Find common ancestors — findCommonAncestors • ontoProc

Given a set of ontology terms, find their latest common ancestors based on the term hierarchy.

Usage

findCommonAncestors(..., g, remove.self = TRUE, descriptions = NULL)

Arguments

...: One or more (possibly named) character vectors containing ontology terms.
g: A graph object containing the hierarchy of all ontology terms.
remove.self: Logical scalar indicating whether to ignore ancestors containing only a single term (themselves).
descriptions: Named character vector containing plain-English descriptions for each term. Names should be the term identifier while the values are the descriptions.

Value

A DataFrame where each row corresponds to a common ancestor term. This contains the columns number, the number of descendent terms across all vectors in ...; and descendents, a List of DataFrames containing the identities of the descendents. It may also contain the column description, containing the description for each term.

Details

This function identifies all terms in g that are the latest common ancestor (LCA) of any subset of terms in .... An LCA is one that has no children that have the exact same set of descendent terms in ..., i.e., it is the most specific term for that set of observed descendents. Knowing the LCA is useful for deciding how terms should be rolled up to broader definitions in downstream applications, usually when the exact terms in ... are too specific for practical use.

The descendents DataFrame in each row of the output describes the descendents for each LCA, stratified by their presence or absence in each entry of .... This is particularly useful for seeing how different sets of terms would be aggregated into broader terms, e.g., when harmonizing annotation from different datasets or studies. Note that any names for ... will be reflected in the columns of the DataFrame for each LCA.

Author

Aaron Lun

Examples

co <- getOnto("cellOnto")
#> loading from cache

# TODO: wrap in utility function.
parents <- co$parents
self <- rep(names(parents), lengths(parents))
library(igraph)
#> 
#> Attaching package: ‘igraph’
#> The following object is masked from ‘package:IRanges’:
#> 
#>     union
#> The following object is masked from ‘package:S4Vectors’:
#> 
#>     union
#> The following objects are masked from ‘package:BiocGenerics’:
#> 
#>     normalize, path, union
#> The following objects are masked from ‘package:generics’:
#> 
#>     components, union
#> The following objects are masked from ‘package:stats’:
#> 
#>     decompose, spectrum
#> The following object is masked from ‘package:base’:
#> 
#>     union
g <- make_graph(rbind(unlist(parents), self))

# Selecting random terms:
LCA <- ontoProc:::findCommonAncestors(A=sample(names(V(g)), 20),
   B=sample(names(V(g)), 20), g=g)
#> Warning: The dim() method for DataFrameList objects is deprecated. Please use dims()
#>   on these objects instead.
#> Warning: The nrow() method for DataFrameList objects is deprecated. Please use nrows()
#>   on these objects instead.
#> Warning: The ncol() method for DataFrameList objects is deprecated. Please use ncols()
#>   on these objects instead.

LCA[1,]
#> DataFrame with 1 row and 2 columns
#>               number           descendents
#>            <integer>       <DataFrameList>
#> CL:0002076         2 TRUE:FALSE,FALSE:TRUE
LCA[1,"descendents"][[1]]
#> DataFrame with 2 rows and 2 columns
#>                    A         B
#>            <logical> <logical>
#> CL:0002237      TRUE     FALSE
#> CL:0002332     FALSE      TRUE