Given a set of ontology terms, find their latest common ancestors based on the term hierarchy.
Arguments
- ...
One or more (possibly named) character vectors containing ontology terms.
- g
A graph object containing the hierarchy of all ontology terms.
- remove.self
Logical scalar indicating whether to ignore ancestors containing only a single term (themselves).
- descriptions
Named character vector containing plain-English descriptions for each term. Names should be the term identifier while the values are the descriptions.
Value
A DataFrame where each row corresponds to a common ancestor term.
This contains the columns number
, the number of descendent terms across all vectors in ...
;
and descendents
, a List of DataFrames containing the identities of the descendents.
It may also contain the column description
, containing the description for each term.
Details
This function identifies all terms in g
that are the latest common ancestor (LCA) of any subset of terms in ...
.
An LCA is one that has no children that have the exact same set of descendent terms in ...
,
i.e., it is the most specific term for that set of observed descendents.
Knowing the LCA is useful for deciding how terms should be rolled up to broader definitions in downstream applications,
usually when the exact terms in ...
are too specific for practical use.
The descendents
DataFrame in each row of the output describes the descendents for each LCA,
stratified by their presence or absence in each entry of ...
.
This is particularly useful for seeing how different sets of terms would be aggregated into broader terms,
e.g., when harmonizing annotation from different datasets or studies.
Note that any names for ...
will be reflected in the columns of the DataFrame for each LCA.
Examples
co <- getOnto("cellOnto")
#> loading from cache
# TODO: wrap in utility function.
parents <- co$parents
self <- rep(names(parents), lengths(parents))
library(igraph)
#>
#> Attaching package: ‘igraph’
#> The following object is masked from ‘package:IRanges’:
#>
#> union
#> The following object is masked from ‘package:S4Vectors’:
#>
#> union
#> The following objects are masked from ‘package:BiocGenerics’:
#>
#> normalize, path, union
#> The following objects are masked from ‘package:stats’:
#>
#> decompose, spectrum
#> The following object is masked from ‘package:base’:
#>
#> union
g <- make_graph(rbind(unlist(parents), self))
# Selecting random terms:
LCA <- ontoProc:::findCommonAncestors(A=sample(names(V(g)), 20),
B=sample(names(V(g)), 20), g=g)
#> Warning: The dim() method for DataFrameList objects is deprecated. Please use
#> dims() on these objects instead.
#> Warning: The nrow() method for DataFrameList objects is deprecated. Please use
#> nrows() on these objects instead.
#> Warning: The ncol() method for DataFrameList objects is deprecated. Please use
#> ncols() on these objects instead.
LCA[1,]
#> DataFrame with 1 row and 2 columns
#> number descendents
#> <integer> <DataFrameList>
#> CL:0002076 2 TRUE:FALSE,FALSE:TRUE
LCA[1,"descendents"][[1]]
#> DataFrame with 2 rows and 2 columns
#> A B
#> <logical> <logical>
#> CL:0002237 TRUE FALSE
#> CL:0002332 FALSE TRUE