celltoprotein -- connecting Cell and Protein Ontologies
Vincent J. Carey, stvjc at channing.harvard.edu
March 25, 2026
Source:vignettes/celltoprotein.Rmd
celltoprotein.RmdIntroduction
In a pair of papers from the Ventner Institute, Bakken et al. and Aevermann et al. discuss ontological implications of single-cell transcriptomics. A process of cell type definition via “necessary and sufficient marker gene” enumeration is introduced.
In this vignette we indicate how Cell Ontology, Relational Ontology, and Protein Ontology can be connected to assess formal relationships between declared cell types and plasma membrane features that can play a role in cell type definition.
Given a cell type, what proteins are noted as parts of its plasma membrane?
Connect to the relational ontology and search for CURIEs related to “plasma membrane”.
library(ontoProc2)
ro = semsql_connect(ontology="ro")
search_labels(ro, "plasma membrane")## subject label
## 1 RO:0002104 has plasma membrane part
## 2 RO:0015015 has high plasma membrane amount
## 3 RO:0015016 has low plasma membrane amount
We have a helper resource for finding exact Cell Ontology names of cell types.
data("tag2cn", package="ontoProc2")
cd8reg = grep("CD8-positive.*regulatory", tag2cn, value=TRUE)
cd8reg## CL:0000795
## "CD8-positive, alpha-beta regulatory T cell"
## CL:0000919
## "CD8-positive, CD25-positive, alpha-beta regulatory T cell"
## CL:0000920
## "CD8-positive, CD28-negative, alpha-beta regulatory T cell"
## CL:0001041
## "CD8-positive, CXCR3-positive, alpha-beta regulatory T cell"
Now with these cell type identifiers, we can search for the proteins identified as “part of plasma membrane”. We need to use the CURIEs for precision
prtab = get_present_pmp(names(cd8reg))
library(DT)
datatable(prtab)Given a protein, what cell types are asserted to possess it as a membrane part?
We pick two proteins and look for associated cell types.
prs = c("PR:000001094", "PR:000001380")
clk = cells_with_pmp(prs)
datatable(clk)Some details
The “entailed edge” table of the Semantic SQL representation of Cell Ontology includes all assertions that are derivable from base axioms of the ontology.
cl = semsql_connect(ontology="cl")
cl## <SemsqlConn> prefix: CL | labeled terms: 22,298
## # Source: table<`entailed_edge`> [?? x 3]
## # Database: sqlite 3.51.2 [/Users/vincentcarey/Library/Caches/org.R-project.R/R/BiocFileCache/40e27456c620_cl.db]
## subject predicate object
## <chr> <chr> <chr>
## 1 UBERON:0001772 rdfs:subClassOf UBERON:0001772
## 2 UBERON:0019190 rdfs:subClassOf UBERON:0019190
## 3 GO:0051034 rdfs:subClassOf GO:0051033
## 4 GO:0051033 rdfs:subClassOf GO:0051033
## 5 GO:1904522 rdfs:subClassOf GO:1904522
## 6 UBERON:0018685 rdfs:subClassOf UBERON:0018685
## 7 GO:0106027 rdfs:subClassOf GO:0106027
## 8 GO:0050679 rdfs:subClassOf GO:0050679
## 9 GO:1901647 rdfs:subClassOf GO:0050679
## 10 GO:1904692 rdfs:subClassOf GO:0050679
## # ℹ more rows
## # Source: SQL [?? x 1]
## # Database: sqlite 3.51.2 [/Users/vincentcarey/Library/Caches/org.R-project.R/R/BiocFileCache/40e27456c620_cl.db]
## n
## <int>
## 1 2966269
We can look for statements that have “RO:0002104” as predicate: