repair nomenclature mismatches (to curated term set) in a vector of terms
Source:R/clfixer.R
nomenCheckup.Rd
repair nomenclature mismatches (to curated term set) in a vector of terms
Arguments
- cand
character vector of candidate terms
- namedOffic
named character vector of curated terms, the names are regarded as tags, intended to be identifiers in curated ontologies
- n
numeric(1) number of nearest neighbors to return
- tagcolname
character(1) prefix used to name columns for tags in output
- ...
passed to
adist
Value
a data.frame instance with 2n+1 columns (column 1 is candidate,
remaining n pairs of columns are (term, tag) for n nearest neighbors
as measured by adist
.
Examples
candidates = c("JHH7", "HUT102", "HS739T", "NCIH716")
# the candidates are cell line names returned in the text dump from
# https://portals.broadinstitute.org/ccle/page?gene=AHR
# note that one must travel to the third nearest neighbor
# to find the match (and tag) for Hs 739.T
# in this example, we compare to cell line names in Cell Line Ontology
nomenCheckup(candidates, cleanCLOnames(), n=3, tagcolname="clo")
#> loading from cache
#> cand hit1 clo1 hit2 clo2 hit3 clo3
#> 1 JHH7 JHH-7 CLO:0009994 FH 7 CLO:0003207 HH CLO:0003744
#> 2 HUT102 HuT 102 CLO:0004303 FC102 CLO:0003020 HCC202 CLO:0003649
#> 3 HS739T Hs 3.T CLO:0003921 Hs 39.T CLO:0003941 Hs 739.T CLO:0004087
#> 4 NCIH716 NCI-H716 CLO:0008108 NCI-H711 CLO:0008107 NCI-H719 CLO:0008109