R/clfixer.R
nomenCheckup.Rd
repair nomenclature mismatches (to curated term set) in a vector of terms
nomenCheckup(cand, namedOffic, n = 1, tagcolname = "tag", ...)
character vector of candidate terms
named character vector of curated terms, the names are regarded as tags, intended to be identifiers in curated ontologies
numeric(1) number of nearest neighbors to return
character(1) prefix used to name columns for tags in output
passed to adist
a data.frame instance with 2n+1 columns (column 1 is candidate,
remaining n pairs of columns are (term, tag) for n nearest neighbors
as measured by adist
.
candidates = c("JHH7", "HUT102", "HS739T", "NCIH716")
# the candidates are cell line names returned in the text dump from
# https://portals.broadinstitute.org/ccle/page?gene=AHR
# note that one must travel to the third nearest neighbor
# to find the match (and tag) for Hs 739.T
# in this example, we compare to cell line names in Cell Line Ontology
nomenCheckup(candidates, cleanCLOnames(), n=3, tagcolname="clo")
#> loading from cache
#> cand hit1 clo1 hit2 clo2 hit3 clo3
#> 1 JHH7 JHH-7 CLO:0009994 FH 7 CLO:0003207 HH CLO:0003744
#> 2 HUT102 HuT 102 CLO:0004303 FC102 CLO:0003020 HCC202 CLO:0003649
#> 3 HS739T Hs 3.T CLO:0003921 Hs 39.T CLO:0003941 Hs 739.T CLO:0004087
#> 4 NCIH716 NCI-H716 CLO:0008108 NCI-H711 CLO:0008107 NCI-H719 CLO:0008109