repair nomenclature mismatches (to curated term set) in a vector of terms

nomenCheckup(cand, namedOffic, n = 1, tagcolname = "tag", ...)

Arguments

cand

character vector of candidate terms

namedOffic

named character vector of curated terms, the names are regarded as tags, intended to be identifiers in curated ontologies

n

numeric(1) number of nearest neighbors to return

tagcolname

character(1) prefix used to name columns for tags in output

...

passed to adist

Value

a data.frame instance with 2n+1 columns (column 1 is candidate, remaining n pairs of columns are (term, tag) for n nearest neighbors as measured by adist.

Examples

candidates = c("JHH7", "HUT102", "HS739T", "NCIH716")
# the candidates are cell line names returned in the text dump from
# https://portals.broadinstitute.org/ccle/page?gene=AHR
# note that one must travel to the third nearest neighbor
# to find the match (and tag) for Hs 739.T
# in this example, we compare to cell line names in Cell Line Ontology
nomenCheckup(candidates, cleanCLOnames(), n=3, tagcolname="clo")
#> loading from cache
#>      cand     hit1        clo1     hit2        clo2     hit3        clo3
#> 1    JHH7    JHH-7 CLO:0009994     FH 7 CLO:0003207       HH CLO:0003744
#> 2  HUT102  HuT 102 CLO:0004303    FC102 CLO:0003020   HCC202 CLO:0003649
#> 3  HS739T   Hs 3.T CLO:0003921  Hs 39.T CLO:0003941 Hs 739.T CLO:0004087
#> 4 NCIH716 NCI-H716 CLO:0008108 NCI-H711 CLO:0008107 NCI-H719 CLO:0008109