Retrieve the top-k semantically closest EDAM terms per type
Source:R/embed.R
retrieve_edam_candidates.RdEmbeds content using the provider recorded in edam_emb,
computes cosine similarity against a pre-computed EDAM embedding matrix,
and returns the top retrieve_k candidates per EDAM type
(topic, operation, data, format).
Usage
retrieve_edam_candidates(
content,
edam_emb,
retrieve_k = 75L,
sim_threshold = 0.3,
embed_model = edam_emb$model
)Arguments
- content
character(1) text to use as the query (e.g. a package description).
- edam_emb
list as returned by
get_edam_embeddingsormake_edam_embeddings.- retrieve_k
integer(1) number of candidates to keep per type.
- sim_threshold
numeric(1) minimum cosine similarity; candidates below this value are dropped before the LLM selection step. Defaults to 0.3.
- embed_model
character(1) embedding model; must match the model used to build
edam_emb. Defaults toedam_emb$model.
Value
named list of data.frames (topic, operation, data, format), each
with columns id and lbl, ordered by descending cosine
similarity to content.
Examples
# Model mismatch is caught before any API call
emb <- get_edam_embeddings()
#> Loading bundled EDAM embeddings from /private/var/folders/yw/gfhgh7k565v9w83x_k764wbc0000gp/T/RtmpbnJ5Gy/temp_libpathed7e7dd7ef17/biocEDAM/demo_embedding/edam_embeddings.rds
tryCatch(
retrieve_edam_candidates("some text", emb,
embed_model = "text-embedding-3-large"),
error = function(e) conditionMessage(e)
)
#> [1] "embed_model 'text-embedding-3-large' does not match the artifact model 'text-embedding-3-small'.\nRun make_edam_embeddings(model = 'text-embedding-3-large') to generate a matching artifact."
if (interactive()) {
candidates <- retrieve_edam_candidates(
"RNA-seq transcript quantification and metadata management",
emb, retrieve_k = 5L)
candidates$topic
}