Skip to contents

Connects to the current EDAM SemanticSQL release, fetches term labels and definitions, embeds them using the specified provider, and saves the result to outfile. The saved object can be submitted to AnnotationHub or loaded directly via get_edam_embeddings.

Usage

make_edam_embeddings(
  outfile = file.path(tempdir(), "edam_embeddings.rds"),
  model = "text-embedding-3-small",
  provider = "openai"
)

Arguments

outfile

character(1) path for the output .rds file. Defaults to edam_embeddings.rds in tempdir().

model

character(1) embedding model identifier. For provider="openai" use e.g. "text-embedding-3-small"; for provider="huggingface" use a HuggingFace model ID such as "FremyCompany/BioLORD-2023-C".

provider

character(1) embedding provider: "openai" (default) or "huggingface". The corresponding environment variable must be set (see llm_env_var).

Value

invisibly, the embedding list (same structure as the AnnotationHub resource returned by get_edam_embeddings).

Examples

# This is a maintainer-level function; requires a provider API key and
# the EDAM SemanticSQL database (ontoProc2::semsql_connect).
# See inst/scripts/make_edam_embeddings.R for a worked example.