use Anh Vu's OpenAI prompting to develop structured metadata about Bioconductor packages, targeting EDAM ontology and bio.tools schema

Usage

edamize(content_for_edam, temp = 0, prescrub = TRUE)

Arguments

content_for_edam: character(1) a URL for doc originating from the developer
temp: numeric(1) temperature setting for openAI chat, see `https://gptcache.readthedocs.io/en/latest/bootcamp/temperature/chat.html`, defaults to 0.0
prescrub: logical(1) if TRUE, apply the cleantxt function to the input before trying to assign EDAM tags; defaults to TRUE effort in the python operations in inst/curbioc; defaults to 1

Value

a list with components 'topic' and 'function', which can be converted to a data.frame using `mkdf`

Note

This function is not deterministic. For the provided example, the input to the function is a fixed text, but the output at the end can be NULL, a data frame with 12 rows, or a data frame with 14 rows. More work is needed to achieve greater predictability.

Examples

if (interactive()) {
  key = Sys.getenv("OPENAI_API_KEY")
  if (nchar(key)==0) stop("need to have OPENAI_API_KEY set")
  # avoid repetitious reprocessing of tximeta vignette
  # content = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html")
  content = readRDS(system.file("rds/tximetaFocused.rds", package="biocEDAM"))
  str(content)
  lk = edamize(content$focus)
  if (is.null(lk)) lk = edamize(content$focus)  # sometimes a second try is needed
  print(mkdf(lk))
  # try content derived from a pdf vignette
  # content2 = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf")
  content2 = readRDS(system.file("rds/IRangesOVdata.rds", package="biocEDAM"))
  lk2 = edamize(content2$focus)
  mkdf(lk2)
}