use Anh Vu's OpenAI prompting to develop structured metadata about Bioconductor packages, targeting EDAM ontology and bio.tools schema
Source:R/edamize.R
edamize.Rd
use Anh Vu's OpenAI prompting to develop structured metadata about Bioconductor packages, targeting EDAM ontology and bio.tools schema
Arguments
- content_for_edam
character(1) a URL for doc originating from the developer
- temp
numeric(1) temperature setting for openAI chat, see `https://gptcache.readthedocs.io/en/latest/bootcamp/temperature/chat.html`, defaults to 0.0
- prescrub
logical(1) if TRUE, apply the cleantxt function to the input before trying to assign EDAM tags; defaults to TRUE effort in the python operations in inst/curbioc; defaults to 1
Value
a list with components 'topic' and 'function', which can be converted to a data.frame using `mkdf`
Note
This function is not deterministic. For the provided example, the input to the function is a fixed text, but the output at the end can be NULL, a data frame with 12 rows, or a data frame with 14 rows. More work is needed to achieve greater predictability.
Examples
if (interactive()) {
key = Sys.getenv("OPENAI_API_KEY")
if (nchar(key)==0) stop("need to have OPENAI_API_KEY set")
# avoid repetitious reprocessing of tximeta vignette
# content = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html")
content = readRDS(system.file("rds/tximetaFocused.rds", package="biocEDAM"))
str(content)
lk = edamize(content$focus)
if (is.null(lk)) lk = edamize(content$focus) # sometimes a second try is needed
print(mkdf(lk))
# try content derived from a pdf vignette
# content2 = vig2data("https://bioconductor.org/packages/release/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf")
content2 = readRDS(system.file("rds/IRangesOVdata.rds", package="biocEDAM"))
lk2 = edamize(content2$focus)
mkdf(lk2)
}