This package investigates tools that can help understand the relationship of biocViews in relation to the EDAM ontology.
Installation
Use
BiocManager::install('vjcitn/biocEDAM')
to install this package.The DESCRIPTION file of the package defines python requirements.
When functions requiring python are called, a virtual environment will be created by reticulate to resolve references to modules in use.
Note that some functions require that a valid OpenAI API key is readable through the environment variable OPENAI_API_KEY
. When this is absent, functions will simply fail.
Charges to the associated OpenAI account will accrue when these functions are called. All the experimental work underlying the development of the code in March 2025 produced charges of $1.34. You can examine your charges at platform.openai.com/usage
.
Purpose
biocViews is an ad hoc vocabulary in the form of a graphNEL instance with over 400 terms. EDAM is an OWL model of a vocabulary devoted to concepts of data analysis and data management in the biosciences. This package unites these two resources with the objective of permitting exploration that will lead to formal ontological tagging of all Bioconductor software and data packages and workflows.
Tools
vig2data
This function uses rvest, pdftools, and gpt-4o via ellmer to transform content (typically referenced via URL for Bioconductor vignettes, see examples) to structured data about authors, topics (determined ad libitum by GPT), and a component focused
which is a concise summary of content up to 450 words. This component will be used in edamize
.
edamize
This function calls python code of Anh Nguyet Vu that prompts GPT to select and organize terms from EDAM on the basis of relevance to the supplied text (focused
from vig2data
in the intended application). Terms from the topic, operation, data, and format components of EDAM may be returned.
bvbrowse
This function starts a shiny app that presents term-filtered sets of packages and their views annotation.
allmap
The allmap
data.frame is the output of text2term applied to biocViews terms for evaluation of similarity to terms in the EDAM ontology.
> head(allmap)
Source Term ID Source Term
0 http://ccb.hms.harvard.edu/t2t/RFzhTje9ucG BiocViews
1 http://ccb.hms.harvard.edu/t2t/RFzhTje9ucG BiocViews
2 http://ccb.hms.harvard.edu/t2t/R3hqXkeJtkt Software
3 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
4 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
5 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
Mapped Term Label Mapped Term CURIE
0 GenomeReviews ID EDAM.DATA:2751
1 BioC EDAM.FORMAT:3782
2 Software engineering EDAM.TOPIC:3372
3 Annotation EDAM.DATA:2018
4 Annotation EDAM.OPERATION:0226
5 Gene report EDAM.DATA:0916
Mapped Term IRI Mapping Score Tags
0 http://edamontology.org/data_2751 0.459 None
1 http://edamontology.org/format_3782 0.380 None
2 http://edamontology.org/topic_3372 0.721 None
3 http://edamontology.org/data_2018 0.805 None
4 http://edamontology.org/operation_0226 0.805 None
5 http://edamontology.org/data_0916 0.703 None