Skip to contents

This package investigates tools that can help understand the relationship of biocViews in relation to the EDAM ontology.

Installation

  • Use BiocManager::install('vjcitn/biocEDAM') to install this package.

  • The DESCRIPTION file of the package defines python requirements.

  • When functions requiring python are called, a virtual environment will be created by reticulate to resolve references to modules in use.

Note that some functions require that a valid OpenAI API key is readable through the environment variable OPENAI_API_KEY. When this is absent, functions will simply fail.

Charges to the associated OpenAI account will accrue when these functions are called. All the experimental work underlying the development of the code in March 2025 produced charges of $1.34. You can examine your charges at platform.openai.com/usage.

Purpose

biocViews is an ad hoc vocabulary in the form of a graphNEL instance with over 400 terms. EDAM is an OWL model of a vocabulary devoted to concepts of data analysis and data management in the biosciences. This package unites these two resources with the objective of permitting exploration that will lead to formal ontological tagging of all Bioconductor software and data packages and workflows.

Tools

vig2data

This function uses rvest, pdftools, and gpt-4o via ellmer to transform content (typically referenced via URL for Bioconductor vignettes, see examples) to structured data about authors, topics (determined ad libitum by GPT), and a component focused which is a concise summary of content up to 450 words. This component will be used in edamize.

edamize

This function calls python code of Anh Nguyet Vu that prompts GPT to select and organize terms from EDAM on the basis of relevance to the supplied text (focused from vig2data in the intended application). Terms from the topic, operation, data, and format components of EDAM may be returned.

bvbrowse

This function starts a shiny app that presents term-filtered sets of packages and their views annotation.

allmap

The allmap data.frame is the output of text2term applied to biocViews terms for evaluation of similarity to terms in the EDAM ontology.

> head(allmap)
                              Source Term ID    Source Term
0 http://ccb.hms.harvard.edu/t2t/RFzhTje9ucG      BiocViews
1 http://ccb.hms.harvard.edu/t2t/RFzhTje9ucG      BiocViews
2 http://ccb.hms.harvard.edu/t2t/R3hqXkeJtkt       Software
3 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
4 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
5 http://ccb.hms.harvard.edu/t2t/R4dWXrwrX3W AnnotationData
     Mapped Term Label   Mapped Term CURIE
0     GenomeReviews ID      EDAM.DATA:2751
1                 BioC    EDAM.FORMAT:3782
2 Software engineering     EDAM.TOPIC:3372
3           Annotation      EDAM.DATA:2018
4           Annotation EDAM.OPERATION:0226
5          Gene report      EDAM.DATA:0916
                         Mapped Term IRI Mapping Score Tags
0      http://edamontology.org/data_2751         0.459 None
1    http://edamontology.org/format_3782         0.380 None
2     http://edamontology.org/topic_3372         0.721 None
3      http://edamontology.org/data_2018         0.805 None
4 http://edamontology.org/operation_0226         0.805 None
5      http://edamontology.org/data_0916         0.703 None