The oc2bioc package (currently at github.com/vjcitn/oc2bioc) incorporates the open-cravat python modules. This workshop takes advantage of R functions and interfaces defined in that package to illustrate the programmatic use of OpenCRAVAT in R/Bioconductor.
Case studies of the use of the OpenCRAVAT GUI are published in a Journal of Clinical Oncology article (Pagel et al. (2020)).
We acquire the metadata for all OpenCRAVAT ‘modules’ in real time using populate_module_set
.
library(oc2bioc)
modset = populate_module_set()
modset
#> OpenCRAVATModuleSet, created Fri Sep 6 20:21:16 2024
#> There are 392 modules.
#> Module types/counts:
#> aggregator annotator common converter group
#> 1 236 2 8 7
#> mapper package postaggregator reporter webapp
#> 2 5 4 8 1
#> webviewerwidget
#> 118
We transformed the module set object to a searchable HTML table:
The queryOC
function uses the OpenCRAVAT REST API to acquire information on a single position in GRCh38 coordinates. This function requires that a registered username and password be supplied. Visit run.opencravat.org
to set a username and password.
An example API response is available in var_in_tx
in the oc2bioc
package. This was returned by the call queryOC(chr="chr7", pos="140753336", annotators=c("pubmed", "segway_breast", "chasmplus_BRCA"))
var_in_tx
#> Response [https://run.opencravat.org/submit/annotate?chrom=chr7&pos=140753336&ref_base=A&alt_base=T&annotators=chasmplus_BRCA,pubmed,segway_breast]
#> Date: 2020-09-07 10:33
#> Status: 200
#> Content-Type: application/json; charset=utf-8
#> Size: 1.31 kB
We use httr::content
to explore this result.
A peek at the result:
names(vcon)
#> [1] "pubmed" "segway_breast" "chasmplus_BRCA" "crx"
str(head(vcon,3))
#> List of 3
#> $ pubmed :List of 2
#> ..$ n : int 3194
#> ..$ term: chr "http://www.ncbi.nlm.nih.gov/pubmed?term=BRAF[TIAB] AND cancer[MH]"
#> $ segway_breast :List of 2
#> ..$ breast_myoepithelial_cells: chr "Transcribed"
#> ..$ breast_vhmec : chr "Transcribed"
#> $ chasmplus_BRCA:List of 4
#> ..$ score : num 0.159
#> ..$ transcript: chr "NM_004333.4"
#> ..$ pval : num 0.0307
#> ..$ results : chr "*NM_004333.4:(0.159:0.0307)"
Click here for the PubMed references related to the gene harboring the query variant.
The crx
component of the response provides information on variant impacts at the transcript level.
nl = rjson::fromJSON(vcon$crx$all_mappings)
DT::datatable(data.frame(do.call(rbind, unlist(nl, recursive=FALSE))))
We’ll use TCGA as a source of realistic variant sets. The curatedTCGAData package will be used.
library(curatedTCGAData)
suppressMessages({
acc = curatedTCGAData("ACC", assays="Mutation", dry.run=FALSE,
cache=BiocFileCache::bfccache(BiocFileCache::BiocFileCache(ask=FALSE)),
version="2.0.1")
})
eacc = experiments(acc)[[1]]
eacc
#> class: RaggedExperiment
#> dim: 20166 90
#> assays(47): Hugo_Symbol Entrez_Gene_Id ... Trna_alt1 Trna_alt2
#> rownames: NULL
#> colnames(90): TCGA-OR-A5J1-01A-11D-A29I-10 TCGA-OR-A5J2-01A-11D-A29I-10
#> ... TCGA-PK-A5HB-01A-11D-A29I-10 TCGA-PK-A5HC-01A-11D-A30A-10
#> colData names(0):
muts = as(eacc, "GRangesList")
sum(elementNROWS(muts))
#> [1] 20166
DT::datatable(as.data.frame(head(muts[[1]][,1:4])))
Notice that there is a variant with width 24. We will focus on single-nucleotide variants (SNVs).
We will use a function based on the TnT package to visualize variant locations in the context of gene-like features.
Here’s a 1Mb slice:
TnTdemo(acc, viewstart=6.4e7, viewend=6.5e7)
Now we drill down a bit, to a 0.1Mb region near MAP4K2:
TnTdemo(acc, viewstart=6.455e7, viewend=6.465e7)
We use the function make_oc_POSTable
to transform a GRanges instance into a data.frame that can be saved to a file and submitted to the OpenCRAVAT API.
mdf = make_oc_POSTable(snvs)
head(mdf)
#> chr pos ref alt samp var
#> 1 1 11561526 G A TCGA-OR-A5J1-01A-11D-A29I-10 v_1
#> 2 1 12309384 T G TCGA-OR-A5J1-01A-11D-A29I-10 v_2
#> 3 1 33820015 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_3
#> 4 1 152800122 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_4
#> 5 1 152800131 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_5
#> 6 1 173499091 C A TCGA-OR-A5J1-01A-11D-A29I-10 v_6
The data.frame must be formatted for ingestion by OpenCRAVAT. The following call to write.table will accomplish this.
tf = tempfile()
write.table(mdf, file=tf, sep="\t", col.names=FALSE,
row.names=FALSE, quote=FALSE)
head(read.delim(tf, h=FALSE, sep="\t"))
#> V1 V2 V3 V4 V5 V6
#> 1 1 11561526 G A TCGA-OR-A5J1-01A-11D-A29I-10 v_1
#> 2 1 12309384 T G TCGA-OR-A5J1-01A-11D-A29I-10 v_2
#> 3 1 33820015 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_3
#> 4 1 152800122 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_4
#> 5 1 152800131 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_5
#> 6 1 173499091 C A TCGA-OR-A5J1-01A-11D-A29I-10 v_6