vignettes/BiocOpenCRAVAT.Rmd
BiocOpenCRAVAT.Rmd
The oc2bioc package (currently at github.com/vjcitn/oc2bioc) incorporates the open-cravat python modules. This workshop takes advantage of R functions and interfaces defined in that package to illustrate the programmatic use of OpenCRAVAT in R/Bioconductor.
Case studies of the use of the OpenCRAVAT GUI are published in a Journal of Clinical Oncology article (Pagel et al. (2020)).
We acquire the metadata for all OpenCRAVAT ‘modules’ in real time using populate_module_set
.
library(oc2bioc) modset = populate_module_set() modset #> OpenCRAVATModuleSet, created Wed Sep 30 03:13:37 2020 #> There are 293 modules. #> Module types/counts: #> aggregator annotator common converter group #> 1 186 2 7 6 #> mapper postaggregator reporter webviewerwidget #> 1 3 6 81
We transformed the module set object to a searchable HTML table:
The queryOC
function uses the OpenCRAVAT REST API to acquire information on a single position in GRCh38 coordinates. This function requires that a registered username and password be supplied. Visit run.opencravat.org
to set a username and password.
An example API response is available in var_in_tx
in the oc2bioc
package. This was returned by the call queryOC(chr="chr7", pos="140753336", annotators=c("pubmed", "segway_breast", "chasmplus_BRCA"))
var_in_tx #> Response [https://run.opencravat.org/submit/annotate?chrom=chr7&pos=140753336&ref_base=A&alt_base=T&annotators=chasmplus_BRCA,pubmed,segway_breast] #> Date: 2020-09-07 10:33 #> Status: 200 #> Content-Type: application/json; charset=utf-8 #> Size: 1.31 kB
We use httr::content
to explore this result.
A peek at the result:
names(vcon) #> [1] "pubmed" "segway_breast" "chasmplus_BRCA" "crx" str(head(vcon,3)) #> List of 3 #> $ pubmed :List of 2 #> ..$ n : int 3194 #> ..$ term: chr "http://www.ncbi.nlm.nih.gov/pubmed?term=BRAF[TIAB] AND cancer[MH]" #> $ segway_breast :List of 2 #> ..$ breast_myoepithelial_cells: chr "Transcribed" #> ..$ breast_vhmec : chr "Transcribed" #> $ chasmplus_BRCA:List of 4 #> ..$ score : num 0.159 #> ..$ transcript: chr "NM_004333.4" #> ..$ pval : num 0.0307 #> ..$ results : chr "*NM_004333.4:(0.159:0.0307)"
Click here for the PubMed references related to the gene harboring the query variant.
The crx
component of the response provides information on variant impacts at the transcript level.
nl = rjson::fromJSON(vcon$crx$all_mappings) DT::datatable(data.frame(do.call(rbind, unlist(nl, recursive=FALSE))))
We’ll use TCGA as a source of realistic variant sets. The curatedTCGAData package will be used.
library(curatedTCGAData) suppressMessages({ acc = curatedTCGAData("ACC", assays="Mutation", dry.run=FALSE, cache=BiocFileCache::bfccache(BiocFileCache::BiocFileCache(ask=FALSE))) }) eacc = experiments(acc)[[1]] eacc #> class: RaggedExperiment #> dim: 20166 90 #> assays(47): Hugo_Symbol Entrez_Gene_Id ... Trna_alt1 Trna_alt2 #> rownames: NULL #> colnames(90): TCGA-OR-A5J1-01A-11D-A29I-10 TCGA-OR-A5J2-01A-11D-A29I-10 #> ... TCGA-PK-A5HB-01A-11D-A29I-10 TCGA-PK-A5HC-01A-11D-A30A-10 #> colData names(0): muts = as(eacc, "GRangesList") sum(elementNROWS(muts)) #> [1] 20166 DT::datatable(as.data.frame(head(muts[[1]][,1:4])))
We will use a function based on the TnT package to visualize variant locations in the context of gene-like features.
Here’s a 1Mb slice:
library(BiocOpenCRAVAT) TnTdemo(acc, viewstart=6.4e7, viewend=6.5e7)
Now we drill down a bit, to a 0.1Mb region near MAP4K2:
TnTdemo(acc, viewstart=6.455e7, viewend=6.465e7)
We use the function make_oc_POSTable
to transform a GRanges instance into a data.frame that can be saved to a file and submitted to the OpenCRAVAT API.
mdf = make_oc_POSTable(snvs) head(mdf) #> chr pos ref alt samp var #> 1 1 11561526 G A TCGA-OR-A5J1-01A-11D-A29I-10 v_1 #> 2 1 12309384 T G TCGA-OR-A5J1-01A-11D-A29I-10 v_2 #> 3 1 33820015 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_3 #> 4 1 152800122 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_4 #> 5 1 152800131 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_5 #> 6 1 173499091 C A TCGA-OR-A5J1-01A-11D-A29I-10 v_6
The data.frame must be formatted for ingestion by OpenCRAVAT. The following call to write.table will accomplish this.
tf = tempfile() write.table(mdf, file=tf, sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE) head(read.delim(tf, h=FALSE, sep="\t")) #> V1 V2 V3 V4 V5 V6 #> 1 1 11561526 G A TCGA-OR-A5J1-01A-11D-A29I-10 v_1 #> 2 1 12309384 T G TCGA-OR-A5J1-01A-11D-A29I-10 v_2 #> 3 1 33820015 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_3 #> 4 1 152800122 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_4 #> 5 1 152800131 C T TCGA-OR-A5J1-01A-11D-A29I-10 v_5 #> 6 1 173499091 C A TCGA-OR-A5J1-01A-11D-A29I-10 v_6
Pagel, Kymberleigh A., Rick Kim, Kyle Moad, Ben Busby, Lily Zheng, Collin Tokheim, Michael Ryan, and Rachel Karchin. 2020. “Integrated Informatics Analysis of Cancer-Related Variants.” JCO Clinical Cancer Informatics, no. 4: 310–17. https://doi.org/10.1200/cci.19.00132.