vignettes/htxlook.Rmd
htxlook.Rmd
suppressPackageStartupMessages({ library(BiocStyle) })
Following on Sean Davis’ BigRNA project, in which 181000+ RNA-seq studies in NCBI SRA are processed by a salmon-based workflow, we transformed the gene-level quantifications to HDF5 and loaded them into the HDF Scalable Data Service (HSDS).
The restfulSE package uses rhdf5client to implement a DelayedArray/SummarizedExperiment interface to this collection of transcriptomes.
suppressPackageStartupMessages({ library(HumanTranscriptomeCompendium) library(SummarizedExperiment) }) htx = htx_load() #> Loading required namespace: BiocFileCache #> using temporary cache /tmp/RtmpL5IWjw/BiocFileCache #> adding RDS to local cache, future invocations will use local image #> adding rname 'https://s3.amazonaws.com/bcfound-bigrna/rangedHtxGeneSE.rds' htx #> class: RangedSummarizedExperiment #> dim: 58288 181134 #> metadata(1): rangeSource #> assays(1): counts #> rownames(58288): ENSG00000000003.14 ENSG00000000005.5 ... #> ENSG00000284747.1 ENSG00000284748.1 #> rowData names(0): #> colnames(181134): DRX001125 DRX001126 ... SRX999990 SRX999991 #> colData names(4): experiment_accession experiment_platform #> study_accession study_title system.time(lka <- assay(htx)) #> Loading required package: rhdf5client #> user system elapsed #> 0.183 0.000 0.184 lka #> <58288 x 181134> matrix of class DelayedMatrix and type "double": #> DRX001125 DRX001126 DRX001127 ... SRX999990 #> ENSG00000000003.14 40.001250 1322.844547 1528.257578 . 1149.0341 #> ENSG00000000005.5 0.000000 9.999964 6.000006 . 0.0000 #> ENSG00000000419.12 64.000031 1456.004418 2038.996875 . 1485.0003 #> ENSG00000000457.13 31.814591 1583.504257 1715.041308 . 631.7751 #> ENSG00000000460.16 12.430602 439.321234 529.280324 . 945.6903 #> ... . . . . . #> ENSG00000284744.1 1.05614505 24.81388079 32.29261298 . 7.316061 #> ENSG00000284745.1 0.99999879 15.99996994 16.99999743 . 0.000000 #> ENSG00000284746.1 0.00000000 0.00379458 0.00000000 . 0.000000 #> ENSG00000284747.1 7.77564984 270.83296409 239.88056843 . 108.011633 #> ENSG00000284748.1 1.00000768 22.23010514 37.73881938 . 11.278980 #> SRX999991 #> ENSG00000000003.14 1430.3955 #> ENSG00000000005.5 0.0000 #> ENSG00000000419.12 1970.0004 #> ENSG00000000457.13 802.0563 #> ENSG00000000460.16 1259.7648 #> ... . #> ENSG00000284744.1 3.268453 #> ENSG00000284745.1 0.000000 #> ENSG00000284746.1 0.000000 #> ENSG00000284747.1 94.606851 #> ENSG00000284748.1 5.240970
The metadata for each sample in this compendium is limited; Sean Davis’ Omicidx can be queried for more attributes.
names(colData(htx)) #> [1] "experiment_accession" "experiment_platform" "study_accession" #> [4] "study_title" head(sort(table(htx$study_title), decreasing=TRUE)) #> #> Genotype-Tissue Expression (GTEx) Common Fund Project #> 9495 #> Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) #> 5146 #> Single Cell RNA-seq Study of Midbrain and Dopaminergic Neuron Development in Mouse, Human, and Stem Cells #> 4002 #> Single_Cell_RNAseq_at_various_stages_of_HiPSCs_differentiating_toward_definitive_endoderm_and_endoderm_derived_lineages_ #> 3987 #> Single-cell RNA-seq analysis of human pancreas from healthy individuals and type 2 diabetes patients #> 3493 #> Single-Cell RNAseq analysis of diffuse neoplastic infiltrating cells at the migrating front of human glioblastoma #> 3383