Taking a look at HumanTranscriptomeCompendium

Following on Sean Davis’ BigRNA project, in which 181000+ RNA-seq studies in NCBI SRA are processed by a salmon-based workflow, we transformed the gene-level quantifications to HDF5 and loaded them into the HDF Scalable Data Service (HSDS).

The restfulSE package uses rhdf5client to implement a DelayedArray/SummarizedExperiment interface to this collection of transcriptomes.

suppressPackageStartupMessages({
library(HumanTranscriptomeCompendium)
library(SummarizedExperiment)
})
htx = htx_load()
#> Loading required namespace: BiocFileCache
#> using temporary cache /tmp/RtmpL5IWjw/BiocFileCache
#> adding RDS to local cache, future invocations will use local image
#> adding rname 'https://s3.amazonaws.com/bcfound-bigrna/rangedHtxGeneSE.rds'
htx
#> class: RangedSummarizedExperiment 
#> dim: 58288 181134 
#> metadata(1): rangeSource
#> assays(1): counts
#> rownames(58288): ENSG00000000003.14 ENSG00000000005.5 ...
#>   ENSG00000284747.1 ENSG00000284748.1
#> rowData names(0):
#> colnames(181134): DRX001125 DRX001126 ... SRX999990 SRX999991
#> colData names(4): experiment_accession experiment_platform
#>   study_accession study_title
system.time(lka <- assay(htx))
#> Loading required package: rhdf5client
#>    user  system elapsed 
#>   0.183   0.000   0.184
lka
#> <58288 x 181134> matrix of class DelayedMatrix and type "double":
#>                       DRX001125    DRX001126    DRX001127 ...  SRX999990
#> ENSG00000000003.14    40.001250  1322.844547  1528.257578   .  1149.0341
#>  ENSG00000000005.5     0.000000     9.999964     6.000006   .     0.0000
#> ENSG00000000419.12    64.000031  1456.004418  2038.996875   .  1485.0003
#> ENSG00000000457.13    31.814591  1583.504257  1715.041308   .   631.7751
#> ENSG00000000460.16    12.430602   439.321234   529.280324   .   945.6903
#>                ...            .            .            .   .          .
#>  ENSG00000284744.1   1.05614505  24.81388079  32.29261298   .   7.316061
#>  ENSG00000284745.1   0.99999879  15.99996994  16.99999743   .   0.000000
#>  ENSG00000284746.1   0.00000000   0.00379458   0.00000000   .   0.000000
#>  ENSG00000284747.1   7.77564984 270.83296409 239.88056843   . 108.011633
#>  ENSG00000284748.1   1.00000768  22.23010514  37.73881938   .  11.278980
#>                     SRX999991
#> ENSG00000000003.14  1430.3955
#>  ENSG00000000005.5     0.0000
#> ENSG00000000419.12  1970.0004
#> ENSG00000000457.13   802.0563
#> ENSG00000000460.16  1259.7648
#>                ...          .
#>  ENSG00000284744.1   3.268453
#>  ENSG00000284745.1   0.000000
#>  ENSG00000284746.1   0.000000
#>  ENSG00000284747.1  94.606851
#>  ENSG00000284748.1   5.240970

The metadata for each sample in this compendium is limited; Sean Davis’ Omicidx can be queried for more attributes.

names(colData(htx))
#> [1] "experiment_accession" "experiment_platform"  "study_accession"     
#> [4] "study_title"
head(sort(table(htx$study_title), decreasing=TRUE))
#> 
#>                                                                    Genotype-Tissue Expression (GTEx) Common Fund Project 
#>                                                                                                                     9495 
#>                                                Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) 
#>                                                                                                                     5146 
#>                Single Cell RNA-seq Study of Midbrain and Dopaminergic Neuron Development in Mouse, Human, and Stem Cells 
#>                                                                                                                     4002 
#> Single_Cell_RNAseq_at_various_stages_of_HiPSCs_differentiating_toward_definitive_endoderm_and_endoderm_derived_lineages_ 
#>                                                                                                                     3987 
#>                     Single-cell RNA-seq analysis of human pancreas from healthy individuals and type 2 diabetes patients 
#>                                                                                                                     3493 
#>        Single-Cell RNAseq analysis of diffuse neoplastic infiltrating cells at the migrating front of human glioblastoma 
#>                                                                                                                     3383