BiocArrayProc

vehicle for summarizing developments on scalable array processing

View the Project on GitHub vjcitn/BiocArrayProc

Welcome to BiocArrayProc

Overview

The purpose of this gh-pages site is to collect links and comments concerning array processing methods relevant to Bioconductor.

Software design concepts

A common objective is to hide complexities of working with very large data while permitting great flexibility in use of available computing resources. There are many tradeoffs to be navigated, and a comprehensive account of the issues will take substantial work. A brief sketch follows.

Exemplary documents, packages, datasets, and repositories

Documents
Packages
> assay(tenx)
<27998 x 1306127> DelayedMatrix object of type "integer":
           AAACCTGAGATAGGAG-1 ... TTTGTCATCTGAAAGA-133
    [1,]                    0   .                    0
    [2,]                    0   .                    0
    [3,]                    0   .                    0
    [4,]                    0   .                    0
    [5,]                    0   .                    0
     ...                    .   .                    .
[27994,]                    0   .                    0
[27995,]                    1   .                    0
[27996,]                    0   .                    0
[27997,]                    0   .                    0
[27998,]                    0   .                    0

Additionally, the vignette presents issues involved with a sparse representation, managed in this case in HDF5:

> h5ls(fname)
  group       name       otype  dclass        dim
0     /       mm10   H5I_GROUP                   
1 /mm10   barcodes H5I_DATASET  STRING    1306127
2 /mm10       data H5I_DATASET INTEGER 2624828308
3 /mm10 gene_names H5I_DATASET  STRING      27998
4 /mm10      genes H5I_DATASET  STRING      27998
5 /mm10    indices H5I_DATASET INTEGER 2624828308
6 /mm10     indptr H5I_DATASET INTEGER    1306128
7 /mm10      shape H5I_DATASET INTEGER          2
Repositories

Key data structures

Support or Contact

Have a look at the scalability channel at community-bioc.slack.com