Transform an AnnData input to one that has had gene filtering, normalization, log transformation, low-variance gene exclusion, and discretization applied.

basic_preproc(
  ad,
  workdir = tempdir(),
  min_n_cells = 3L,
  norm_method = "lib_size",
  do_log = TRUE,
  n_top_genes = NULL,
  n_bins = 6L,
  simba_ref
)

Arguments

ad: AnnData instance (i.e., inherits from `anndata._core.anndata.AnnData`)
workdir: character(1) defines working directory for graph serialization; set to NULL to take simba default
min_n_cells: integer(1) used to filter genes, excluding those expressed in fewer than `min_n_cells` cells
norm_method: character(1) defaults to 'lib_size', see simba doc
do_log: logical(1) if TRUE, conducts log normalization
n_top_genes: NULL or numeric(1) if non-null, genes are excluded if variance is smaller than that obtained for the "nth" top
n_bins: integer(1) number of bins for expression discretization
simba_ref: instance of python.builtin.module, checked to have component 'tl'

Value

instance of AnnData with layers raw and simba, the latter consisting of a sparse matrix, and other components as determined by the operations selected through argument bindings.

Examples

p3k = get_10x3kpbmc_path(overwrite=TRUE)
ref = simba_ref()
pp = ref$read_h5ad(p3k)
pp
#> AnnData object with n_obs × n_vars = 2700 × 32738
#>     obs: 'celltype'
#>     var: 'gene_ids'
bb = basic_preproc(pp, simba_ref=ref)
bb
#> AnnData object with n_obs × n_vars = 2700 × 13714
#>     obs: 'celltype', 'n_counts', 'n_genes', 'pct_genes', 'pct_mt'
#>     var: 'gene_ids', 'n_counts', 'n_cells', 'pct_cells'
#>     uns: 'disc'
#>     layers: 'raw', 'simba'