Transform an AnnData input to one that has had gene filtering, normalization, log transformation, low-variance gene exclusion, and discretization applied.

basic_preproc(
  ad,
  workdir = tempdir(),
  min_n_cells = 3L,
  norm_method = "lib_size",
  do_log = TRUE,
  n_top_genes = NULL,
  n_bins = 6L,
  simba_ref
)

Arguments

ad

AnnData instance (i.e., inherits from `anndata._core.anndata.AnnData`)

workdir

character(1) defines working directory for graph serialization; set to NULL to take simba default

min_n_cells

integer(1) used to filter genes, excluding those expressed in fewer than `min_n_cells` cells

norm_method

character(1) defaults to 'lib_size', see simba doc

do_log

logical(1) if TRUE, conducts log normalization

n_top_genes

NULL or numeric(1) if non-null, genes are excluded if variance is smaller than that obtained for the "nth" top

n_bins

integer(1) number of bins for expression discretization

simba_ref

instance of python.builtin.module, checked to have component 'tl'

Value

instance of AnnData with layers raw and simba, the latter consisting of a sparse matrix, and other components as determined by the operations selected through argument bindings.

Examples

p3k = get_10x3kpbmc_path(overwrite=TRUE)
ref = simba_ref()
pp = ref$read_h5ad(p3k)
pp
#> AnnData object with n_obs × n_vars = 2700 × 32738
#>     obs: 'celltype'
#>     var: 'gene_ids'
bb = basic_preproc(pp, simba_ref=ref)
bb
#> AnnData object with n_obs × n_vars = 2700 × 13714
#>     obs: 'celltype', 'n_counts', 'n_genes', 'pct_genes', 'pct_mt'
#>     var: 'gene_ids', 'n_counts', 'n_cells', 'pct_cells'
#>     uns: 'disc'
#>     layers: 'raw', 'simba'