R/basic_preproc.R
basic_preproc.Rd
Transform an AnnData input to one that has had gene filtering, normalization, log transformation, low-variance gene exclusion, and discretization applied.
basic_preproc(
ad,
workdir = tempdir(),
min_n_cells = 3L,
norm_method = "lib_size",
do_log = TRUE,
n_top_genes = NULL,
n_bins = 6L,
simba_ref
)
AnnData instance (i.e., inherits from `anndata._core.anndata.AnnData`)
character(1) defines working directory for graph serialization; set to NULL to take simba default
integer(1) used to filter genes, excluding those expressed in fewer than `min_n_cells` cells
character(1) defaults to 'lib_size', see simba doc
logical(1) if TRUE, conducts log normalization
NULL or numeric(1) if non-null, genes are excluded if variance is smaller than that obtained for the "nth" top
integer(1) number of bins for expression discretization
instance of python.builtin.module, checked to have component 'tl'
instance of AnnData with layers raw and simba, the latter consisting of a sparse matrix, and other components as determined by the operations selected through argument bindings.
p3k = get_10x3kpbmc_path(overwrite=TRUE)
ref = simba_ref()
pp = ref$read_h5ad(p3k)
pp
#> AnnData object with n_obs × n_vars = 2700 × 32738
#> obs: 'celltype'
#> var: 'gene_ids'
bb = basic_preproc(pp, simba_ref=ref)
bb
#> AnnData object with n_obs × n_vars = 2700 × 13714
#> obs: 'celltype', 'n_counts', 'n_genes', 'pct_genes', 'pct_mt'
#> var: 'gene_ids', 'n_counts', 'n_cells', 'pct_cells'
#> uns: 'disc'
#> layers: 'raw', 'simba'