Extract and group genomic features from a GTFParquet object
Source:R/gtf_parquet.R
transcriptsBy.RdGeneric functions to extract genomic features of a given type grouped based on another type of genomic feature. These methods extend the GenomicFeatures generics for GTFParquet objects.
Usage
# S4 method for class 'GTFParquet'
transcriptsBy(x, by="gene", filter=NULL)
# S4 method for class 'GTFParquet'
exonsBy(x, by=c("tx", "gene"), filter=NULL)
# S4 method for class 'GTFParquet'
cdsBy(x, by=c("tx", "gene"), filter=NULL)
# S4 method for class 'GTFParquet'
cdsBy(x, by = c("tx", "gene"), filter = NULL)
# S4 method for class 'GTFParquet'
transcriptsBy(x, by = "gene", filter = NULL)Arguments
- x
A
GTFParquetobject.- by
One of
"gene","tx"(transcript). Determines the grouping. FortranscriptsBy, only"gene"is currently supported.- filter
Optional named list for filtering features before grouping. Names should be column names (e.g.,
gene_type,chrom), values are vectors of acceptable values. Example:filter = list(gene_type = "protein_coding", chrom = "chr1")
Value
A GRangesList object. The names of the list elements are the IDs of the grouping features (gene IDs or transcript IDs).
For GTFParquet objects, the names use stripped (unversioned) IDs by default
(e.g., ENSG00000141510 rather than ENSG00000141510.18).
Details
These functions return a GRangesList object where the ranges within each of the elements are ordered according to the following rule:
When using exonsBy or cdsBy with by = "tx",
the returned exons or CDS are ordered by ascending exon number for each
transcript, that is, by their position in the transcript.
In all other cases, the ranges will be ordered by chromosome, strand,
start, and end values.
Unlike TxDb methods, GTFParquet methods preserve rich metadata columns
including transcript_name, transcript_type, exon_number,
protein_id, and frame.
The filter argument allows efficient server-side filtering before
data is loaded into R, which can dramatically improve performance for
large annotation files.
See also
GTFParquet-classfor the class definitiongenes,GTFParquet-methodfor extracting ungrouped featurestranscriptsByfor the genericexonsByfor the genericcdsByfor the generic
Examples
if (FALSE) { # \dontrun{
gtf <- GTFParquet(system.file("gc49", package="lkparq"))
# Exons grouped by transcript (sorted by exon_number)
ebt <- exonsBy(gtf, by = "tx")
ebt[[1]] # Exons for first transcript
# Exons grouped by gene
ebg <- exonsBy(gtf, by = "gene")
# CDS grouped by transcript
cbt <- cdsBy(gtf, by = "tx")
# Transcripts grouped by gene
tbg <- transcriptsBy(gtf, by = "gene")
# Filter to protein-coding only
pc_exons <- exonsBy(gtf, by = "tx",
filter = list(gene_type = "protein_coding"))
# Filter by chromosome
chr1_cds <- cdsBy(gtf, by = "tx", filter = list(chrom = "chr1"))
} # }