An S4 class for accessing GTF annotations stored in Parquet format. Unlike TxDb, preserves all GTF attributes (gene_type, gene_name, transcript_support_level, tags, etc.)
Arguments
- x
A
GTFParquetobject.
Value
A Seqinfo object containing chromosome names and genome build.
Details
GTFParquet objects are created by the GTFParquet constructor
function from a directory of Parquet files generated by gtf_to_parquet.py.
The class implements methods for GenomicFeatures generics including
genes, transcripts,
exons, cds,
exonsBy, cdsBy,
and transcriptsBy.
All methods support a filter argument for efficient querying
(e.g., filter = list(gene_type = "protein_coding")).
Slots
pathCharacter. Path to the Parquet directory.
filesList. Paths to individual Parquet files.
availableLogical vector. Which files are present.
is_partitionedLogical. Whether genes are partitioned by chromosome.
.genomeCharacter. Reference genome build (e.g., "GRCh38").
See also
GTFParquetfor the constructor functiongenes,GTFParquet-methodfor extracting genestranscriptsBy,GTFParquet-methodfor grouped extractorsTxDbfor comparison with TxDb objects
seqinfo
Examples
if (FALSE) { # \dontrun{
# Create from Parquet directory
gtf <- GTFParquet(system.file("gc49", package="lkparq"))
# Extract genes with full attributes
gr <- genes(gtf)
mcols(gr) # gene_name, gene_type, level, tags, etc.
# Filter by gene type
pc <- genes(gtf, filter = list(gene_type = "protein_coding"))
} # }