Skip to contents

An S4 class for accessing GTF annotations stored in Parquet format. Unlike TxDb, preserves all GTF attributes (gene_type, gene_name, transcript_support_level, tags, etc.)

Usage

# S4 method for class 'GTFParquet'
genome(x)

# S4 method for class 'GTFParquet'
seqinfo(x)

Arguments

x

A GTFParquet object.

Value

A Seqinfo object containing chromosome names and genome build.

Details

GTFParquet objects are created by the GTFParquet constructor function from a directory of Parquet files generated by gtf_to_parquet.py.

The class implements methods for GenomicFeatures generics including genes, transcripts, exons, cds, exonsBy, cdsBy, and transcriptsBy.

All methods support a filter argument for efficient querying (e.g., filter = list(gene_type = "protein_coding")).

Slots

path

Character. Path to the Parquet directory.

files

List. Paths to individual Parquet files.

available

Logical vector. Which files are present.

is_partitioned

Logical. Whether genes are partitioned by chromosome.

.genome

Character. Reference genome build (e.g., "GRCh38").

See also

seqinfo

Examples

if (FALSE) { # \dontrun{
# Create from Parquet directory
gtf <- GTFParquet(system.file("gc49", package="lkparq"))

# Extract genes with full attributes
gr <- genes(gtf)
mcols(gr)  # gene_name, gene_type, level, tags, etc.

# Filter by gene type
pc <- genes(gtf, filter = list(gene_type = "protein_coding"))
} # }