Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.
Usage
make_tx_to_gene(file_path, file_type = c("fasta", "gff"))
Value
A tibble
containing the transcript-to-gene mapping information, including transcript IDs, gene IDs,
transcript names, gene names, and transcript types.
Details
The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.
Examples
# Assuming you have downloaded the GENCODE transcriptome FASTA file:
fasta_file <- download_reference(
version = "43",
organism = "human",
file_type = "fasta",
output_path = "data-raw"
)
#> ℹ data-raw/gencode.v43.transcripts.fa.gz already exists.
# Create the transcript-to-gene mapping table
tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta")
# View the first few rows
head(tx_to_gene)
#> # A tibble: 6 × 8
#> transcript_id gene_id havanna_gene_id havanna_transcript_id transcript_name
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ENST00000456328… ENSG00… - OTTHUMT00000362751.1 DDX11L2-202
#> 2 ENST00000450305… ENSG00… OTTHUMG0000000… OTTHUMT00000002844.2 DDX11L1-201
#> 3 ENST00000488147… ENSG00… OTTHUMG0000000… OTTHUMT00000002839.1 WASH7P-201
#> 4 ENST00000619216… ENSG00… - - MIR6859-1-201
#> 5 ENST00000473358… ENSG00… OTTHUMG0000000… OTTHUMT00000002840.1 MIR1302-2HG-202
#> 6 ENST00000469289… ENSG00… OTTHUMG0000000… OTTHUMT00000002841.2 MIR1302-2HG-201
#> # ℹ 3 more variables: gene_name <chr>, entrez_id <chr>, transcript_type <chr>