Skip to contents

Extracts a transcript-to-gene mapping table from GENCODE annotation files, such as the transcriptome FASTA file. Currently, only FASTA files are supported.


make_tx_to_gene(file_path, file_type = c("fasta", "gff"))



A character string specifying the path to the reference file (e.g., GENCODE FASTA file).


A character string specifying the type of the reference file. Currently, only "fasta" is supported. Default is "fasta".


A tibble containing the transcript-to-gene mapping information, including transcript IDs, gene IDs, transcript names, gene names, and transcript types.


The function reads the headers of the FASTA file and extracts relevant information to create a mapping table. For GTF or GFF3 files, support is not yet implemented.


# Assuming you have downloaded the GENCODE transcriptome FASTA file:
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
#>  data-raw/gencode.v43.transcripts.fa.gz already exists.

# Create the transcript-to-gene mapping table
tx_to_gene <- make_tx_to_gene(file_path = fasta_file, file_type = "fasta")

# View the first few rows
#> # A tibble: 6 × 8
#>   transcript_id    gene_id havanna_gene_id havanna_transcript_id transcript_name
#>   <chr>            <chr>   <chr>           <chr>                 <chr>          
#> 1 ENST00000456328… ENSG00… -               OTTHUMT00000362751.1  DDX11L2-202    
#> 2 ENST00000450305… ENSG00… OTTHUMG0000000… OTTHUMT00000002844.2  DDX11L1-201    
#> 3 ENST00000488147… ENSG00… OTTHUMG0000000… OTTHUMT00000002839.1  WASH7P-201     
#> 4 ENST00000619216… ENSG00… -               -                     MIR6859-1-201  
#> 5 ENST00000473358… ENSG00… OTTHUMG0000000… OTTHUMT00000002840.1  MIR1302-2HG-202
#> 6 ENST00000469289… ENSG00… OTTHUMG0000000… OTTHUMT00000002841.2  MIR1302-2HG-201
#> # ℹ 3 more variables: gene_name <chr>, entrez_id <chr>, transcript_type <chr>