Skip to contents

Downloads reference annotation files from the GENCODE database for human or mouse genomes. Supports downloading GTF, GFF, and transcriptome FASTA files. The function handles directory creation and checks for existing files to avoid redundant downloads.

Usage

download_reference(
  version = "46",
  reference = "gencode",
  organism = c("human", "mouse"),
  file_type = c("gtf", "gff", "fasta"),
  output_path = "data-raw",
  timeout_limit = 3600,
  method = "auto"
)

Arguments

version

A character string specifying the GENCODE release version. For mouse references, include the letter 'M' in the version string (e.g., "M32"). Default is "46".

reference

A character string specifying the source of the reference file. Currently, only "gencode" is supported. Default is "gencode".

organism

A character string specifying the organism. Valid options are "human" or "mouse".

file_type

A character string specifying the type of file to download. Valid options are "gtf", "gff", or "fasta". Defaults to "gtf". Note: "fasta" refers to the transcriptome FASTA file.

output_path

A character string specifying the directory where the downloaded file will be saved. Defaults to "data-raw".

timeout_limit

A numeric value specifying the maximum time in seconds for the download to complete. This argument takes precedence over options("timeout"). Defaults to 3600 seconds (1 hour).

method

A character string specifying the method used by utils::download.file(). Defaults to "auto".

Value

A character string with the full path to the downloaded file.

Details

The function constructs the appropriate download URL based on the specified organism, version, and file type, and downloads the file to the specified output path. If the file already exists in the output directory, the function will not download it again and will return the existing file path. The function requires an internet connection and handles timeout settings to prevent download interruptions.

Note

Currently, only "gencode" reference files are supported. The "mane" reference is not implemented yet.

Examples

# Download human GTF file for GENCODE release 43
gtf_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "gtf",
  output_path = "data-raw"
)
#>  data-raw/gencode.v43.annotation.gtf.gz successfully downloaded.

# Download mouse GTF file for GENCODE release M32
gtf_file_mouse <- download_reference(
  version = "M32",
  organism = "mouse",
  file_type = "gtf",
  output_path = "data-raw"
)
#>  data-raw/gencode.vM32.annotation.gtf.gz successfully downloaded.

# Download human transcriptome FASTA file for GENCODE release 43
fasta_file <- download_reference(
  version = "43",
  organism = "human",
  file_type = "fasta",
  output_path = "data-raw"
)
#>  data-raw/gencode.v43.transcripts.fa.gz successfully downloaded.