
Prepare Data for Gene and Transcript Expression Profile Plot
Source:R/prepare_profile_data.R
prepare_profile_data.RdThis function processes gene and transcript-level expression data, along with differential expression results, to prepare a tidy data frame suitable for plotting expression profiles across different sample groups.
Usage
prepare_profile_data(
txi_gene = NULL,
txi_transcript,
sample_metadata,
tx_to_gene,
de_result_gene,
de_result_transcript,
var,
var_levels,
gene_col = "gene_name",
tx_col = "transcript_name",
pvalue_cutoff = 0.05,
lfc_cutoff = 1,
use_fdr = TRUE
)Arguments
- txi_gene
A
tibbleortximportoutput containing gene-level expression abundances. IfNULL, gene-level abundances will be summarized fromtxi_transcript. Default isNULL.- txi_transcript
A
tibbleortximportoutput containing transcript-level expression abundances.- sample_metadata
A
data.frameortibblecontaining sample metadata. The first column should contain sample names matching the column names intxi_geneandtxi_transcript.- tx_to_gene
A
data.frameortibblecontaining transcript-to-gene mapping information. Must include columns specified bygene_colandtx_col.- de_result_gene
A
data.frameortibblecontaining differential expression results at the gene level. Must includegene_name,log2FC, andqvaluecolumns.- de_result_transcript
A
data.frameortibblecontaining differential expression results at the transcript level. Must includetranscript_name,log2FC, andqvaluecolumns.- var
A string specifying the column name in
sample_metadatathat indicates the grouping variable (e.g., treatment, condition).- var_levels
A character vector specifying the levels of
varto include in the contrasts.- gene_col
A string specifying the column name in
tx_to_genethat contains gene names. Default is"gene_name".- tx_col
A string specifying the column name in
tx_to_genethat contains transcript names. Default is"transcript_name".- pvalue_cutoff
A numeric value specifying the p-value cutoff for determining significant differential expression. Default is
0.05.- lfc_cutoff
A numeric value specifying the log2 fold-change cutoff for determining significant differential expression. Default is
1.- use_fdr
A logical value indicating whether to use the false discovery rate (
qvalue) instead of p-value for significance cutoff. Default isTRUE.
Value
A tibble containing processed expression data and differential expression flags, ready for plotting.
Details
The function combines gene and transcript expression data with differential expression results to generate a tidy data frame. It filters significant genes and transcripts based on specified cutoffs and prepares the data for plotting expression profiles across specified sample groups.
Examples
if (FALSE) { # \dontrun{
# Assuming txi_gene, txi_transcript, sample_metadata, tx_to_gene, de_result_gene,
# and de_result_transcript are pre-loaded data frames:
# Prepare data for plotting
expr_df <- prepare_profile_data(
txi_gene = txi_gene,
txi_transcript = txi_transcript,
sample_metadata = sample_metadata,
tx_to_gene = tx_to_gene,
de_result_gene = de_result_gene,
de_result_transcript = de_result_transcript,
var = "condition",
var_levels = c("control", "treatment"),
gene_col = "gene_name",
tx_col = "transcript_name",
pvalue_cutoff = 0.05,
lfc_cutoff = 1,
use_fdr = TRUE
)
# View the prepared data
utils::head(expr_df)
# Plotting example (assuming ggplot2 is installed)
library(ggplot2)
ggplot(expr_df, aes(x = condition, y = mean_TPM, fill = DE)) +
geom_bar(stat = "identity", position = position_dodge()) +
facet_wrap(~ parent_gene + transcript_type)
} # }