scATAC-seq Advanced Analysis: Peak-Gene Regulatory Association Analysis

Author: SeekGene

Time: 17 min

Words: 3.3k words

Updated: 2026-02-28

Reads: 0 times

scATAC + RNA-seq Analysis Guide Notebooks Peak-Gene Links Analysis

Background

ATAC_Peak2Gene is a Peak-to-Gene association analysis pipeline for integrating scATAC-seq and scRNA-seq in single-cell multi-omics data. It aims to systematically identify regulatory relationships between chromatin accessible regions (peaks) and gene expression, revealing the potential impact of cis-regulatory elements (CREs) on gene transcription. Based on the Signac/Seurat multi-omics integration framework, this pipeline combines the LinkPeaks algorithm, correlation analysis, clustering visualization, and statistical testing to provide a comprehensive interpretation from "chromatin accessibility → gene expression → regulatory association".

Core Principles

Peak-to-Gene Association Detection
- Uses Signac's LinkPeaks() function to establish associations between chromatin accessible regions and nearby genes based on distance constraints and statistical correlation.
- Calculates the correlation coefficient between each peak and gene expression, and obtains significance assessment through background model correction.
Omics Data Integration
- Analyzes both ATAC-seq (chromatin accessibility) and RNA-seq (gene expression) data to ensure biological plausibility of association analysis.
- Filters high-quality peak-gene association pairs through variance filtering, correlation thresholds, and significance testing.
Clustering and Visualization
- Performs K-means clustering based on Z-score standardized values of ATAC and RNA data to identify peak-gene combinations with similar regulatory patterns.
- Displays expression patterns of peak-gene associations in different cell types via heatmaps, revealing cell-type-specific regulatory relationships.

Parameter Settings

Please mount the cloud platform pipeline data to be analyzed first. Refer to the Jupyter usage tutorial for mounting instructions.

Specific parameters:

rds: Mounted pipeline-related rds data, located in /home/{UserName}/workspace/project/{UserName}/ directory, e.g., /home/shumeng/workspace/data/AY1752167131383/
meta: The metadata file in the same directory as rds
outdir: Path to save analysis results
clusters_col: Cell type column name. Select the label corresponding to the cell type or cluster to be analyzed. If you want to analyze annotated cell types, select the corresponding label, e.g., CellAnnotation, used in conjunction with celltypes.
celltypes: Cell types (multiple selection). Select the cell types or clustering results to be analyzed, such as Monocyte and Macrophage.
SAMple_col: Sample column, the column name in meta.data.
SAMples: Samples. Used with SAMple_col, specific content of the SAMple_col column in meta.data.
species: Species information
filter: Whether to filter Links. Options are TRUE or FALSE. If TRUE, fill in the following four specific filtering parameters: corCutOff (correlation threshold), pCutOff (significance threshold), varCutOffATAC (ATAC variance threshold), varCutOffRNA (RNA variance threshold).
k: Number of clusters for K-means clustering, used for link grouping. Default is 10.
nPlot: Default 1000. Limits the number of links analyzed.

# Parameters
outdir = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/_build/html/"
rds = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/input.rds"
meta = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/meta.txt"
species = "/jp_envs/reference/human/refdata-arc-GRCh38-2020-A/fasta/genome.fa"
clusters_col = "wknn_res.0.5_d30_l2_50"
celltypes = "0,1,10,11,12,13,14,15,16,17,2,3,4,5,6,7,8,9"
sample_col = "Sample"
samples = "LXHBCC_p_arc,LXHBCC_z_arc,SGKMM_p_arc,SGKMM_z_arc,SQKBCC_p_arc,SQKBCC_z_arc,XALSCC_p_arc,XALSCC_z_arc"
filter = "FALSE"
corCutOff = ""
pCutOff = ""
varCutOffATAC = ""
varCutOffRNA = ""
k = "10"
nPlot = "1000"

Environment Preparation

Load R Packages

Please select the common_r environment for this integration tutorial.

suppressPackageStartupMessages({
  library(Seurat)
  library(Signac)
  library(ArchR)
  library(BSgenome.Hsapiens.UCSC.hg38)
  library(BSgenome.Mmusculus.UCSC.mm10)
  library(dplyr)
  library(MatrixGenerics)
  library(future)
  library(parallel)
  library(ComplexHeatmap)
  library(magick)
})
plan(multisession, workers = min(detectCores()-1, 25))
options(future.globals.maxSize = 30 * 1024^3)

outdir <- paste0(outdir,'/output')
dir.create(outdir, recursive = TRUE)

celltypes <- strsplit(celltypes,",")[[1]]
samples <- strsplit(samples,",")[[1]]

filter <- as.logical(filter)
corCutOff <- as.numeric(corCutOff)
pCutOff <- as.numeric(pCutOff)
varCutOffATAC <- as.numeric(varCutOffATAC)
varCutOffRNA <- as.numeric(varCutOffRNA)

k <- as.numeric(k)
nPlot <- as.numeric(nPlot)

Data Loading

obj <- readRDS(rds)
if (meta != ""){
  meta <- read.table(meta,header=T,sep='\t',check.names=F)
  rownames(meta) <- meta$barcodes
  obj <- AddMetaData(obj, meta)
}
obj <- subset(obj, subset = !!sym(clusters_col) == celltypes)
obj <- subset(obj, subset = !!sym(sample_col) == samples)

n_fragments <- length(obj@assays$ATAC@fragments)
for (i in 1:n_fragments) {
  original_path <- obj@assays$ATAC@fragments[[i]]@path
  filename <- basename(original_path)
  obj@assays$ATAC@fragments[[i]]@path <- paste0(dirname(dirname(outdir)), '/', filename)
}

Define Heatmap Function

peak2GeneHeatmap <- function(
  seurat_obj,
  filter = FALSE,
  corCutOff = 0.45,
  pCutOff = 0.0001,
  varCutOffATAC = 0.25,
  varCutOffRNA = 0.25,
  k = 10,
  nPlot = 5000,
  groupBy = "seurat_clusters",
  palGroup = NULL,
  palATAC = paletteContinuous("solarExtra"),
  palRNA = paletteContinuous("blueYellow"),
  seed = 1
) {
    
  ## Set random seed
  set.seed(seed)
  
  ## Extract ATAC data
  peak_matrix <- GetAssayData(seurat_obj, assay = "ATAC", slot = "data")
  peak_vars <- rowVars(peak_matrix)
  peak_var_quantiles <- rank(peak_vars) / length(peak_vars)
  names(peak_var_quantiles) <- rownames(peak_matrix)
  
  ## Extract RNA data
  rna_matrix <- GetAssayData(seurat_obj, assay = "RNA", slot = "data")
  rna_vars <- rowVars(rna_matrix)
  rna_var_quantiles <- rank(rna_vars) / length(rna_vars)
  names(rna_var_quantiles) <- rownames(rna_matrix)

  links_data <- data.frame(
    peak = links$peak,
    gene = links$gene,
    score = links$score,
    pvalue = links$pvalue
  )
  
  if (filter){
    links_filtered <- links_data[
      abs(links_data$score) >= corCutOff & 
      links_data$pvalue <= pCutOff,
    ]

    links_filtered$peak_id <- match(links_filtered$peak, rownames(peak_matrix))
    links_filtered$gene_id <- match(links_filtered$gene, rownames(rna_matrix))
    ## Add variance quantiles
    links_filtered$VarQATAC <- peak_var_quantiles[rownames(peak_matrix)[links_filtered$peak_id]]
    links_filtered$VarQRNA <- rna_var_quantiles[rownames(rna_matrix)[links_filtered$gene_id]]
    ## Further filter links (based on variance quantiles)
    links_filtered <- links_filtered[links_filtered$VarQATAC > varCutOffATAC, ]
    links_filtered <- links_filtered[links_filtered$VarQRNA > varCutOffRNA, ]

  }else{
    links_filtered <- links_data
  }

  if(nrow(links_filtered) < 20) { 
    stop("Too few links, consider relaxing the filtering threshold")
  }

  ## Limit the number of links for analysis
  if (nrow(links_filtered) > nPlot) {
    links_filtered <- links_filtered[sample(1:nrow(links_filtered), nPlot), ]
  }

  ## Extract required peak regions and gene data
  peak_ids <- match(links_filtered$peak, rownames(peak_matrix))
  gene_ids <- match(links_filtered$gene, rownames(rna_matrix))
  
  atac_data <- peak_matrix[peak_ids, ]
  rna_data <- rna_matrix[gene_ids, ]
  
  ## Calculate Z-scores
  rowZscores <- function(matrix1, matrix2, min = NULL, max = NULL, limit = TRUE) {
    ## Z-score standardization for rows
    z1 <- t(scale(t(matrix1)))
    z2 <- t(scale(t(matrix2)))
    ## If range limitation is needed
    if(limit) {
        ## If min and max are not specified, calculate quantiles to determine appropriate range
        if(is.null(min) || is.null(max)) {
          middle_range1 <- quantile(abs(z1), probs = 0.875, na.rm = TRUE)
          ## Determine range for the first matrix
          if (middle_range1 < 1) {
            min1 <- -1
            max1 <- 1
          } else if (middle_range1 < 1.5) {
            min1 <- -1.5
            max1 <- 1.5
          } else {
            min1 <- -2
            max1 <- 2
          }
          middle_range2 <- quantile(abs(z2), probs = 0.875, na.rm = TRUE)
        
          if (middle_range2 < 1) {
            min2 <- -1
            max2 <- 1
          } else if (middle_range2 < 1.5) {
            min2 <- -1.5
            max2 <- 1.5
          } else {
            min2 <- -2
            max2 <- 2
          }
        min <- max(min1, min2)  # Take larger minimum
        max <- min(max1, max2)  # Take smaller maximum
        }

        z1[z1 > max] <- max
        z1[z1 < min] <- min
        z2[z2 > max] <- max
        z2[z2 < min] <- min
    }
    return(list(matrix1 = z1, matrix2 = z2))
  }
  # atac_zscores <- rowZscores(as.matrix(atac_data))
  # rna_zscores <- rowZscores(as.matrix(rna_data))
  zs <- rowZscores(as.matrix(atac_data), as.matrix(rna_data))
  atac_zscores <- zs$matrix1
  rna_zscores <- zs$matrix2
  ## Get cell grouping information
  cell_groups <- seurat_obj@meta.data[, groupBy]
  names(cell_groups) <- colnames(seurat_obj)
  
  ## K-means clustering
  m <- nrow(atac_zscores)
  if(k > m) {
    message(sprintf("Requested cluster number (k=%d) is larger than data point number (m=%d)", k, m))
    # k <- min(k, m)
    k <- 1
    message(sprintf("Automatically adjusting cluster number to: k=%d", k))
  }

  k_clusters <- kmeans(atac_zscores, centers = k)
  cluster_ids <- k_clusters$cluster

  while(1 %in% table(cluster_ids)){
    k <- ceiling(k/2)
    message(sprintf("Too many clusters, automatically adjusting cluster number to: k=%d", k))
    k_clusters <- kmeans(atac_zscores, centers = k)
    cluster_ids <- k_clusters$cluster
  }

  ## Calculate mean of each group in each cluster
  group_means_atac <- matrix(0, nrow = k, ncol = length(levels(as.factor(cell_groups))))
  group_means_rna <- matrix(0, nrow = k, ncol = length(levels(as.factor(cell_groups))))
  colnames(group_means_atac) <- levels(as.factor(cell_groups))
  colnames(group_means_rna) <- levels(as.factor(cell_groups))
  
  for (i in 1:k) {
    cluster_rows <- which(cluster_ids == i)
    if (length(cluster_rows) > 0) {
      cluster_atac <- atac_zscores[cluster_rows, , drop = FALSE]
      cluster_rna <- rna_zscores[cluster_rows, , drop = FALSE]
      
      ## If there is only one row, keep it as a matrix
      # if (length(cluster_rows) == 1) {
      #   cluster_atac <- matrix(cluster_atac, nrow = 1)
      #   cluster_rna <- matrix(cluster_rna, nrow = 1)
      #   colnames(cluster_atac) <- colnames(atac_zscores)
      #   colnames(cluster_rna) <- colnames(rna_zscores)
      # }
      
      for (g in levels(as.factor(cell_groups))) {
        group_cells <- names(cell_groups)[cell_groups == g]
        if (length(group_cells) > 0) {
          group_means_atac[i, g] <- mean(rowMeans(cluster_atac[, group_cells, drop = FALSE]), na.rm = TRUE)
          group_means_rna[i, g] <- mean(rowMeans(cluster_rna[, group_cells, drop = FALSE]), na.rm = TRUE)
        }
      }
    }
  }

  ## Reorder clusters
  hc <- hclust(dist(group_means_atac))
  cluster_order <- hc$order
  
  ## Ordered data
  ordered_links <- list()
  ordered_atac <- list()
  ordered_rna <- list()
  
  for (i in cluster_order) {
    cluster_rows <- which(cluster_ids == i)
    if (length(cluster_rows) > 0) {
      ordered_links <- c(ordered_links, list(links_filtered[cluster_rows, ]))
      ordered_atac <- c(ordered_atac, list(atac_zscores[cluster_rows, ]))
      ordered_rna <- c(ordered_rna, list(rna_zscores[cluster_rows, ]))
    }
  }
  
  ## Merge data
  ordered_links_df <- do.call(rbind, ordered_links)
  ordered_atac_mat <- do.call(rbind, ordered_atac)
  ordered_rna_mat <- do.call(rbind, ordered_rna)

  ## Prepare group colors
  if (is.null(palGroup)) {
    group_levels <- levels(as.factor(cell_groups))
    palGroup <- setNames(
      scales::hue_pal()(length(group_levels)),
      group_levels
    )
  }
  
  ## Create group annotations
  column_atac <- ComplexHeatmap::HeatmapAnnotation(
    Group = cell_groups,
    col = list(Group = palGroup),
    show_legend = FALSE,
    show_annotation_name = FALSE
  )

  column_rna <- ComplexHeatmap::HeatmapAnnotation(
    Cell = cell_groups,
    col = list(Cell = palGroup),
    show_annotation_name = FALSE
  )

  ## Create heatmap
  atac_heatmap <- ComplexHeatmap::Heatmap(
    ordered_atac_mat,
    name = "ATAC Z-score",
    col = palATAC,
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    column_order = order(colMeans(ordered_rna_mat), decreasing = TRUE),
    show_row_names = FALSE,
    show_column_names = FALSE,
    top_annotation = column_atac,
    column_split = cell_groups,
    column_gap = unit(0, "mm"),
    column_title = NULL,
    row_split = rep(1:k, sapply(ordered_atac, nrow)),
    use_raster = TRUE,
    raster_quality = 1,
    heatmap_legend_param = list(
      direction = "horizontal",
      title_position = "topcenter"
    )
  )
  
  rna_heatmap <- ComplexHeatmap::Heatmap(
    ordered_rna_mat,
    name = "RNA Z-score",
    col = palRNA,
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    column_order = order(colMeans(ordered_rna_mat), decreasing = TRUE),
    show_row_names = FALSE,
    show_column_names = FALSE,
    top_annotation = column_rna,
    column_split = cell_groups,
    column_gap = unit(0, "mm"),
    column_title = NULL,
    row_split = rep(1:k, sapply(ordered_rna, nrow)),
    use_raster = TRUE,
    raster_quality = 1,
    heatmap_legend_param = list(
      direction = "horizontal",
      title_position = "topcenter"
    )
  )

  draw((atac_heatmap + rna_heatmap), heatmap_legend_side = "bottom")
  return(links_filtered)
}

Define Histogram Function

plotPeaksPerGene <- function(links) {
  ## Count number of linked peaks per gene
  gene_counts <- table(links$gene)
  gene_counts_df <- data.frame(
    gene = names(gene_counts),
    count = as.numeric(gene_counts)
  )
    
  ## Calculate median
  median_linked_Peaks <- median(gene_counts_df$count)
  
  ## Plot histogram
  p <- ggplot(gene_counts_df, aes(x = count)) +
    geom_histogram(binwidth = 1, fill = "black", color = "white") +
    geom_vline(xintercept = median_linked_Peaks, linetype = "dashed", color = "gray") +
    annotate("text", x = 10, y = 1000, 
             label = paste0("Median Linked Peaks = ", median_linked_Peaks)) +
    labs(x = "Number of linked peaks", y = "Number of genes") +
    theme_classic() +
    xlim(0, 25) #+ 
    # theme(text = element_text(family = "sans"))
    
  return(p)
}

Define Volcano Plot Function

## Add volcano plot: Display correlation and significance of peak-gene links
plotPeak2GeneVolcano <- function(links, topN = 10, p = 0.05, score = 0.05) {
  ## Prepare data
  volcano_data <- data.frame(
    peak = links$peak,
    gene = links$gene,
    correlation = links$score,
    pvalue = links$pvalue,
    logPvalue = -log10(links$pvalue + 1e-300) # Avoid log(0)
  )
  
  ## Directly find the top N most significant genes
  top_genes_indices <- order(volcano_data$logPvalue, decreasing = TRUE)[1:min(topN, nrow(volcano_data))]
  top_genes <- volcano_data[top_genes_indices, ]
  
  ## Mark points to be highlighted (labeled points)
  volcano_data$highlighted <- "No"
  volcano_data$highlighted[top_genes_indices] <- "Yes"
  top_genes$highlighted <- "Yes"
  
  ## Plot volcano plot
  p <- ggplot(volcano_data, aes(x = correlation, y = logPvalue, color = highlighted)) +
    geom_point(alpha = 0.6, size = 1) +
    scale_color_manual(values = c("No" = "gray", "Yes" = "red")) +
    labs(
      # title = "Peak-to-Gene Linkage Volcano Plot",
      x = "Correlation Score",
      y = "-log10(p-value)",
      color = "Top Genes"
    ) +
    theme_classic() +
    theme(
      legend.position = "none",
      # text = element_text(family = "sans")
    ) +
    ## Add dashed lines
    geom_vline(xintercept = -score, linetype = "dashed", color = "black", alpha = 0.7) +
    geom_vline(xintercept = score, linetype = "dashed", color = "black", alpha = 0.7) +
    geom_hline(yintercept = -log10(p), linetype = "dashed", color = "black", alpha = 0.7) 
  
  ## Label top genes
  if(nrow(top_genes) > 0) {
    p <- p + ggrepel::geom_text_repel(
      data = top_genes,
      # family = "sans",
      aes(label = gene),
      size = 3,
      box.padding = 0.5,
      point.padding = 0.2,
      force = 2,
      max.overlaps = 20
    )
  }
  
  return(p)
}

Define Correlation Rank Plot

## Plot correlation rank using all score values in links
plotRankCorrelation <- function(links) {
  ## Extract all correlation values
  all_correlations <- links$score
  
  ## Sort by correlation from high to low
  sorted_correlations <- sort(all_correlations, decreasing = TRUE)
  correlation_df <- data.frame(
    rank = 1:length(sorted_correlations),
    correlation = sorted_correlations
  )
  
  ## Plot rank plot
  p <- ggplot(correlation_df, aes(x = rank, y = correlation)) +
    geom_line(size = 1) +
    geom_hline(yintercept = 0, linetype = "solid", color = "black") +
    labs(
      # title = "Peak-Gene correlation",
      x = "Peak-Gene Linkage",
      y = "Correlation Score"
    ) +
    theme_classic() +
    theme(
      # text = element_text(family = "sans"),
      axis.title = element_text(size = 14),
      axis.text = element_text(size = 12),
      plot.title = element_text(hjust = 0.5, size = 14),
      axis.text.x = element_blank(),  # Hide x-axis tick labels
      axis.ticks.x = element_blank()  # Hide x-axis tick marks
    )
  
  return(p)
}

Start Peak-to-Gene Association Analysis Between Cell Groups

We start implementing Peak-to-Gene association analysis. The core of this analysis is to establish statistical associations between chromatin accessible regions (peaks) and gene expression, thereby identifying potential cis-regulatory relationships.

Analysis Workflow Overview

Our analysis follows these steps:

Check existing associations: First check if peak-to-gene links already exist in the Seurat object
Establish associations: If links do not exist, use LinkPeaks() function to establish new associations
Data preprocessing: Perform chromosome filtering and region statistics calculation on genomic data
Output results: Save association results as table files for subsequent analysis

Key Technical Parameters

LinkPeaks function: Integrates ATAC-seq and RNA-seq data, calculates correlation between peaks and gene expression
Distance constraint: Defaults to searching for associated peak regions within 500kb of gene TSS
Statistical assessment: Calculates Z-score and P-value through background model correction to assess association significance
Genome compatibility: Supports human and mouse genomes, automatically adapts chromosome information

Output Results Description

Upon completion of analysis, we will obtain:
- Gene2PeakLinks.xls: Detailed table of peak-gene association pairs, including peak position, gene name, correlation score, and significance metrics
- Association statistics: Summary information such as peak count, gene count, significant association pair count, etc.
- QC metrics: Statistical features such as association strength distribution, distance distribution, etc.

## Check peak-to-gene links
links <- Signac::Links(obj[["ATAC"]])
if (length(links) == 0) {
  # if (species == "human"){
  #     genome = BSgenome.Hsapiens.UCSC.hg38
  # }else if (species == "mouse"){
  #     genome = BSgenome.Mmusculus.UCSC.mm10
  # }
  library(Biostrings)
  genome <- readDNAStringSet(species)
  names(genome) <- sub(" .*", "", names(genome)) 
  keep_chromosomes <- unique(as.character(seqnames(granges(obj[["ATAC"]]))))
  genome <- genome[which(names(genome) %in% keep_chromosomes)]

  obj <- RegionStats(
    object = obj,
    genome = genome,
    assay = "ATAC"
  )
  obj <- LinkPeaks(
      object = obj,
      peak.assay = "ATAC",
      expression.assay = "RNA"
  )
  links <- Signac::Links(obj[["ATAC"]])
  
  if (length(links) == 0) {
    stop("No peak-to-gene links found")
  }
}
write.table(as.data.frame(links), file = file.path(outdir, "Gene2PeakLinks.xls"), sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE)

Results Visualization

Heatmap

The heatmap shows the correlation patterns between chromatin accessible regions (peaks) and gene expression. The heatmap is divided into two parts:
Each row in the heatmap represents a peak-gene pair, and each column represents a cell.
ATAC heatmap (left): Shows ATAC-seq signal intensity (chromatin accessibility). Blue indicates low accessibility, red indicates high accessibility.
RNA heatmap (right): Shows expression levels of corresponding genes. Blue indicates low expression, yellow indicates high expression.
The top color bar identifies different cell types or groups, helping to observe peak-gene association patterns in specific cell types.

pdf(file.path(outdir, "Peak2GeneHeatmap.pdf"), width = 12, height = 10)
links <- peak2GeneHeatmap(
    obj,
    filter = filter,
    corCutOff = corCutOff,
    pCutOff = pCutOff,
    varCutOffATAC = varCutOffATAC,
    varCutOffRNA = varCutOffRNA,
    k = k,
    nPlot = nPlot,
    groupBy = clusters_col
  )
dev.off()

png(file.path(outdir, "Peak2GeneHeatmap.png"), width = 12, height = 10, units = "in", res = 300)
links <- peak2GeneHeatmap(
    obj,
    filter = filter,
    corCutOff = corCutOff,
    pCutOff = pCutOff,
    varCutOffATAC = varCutOffATAC,
    varCutOffRNA = varCutOffRNA,
    k = k,
    nPlot = nPlot,
    groupBy = clusters_col
  )
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneHeatmap.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneHeatmap.png"), format = "png")

Histogram

The histogram shows the number of genes with 1 ~ 25 associated peaks.
The x-axis represents the number of associated peaks, and the y-axis represents the number of genes associated with that number of peaks.
The dashed line indicates the median position, and the text labels the specific median value.

pdf(file.path(outdir, "Peak2GeneHistogram.pdf"), width = 8, height = 8)
print(plotPeaksPerGene(links))
dev.off()

png(file.path(outdir, "Peak2GeneHistogram.png"), width = 8, height = 8, units = "in", res = 300)
print(plotPeaksPerGene(links))
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneHistogram.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneHistogram.png"), format = "png")

Volcano Plot

The volcano plot labels the Top 10 most significant peak-gene associated genes.
The x-axis is the correlation score, and the y-axis is -log10(p).

pdf(file.path(outdir, "Peak2GeneVolcano.pdf"), width = 8, height = 8)
if(filter == TRUE) {
  print(plotPeak2GeneVolcano(links, p = pCutOff, score = corCutOff))
} else {
  print(plotPeak2GeneVolcano(links))
}
dev.off()

png(file.path(outdir, "Peak2GeneVolcano.png"), width = 8, height = 8, units = "in", res = 300)
if(filter == TRUE) {
  print(plotPeak2GeneVolcano(links, p = pCutOff, score = corCutOff))
} else {
  print(plotPeak2GeneVolcano(links))
}
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneVolcano.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneVolcano.png"), format = "png")

Correlation Rank Plot

The curve plot shows the correlation coefficients of all peak-gene pairs, sorted from high to low.
The x-axis represents peak-gene pairs, and the y-axis represents their correlation coefficients.

pdf(file.path(outdir, "Peak2GeneRank.pdf"), width = 8, height = 6)
print(plotRankCorrelation(links))
dev.off()

png(file.path(outdir, "Peak2GeneRank.png"), width = 8, height = 6, units = "in", res = 300)
print(plotRankCorrelation(links))
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneRank.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneRank.png"), format = "png")

Result Files

Peak2GeneHeatmap.pdf/png - Peak-to-Gene Association Heatmap
Peak2GeneHistogram.pdf/png - Peak-to-Gene Histogram
Peak2GeneVolcano.pdf/png - Peak-to-Gene Volcano Plot
Peak2GeneRank.pdf/png - Peak-to-Gene Rank Curve Plot

Results Directory: Includes all image and table result files involved in this analysis.

Literature Case Study

- Literature 1: In the paper "Spatial multiomic landscape of the human placenta at molecular resolution", to reveal the relationship between genes and distal CREs, the authors associated distal peaks with genes in cis and identified 43,622 significant peak-gene pairs (representing potential enhancer-gene relationships).

References

[1] Stuart, Tim, Avi Srivastava, Caleb Lareau, and Rahul Satija. "Multimodal single-cell chromatin analysis with Signac." BioRxiv (2020): 2020-11.

[2] Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, Ding J, Brack A, Kartha VK, Tay T, Law T, Lareau C, Hsu YC, Regev A, Buenrostro JD. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell. 2020 Nov 12;183(4):1103-1116.e20. doi: 10.1016/j.cell.2020.09.056. Epub 2020 Oct 23. PMID: 33098772; PMCID: PMC7669735.

Appendix

.csv/.xls/.txt: Result data table files, separated by commas or tabs. Linux/Mac users use less or more commands to view; Windows users use advanced text editors like Notepad++ or open with Microsoft Excel.
.png: Result image files, bitmap, lossless compression.
.pdf: Result image files, vector graphics, can be zoomed in and out without distortion, convenient for viewing and editing. Can be edited with Adobe Illustrator for publication.

scATAC-seq Advanced Analysis: Peak-Gene Regulatory Association Analysis ​

Background ​

Core Principles ​

Parameter Settings ​

Environment Preparation ​

Start Peak-to-Gene Association Analysis Between Cell Groups ​

Analysis Workflow Overview ​

Results Visualization ​

Result Files ​

Literature Case Study ​

References ​

Appendix ​