scATAC-seq 高级分析：Peak-Gene 调控关联分析

作者: SeekGene

时长: 18 分钟

字数: 4.2k 字

更新: 2026-02-28

阅读: 0 次

ATAC + RNA 双组学分析指南 Notebooks Peak-Gene Links分析

背景介绍

ATAC_Peak2Gene 是面向单细胞多组学数据中 scATAC-seq 与 scRNA-seq 整合分析的 Peak-to-Gene 关联分析流程，旨在系统性地识别染色质开放区域（peaks）与基因表达之间的调控关系，揭示顺式调控元件（CREs）对基因转录的潜在影响。该流程基于 Signac/Seurat 的多组学整合框架，结合 LinkPeaks 算法、相关性分析、聚类可视化以及统计检验，提供从"染色质可及性 → 基因表达 → 调控关联"的全面解读。

核心原理

Peak-to-Gene 关联检测
- 采用 Signac 的 LinkPeaks() 函数，基于距离约束和统计相关性，将染色质开放区域与邻近基因建立关联关系。
- 计算每个 peak 与基因表达之间的相关系数，并通过背景模型校正获得显著性评估。
组学数据整合
- 同时分析 ATAC-seq（染色质可及性）和 RNA-seq（基因表达）数据，确保关联分析的生物学合理性。
- 通过方差过滤、相关性阈值和显著性检验，筛选高质量的 peak-gene 关联对。
聚类与可视化
- 基于 ATAC 和 RNA 数据的 Z-score 标准化值进行 K-means 聚类，识别具有相似调控模式的 peak-gene 组合。
- 通过热图展示不同细胞类型中 peak-gene 关联的表达模式，揭示细胞类型特异的调控关系。

参数设置

请先挂载将要分析的云平台相关流程数据，挂载方式请参照 jupyter 使用教程

具体参数填写：

rds：挂载的流程相关的 rds 数据，在/home/{UserName}/workspace/project/{UserName}/目录下，如下列项目数据/home/shumeng/workspace/data/AY1752167131383/
meta：与 rds 同目录下的 metadata 文件
outdir: 分析结果保存路径
clusters_col：细胞类型列名，选择要分析的细胞类型或者聚类对应的标签。假如想对注释好的细胞类型进行分析，则选择其对应的标签，例如 CellAnnotation，与<细胞类型>配合使用。
celltypes：细胞类型，多选，选择要分析的细胞类型或者聚类结果，如 Monocyte 和 Macrophage 等。
SAMple_col: 样本列，meta.data 的列名。
SAMples：样本，和 SAMple_col 搭配使用，是 meta.data 的 SAMple_col 列具体的内容。
species：物种信息
filter：是否过滤 Links,选项为 TRUE or FALSE，若为 TRUE，则填写以下四个具体过滤参数，corCutOff（相关性阈值）、pCutOff（显著性阈值）、varCutOffATAC（ATAC 方差阈值）、varCutOffRNA（RNA 方差阈值）。
k: K-means 聚类的簇数，用于 link 分组，默认为 10。
nPlot：默认 1000，限制分析的链接数量。

# Parameters
outdir = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/_build/html/"
rds = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/input.rds"
meta = "/PROJ2/FLOAT/advanced-analysis/prod/18676-49338-86595-580897984602987130/meta.txt"
species = "/jp_envs/reference/human/refdata-arc-GRCh38-2020-A/fasta/genome.fa"
clusters_col = "wknn_res.0.5_d30_l2_50"
celltypes = "0,1,10,11,12,13,14,15,16,17,2,3,4,5,6,7,8,9"
sample_col = "Sample"
samples = "LXHBCC_p_arc,LXHBCC_z_arc,SGKMM_p_arc,SGKMM_z_arc,SQKBCC_p_arc,SQKBCC_z_arc,XALSCC_p_arc,XALSCC_z_arc"
filter = "FALSE"
corCutOff = ""
pCutOff = ""
varCutOffATAC = ""
varCutOffRNA = ""
k = "10"
nPlot = "1000"

环境准备

R 包加载

请选择 common_r 这个环境进行该整合教程的学习

suppressPackageStartupMessages({
  library(Seurat)
  library(Signac)
  library(ArchR)
  library(BSgenome.Hsapiens.UCSC.hg38)
  library(BSgenome.Mmusculus.UCSC.mm10)
  library(dplyr)
  library(MatrixGenerics)
  library(future)
  library(parallel)
  library(ComplexHeatmap)
  library(magick)
})
plan(multisession, workers = min(detectCores()-1, 25))
options(future.globals.maxSize = 30 * 1024^3)

outdir <- paste0(outdir,'/output')
dir.create(outdir, recursive = TRUE)

celltypes <- strsplit(celltypes,",")[[1]]
samples <- strsplit(samples,",")[[1]]

filter <- as.logical(filter)
corCutOff <- as.numeric(corCutOff)
pCutOff <- as.numeric(pCutOff)
varCutOffATAC <- as.numeric(varCutOffATAC)
varCutOffRNA <- as.numeric(varCutOffRNA)

k <- as.numeric(k)
nPlot <- as.numeric(nPlot)

数据读取

obj <- readRDS(rds)
if (meta != ""){
  meta <- read.table(meta,header=T,sep='\t',check.names=F)
  rownames(meta) <- meta$barcodes
  obj <- AddMetaData(obj, meta)
}
obj <- subset(obj, subset = !!sym(clusters_col) == celltypes)
obj <- subset(obj, subset = !!sym(sample_col) == samples)

n_fragments <- length(obj@assays$ATAC@fragments)
for (i in 1:n_fragments) {
  original_path <- obj@assays$ATAC@fragments[[i]]@path
  filename <- basename(original_path)
  obj@assays$ATAC@fragments[[i]]@path <- paste0(dirname(dirname(outdir)), '/', filename)
}

定义热图函数

peak2GeneHeatmap <- function(
  seurat_obj,
  filter = FALSE,
  corCutOff = 0.45,
  pCutOff = 0.0001,
  varCutOffATAC = 0.25,
  varCutOffRNA = 0.25,
  k = 10,
  nPlot = 5000,
  groupBy = "seurat_clusters",
  palGroup = NULL,
  palATAC = paletteContinuous("solarExtra"),
  palRNA = paletteContinuous("blueYellow"),
  seed = 1
) {
    
  # 设置随机种子
  set.seed(seed)
  
  # 提取ATAC数据
  peak_matrix <- GetAssayData(seurat_obj, assay = "ATAC", slot = "data")
  peak_vars <- rowVars(peak_matrix)
  peak_var_quantiles <- rank(peak_vars) / length(peak_vars)
  names(peak_var_quantiles) <- rownames(peak_matrix)
  
  # 提取RNA数据
  rna_matrix <- GetAssayData(seurat_obj, assay = "RNA", slot = "data")
  rna_vars <- rowVars(rna_matrix)
  rna_var_quantiles <- rank(rna_vars) / length(rna_vars)
  names(rna_var_quantiles) <- rownames(rna_matrix)

  links_data <- data.frame(
    peak = links$peak,
    gene = links$gene,
    score = links$score,
    pvalue = links$pvalue
  )
  
  if (filter){
    links_filtered <- links_data[
      abs(links_data$score) >= corCutOff & 
      links_data$pvalue <= pCutOff,
    ]

    links_filtered$peak_id <- match(links_filtered$peak, rownames(peak_matrix))
    links_filtered$gene_id <- match(links_filtered$gene, rownames(rna_matrix))
    # 添加方差quantiles
    links_filtered$VarQATAC <- peak_var_quantiles[rownames(peak_matrix)[links_filtered$peak_id]]
    links_filtered$VarQRNA <- rna_var_quantiles[rownames(rna_matrix)[links_filtered$gene_id]]
    # 进一步过滤链接（基于方差quantiles）
    links_filtered <- links_filtered[links_filtered$VarQATAC > varCutOffATAC, ]
    links_filtered <- links_filtered[links_filtered$VarQRNA > varCutOffRNA, ]

  }else{
    links_filtered <- links_data
  }

  if(nrow(links_filtered) < 20) { 
    stop("link过少，建议放宽过滤阈值")
  }

  # 限制分析的链接数量
  if (nrow(links_filtered) > nPlot) {
    links_filtered <- links_filtered[sample(1:nrow(links_filtered), nPlot), ]
  }

  # 提取需要的峰区域和基因数据
  peak_ids <- match(links_filtered$peak, rownames(peak_matrix))
  gene_ids <- match(links_filtered$gene, rownames(rna_matrix))
  
  atac_data <- peak_matrix[peak_ids, ]
  rna_data <- rna_matrix[gene_ids, ]
  
  # 计算Z-scores
  rowZscores <- function(matrix1, matrix2, min = NULL, max = NULL, limit = TRUE) {
    # 对行进行z-score标准化
    z1 <- t(scale(t(matrix1)))
    z2 <- t(scale(t(matrix2)))
    # 如果需要限制范围
    if(limit) {
        # 如果没有指定min和max，计算分位数来确定合适的范围
        if(is.null(min) || is.null(max)) {
          middle_range1 <- quantile(abs(z1), probs = 0.875, na.rm = TRUE)
          # 确定第一个矩阵的范围
          if (middle_range1 < 1) {
            min1 <- -1
            max1 <- 1
          } else if (middle_range1 < 1.5) {
            min1 <- -1.5
            max1 <- 1.5
          } else {
            min1 <- -2
            max1 <- 2
          }
          middle_range2 <- quantile(abs(z2), probs = 0.875, na.rm = TRUE)
        
          if (middle_range2 < 1) {
            min2 <- -1
            max2 <- 1
          } else if (middle_range2 < 1.5) {
            min2 <- -1.5
            max2 <- 1.5
          } else {
            min2 <- -2
            max2 <- 2
          }
        min <- max(min1, min2)  # 取较大的最小值
        max <- min(max1, max2)  # 取较小的最大值
        }

        z1[z1 > max] <- max
        z1[z1 < min] <- min
        z2[z2 > max] <- max
        z2[z2 < min] <- min
    }
    return(list(matrix1 = z1, matrix2 = z2))
  }
  # atac_zscores <- rowZscores(as.matrix(atac_data))
  # rna_zscores <- rowZscores(as.matrix(rna_data))
  zs <- rowZscores(as.matrix(atac_data), as.matrix(rna_data))
  atac_zscores <- zs$matrix1
  rna_zscores <- zs$matrix2
  # 获取细胞分组信息
  cell_groups <- seurat_obj@meta.data[, groupBy]
  names(cell_groups) <- colnames(seurat_obj)
  
  # K-means聚类
  m <- nrow(atac_zscores)
  if(k > m) {
    message(sprintf("要求的聚类数(k=%d)大于数据点数量(m=%d)", k, m))
    # k <- min(k, m)
    k <- 1
    message(sprintf("自动调整聚类数为: k=%d", k))
  }

  k_clusters <- kmeans(atac_zscores, centers = k)
  cluster_ids <- k_clusters$cluster

  while(1 %in% table(cluster_ids)){
    k <- ceiling(k/2)
    message(sprintf("聚类数过多，自动调整聚类数为: k=%d", k))
    k_clusters <- kmeans(atac_zscores, centers = k)
    cluster_ids <- k_clusters$cluster
  }

  # 计算每个聚类中每个组的平均值
  group_means_atac <- matrix(0, nrow = k, ncol = length(levels(as.factor(cell_groups))))
  group_means_rna <- matrix(0, nrow = k, ncol = length(levels(as.factor(cell_groups))))
  colnames(group_means_atac) <- levels(as.factor(cell_groups))
  colnames(group_means_rna) <- levels(as.factor(cell_groups))
  
  for (i in 1:k) {
    cluster_rows <- which(cluster_ids == i)
    if (length(cluster_rows) > 0) {
      cluster_atac <- atac_zscores[cluster_rows, , drop = FALSE]
      cluster_rna <- rna_zscores[cluster_rows, , drop = FALSE]
      
      # 如果只有一行，需要保持为矩阵形式
      # if (length(cluster_rows) == 1) {
      #   cluster_atac <- matrix(cluster_atac, nrow = 1)
      #   cluster_rna <- matrix(cluster_rna, nrow = 1)
      #   colnames(cluster_atac) <- colnames(atac_zscores)
      #   colnames(cluster_rna) <- colnames(rna_zscores)
      # }
      
      for (g in levels(as.factor(cell_groups))) {
        group_cells <- names(cell_groups)[cell_groups == g]
        if (length(group_cells) > 0) {
          group_means_atac[i, g] <- mean(rowMeans(cluster_atac[, group_cells, drop = FALSE]), na.rm = TRUE)
          group_means_rna[i, g] <- mean(rowMeans(cluster_rna[, group_cells, drop = FALSE]), na.rm = TRUE)
        }
      }
    }
  }

  # 对聚类重新排序
  hc <- hclust(dist(group_means_atac))
  cluster_order <- hc$order
  
  # 排序后的数据
  ordered_links <- list()
  ordered_atac <- list()
  ordered_rna <- list()
  
  for (i in cluster_order) {
    cluster_rows <- which(cluster_ids == i)
    if (length(cluster_rows) > 0) {
      ordered_links <- c(ordered_links, list(links_filtered[cluster_rows, ]))
      ordered_atac <- c(ordered_atac, list(atac_zscores[cluster_rows, ]))
      ordered_rna <- c(ordered_rna, list(rna_zscores[cluster_rows, ]))
    }
  }
  
  # 合并数据
  ordered_links_df <- do.call(rbind, ordered_links)
  ordered_atac_mat <- do.call(rbind, ordered_atac)
  ordered_rna_mat <- do.call(rbind, ordered_rna)

  # 准备分组颜色
  if (is.null(palGroup)) {
    group_levels <- levels(as.factor(cell_groups))
    palGroup <- setNames(
      scales::hue_pal()(length(group_levels)),
      group_levels
    )
  }
  
  # 创建分组注释
  column_atac <- ComplexHeatmap::HeatmapAnnotation(
    Group = cell_groups,
    col = list(Group = palGroup),
    show_legend = FALSE,
    show_annotation_name = FALSE
  )

  column_rna <- ComplexHeatmap::HeatmapAnnotation(
    Cell = cell_groups,
    col = list(Cell = palGroup),
    show_annotation_name = FALSE
  )

  # 创建热图
  atac_heatmap <- ComplexHeatmap::Heatmap(
    ordered_atac_mat,
    name = "ATAC Z-score",
    col = palATAC,
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    column_order = order(colMeans(ordered_rna_mat), decreasing = TRUE),
    show_row_names = FALSE,
    show_column_names = FALSE,
    top_annotation = column_atac,
    column_split = cell_groups,
    column_gap = unit(0, "mm"),
    column_title = NULL,
    row_split = rep(1:k, sapply(ordered_atac, nrow)),
    use_raster = TRUE,
    raster_quality = 1,
    heatmap_legend_param = list(
      direction = "horizontal",
      title_position = "topcenter"
    )
  )
  
  rna_heatmap <- ComplexHeatmap::Heatmap(
    ordered_rna_mat,
    name = "RNA Z-score",
    col = palRNA,
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    column_order = order(colMeans(ordered_rna_mat), decreasing = TRUE),
    show_row_names = FALSE,
    show_column_names = FALSE,
    top_annotation = column_rna,
    column_split = cell_groups,
    column_gap = unit(0, "mm"),
    column_title = NULL,
    row_split = rep(1:k, sapply(ordered_rna, nrow)),
    use_raster = TRUE,
    raster_quality = 1,
    heatmap_legend_param = list(
      direction = "horizontal",
      title_position = "topcenter"
    )
  )

  draw((atac_heatmap + rna_heatmap), heatmap_legend_side = "bottom")
  return(links_filtered)
}

定义直方图函数

plotPeaksPerGene <- function(links) {
  # 统计每个基因连接的peak数量
  gene_counts <- table(links$gene)
  gene_counts_df <- data.frame(
    gene = names(gene_counts),
    count = as.numeric(gene_counts)
  )
    
  # 计算中位数
  median_linked_Peaks <- median(gene_counts_df$count)
  
  # 绘制直方图
  p <- ggplot(gene_counts_df, aes(x = count)) +
    geom_histogram(binwidth = 1, fill = "black", color = "white") +
    geom_vline(xintercept = median_linked_Peaks, linetype = "dashed", color = "gray") +
    annotate("text", x = 10, y = 1000, 
             label = paste0("Median Linked Peaks = ", median_linked_Peaks)) +
    labs(x = "Number of linked peaks", y = "Number of genes") +
    theme_classic() +
    xlim(0, 25) #+ 
    # theme(text = element_text(family = "sans"))
    
  return(p)
}

定义火山图函数

# 添加火山图：展示峰区域与基因连接的相关性和显著性
plotPeak2GeneVolcano <- function(links, topN = 10, p = 0.05, score = 0.05) {
  # 准备数据
  volcano_data <- data.frame(
    peak = links$peak,
    gene = links$gene,
    correlation = links$score,
    pvalue = links$pvalue,
    logPvalue = -log10(links$pvalue + 1e-300) # 避免log(0)
  )
  
  # 直接找出最显著的topN个基因
  top_genes_indices <- order(volcano_data$logPvalue, decreasing = TRUE)[1:min(topN, nrow(volcano_data))]
  top_genes <- volcano_data[top_genes_indices, ]
  
  # 标记哪些点需要高亮显示（被标注的点）
  volcano_data$highlighted <- "No"
  volcano_data$highlighted[top_genes_indices] <- "Yes"
  top_genes$highlighted <- "Yes"
  
  # 绘制火山图
  p <- ggplot(volcano_data, aes(x = correlation, y = logPvalue, color = highlighted)) +
    geom_point(alpha = 0.6, size = 1) +
    scale_color_manual(values = c("No" = "gray", "Yes" = "red")) +
    labs(
      # title = "Peak-to-Gene Linkage Volcano Plot",
      x = "Correlation Score",
      y = "-log10(p-value)",
      color = "Top Genes"
    ) +
    theme_classic() +
    theme(
      legend.position = "none",
      # text = element_text(family = "sans")
    ) +
    # 添加虚线
    geom_vline(xintercept = -score, linetype = "dashed", color = "black", alpha = 0.7) +
    geom_vline(xintercept = score, linetype = "dashed", color = "black", alpha = 0.7) +
    geom_hline(yintercept = -log10(p), linetype = "dashed", color = "black", alpha = 0.7) 
  
  # 标记顶部基因
  if(nrow(top_genes) > 0) {
    p <- p + ggrepel::geom_text_repel(
      data = top_genes,
      # family = "sans",
      aes(label = gene),
      size = 3,
      box.padding = 0.5,
      point.padding = 0.2,
      force = 2,
      max.overlaps = 20
    )
  }
  
  return(p)
}

定义相关性排序图

# 使用links中的所有score值绘制相关性排序图
plotRankCorrelation <- function(links) {
  # 提取所有相关性值
  all_correlations <- links$score
  
  # 按相关性从高到低排序
  sorted_correlations <- sort(all_correlations, decreasing = TRUE)
  correlation_df <- data.frame(
    rank = 1:length(sorted_correlations),
    correlation = sorted_correlations
  )
  
  # 绘制排序图
  p <- ggplot(correlation_df, aes(x = rank, y = correlation)) +
    geom_line(size = 1) +
    geom_hline(yintercept = 0, linetype = "solid", color = "black") +
    labs(
      # title = "Peak-Gene correlation",
      x = "Peak-Gene Linkage",
      y = "Correlation Score"
    ) +
    theme_classic() +
    theme(
      # text = element_text(family = "sans"),
      axis.title = element_text(size = 14),
      axis.text = element_text(size = 12),
      plot.title = element_text(hjust = 0.5, size = 14),
      axis.text.x = element_blank(),  # 隐藏x轴刻度标签
      axis.ticks.x = element_blank()  # 隐藏x轴刻度线
    )
  
  return(p)
}

开始细胞群间 Peak-to-Gene 关联分析

我们开始实施 Peak-to-Gene 关联分析。该分析的核心在于建立染色质开放区域（peaks）与基因表达之间的统计关联，从而识别潜在的顺式调控关系。

分析流程概述

我们的分析采用以下步骤：

检查现有关联：首先检查 Seurat 对象中是否已存在 peak-to-gene 链接
建立关联关系：如果不存在链接，则使用 LinkPeaks()函数建立新的关联
数据预处理：对基因组数据进行染色体筛选和区域统计计算
结果输出：将关联结果保存为表格文件，便于后续分析

关键技术参数

LinkPeaks 函数：整合 ATAC-seq 和 RNA-seq 数据，计算 peak 与基因表达的相关性
距离约束：默认在基因 TSS 附近 500kb 范围内寻找关联的 peak 区域
统计评估：通过背景模型校正，计算 Z 得分和 P 值，评估关联的显著性
基因组兼容性：支持人类和小鼠基因组，自动适配染色体信息

输出结果说明

分析完成后，我们将获得：
- Gene2PeakLinks.xls：包含 peak-gene 关联对的详细表格，包括 peak 位置、基因名称、相关性得分和显著性指标
- 关联统计信息：peak 数量、基因数量、显著关联对数量等汇总信息
- 质量控制指标：关联强度分布、距离分布等统计特征

# 检查peak-to-gene链接
links <- Signac::Links(obj[["ATAC"]])
if (length(links) == 0) {
  # if (species == "human"){
  #     genome = BSgenome.Hsapiens.UCSC.hg38
  # }else if (species == "mouse"){
  #     genome = BSgenome.Mmusculus.UCSC.mm10
  # }
  library(Biostrings)
  genome <- readDNAStringSet(species)
  names(genome) <- sub(" .*", "", names(genome)) 
  keep_chromosomes <- unique(as.character(seqnames(granges(obj[["ATAC"]]))))
  genome <- genome[which(names(genome) %in% keep_chromosomes)]

  obj <- RegionStats(
    object = obj,
    genome = genome,
    assay = "ATAC"
  )
  obj <- LinkPeaks(
      object = obj,
      peak.assay = "ATAC",
      expression.assay = "RNA"
  )
  links <- Signac::Links(obj[["ATAC"]])
  
  if (length(links) == 0) {
    stop("未找到peak-to-gene链接")
  }
}
write.table(as.data.frame(links), file = file.path(outdir, "Gene2PeakLinks.xls"), sep = "\t", row.names = FALSE, col.names = TRUE, quote = FALSE)

结果可视化

热图

热图展示了染色质开放区域(peaks)与基因表达之间的相关性模式。热图分为两部分：
热图中的每一行代表一个 peak-gene 对，每一列代表一个细胞。
ATAC 热图（左）：显示 ATAC-seq 信号强度（染色质可及性），蓝色表示低可及性，红色表示高可及性。
RNA 热图（右）：显示相应基因的表达水平，蓝色表示低表达，黄色表示高表达。
顶部颜色条标识了不同的细胞类型或群体，帮助观察特定细胞类型中 peak-gene 关联的模式。

pdf(file.path(outdir, "Peak2GeneHeatmap.pdf"), width = 12, height = 10)
links <- peak2GeneHeatmap(
    obj,
    filter = filter,
    corCutOff = corCutOff,
    pCutOff = pCutOff,
    varCutOffATAC = varCutOffATAC,
    varCutOffRNA = varCutOffRNA,
    k = k,
    nPlot = nPlot,
    groupBy = clusters_col
  )
dev.off()

png(file.path(outdir, "Peak2GeneHeatmap.png"), width = 12, height = 10, units = "in", res = 300)
links <- peak2GeneHeatmap(
    obj,
    filter = filter,
    corCutOff = corCutOff,
    pCutOff = pCutOff,
    varCutOffATAC = varCutOffATAC,
    varCutOffRNA = varCutOffRNA,
    k = k,
    nPlot = nPlot,
    groupBy = clusters_col
  )
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneHeatmap.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneHeatmap.png"), format = "png")

直方图

直方图显示 1 ~ 25 个关联 peak 的基因数量。
横轴代表关联 peak 数，纵轴代表关联到横轴所示数量的 peak 的基因数
虚线为中位数所在位置，文本标注中位数具体数值。

pdf(file.path(outdir, "Peak2GeneHistogram.pdf"), width = 8, height = 8)
print(plotPeaksPerGene(links))
dev.off()

png(file.path(outdir, "Peak2GeneHistogram.png"), width = 8, height = 8, units = "in", res = 300)
print(plotPeaksPerGene(links))
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneHistogram.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneHistogram.png"), format = "png")

火山图

火山图标注了 Top 10 最显著的 peak-gene 关联的基因。
横轴为相关性 score，纵轴为-log10(p)。

pdf(file.path(outdir, "Peak2GeneVolcano.pdf"), width = 8, height = 8)
if(filter == TRUE) {
  print(plotPeak2GeneVolcano(links, p = pCutOff, score = corCutOff))
} else {
  print(plotPeak2GeneVolcano(links))
}
dev.off()

png(file.path(outdir, "Peak2GeneVolcano.png"), width = 8, height = 8, units = "in", res = 300)
if(filter == TRUE) {
  print(plotPeak2GeneVolcano(links, p = pCutOff, score = corCutOff))
} else {
  print(plotPeak2GeneVolcano(links))
}
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneVolcano.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneVolcano.png"), format = "png")

相关性排序图

曲线图展示了所有 peak-gene 对的相关系数，按照相关性从高到低排序。
横轴代表 peak-gene 对，纵轴代表其相关系数。

pdf(file.path(outdir, "Peak2GeneRank.pdf"), width = 8, height = 6)
print(plotRankCorrelation(links))
dev.off()

png(file.path(outdir, "Peak2GeneRank.png"), width = 8, height = 6, units = "in", res = 300)
print(plotRankCorrelation(links))
dev.off()
# img <- image_read_pdf(file.path(outdir, "Peak2GeneRank.pdf"), density = 300)
# image_write(img, path = file.path(outdir, "Peak2GeneRank.png"), format = "png")

结果文件

Peak2GeneHeatmap.pdf/png - Peak-to-Gene 关联热图
Peak2GeneHistogram.pdf/png - Peak-to-Gene 直方图
Peak2GeneVolcano.pdf/png - Peak-to-Gene 小提琴图
Peak2GeneRank.pdf/png - Peak-to-Gene 排序曲线图

结果目录：目录下包括此项分析所涉及的所有图片以及表格等结果文件

文献案例解析

文献一：
文献《Spatial multiomic landscape of the human placenta at molecular resolution》为了揭示基因和远端 CREs 之间的关系，作者以顺式方式将远端峰与基因相关联，并确定了 43,622 个显著的峰-基因对（代表潜在的增强子-基因关系）。

参考资料

[1] Stuart, Tim, Avi Srivastava, Caleb Lareau, and Rahul Satija. "Multimodal single-cell chromatin analysis with Signac." BioRxiv (2020): 2020-11.

[2] Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, Ding J, Brack A, Kartha VK, Tay T, Law T, Lareau C, Hsu YC, Regev A, Buenrostro JD. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell. 2020 Nov 12;183(4):1103-1116.e20. doi: 10.1016/j.cell.2020.09.056. Epub 2020 Oct 23. PMID: 33098772; PMCID: PMC7669735.

附录

.csv/.xls/.txt ：结果数据表格文件，文件以逗号或制表符（Tab）分隔。Linux/Mac 用户使用 less 或 more 命令查看；Windows 用户使用高级文本编辑器 Notepad++ 等查看，也可以用 Microsoft Excel 打开。
.png：结果图像文件，位图，无损压缩。
.pdf：结果图像文件，矢量图，可以放大和缩小而不失真，方便用户查看和编辑处理，可使用 Adobe Illustrator 进行图片编辑，用于文章发表等。

scATAC-seq 高级分析：Peak-Gene 调控关联分析 ​

背景介绍 ​

核心原理 ​

参数设置 ​

环境准备 ​

开始细胞群间 Peak-to-Gene 关联分析 ​

分析流程概述 ​

结果可视化 ​

结果文件 ​

文献案例解析 ​

参考资料 ​

附录 ​