{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f2344013",
   "metadata": {},
   "source": [
    "---\n",
    "title: scATAC-seq 差异可及性与功能富集分析教程\n",
    "author: SeekGene\n",
    "date: 2026-01-29\n",
    "tags:\n",
    "  - ATAC + RNA 双组学\n",
    "  - Notebooks\n",
    "  - 分析指南\n",
    "  - 差异富集分析\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aca702d2-45f2-420e-83d1-ac0361d75f6a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# scATAC-seq 差异可及性与功能富集分析教程"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73ac05c8-e2c1-47f7-80f3-c7e3477c383f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     ".custom-table"
    ]
   },
   "source": [
    "## 背景介绍\n",
    "\n",
    "ATAC_DiffEnrich 是面向单细胞多组学数据中 scATAC-seq 的差异可及性与富集分析流程，旨在从不同细胞群或条件之间系统地鉴定差异可及染色质区域（DARs），解析其潜在调控因子（TF）及受影响的生物学过程/信号通路。该流程基于 Signac/Seurat 的 ATAC 工作流，结合 presto 的快速差异检验、JASPAR/TFBSTools 的 motif 资源、chromVAR 的 motif 活性评估，以及 clusterProfiler 的 GO/KEGG 富集分析，提供从“差异峰 → 调控因子 → 功能通路”的一体化解读。\n",
    "\n",
    "### 核心原理\n",
    "- 差异可及性检测（DARs）\n",
    "  - 在细胞群（或分组）间采用 Wilcoxon 秩和检验（presto::wilcoxauc）评估每个峰的可及性差异，并进行多重检验校正，获得显著 DAR 集合。\n",
    "- 峰-基因连接（Peak-to-Gene）\n",
    "  - 以 ClosestFeature 将显著差异峰关联至邻近基因（如 TSS 附近），用于后续功能富集与通路解释。\n",
    "- 转录因子推断（Motif/ChromVAR）\n",
    "  - 基于 JASPAR motif 集对差异峰做富集分析（FindMotifs），并用 chromVAR 量化全局 motif 偏好与细胞层面的 motif 活性变化。\n",
    "- 功能与通路富集（GO/KEGG）\n",
    "  - 将峰邻近基因映射到人/鼠注释库，使用 clusterProfiler 进行 GO/KEGG 富集，刻画受影响的生物过程与信号通路。\n",
    "\n",
    "### 目的与意义\n",
    "- 细胞类型/状态特异调控图谱：识别在特定细胞群中开放的调控元件与核心 TF。\n",
    "- 机制层解析：从差异峰—motif—chromVAR 活性—GO/KEGG 多层证据支持调控网络假设。\n",
    "- 条件对比与干预线索：在病例-对照或处理-未处理分组中定位被扰动的调控通路与候选靶点。\n",
    "- 结果可解释性提升：将无基因注释的峰信号转译为可读的基因与通路层结论。\n",
    "\n",
    "### 本教程涵盖内容\n",
    "- 差异 Peaks 分析：按细胞群或分组比较，输出显著 DAR 表格与可视化。\n",
    "- Top peaks 展示：对 LogFC 靠前的代表性峰绘制 CoveragePlot。\n",
    "- Motif 富集与活性：差异峰的 motif 富集、motif 序列图与 chromVAR UMAP 展示。\n",
    "- 峰邻近基因：输出显著差异峰附近基因表。\n",
    "- GO/KEGG 富集：对峰邻近基因进行富集，生成柱状图/气泡图与汇总表。\n",
    "- 交互式结果浏览：显著 DAR、motif、富集结果支持筛选与导出。\n",
    "\n",
    "### 预期分析结果\n",
    "- 显著差异 Peaks 列表：按细胞群汇总的显著 DAR 及其统计指标。\n",
    "- 代表性信号图：Top 差异峰的覆盖度图，用于直观评估峰级别差异。\n",
    "- Motif 与 TF 线索：富集的 TF motif、对应序列图与细胞层 motif 活性变化。\n",
    "- 功能通路概览：GO/KEGG 富集结果与可视化，指向受影响的关键生物学过程。\n",
    "- 一站式输出目录：包含差异表、峰-基因表、motif/富集图与综合汇总文件，便于下游报告与复用。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf538254",
   "metadata": {},
   "source": [
    "## 参数设置\n",
    "\n",
    "请先挂载将要分析的云平台相关流程数据，挂载方式请参照 jupyter 使用教程\n",
    "\n",
    "具体参数填写：\n",
    "\n",
    "- rds：挂载的流程相关的 rds 数据，在/home/{UserName}/workspace/project/{UserName}/目录下，如下列项目数据/home/shumeng/workspace/data/AY1752167131383/\n",
    "- meta：与 rds 同目录下的 metadata 文件  \n",
    "- outdir: 分析结果保存路径\n",
    "- clusters_col：细胞类型列名，选择要分析的细胞类型或者聚类对应的标签。假如想对注释好的细胞类型进行分析，则选择其对应的标签，例如 CellAnnotation，与<细胞类型>配合使用。\n",
    "- celltypes：细胞类型，多选，选择要分析的细胞类型或者聚类结果，如 Monocyte 和 Macrophage 等。\n",
    "- species：物种信息\n",
    "- group_comparison：分组比较，选择要进行比较分析的组别名，如要对 Ctrl 组和 STIM 组的基因活力进行比较，则选择 Ctrl 组和 STIM 组对应的标签名，与下面的<case_group：>和<control_group：>搭配适用\n",
    "- case_group：比较组，如上述的 STIM 组。\n",
    "- control_group：对照组，如上述的 Ctrl 组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d294917",
   "metadata": {
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# Parameters\n",
    "outdir = \"/home/shumeng/workspace/project/shumeng/\"\n",
    "rds = \"/home/shumeng/workspace/data/AY1752167131383/input.rds\"\n",
    "meta = \"/home/shumeng/workspace/data/AY1752167131383/meta.tsv\"\n",
    "species = \"mouse\"\n",
    "clusters_col = \"wknn_res.0.5_d30_l2_50\"\n",
    "celltypes = \"0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18\"\n",
    "group_comparison = \"\"\n",
    "case_group = \"\"\n",
    "control_group = \"\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d59777c6",
   "metadata": {},
   "source": [
    "## 环境准备\n",
    "\n",
    "**R 包加载**\n",
    "\n",
    "请选择 common_r 这个环境进行该整合教程的学习"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dd37f51-7539-4df9-8968-7051a2961983",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "suppressPackageStartupMessages(suppressWarnings({\n",
    "    library(Signac)\n",
    "    library(Seurat)\n",
    "    library(JASPAR2020)\n",
    "    library(TFBSTools)\n",
    "    library(BSgenome.Hsapiens.UCSC.hg38)\n",
    "    library(BSgenome.Mmusculus.UCSC.mm10)\n",
    "    library(patchwork)\n",
    "    library(ggplot2)\n",
    "    library(presto)\n",
    "    library(clusterProfiler)\n",
    "    library(stringr)\n",
    "    library(tibble)\n",
    "    library(dplyr)\n",
    "    library(DOSE)\n",
    "    library(org.Hs.eg.db)\n",
    "    library(org.Mm.eg.db)\n",
    "    library(enrichplot)\n",
    "    library(foreach)\n",
    "    library(doParallel)\n",
    "    library(base64enc) \n",
    "    library(DT) \n",
    "    library(KEGG.db)\n",
    "}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bda7628a-008b-47ee-90b0-6a77f0fa46fe",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "#创建必要的输出目录\n",
    "output <- paste0(outdir,'/output')\n",
    "celltypes <- strsplit(celltypes,\",\")[[1]]\n",
    "# pct <- as.numeric(pct)\n",
    "# logfc <- as.numeric(logfc)\n",
    "# pval_adj <- as.numeric(pval_adj)\n",
    "# top_n <- as.numeric(top_n)\n",
    "\n",
    "cl <- makeCluster(min(detectCores(), 25))\n",
    "registerDoParallel(cl)\n",
    "\n",
    "options(future.globals.maxSize = 10000 * 1024^2)\n",
    "\n",
    "dir.create(paste0(output,'/DIFF/01_diffPeak'), recursive = TRUE)\n",
    "dir.create(paste0(output,'/DIFF/02_topPeak'), recursive = TRUE)\n",
    "dir.create(paste0(output,'/DIFF/03_motif'), recursive = TRUE)\n",
    "dir.create(paste0(output,'/DIFF/04_closestFeature'), recursive = TRUE)\n",
    "dir.create(paste0(output,'/clusterProfiler/GO'), recursive = TRUE)\n",
    "dir.create(paste0(output,'/clusterProfiler/KEGG'), recursive = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52cede64-3a7d-4c2c-be04-b87b43ae7898",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "#数据读取\n",
    "obj <- readRDS(rds)\n",
    "if (meta != \"\"){\n",
    "  meta <- read.table(meta,header=T,sep='\\t',check.names=F)\n",
    "  rownames(meta) <- meta$barcodes\n",
    "  obj <- AddMetaData(obj, meta)\n",
    "}\n",
    "obj <- subset(obj, subset = !!sym(clusters_col) == celltypes)\n",
    "\n",
    "n_fragments <- length(obj@assays$ATAC@fragments)\n",
    "for (i in 1:n_fragments) {\n",
    "  original_path <- obj@assays$ATAC@fragments[[i]]@path\n",
    "  filename <- basename(original_path)\n",
    "  obj@assays$ATAC@fragments[[i]]@path <- paste0(dirname(dirname(outdir)), '/', filename)\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bed23eaa-c80e-4f31-aaf5-98c92b5359af",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "DefaultAssay(obj) <- \"ATAC\"\n",
    "Idents(obj) <- clusters_col"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ed1be3c",
   "metadata": {},
   "source": [
    "## Motif 资源与 chromVAR 初始化\n",
    "\n",
    "Motif 分析用于从差异可及性区域（DARs）推断潜在调控因子（TF）及其活性变化。本流程在显著差异峰上进行 motif 富集（FindMotifs），并结合 chromVAR 计算的 motif 偏离分数（deviation）进行细胞层面的活性可视化，从而在“峰级差异 → TF 线索 → 细胞活性”三层面形成证据闭环。\n",
    "\n",
    "\n",
    "按物种加载 JASPAR2020 的 PFM 矩阵，将 motif 映射到峰集，并使用对应参考基因组运行 chromVAR；同时设置后续富集与注释所需的物种与数据库标识。\n",
    "\n",
    "- 功能概述\n",
    "  - 获取人/鼠的 JASPAR PFM 集：`getMatrixSet(JASPAR2020, opts = list(species = 9606/10090))`\n",
    "  - 将 motif 写入对象：`AddMotifs(object = obj, genome = BSgenome.* , pfm = pfm)`\n",
    "  - 计算 motif 偏离分数：`RunChromVAR(object = obj, genome = BSgenome.*)`\n",
    "  - 统一下游物种标识：\n",
    "    - human → `species = \"hsa\"`, `db = \"org.Hs.eg.db\"`, `spe = \"human\"`, 基因组 `hg38`\n",
    "    - mouse → `species = \"mmu\"`, `db = \"org.Mm.eg.db\"`, `spe = \"mouse\"`, 基因组 `mm10`\n",
    "\n",
    "- 使用要点\n",
    "  - 参考基因组需与物种一致（human→`BSgenome.Hsapiens.UCSC.hg38`，mouse→`BSgenome.Mmusculus.UCSC.mm10`）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32647ed3-7b55-4857-ad85-b1dc905d458f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "if (species == \"human\"){\n",
    "    pfm <- getMatrixSet(\n",
    "    x = JASPAR2020,\n",
    "    opts = list(species = 9606, all_versions = FALSE)\n",
    "    )\n",
    "    obj <- AddMotifs(\n",
    "    object = obj,\n",
    "    genome = BSgenome.Hsapiens.UCSC.hg38,\n",
    "    # genome = genome,\n",
    "    pfm = pfm\n",
    "    )\n",
    "    obj <- RunChromVAR(\n",
    "    object = obj,\n",
    "    genome = BSgenome.Hsapiens.UCSC.hg38\n",
    "    # genome = genome,\n",
    "    )\n",
    "    species = \"hsa\"\n",
    "    db = 'org.Hs.eg.db'\n",
    "    spe = \"human\"\n",
    "}else if (species == \"mouse\"){\n",
    "    pfm <- getMatrixSet(\n",
    "    x = JASPAR2020,\n",
    "    opts = list(species = 10090, all_versions = FALSE)\n",
    "    )\n",
    "    obj <- AddMotifs(\n",
    "    object = obj,\n",
    "    genome = BSgenome.Mmusculus.UCSC.mm10,\n",
    "    pfm = pfm\n",
    "    )\n",
    "    obj <- RunChromVAR(\n",
    "    object = obj,\n",
    "    genome = BSgenome.Mmusculus.UCSC.mm10\n",
    "    )\n",
    "\n",
    "    species = \"mmu\"\n",
    "    db = 'org.Mm.eg.db'\n",
    "    spe = \"mouse\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35f271f8-2f87-4532-ab89-97ed5261a7d1",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## 细胞群间差异 Peaks 分析\n",
    "\n",
    "### 差异 peaks 分析\n",
    "\n",
    "为在不同细胞群（或分组条件）间识别差异可及性区域（DARs），本流程采用 Wilcoxon 秩和检验进行差异可及性（DA）评估，并使用 Benjamini–Hochberg（BH）进行多重检验校正。`presto::wilcoxauc` 面向 Seurat 对象的高效实现，可在大规模 scATAC 数据上显著提速且保持统计稳健性。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b966779f-f954-45ff-a803-3c4a87139ead",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "if (group_comparison == \"\") {\n",
    "    da_peaks <- wilcoxauc(obj, clusters_col, assay='data', seurat_assay='ATAC')\n",
    "    colnames(da_peaks)[2] <- 'cluster'\n",
    "    # write.table(da_peaks, \n",
    "    #            file=paste0(output,'/DIFF/all_diffPeak.xls'),\n",
    "    #            quote=F, row.names=F, col.names=T, sep='\\t')\n",
    "} else {\n",
    "    all_da_peaks <- list()\n",
    "    obj <- subset(obj, subset = !!sym(group_comparison) == c(case_group, control_group))\n",
    "    \n",
    "    for (cell_type in unique(obj@meta.data[[clusters_col]])) {\n",
    "        tryCatch({\n",
    "            sub_obj <- subset(obj, subset = !!sym(clusters_col) == cell_type)\n",
    "            da_peaks <- wilcoxauc(sub_obj, \n",
    "                                        group_by = group_comparison, \n",
    "                                        assay = 'data',\n",
    "                                        seurat_assay = 'ATAC',\n",
    "                                        groups_use = c(case_group, control_group))\n",
    "            da_peaks$cluster <- as.character(cell_type)\n",
    "            all_da_peaks[[as.character(cell_type)]] <- da_peaks\n",
    "            # write.table(da_peaks,\n",
    "            #            file = paste0(output, '/DIFF/by_celltype/', cell_type, '_diffPeak.xls'),\n",
    "            #            quote = F, row.names = F, col.names = T, sep = '\\t')\n",
    "        }, error = function(e) {\n",
    "        message(sprintf(\"Error in cluster %s: %s\", cell_type, e$message))\n",
    "        })\n",
    "    }\n",
    "\n",
    "    da_peaks <- do.call(rbind, all_da_peaks)\n",
    "\n",
    "    # write.table(da_peaks,\n",
    "    #            file = paste0(output, '/DIFF/all_diffPeak.xls'),\n",
    "    #            quote = F, row.names = F, col.names = T, sep = '\\t')\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b57d3c1c",
   "metadata": {},
   "source": [
    "### 差异 peaks 结果保存与可视化\n",
    "\n",
    "下面基于差异可及性结果，对每个细胞群（cluster）进行显著峰筛选、峰-基因映射、motif 富集与活性展示，以及代表性 Top peaks 的覆盖度图绘制，直观呈现峰级差异与潜在转录调控线索。\n",
    "\n",
    "- 核心流程\n",
    "  - 集群排序：使用通用排序函数 `sort_clusters`，优先按数值顺序排列（兼容混合字符编号）。\n",
    "  - 显著峰筛选：对每个 cluster 选取 `padj < 0.05` 的差异 peaks（`feature` 列），并导出结果表：\n",
    "    - `output/DIFF/01_diffPeak/c_cluster_diffPeak_significant.xls`\n",
    "  - 峰-基因映射：`ClosestFeature(obj, regions = peaks)` 将显著 peaks 映射至邻近基因：\n",
    "    - `output/DIFF/04_closestFeature/c_cluster_diffPeak_sig_genes.xls`\n",
    "  \n",
    "- Motif 富集与展示：\n",
    "    - `FindMotifs(object = obj, features = peaks)` 输出富集表（含 p.adjust 与 fold.enrichment）：\n",
    "      - `output/DIFF/03_motif/c_cluster_diffPeak_motifs.xls`\n",
    "    - 依据 `fold.enrichment` 取 Top 6 motif，`MotifPlot` 绘制序列图：\n",
    "      - `..._diffPeak_motifs_top.pdf/png`\n",
    "  \n",
    "- Top peaks 覆盖度图（CoveragePlot）：\n",
    "    - 按 `logFC` 降序选取 Top 6 差异 peaks，窗口上下游各扩展 5 kb：\n",
    "      - 无分组对比：整体绘制；\n",
    "      - 有分组（`group_comparison` 非空）：按 `case_group` vs `control_group` 分面绘制；\n",
    "    - 输出：`output/DIFF/02_topPeak/c_cluster_diffPeak_top.pdf/png`\n",
    "  - chromVAR 活性 UMAP：\n",
    "    - 切换至 `chromvar` assay，使用 `FeaturePlot` 展示 Top motifs 的偏离分数（ncol=6/3）：\n",
    "      - `output/DIFF/03_motif/c_cluster_diffPeak_motifs_top_umap.pdf/png`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6797d2cb-e104-4221-8237-ca651a330c91",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "cluster <- unique(da_peaks$cluster)\n",
    "# 通用排序函数\n",
    "sort_clusters <- function(x) {\n",
    "  if(all(!is.na(suppressWarnings(as.numeric(x))))) {\n",
    "    return(x[order(as.numeric(x))])\n",
    "  } else {\n",
    "    nums <- suppressWarnings(as.numeric(gsub(\"[^0-9]\", \"\", x)))\n",
    "    if(all(!is.na(nums))) {\n",
    "      return(x[order(nums)])\n",
    "    } else {\n",
    "      return(sort(x))\n",
    "    }\n",
    "  }\n",
    "}\n",
    "\n",
    "cluster <- sort_clusters(cluster)\n",
    "\n",
    "for (i in cluster) {\n",
    "    tryCatch({\n",
    "        DefaultAssay(obj) <- \"ATAC\"\n",
    "        data <- da_peaks[da_peaks$cluster == i, ]\n",
    "        # write.table(data,paste0(output,'/DIFF/c_',gsub(\" \",\"\",i),'_diffPeak.xls'),quote=F,row.names=F,col.names=T,sep='\\t')\n",
    "        # data.sig <- data[data$p_val_adj < 0.05, ]\n",
    "        data.sig <- data[data$padj < 0.05, ]\n",
    "        write.table(data.sig,paste0(output,'/DIFF/01_diffPeak/c_',gsub(\" \",\"\",i),'_diffPeak_significant.xls'),quote=F,row.names=F,col.names=T,sep='\\t')\n",
    "        # peaks <- rownames(data.sig)\n",
    "        peaks <- data.sig$feature\n",
    "        \n",
    "        genes <- ClosestFeature(obj, regions = peaks)\n",
    "        genes$cluster <- as.character(i)\n",
    "        write.table(genes,paste0(output,'/DIFF/04_closestFeature/c_',gsub(\" \",\"\",i),'_diffPeak_sig_genes.xls'),quote=F,row.names=F,col.names=T,sep='\\t')\n",
    "    \n",
    "        enriched.motifs <- FindMotifs(object = obj, features = peaks)\n",
    "        enriched.motifs$cluster <- as.character(i)\n",
    "        write.table(enriched.motifs,paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs.xls'),quote=F,row.names=F,col.names=T,sep='\\t')\n",
    "    \n",
    "        motif.top <- head(enriched.motifs[order(enriched.motifs$fold.enrichment, decreasing = TRUE), ])\n",
    "        p1 <- MotifPlot(object = obj, motifs = rownames(motif.top), nrow = 1)\n",
    "        p11 <- MotifPlot(object = obj, motifs = rownames(motif.top), nrow = 2)\n",
    "    \n",
    "        ggsave(paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top.pdf'), p11, width = 12, height = 6)\n",
    "        # ggsave(paste0(output,'/DIFF/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top.png'), p1, width = 8, height = 4, dpi = 100, bg = \"white\")\n",
    "        ggsave(paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top.png'), p1, width = 24, height = 3, dpi = 100, bg = \"white\")\n",
    "    \n",
    "        # peaks.top <- rownames(head(data.sig[order(data.sig$avg_log2FC, decreasing = TRUE), ]))\n",
    "        peaks.top <- head(data.sig[order(data.sig$logFC, decreasing = TRUE), ])[['feature']]\n",
    "        if (group_comparison == \"\") {\n",
    "          p2 <- CoveragePlot(object = obj, region = peaks.top, extend.upstream = 5000, extend.downstream = 5000, nrow = 1)\n",
    "          # p2 <- CoveragePlot(object = obj, region = peaks.top, extend.upstream = 5000, extend.downstream = 5000, nrow = 3)\n",
    "          p22 <- CoveragePlot(object = obj, region = peaks.top, extend.upstream = 5000, extend.downstream = 5000, nrow = 2)\n",
    "        } else {\n",
    "          p2 <- CoveragePlot(object = obj, region = peaks.top, extend.upstream = 5000, extend.downstream = 5000, nrow = 1, group.by = group_comparison, idents = c(case_group, control_group))\n",
    "          p22 <- CoveragePlot(object = obj, region = peaks.top, extend.upstream = 5000, extend.downstream = 5000, nrow = 2, group.by = group_comparison, idents = c(case_group, control_group))\n",
    "        }\n",
    "        ggsave(paste0(output,'/DIFF/02_topPeak/c_',gsub(\" \",\"\",i),'_diffPeak_top.pdf'), p22, width = 12, height = 6)\n",
    "        # ggsave(paste0(output,'/DIFF/c_',gsub(\" \",\"\",i),'_diffPeak_top.png'), p2, width = 8, height = 4, dpi = 100, bg = \"white\")\n",
    "        ggsave(paste0(output,'/DIFF/02_topPeak/c_',gsub(\" \",\"\",i),'_diffPeak_top.png'), p2, width = 24, height = 3, dpi = 100, bg = \"white\")\n",
    "        \n",
    "        DefaultAssay(obj) <- 'chromvar'\n",
    "        p3 <- FeaturePlot(object = obj, features = rownames(motif.top), ncol = 6)\n",
    "        # p3 <- FeaturePlot(object = obj, features = rownames(motif.top), ncol = 2)\n",
    "        p33 <- FeaturePlot(object = obj, features = rownames(motif.top), ncol = 3)\n",
    "        \n",
    "        ggsave(paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top_umap.pdf'), p33, width = 12, height = 6)\n",
    "        # ggsave(paste0(output,'/DIFF/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top.png'), p1, width = 8, height = 4, dpi = 100, bg = \"white\")\n",
    "        ggsave(paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top_umap.png'), p3, width = 24, height = 3, dpi = 100, bg = \"white\")\n",
    "    }, error = function(e) {\n",
    "        message(sprintf(\"Error in cluster %s: %s\", i, e$message))\n",
    "    })\n",
    "} \n",
    "saveRDS(obj,paste0(output, '/output.rds'))\n",
    "write.table(da_peaks[da_peaks$padj < 0.05, ],paste0(output,'/DIFF/01_diffPeak/all_diffPeak_significant.xls'),quote=F,row.names=F,col.names=T,sep='\\t')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99511b5c-0d22-4aa9-8272-54f8392aa209",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### 细胞群间显著差异 Peaks 列表\n",
    "\n",
    "计算得到的各细胞群间校正 p 值<0.05 的差异 Peaks 表格。\n",
    "\n",
    "**指标解读（差异 Peaks 表）**\n",
    "- feature: 峰位点坐标，格式为 chr:start-end，用于唯一标识每个 ATAC 峰。\n",
    "- cluster: 目标细胞群（或标签）名称，表示该行统计是以此群体为“in-group”与其余细胞为“out-group”进行比较。\n",
    "- avgExpr: 目标细胞群内该峰的平均可及性信号（来源于 ATAC data 矩阵的归一化值，如 TF-IDF/归一化计数），数值越大表示该群体整体更开放。\n",
    "- logFC: 目标群体相对其余细胞的对数倍数变化，>0 表示在该 cluster 中更开放，数值越大差异越明显。\n",
    "- statistic: Wilcoxon 秩和检验的统计量（秩和），用于计算 p 值。\n",
    "- auc: AUC（Area Under Curve）衡量两组可分性，0.5 表示无差异；通常 0.6≈弱、0.7≈中等、≥0.8≈较强分离。\n",
    "- pval: 未校正 p 值，由 Wilcoxon 检验得到。\n",
    "- padj: 多重检验校正后的 p 值（Benjamini–Hochberg），用于显著性判断（常用阈值 < 0.05）。\n",
    "- pct_in: 目标细胞群中该峰为非零/可及的细胞占比（%），反映在群体内的“普遍性”。\n",
    "- pct_out: 其余细胞中该峰为非零/可及的细胞占比（%），用于与 pct_in 对比评估特异性。\n",
    ""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44e1088f-2578-47a0-aa0e-4c401abe2c9a",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE)  \n",
    "\n",
    "# da_peaks <- da_peaks %>% mutate(across(where(is.numeric), ~round(., 2)))\n",
    "\n",
    "datatable(da_peaks[da_peaks$padj < 0.05, ], rownames = FALSE, filter = 'top', options = list(pageLength = 5, dom = 'Blfrtip', buttons = c('csv', 'excel', 'pdf')), extensions = 'Buttons') %>%\n",
    "    formatSignif(  \n",
    "        columns = names(da_peaks)[sapply(da_peaks, is.numeric)],\n",
    "        digit = 3, zero.print = NA\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ecfcec2-74ad-4c18-b042-befefdb7ef30",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### top peaks 展示\n",
    "\n",
    "使用 `CoveragePlot` 对显著差异峰（`padj < 0.05`）中按 `logFC` 降序的 Top 6 进行可视化。每个面板展示该峰附近 ±5 kb 的归一化覆盖度曲线（Normalized signal），下方同步显示基因结构与预测 peaks，便于联读。\n",
    "\n",
    "- 图片解读\n",
    "  - 上半部分彩色轨迹代表不同 cluster（或细胞类型）的覆盖度，轨迹越高表示该区域越开放。\n",
    "  - 横轴为基因组坐标；下方蓝色箭头为基因坐标（如 Ppp1r16a、Cpt、Mfsd3），灰色条为 peaks 位置。\n",
    "  - 同一峰在多条轨迹中的高度差异，揭示其跨 cluster 的开放度差异；目标组/cluster 曲线整体更高提示更强的染色质开放与潜在调控活性。\n",
    "\n",
    ""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acae7ba8-2797-45ae-a0a4-b2dfc23ea695",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 2. Peaks图表\n",
    "\n",
    "peaks_links <- data.frame(\n",
    "    Cluster = cluster,\n",
    "    Preview = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/DIFF/02_topPeak/c_',gsub(\" \",\"\",i),'_diffPeak_top.png')\n",
    "        file_path <- paste0('../output/DIFF/02_topPeak/c_', gsub(\" \",\"\", i), '_diffPeak_top.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    })\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59b4f24d-2b28-430a-a085-3cef67d4ddbb",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE) \n",
    "\n",
    "# Peaks表格\n",
    "datatable(peaks_links, filter = 'top', \n",
    "          escape = FALSE,\n",
    "          options = list(\n",
    "              pageLength = 1,\n",
    "              dom = 'tp'\n",
    "          ),\n",
    "          caption = \"Peak Coverage Plots\",\n",
    "          rownames = FALSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2faea2ba-6896-41d2-8519-26d3b2a99c8d",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### Top peaks 的 Motif 展示\n",
    "\n",
    "前面已经对每个 `cluster` 的显著差异峰集合（`padj < 0.05`）上通过执行 `FindMotifs`，评估 JASPAR Motif 的富集，识别可能驱动该峰集开放的转录因子（TF），下面分别用并对 Top Motif 进行可视化与活性展示。\n",
    ""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8394f998",
   "metadata": {},
   "source": [
    "##### Motif 列表\n",
    "\n",
    "**差异 peak 富集 Motif 列表指标解读**\n",
    "- motif：Motif 的唯一标识符（如 JASPAR ID）。\n",
    "- observed：在差异 peaks 区域中实际检测到该 motif 的次数。\n",
    "- background：在背景区域（非差异 peaks）中检测到该 motif 的次数。\n",
    "- percent.observed：该 motif 在差异 peaks 区域中的出现比例（%）。\n",
    "- percent.background：该 motif 在背景区域中的出现比例（%）。\n",
    "- fold.enrichment：富集倍数，表示 motif 在差异 peaks 区域的富集程度（percent.observed / percent.background），数值越大富集越显著。\n",
    "- pvalue：富集检验的原始 p 值，衡量 motif 富集的统计显著性。\n",
    "- motif.name：Motif 对应的转录因子名称或注释。\n",
    "- p.adjust：多重检验校正后的 p 值（如 FDR），常用阈值 < 0.05 判定显著富集。\n",
    "- cluster：对应的细胞群/cluster，表明该 motif 富集于哪个 cluster 的差异 peaks。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a95c31d3-e533-4888-a0ea-607cbb8ad8db",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "data_dir <- paste0(output,\"/DIFF/03_motif\")\n",
    "xls_files <- list.files(path = data_dir, \n",
    "                       pattern = \"motifs\\\\.xls$\", \n",
    "                       full.names = TRUE)\n",
    "\n",
    "merged_data <- data.frame()\n",
    "for (file in xls_files) {\n",
    "    df <- read.table(file, sep=\"\\t\", header=TRUE)\n",
    "    df$cluster <- as.character(df$cluster)\n",
    "    merged_data <- bind_rows(merged_data, df)\n",
    "}\n",
    "merged_data$cluster <- as.character(merged_data$cluster)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ab43ebb-2110-44fd-87d5-61024e7db9e3",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE)  \n",
    "\n",
    "# merged_data_motif <- merged_data %>% mutate(across(where(is.numeric), ~round(., 2)))\n",
    "merged_data_motif <- merged_data \n",
    "datatable(merged_data_motif[merged_data_motif$p.adjust < 0.05, ], rownames = FALSE, filter = 'top', options = list(pageLength = 5, dom = 'Blfrtip', buttons = c('csv', 'excel', 'pdf')), extensions = 'Buttons') %>%\n",
    "    formatSignif(  \n",
    "        columns = names(merged_data_motif)[sapply(merged_data_motif, is.numeric)],\n",
    "        digit = 3, zero.print = NA\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b54c463",
   "metadata": {},
   "source": [
    "##### Motif 可视化（chromVAR UMAP）\n",
    "\n",
    "- 按 `cluster` 展示 Top Motif 的序列 logo ，直观反映其碱基组成和保守性。\n",
    "- 按 `cluster` 展示 Top Motif 的细胞层活性分布预览。每个小图为一个 Motif 的 chromVAR 偏离分数（deviation z-score）在 UMAP 空间的投影。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8f61aed-96e3-4c41-a070-16d60f604d9b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 1. Motifs图表\n",
    "\n",
    "motifs_links1 <- data.frame(\n",
    "    Cluster = cluster,\n",
    "    Preview = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top.png')\n",
    "        file_path <- paste0('../output/DIFF/03_motif/c_', gsub(\" \",\"\", i), '_diffPeak_motifs_top.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    })\n",
    ")\n",
    "\n",
    "motifs_links2 <- data.frame(\n",
    "    Cluster = cluster,\n",
    "    Preview = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/DIFF/03_motif/c_',gsub(\" \",\"\",i),'_diffPeak_motifs_top_umap.png')\n",
    "        file_path <- paste0('../output/DIFF/03_motif/c_', gsub(\" \",\"\", i), '_diffPeak_motifs_top_umap.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    })\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8a3304d-928e-4748-ae50-8ce909db4672",
   "metadata": {
    "editable": true,
    "scrolled": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE)  \n",
    "\n",
    "# Motifs表格\n",
    "datatable(motifs_links1, filter = 'top', \n",
    "          escape = FALSE,\n",
    "          options = list(\n",
    "              pageLength = 1,\n",
    "              dom = 'tp'\n",
    "          ),\n",
    "          caption = \"Motifs Top Plots\",\n",
    "          rownames = FALSE)\n",
    "\n",
    "datatable(motifs_links2, filter = 'top', \n",
    "          escape = FALSE,\n",
    "          options = list(\n",
    "              pageLength = 1,\n",
    "              dom = 'tp'\n",
    "          ),\n",
    "          rownames = FALSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76202099-787e-4734-8d4e-cb70820256d3",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### 差异 peaks 附近的基因\n",
    "\n",
    "单独的峰位点坐标难以直接解释其功能。我们使用 `ClosestFeature` 将显著差异 peaks（`padj < 0.05`）映射到最近的基因，从而推断可能受这些调控元件影响的候选基因，并为后续 GO/KEGG 富集分析提供输入。\n",
    "\n",
    "**指标解读（ClosestFeature 结果表）**\n",
    "- tx_id: 最近转录本的 Ensembl 转录本 ID。\n",
    "- gene_name: 最近基因的基因符号（SYMBOL）。\n",
    "- gene_id: 最近基因的 Ensembl 基因 ID。\n",
    "- gene_biotype: 基因类型（如 protein_coding、lincRNA 等）。\n",
    "- type: 距离最近的功能注释类型（如 exon、UTR、intron、promoter 等）。\n",
    "- closest_region: 最近功能注释片段的基因组坐标。\n",
    "- query_region: 输入的差异 peak 坐标。\n",
    "- distance: 两者之间的距离（bp）。0 表示重叠；数值越小，调控关联可能性越高；常将 ≤2000 bp 视为启动子近端候选。\n",
    "- cluster: 该差异 peak 所属的细胞群标签。\n",
    "\n",
    ""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "607b2005-cca3-4bd7-8207-de6c5f00e576",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "data_dir <- paste0(output,\"/DIFF/04_closestFeature\")\n",
    "xls_files <- list.files(path = data_dir, \n",
    "                       pattern = \"diffPeak_sig_genes\\\\.xls$\", \n",
    "                       full.names = TRUE)\n",
    "\n",
    "merged_data <- data.frame()\n",
    "for (file in xls_files) {\n",
    "    df <- read.table(file, sep=\"\\t\", header=TRUE)\n",
    "    df$cluster <- as.character(df$cluster)\n",
    "    merged_data <- bind_rows(merged_data, df)\n",
    "}\n",
    "merged_data$cluster <- as.character(merged_data$cluster)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0c34bd9-8d22-438e-b5e9-da77c527d66e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE) \n",
    "\n",
    "# merged_data_genes <- merged_data %>% mutate(across(where(is.numeric), ~round(., 2)))\n",
    "merged_data_genes <- merged_data \n",
    "datatable(merged_data_genes, rownames = FALSE, filter = 'top', options = list(pageLength = 5, dom = 'Blfrtip', buttons = c('csv', 'excel', 'pdf')), extensions = 'Buttons') %>%\n",
    "    formatSignif(  \n",
    "        columns = names(merged_data_genes)[sapply(merged_data_genes, is.numeric)],\n",
    "        digit = 3, zero.print = NA\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "faa11976-5277-4c06-a514-074f11f07045",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "#### 差异 peaks 附近基因的功能富集\n",
    "\n",
    "将 `ClosestFeature` 得到的邻近基因作为输入，使用 `clusterProfiler` 对每个 `cluster` 分别进行 GO 与 KEGG 富集，系统刻画差异调控区域可能影响的生物学过程与信号通路。\n",
    "\n",
    "- **分析流程**\n",
    "  - 基因准备：按 `cluster` 提取 `gene_name`，用 `bitr` 映射到 `ENSEMBL/ENTREZID`。\n",
    "  \n",
    "- GO 富集：`enrichGO(ont = \"ALL\", pAdjustMethod = \"BH\")`，输出显著条目与可视化。\n",
    "  \n",
    "- KEGG 富集：`enrichKEGG(organism = hsa/mmu)`，并可视化。\n",
    "  \n",
    ""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "286ed79d-a88e-4f36-bee2-6aedbb58e3df",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "go_enrich <- function(eg,db,outdir,prefix) {\n",
    "    genelist <- eg$SYMBOL\n",
    "    go <- enrichGO(genelist, OrgDb=db, ont='ALL',pAdjustMethod = 'BH',qvalueCutoff = 1,pvalueCutoff = 1,keyType = 'SYMBOL')\n",
    "    go1 <- data.frame(cluster=prefix, go)\n",
    "    write.table(go1,file=paste(outdir,'/',prefix,'_GOenrich.xls',sep=''),sep='\\t',quote=F,row.names=F)\n",
    "\n",
    "    pdf(paste0(outdir,'/',prefix,'_GO_bar.pdf',sep=''),width = 10, height = 9)\n",
    "    print(barplot(go,showCategory=20,drop=T)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    pdf(paste0(outdir,'/',prefix,'_GO_dot.pdf',sep=''),width = 10, height = 8)\n",
    "    print(dotplot(go,showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    png(paste0(outdir,'/',prefix,'_GO_bar.png',sep=''),width = 780, height = 580)\n",
    "    print(barplot(go,showCategory=20,drop=T)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    png(paste0(outdir,'/',prefix,'_GO_dot.png',sep=''),width = 780, height = 580)\n",
    "    print(dotplot(go,showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "}\n",
    "\n",
    "\n",
    "kegg_enrich <- function(eg,species,outdir,prefix) {\n",
    "    genenames<-eg$SYMBOL\n",
    "    names(genenames)<-eg$ENTREZID\n",
    "    genelist <- eg$ENTREZID\n",
    "    download.KEGG.Path <- function (species) {\n",
    "    keggpathid2extid.df <- clusterProfiler:::kegg_link(species, \"pathway\")\n",
    "    if (is.null(keggpathid2extid.df)) {\n",
    "      message <- paste(\"Failed to download KEGG data.\", \"Wrong 'species' or the network is unreachable.\", \n",
    "                     \"The 'species' should be one of organisms listed in\", \n",
    "                     \"'https://www.genome.jp/kegg/catalog/org_list.html'\")\n",
    "      stop(message)\n",
    "    }\n",
    "    keggpathid2extid.df[, 1] %<>% gsub(\"[^:]+:\", \"\", .)\n",
    "    keggpathid2extid.df[, 2] %<>% gsub(\"[^:]+:\", \"\", .)\n",
    "    keggpathid2name.df <- clusterProfiler:::kegg_list(\"pathway\")\n",
    "    keggpathid2name.df[, 1] %<>% gsub(\"map\", species, .)\n",
    "    keggpathid2extid.df <- keggpathid2extid.df[keggpathid2extid.df[, 1] %in% keggpathid2name.df[, 1], ]\n",
    "    return(list(KEGGPATHID2EXTID = keggpathid2extid.df, KEGGPATHID2NAME = keggpathid2name.df))\n",
    "  }\n",
    "    \n",
    "    assignInNamespace('download.KEGG.Path',download.KEGG.Path,ns = 'clusterProfiler')\n",
    "    head(genelist)\n",
    "    kegg <- enrichKEGG(genelist, organism = species, keyType = 'kegg', pAdjustMethod = 'BH',pvalueCutoff = 1,qvalueCutoff = 1,use_internal_data = TRUE)\n",
    "    gene_list <- strsplit(kegg$geneID,split='/')\n",
    "    geneName<-unlist(lapply(gene_list,FUN=function(x){paste(genenames[x],collapse='/')}))\n",
    "   \n",
    "    KEGGenrich <- data.frame(cluster=prefix, kegg[,1:8],geneName,kegg[,9,drop=F])\n",
    "    write.table(KEGGenrich,file=paste(outdir,'/',prefix,'_KEGGenrich.xls',sep=''),sep='\\t',quote=F,row.names=F)\n",
    "    pdf(paste0(outdir,'/',prefix,'_KEGG_dot.pdf',sep=''),width = 8, height = 8)\n",
    "    print(dotplot(kegg, showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    pdf(paste0(outdir,'/',prefix,'_KEGG_bar.pdf',sep=''),width = 8, height = 8)\n",
    "    print(barplot(kegg, showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    png(paste0(outdir,'/',prefix,'_KEGG_dot.png',sep=''),width = 780, height = 580)\n",
    "    print(dotplot(kegg, showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "    png(paste0(outdir,'/',prefix,'_KEGG_bar.png',sep=''),width = 780, height = 580)\n",
    "    print(barplot(kegg, showCategory=20)+ scale_y_discrete(labels = function(x) str_wrap(x, width = 50)))\n",
    "    dev.off()\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4fd99252-8c2c-4fc5-a4a7-385c5e3dace5",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "files <- list.files(paste0(output,\"/DIFF/04_closestFeature\"),pattern=\"diffPeak_sig_genes.xls\")\n",
    "\n",
    "# cores <- detectCores() - 1  \n",
    "# cl <- makeCluster(cores)\n",
    "# registerDoParallel(cl)\n",
    "\n",
    "# foreach(f=files,.packages = c(\"clusterProfiler\", \"reshape2\", \"ggplot2\", \"stringr\", \"tibble\", \"dplyr\", \"DOSE\", \"enrichplot\")) %dopar% {\n",
    "for (f in files){\n",
    "  prefix <- gsub('_diffPeak_sig_genes.xls','',f)\n",
    "  f <- paste0(outdir,\"/output/DIFF/04_closestFeature/\",f)\n",
    "  print(f)\n",
    "  print(prefix)\n",
    "  clus_1<-read.table(f,header=T,sep='\\t')\n",
    "  clus<-clus_1$gene_name\n",
    "\n",
    "  tryCatch({\n",
    "    eg <- bitr(clus, fromType=\"SYMBOL\", toType=c(\"ENSEMBL\",\"ENTREZID\"), OrgDb=db)\n",
    "    print(head(eg))\n",
    "  }, error = function(e) {\n",
    "    message(paste(\"处理文件\", f, \"时出错:\", e$message))\n",
    "  })\n",
    "  \n",
    "  tryCatch({\n",
    "    go_enrich(eg,db,paste0(outdir,\"/output/clusterProfiler/GO\"),prefix)\n",
    "  }, error = function(e) {\n",
    "    message(paste0(\"GO富集分析错误: \", prefix, \" - \", e$message))\n",
    "  })\n",
    "  tryCatch({\n",
    "    kegg_enrich(eg, species, paste0(outdir,\"/output/clusterProfiler/KEGG\"), prefix)\n",
    "  }, error = function(e) {\n",
    "    message(paste0(\"KEGG富集分析错误: \", prefix, \" - \", e$message))\n",
    "  })\n",
    "\n",
    "}\n",
    "# stopCluster(cl)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c19a16d-b081-4c92-b607-7dd01c17db10",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 合并 GO 富集分析结果\n",
    "go_dir <- paste0(output,\"/clusterProfiler/GO\")\n",
    "if(dir.exists(go_dir)) {  # 检查目录是否存在\n",
    "    xls_files <- list.files(path = go_dir, \n",
    "                           pattern = \"\\\\.xls$\", \n",
    "                           full.names = TRUE)\n",
    "    \n",
    "    merged_data <- data.frame()\n",
    "    for (file in xls_files) {\n",
    "        df <- read.table(file, sep=\"\\t\", header=TRUE, fill=TRUE, quote=\"\", check.names=FALSE)\n",
    "        merged_data <- bind_rows(merged_data, df)\n",
    "    }\n",
    "    \n",
    "    # 直接使用完整路径写入文件\n",
    "    write.table(merged_data, \n",
    "                file = file.path(go_dir, \"all_GOenrich.xls\"), \n",
    "                row.names = FALSE,\n",
    "                sep=\"\\t\",\n",
    "                quote=FALSE)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc55cde4",
   "metadata": {},
   "source": [
    "**GO 富集结果**\n",
    "\n",
    "表格汇总各 `cluster` 的 GO 富集条目与显著性统计，用于刻画差异 peaks 邻近基因的功能指向。\n",
    "\n",
    "- 指标解读\n",
    "  - **cluster**：结果所属的细胞群标签。\n",
    "  - **ONTOLOGY**：GO 域，BP（生物过程）/ CC（细胞组分）/ MF（分子功能）。\n",
    "  - **ID**：GO 术语编号（如 `GO:0007264`）。\n",
    "  - **Description**：GO 术语名称。\n",
    "  - **GeneRatio**：命中该术语的输入基因数/输入基因总数（越大说明在本次输入中命中比例越高）。\n",
    "  - **BgRatio**：该术语在背景基因集中的命中数/背景总数（用于与 GeneRatio 对比评估富集强度）。\n",
    "  - **pvalue**：富集检验原始 p 值。\n",
    "  - **p.adjust**：多重检验校正后的 p 值（BH/FDR），常用阈值 < 0.05 判定显著。\n",
    "  - **qvalue**：q 值估计（FDR 另一种表示），可用阈值 < 0.2 辅助筛选。\n",
    "  - **geneID**：命中该术语的基因列表，常用 “/” 分隔。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3b49a7a-7faa-4629-b0b8-421dbda97675",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE) \n",
    "\n",
    "# merged_data_go <- merged_data %>% mutate(across(where(is.numeric), ~round(., 2)))\n",
    "merged_data_go <- merged_data \n",
    "if(nrow(merged_data_go[merged_data_go$p.adjust < 0.05, ]) > 0) {\n",
    "datatable(merged_data_go[merged_data_go$p.adjust < 0.05, ], rownames = FALSE, filter = 'top', caption = \"GO Enrichment\",\n",
    "          options = list(pageLength = 5, dom = 'Blfrtip', buttons = c('csv', 'excel', 'pdf')), \n",
    "          extensions = 'Buttons') %>%\n",
    "    formatSignif(  \n",
    "        columns = names(merged_data_go)[sapply(merged_data_go, is.numeric)],\n",
    "        digit = 3, zero.print = NA\n",
    "    )\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0d0e06b",
   "metadata": {},
   "source": [
    "**GO 富集结果可视化**\n",
    "\n",
    "Dot（左图）\n",
    "横轴 GeneRatio：命中该术语的输入基因数/输入总数，越大代表相对更集中。\n",
    "纵轴 GO 术语（BP/CC/MF）。\n",
    "圆点大小 Count：命中基因的绝对数量，越大越多。\n",
    "颜色 p.adjust：颜色越偏红表示校正后显著性越高（p.adjust 越小）。\n",
    "解读：优先关注“红且大、靠右”的点（显著性高、命中多、比例大）。\n",
    "Bar（右图）\n",
    "横轴 Count：命中基因数；条形越长命中越多。\n",
    "纵轴 GO 术语。\n",
    "颜色 p.adjust：颜色越红表示显著性越高。\n",
    "解读：从上到下对比条形长度与颜色，锁定“长且红”的条目。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2126d99-48c7-4fa4-955e-8203606db3fd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 3. 富集图表\n",
    "\n",
    "go_links <- data.frame(\n",
    "    Cluster = cluster,\n",
    "    GO_dot = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/clusterProfiler/GO/c_', gsub(\" \",\"\", i), '_GO_dot.png')\n",
    "        file_path <- paste0('../output/clusterProfiler/GO/c_', gsub(\" \",\"\", i), '_GO_dot.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    }),\n",
    "    GO_bar = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/clusterProfiler/GO/c_', gsub(\" \",\"\", i), '_GO_bar.png')\n",
    "        file_path <- paste0('../output/clusterProfiler/GO/c_', gsub(\" \",\"\", i), '_GO_bar.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    })\n",
    ")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d7505b7-e1e7-47ff-938b-d1242d7b5063",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE)  \n",
    "\n",
    "# Enrich表格\n",
    "datatable(go_links, filter = 'top', \n",
    "          escape = FALSE,\n",
    "          options = list(\n",
    "              pageLength = 1,\n",
    "              dom = 'tp'\n",
    "          ),\n",
    "          caption = \"Enrich Plots\",\n",
    "          rownames = FALSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "981486fc-9a43-4fa0-847d-8a84ff27ee43",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "```{note}\n",
    "- 柱状图：x 轴为注释到 GO 编号上的基因数，y 轴为 GO term 描述，颜色代表校正 p 值，颜色越红 p 值越小，颜色越蓝 p 值越大。\n",
    "- 气泡图：x 轴为 GeneRatio，y 轴为 GO term 描述，颜色代表校正 p 值，颜色越红 p 值越小，颜色越蓝 p 值越大，点的大小表示富集在通路内的差异基因个数。\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27c8e6bc-18a4-4436-b91d-81da031934a4",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 合并 KEGG 富集分析结果\n",
    "kegg_dir <- paste0(output,\"/clusterProfiler/KEGG\")\n",
    "if(dir.exists(kegg_dir)) {  # 检查目录是否存在\n",
    "    xls_files <- list.files(path = kegg_dir, \n",
    "                           pattern = \"\\\\.xls$\", \n",
    "                           full.names = TRUE)\n",
    "    \n",
    "    merged_data <- data.frame()\n",
    "    for (file in xls_files) {\n",
    "        df <- read.table(file, sep=\"\\t\", header=TRUE, fill=TRUE, quote=\"\", check.names=FALSE)\n",
    "        merged_data <- bind_rows(merged_data, df)\n",
    "    }\n",
    "    \n",
    "    # 直接使用完整路径写入文件\n",
    "    write.table(merged_data, \n",
    "                file = file.path(kegg_dir, \"all_KEGGenrich.xls\"), \n",
    "                row.names = FALSE,\n",
    "                sep=\"\\t\",\n",
    "                quote=FALSE)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b0aeceb",
   "metadata": {},
   "source": [
    "**KEGG 富集结果**\n",
    "\n",
    "汇总各 `cluster` 的 KEGG 通路富集条目，用于刻画差异 peaks 邻近基因可能参与的信号通路与代谢路径（基于 `enrichKEGG`，物种代码 human→`hsa`，mouse→`mmu`）。\n",
    "\n",
    "- 指标解读\n",
    "  - **cluster**：所属细胞群标签。\n",
    "  - **category**：KEGG 一级分类（如 Metabolism、Cellular Processes）。\n",
    "  - **subcategory**：KEGG 二级分类（如 Carbohydrate metabolism、Signal transduction）。\n",
    "  - **ID**：通路编号（如 `mmu04015`）。\n",
    "  - **Description**：通路名称。\n",
    "  - **GeneRatio**：输入基因中命中该通路的比例（命中数/输入总数），越大说明在当前输入中更集中。\n",
    "  - **BgRatio**：背景基因集中命中该通路的比例（命中数/背景总数），用于与 GeneRatio 对比评估富集强度。\n",
    "  - **pvalue**：富集检验原始 p 值。\n",
    "  - **p.adjust**：多重检验校正后的 p 值（FDR/BH），常用阈值 < 0.05 判定显著。\n",
    "  - **geneID/geneName**：命中该通路的基因列表，用 “/” 分隔。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d86e0850-3a01-4f8a-9c02-fbb143e45e29",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE) \n",
    "\n",
    "# merged_data_kegg <- merged_data %>% mutate(across(where(is.numeric), ~round(., 2)))\n",
    "merged_data_kegg <- merged_data \n",
    "if(nrow(merged_data_kegg[merged_data_kegg$p.adjust < 0.05, ]) > 0) {\n",
    "datatable(merged_data_kegg[merged_data_kegg$p.adjust < 0.05, ], rownames = FALSE, filter = 'top', caption = \"KEGG Enrichment\",\n",
    "          options = list(pageLength = 5, dom = 'Blfrtip', buttons = c('csv', 'excel', 'pdf')), \n",
    "          extensions = 'Buttons') %>%\n",
    "    formatSignif(  \n",
    "        columns = names(merged_data_kegg)[sapply(merged_data_kegg, is.numeric)],\n",
    "        digit = 3, zero.print = NA\n",
    "    )\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91dc0306",
   "metadata": {},
   "source": [
    "##### KEGG 富集结果可视化\n",
    "\n",
    "- Dot（左图）\n",
    "  - 横轴 GeneRatio：命中该通路的输入基因数/输入总数，越大越集中。\n",
    "  - 纵轴 通路名称（含物种代码，如 mmu04015）。\n",
    "  - 点大小 Count：命中基因数，越大命中越多。\n",
    "  - 颜色 p.adjust：颜色越偏红表示显著性越高（校正后 p 值更小）。\n",
    "  - 解读：优先关注“红且大、靠右”的点（显著、命中多、比例高）。\n",
    "\n",
    "- Bar（右图）\n",
    "  - 横轴 Count：命中基因数；条形越长表示命中越多。\n",
    "  - 纵轴 通路名称。\n",
    "  - 颜色 p.adjust：越红显著性越高。\n",
    "  - 解读：从上到下比较条形长度与颜色，锁定“长且红”的通路。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc088050-e8d2-4c42-86b7-75f2a02327cd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-cell"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 3. 富集图表\n",
    "\n",
    "kegg_links <- data.frame(\n",
    "    Cluster = cluster,\n",
    "    KEGG_dot = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/clusterProfiler/KEGG/c_', gsub(\" \",\"\", i), '_KEGG_dot.png')\n",
    "        file_path <- paste0('../output/clusterProfiler/KEGG/c_', gsub(\" \",\"\", i), '_KEGG_dot.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    }),\n",
    "    KEGG_bar = sapply(cluster, function(i) {\n",
    "        file_path_ab <- paste0(output,'/clusterProfiler/KEGG/c_', gsub(\" \",\"\", i), '_KEGG_bar.png')\n",
    "        file_path <- paste0('../output/clusterProfiler/KEGG/c_', gsub(\" \",\"\", i), '_KEGG_bar.png')\n",
    "        if(file.exists(file_path_ab)) {\n",
    "            sprintf('<a href=\"%s\"><img src=\"%s\" height=\"200px\"></a>', file_path, file_path)\n",
    "        } else {\n",
    "            \"图片不存在\"\n",
    "        }\n",
    "    })\n",
    ")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97acf376-1e37-4fb3-bc0e-ca22255ea12f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "remove-input"
    ],
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "options(warn = -1)          \n",
    "options(message = FALSE)  \n",
    "\n",
    "# Enrich表格\n",
    "datatable(kegg_links, filter = 'top', \n",
    "          escape = FALSE,\n",
    "          options = list(\n",
    "              pageLength = 1,\n",
    "              dom = 'tp'\n",
    "          ),\n",
    "          caption = \"Enrich Plots\",\n",
    "          rownames = FALSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5e0b172-24b0-4688-b009-39c5fd16dee8",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "```{note}\n",
    "- 柱状图：x 轴为注释到 KEGG 通路上的基因数，y 轴为 KEGG 通路描述，颜色代表校正 p 值，颜色越红 p 值越小，颜色越蓝 p 值越大。\n",
    "- 气泡图：x 轴为 GeneRatio，y 轴为 KEGG 通路描述，颜色代表校正 p 值，颜色越红 p 值越小，颜色越蓝 p 值越大，点的大小表示富集在通路内的差异基因个数。\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60694d7d-d5d1-49b5-946a-e0571778f28a",
   "metadata": {
    "editable": true,
    "jp-MarkdownHeadingCollapsed": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## 结果文件\n",
    "\n",
    "- DIFF/*diffPeak.xls 差异 Peaks 总表\n",
    "- DIFF/*diffPeak_significant.xls 显著差异 Peaks 表格\n",
    "- DIFF/*diffPeak_top.pdf/png top6 的显著差异 Peaks 染色质可及性信号图\n",
    "- DIFF/*diffPeak_motifs.xls 显著差异 Peaks 的 motif 表格\n",
    "- DIFF/*diffPeak_motifs_top.pdf/png top6 的显著差异 Peaks 的 motif 图\n",
    "- DIFF/*diffPeak_sig_genes.xls 显著差异 peaks 附近的基因\n",
    "  \n",
    "- clusterProfiler/*/all_*enrich.xls 富集结果总表\n",
    "- clusterProfiler/*/*enrich.xls 各细胞群富集结果\n",
    "- clusterProfiler/*/*bar.pdf/png 各细胞群富集柱状图\n",
    "- clusterProfiler/*/*dot.pdf/png 各细胞群富集气泡图\n",
    "\n",
    "\n",
    "结果目录：目录下包括此项分析所涉及的所有图片以及表格等结果文件\n",
    ""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a536d97-1ca8-4d46-bf92-75526e415226",
   "metadata": {},
   "source": [
    "## 文献案例解析\n",
    "\n",
    "* **文献一：**  \n",
    "文献《Single-cell multiomics analysis reveals regulatory programs in clear cell renal cell carcinoma》进行差异可及染色质区域（DAR）相关分析，发现肿瘤细胞的 DAR 在代谢相关的生物过程中显着富集。\n",
    ""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e622a5c-6196-4f95-bb82-f298661eb656",
   "metadata": {},
   "source": [
    "## 参考资料\n",
    "[1] Stuart, Tim, Avi Srivastava, Caleb Lareau, and Rahul Satija. \"Multimodal single-cell chromatin analysis with Signac.\" BioRxiv (2020): 2020-11.\n",
    "\n",
    "[2] Korsunsky I, A. Nathan N Millard, and S. Raychaudhuri. \"presto: Fast Functions for Differential Expression using Wilcox and AUC.\" R package. https://rdrr.io/github/immunogenomics/presto (2019).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "059455e6-b826-4582-b83d-aa42d6a4594c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## 附录\n",
    "\n",
    "- .csv/.xls/.txt ：结果数据表格文件，文件以逗号或制表符（Tab）分隔。Linux/Mac 用户使用 less 或 more 命令查看；Windows 用户使用高级文本编辑器 Notepad++ 等查看，也可以用 Microsoft Excel 打开。\n",
    "\n",
    "- .png：结果图像文件，位图，无损压缩。\n",
    "\n",
    "- .pdf：结果图像文件，矢量图，可以放大和缩小而不失真，方便用户查看和编辑处理，可使用 Adobe Illustrator 进行图片编辑，用于文章发表等。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ATAC",
   "language": "R",
   "name": "atac"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}