{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "title: 单细胞差异表达与功能富集分析全流程教程\n",
    "author: SeekGene\n",
    "date: 2026-01-29\n",
    "tags:\n",
    "  - 3' 转录组\n",
    "  - 5' + 免疫组库\n",
    "  - ATAC + RNA 双组学\n",
    "  - FFPE 单细胞转录组\n",
    "  - Notebooks\n",
    "  - 全序列转录组\n",
    "  - 分析指南\n",
    "  - 差异富集分析\n",
    "  - 甲基化 + RNA 双组学\n",
    "  - 空间转录组\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 单细胞差异表达与功能富集分析全流程教程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 差异表达与富集分析介绍\n",
    "本教程提供了单细胞 RNA 测序数据差异表达分析和富集分析的综合指南。分析包括：\n",
    "\n",
    "- **差异表达分析**：识别不同细胞簇或实验条件之间差异表达的基因\n",
    "- **富集分析**：通过通路和功能分析了解差异表达基因的生物学意义"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 内核配置\n",
    "\n",
    "**重要提示**：本教程应使用 **\"common_r\"** 内核运行。\n",
    "\n",
    "所需包：\n",
    "- Seurat (>= 4.0)\n",
    "- clusterProfiler, msigdbr, org.Hs.eg.db, org.Mm.eg.db\n",
    "- dplyr, tidyr, readr\n",
    "- enrichplot, ggplot2, ggrepel, RColorBrewer, viridis, ComplexHeatmap and patchwork"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载库和参数配置\n",
    "\n",
    "### 加载库\n",
    "\n",
    "加载所有必需的库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:40:57.424130Z",
     "iopub.status.busy": "2025-09-16T09:40:57.392582Z",
     "iopub.status.idle": "2025-09-16T09:42:01.645339Z",
     "shell.execute_reply": "2025-09-16T09:42:01.643279Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] \"Libraries loaded successfully!\"\n"
     ]
    }
   ],
   "source": [
    "# 加载所需的库\n",
    "suppressMessages({\n",
    "    library(Seurat)\n",
    "    library(dplyr)\n",
    "    library(tidyr)\n",
    "    library(ggplot2)\n",
    "    library(patchwork)\n",
    "    library(clusterProfiler)\n",
    "    library(org.Hs.eg.db)\n",
    "    library(org.Mm.eg.db)\n",
    "    library(enrichplot)\n",
    "    library(msigdbr)\n",
    "    library(viridis)\n",
    "    library(RColorBrewer)\n",
    "    library(ComplexHeatmap)\n",
    "    library(readr)\n",
    "    library(ggrepel)\n",
    "})\n",
    "\n",
    "# 设置随机种子以保证可重复性\n",
    "set.seed(42)\n",
    "\n",
    "print(\"Libraries loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 配置分析参数\n",
    "\n",
    "设置分析参数，包括文件路径和可视化设置。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:01.651464Z",
     "iopub.status.busy": "2025-09-16T09:42:01.648998Z",
     "iopub.status.idle": "2025-09-16T09:42:01.855069Z",
     "shell.execute_reply": "2025-09-16T09:42:01.853028Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Analysis configuration:\n",
      "- Output directory: /home/demo-seekgene-com/workspace/project/demo-seekgene-com/DiffEnrich/result \n",
      "- group comparison S133TvsS133A within clusters: 0, 5, 6\n",
      "\n",
      "- Enrichment analysis type: ORA"
     ]
    }
   ],
   "source": [
    "# 分析参数\n",
    "params <- list(\n",
    "  # 数据路径\n",
    "  input_path = \"/home/demo-seekgene-com/workspace/project/demo-seekgene-com/CellAnnotation/result/Step4_Final_Cell_Annotation.rds\",\n",
    "  output_path = \"/home/demo-seekgene-com/workspace/project/demo-seekgene-com/DiffEnrich/result\",\n",
    "  \n",
    "  # 物种信息\n",
    "  species = \"human\",  # Options: \"human\", \"mouse\"\n",
    "  \n",
    "  # 可视化参数\n",
    "  discrete_colors = \"Paired\",  # 用于离散变量\n",
    "  continuous_colors = viridis(100),  # 用于连续变量\n",
    "  plot_dpi = 300,   # 绘图分辨率\n",
    "  \n",
    "  # 差异表达参数\n",
    "  differential_analysis_type = \"group_comparison\",  # 选项：\"clusters\" 或 \"group_comparison\"\n",
    "  cluster_by = \"RNA_snn_res.0.5\",  # 聚类或细胞类型的列名（例如 resolution.0.5_d30, celltype）\n",
    "  select_clusters = c(\"0\", \"5\", \"6\"),  # 要分析的 \"cluster_by\" 的可选子集；设为 NULL 以包含所有簇\n",
    "  group_by = \"sample\",  # 分组的列名（例如 sample, condition）\n",
    "  compare_groups = c(\"S133TvsS133A\"),  # 要分析的 \"group_by\" 的可选子集，成对比较规范（例如 \"TreatedvsControl\",\"DiseasevsHealth\"）；设为 NULL 以自动生成所有对\n",
    "  de_method = \"wilcox\",  # 选项：\"wilcox\", \"bimod\", \"roc\", \"t\", \"negbinom\", \"poisson\", \"LR\", \"MAST\"\n",
    "  min_pct = 0.1,  # 表达该基因的细胞的最小百分比\n",
    "  logfc_threshold = 0.5,  # Seurat 寻找标记物时使用的对数倍数变化阈值\n",
    "  p_adj_cutoff = 0.05,          # 保留 DEG 表中行的校正后 P 值截止值\n",
    "  abs_log2fc_cutoff = 0.5,       # 保留 DEG 表中行的绝对 log2FC 截止值\n",
    "  \n",
    "  # 富集分析参数\n",
    "  enrichment_analysis_type = \"ORA\",  # 选项：\"ORA\" 或 \"GSEA\"\n",
    "  enrichment_databases = \"GO,KEGG,HALLMARK,REACTOME\",  # 逗号分隔\n",
    "  pvalue_cutoff = 0.05,  # 富集分析的 P 值截止值\n",
    "  qvalue_cutoff = 0.2,   # 富集分析的 Q 值截止值\n",
    "  minGSSize = 10,        # 富集分析中的最小基因集大小\n",
    "  maxGSSize = 500       # 富集分析中的最大基因集大小\n",
    ")\n",
    "\n",
    "# 如果输出目录不存在，则创建它\n",
    "if (!dir.exists(params$output_path)) {\n",
    "  dir.create(params$output_path, recursive = TRUE)\n",
    "}\n",
    "\n",
    "cat(\"Analysis configuration:\")\n",
    "cat(\"\\n- Output directory:\", params$output_path, \"\\n\")\n",
    "if (params$differential_analysis_type == \"clusters\") {\n",
    "  if (!is.null(params$select_clusters)) {\n",
    "    cat(\"- differential analysis among clusters: \",\n",
    "        paste(params$select_clusters, collapse = \", \"), \"\\n\", sep = \"\")\n",
    "  } else {\n",
    "    cat(\"- pairwise differential analysis across all clusters\\n\")\n",
    "  }\n",
    "} else if (params$differential_analysis_type == \"group_comparison\") {\n",
    "  comp_str <- if (is.null(params$compare_groups)) \"auto-generate all pairwise groups\" else paste(params$compare_groups, collapse = \", \")\n",
    "  cl_str <- if (is.null(params$select_clusters)) \"all clusters\" else paste(params$select_clusters, collapse = \", \")\n",
    "  cat(\"- group comparison \", comp_str, \" within clusters: \", cl_str, \"\\n\", sep = \"\")\n",
    "}\n",
    "cat(\"\\n- Differential analysis type:\", params$differential_analysis_type)\n",
    "cat(\"\\n- Enrichment analysis type:\", params$enrichment_analysis_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 加载和准备数据\n",
    "\n",
    "加载单细胞数据并为差异表达分析做好准备。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:01.860381Z",
     "iopub.status.busy": "2025-09-16T09:42:01.858735Z",
     "iopub.status.idle": "2025-09-16T09:42:13.846963Z",
     "shell.execute_reply": "2025-09-16T09:42:13.844687Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading data from: /home/demo-seekgene-com/workspace/project/demo-seekgene-com/CellAnnotation/result/Step4_Final_Cell_Annotation.rds \n",
      "\n",
      "Data Summary:\n",
      "- Number of cells: 16924\n",
      "- Number of genes: 24091\n",
      "- Available metadata columns: orig.ident, nCount_RNA, nFeature_RNA, sample, species, percent.mitochondrial, filter_status, pANN_0.05_0.3_1372, DF.classifications_0.05_0.3_1372, RNA_snn_res.0.5, seurat_clusters, SingleR_HumanPrimaryCellAtlas, SingleR_score_HumanPrimaryCellAtlas, SingleR_BlueprintEncode, SingleR_score_BlueprintEncode, Lymphoid_B_cells_Score, Lymphoid_Plasma_cells_Score, Lymphoid_T_cells_Score, Lymphoid_NK_cells_Score, Myeloid_Monocytes_Score, Myeloid_Macrophages_Score, Myeloid_DCs_Score, Myeloid_pDCs_Score, Myeloid_Granulocytes_Score, Myeloid_Neutrophils_Score, Myeloid_Mast_cells_Score, Myeloid_Megakaryocytes_Score, Myeloid_Erythrocytes_Score, Stromal_Epithelial_cells_Score, Stromal_Endothelial_cells_Score, Stromal_Fibroblasts_Score, Stromal_Proliferating_cells_Score, celltype\n",
      "- Available clusters: c(1, 2, 7, 10, 5, 3, 6, 11, 8, 4, 16, 15, 9, 13, 12, 17, 14, 18)\n",
      "- Available groups: S133A, S133T\n",
      "The scale.data slot exists.\n",
      "Filtering clusters: 0, 5, 6\n",
      "Cells after filtering: 6395\n",
      "Cluster distribution after filtering:\n",
      "\n",
      "   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 \n",
      "4645    0    0    0    0  892  858    0    0    0    0    0    0    0    0    0 \n",
      "  16   17 \n",
      "   0    0 \n",
      "\n",
      "Data loaded successfully!"
     ]
    }
   ],
   "source": [
    "# 加载 Seurat 对象\n",
    "cat(\"Loading data from:\", params$input_path, \"\\n\")\n",
    "seurat_obj <- readRDS(params$input_path)\n",
    "\n",
    "# 基本数据信息\n",
    "cat(\"\\nData Summary:\")\n",
    "cat(\"\\n- Number of cells:\", ncol(seurat_obj))\n",
    "cat(\"\\n- Number of genes:\", nrow(seurat_obj))\n",
    "cat(\"\\n- Available metadata columns:\", paste(colnames(seurat_obj@meta.data), collapse = \", \"))\n",
    "\n",
    "# 检查可用簇\n",
    "if (params$cluster_by %in% colnames(seurat_obj@meta.data)) {\n",
    "  cat(\"\\n- Available clusters:\", paste(unique(seurat_obj[[params$cluster_by]]), collapse = \", \"))\n",
    "}\n",
    "\n",
    "# 检查可用分组\n",
    "if (params$group_by %in% colnames(seurat_obj@meta.data)) {\n",
    "  cat(\"\\n- Available groups:\", paste(unique(seurat_obj@meta.data[[params$group_by]]), collapse = \", \"))\n",
    "}\n",
    "\n",
    "# 如果尚未设置，则将默认测定设置为 RNA\n",
    "DefaultAssay(seurat_obj) <- \"RNA\"\n",
    "\n",
    "# 设置簇颜色\n",
    "n_clusters <- length(table(seurat_obj[[params$cluster_by]]))\n",
    "clusters_colors <- brewer.pal(min(n_clusters, 12), params$discrete_colors)\n",
    "if (n_clusters > 12) {\n",
    "    clusters_colors <- colorRampPalette(clusters_colors)(n_clusters)\n",
    "}\n",
    "\n",
    "# 缩放数据\n",
    "if (dim(seurat_obj@assays$RNA@scale.data)[1] > 1) {\n",
    "  cat(\"\\nThe scale.data slot exists.\")\n",
    "} else {\n",
    "  cat(\"\\nScaling data...\")\n",
    "  seurat_obj <- ScaleData(seurat_obj)\n",
    "}\n",
    "\n",
    "# 如果指定了 select_clusters，先对数据进行子集化\n",
    "if (!is.null(params$select_clusters)) {\n",
    "  cat(\"\\nFiltering clusters:\", paste(params$select_clusters, collapse = \", \"))\n",
    "  \n",
    "  # 确保 select_clusters 为字符型以匹配元数据类型\n",
    "  select_clusters <- as.character(params$select_clusters)\n",
    "  \n",
    "  # 验证请求的簇是否存在\n",
    "  available_clusters <- unique(seurat_obj@meta.data[[params$cluster_by]])\n",
    "  missing_clusters <- setdiff(select_clusters, available_clusters)\n",
    "  if (length(missing_clusters) > 0) {\n",
    "    warning(\"The following clusters do not exist and will be ignored: \", paste(missing_clusters, collapse = \", \"))\n",
    "  }\n",
    "  \n",
    "  # 仅保留有效簇\n",
    "  valid_clusters <- intersect(select_clusters, available_clusters)\n",
    "  if (length(valid_clusters) == 0) {\n",
    "    stop(\"No valid clusters found for differential analysis\")\n",
    "  }\n",
    "  \n",
    "  # 将数据子集化为选定的簇\n",
    "  seurat_obj <- subset(seurat_obj, \n",
    "                          cells = rownames(seurat_obj@meta.data)[\n",
    "                            as.character(seurat_obj@meta.data[[params$cluster_by]]) %in% valid_clusters\n",
    "                          ])\n",
    "  \n",
    "  cat(\"\\nCells after filtering:\", ncol(seurat_obj))\n",
    "  cat(\"\\nCluster distribution after filtering:\\n\")\n",
    "  print(table(seurat_obj@meta.data[[params$cluster_by]]))\n",
    "}\n",
    "\n",
    "cat(\"\\nData loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 差异表达与富集分析\n",
    "\n",
    "### 差异表达分析与可视化\n",
    "\n",
    "**簇间比较**：比较并可视化细胞簇之间的差异<br>\n",
    "**簇内分组比较**：比较并可视化特定簇内不同实验组之间的差异"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义差异表达分析和可视化的函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:13.852745Z",
     "iopub.status.busy": "2025-09-16T09:42:13.850811Z",
     "iopub.status.idle": "2025-09-16T09:42:13.889665Z",
     "shell.execute_reply": "2025-09-16T09:42:13.887320Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 函数 1：簇差异表达分析\n",
    "perform_clusters_de <- function(seurat_obj, params) {\n",
    "  cat(\"Performing differential expression analysis across all clusters...\\n\")  \n",
    "\n",
    "  # 查找所有簇的标记物\n",
    "  all_markers <- FindAllMarkers(\n",
    "    seurat_obj,\n",
    "    only.pos = TRUE,\n",
    "    min.pct = params$min_pct,\n",
    "    logfc.threshold = params$logfc_threshold,\n",
    "    test.use = params$de_method\n",
    "  )\n",
    "  \n",
    "  # 保存结果\n",
    "  base_dir <- file.path(params$output_path, \"cluster_DE\")\n",
    "  if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "  output_file <- file.path(base_dir, \"clusters_markers_all.csv\")\n",
    "  write.csv(all_markers, output_file, row.names = FALSE)\n",
    "  cat(\"Results saved to:\", output_file, \"\\n\")\n",
    "  \n",
    "  return(all_markers)\n",
    "}\n",
    "\n",
    "# 函数 2：解析比较对\n",
    "parse_compare_groups <- function(params, available_groups) {\n",
    "  parsed <- list()\n",
    "  \n",
    "  for (comp in params$compare_groups) {\n",
    "    # 确保字符串包含 \"vs\"\n",
    "    if (!grepl(\"vs\", comp)) {\n",
    "      warning(\"Invalid comparison format:\", comp, \"- should contain 'vs'\")\n",
    "      next\n",
    "    }\n",
    "    \n",
    "    # 拆分字符串\n",
    "    parts <- strsplit(comp, \"vs\")[[1]]\n",
    "    if (length(parts) != 2) {\n",
    "      warning(\"Invalid comparison format:\", comp, \"- should have exactly one 'vs'\")\n",
    "      next\n",
    "    }\n",
    "    \n",
    "    ident1 <- trimws(parts[1])\n",
    "    ident2 <- trimws(parts[2])\n",
    "    \n",
    "    # 验证分组是否存在\n",
    "    if (!(ident1 %in% available_groups)) {\n",
    "      warning(\"Group\", ident1, \"not found in available groups\")\n",
    "      next\n",
    "    }\n",
    "    if (!(ident2 %in% available_groups)) {\n",
    "      warning(\"Group\", ident2, \"not found in available groups\")\n",
    "      next\n",
    "    }\n",
    "    \n",
    "    parsed[[length(parsed) + 1]] <- list(\n",
    "      ident1 = ident1,\n",
    "      ident2 = ident2,\n",
    "      comparison = comp\n",
    "    )\n",
    "  }\n",
    "  \n",
    "  return(parsed)\n",
    "}\n",
    "\n",
    "# 函数 3：分组比较差异表达分析\n",
    "perform_group_comparison_de <- function(seurat_obj, params) {\n",
    "  # 验证分组列是否存在\n",
    "  if (!params$group_by %in% colnames(seurat_obj@meta.data)) {\n",
    "    stop(paste(\"Group column\", params$group_by, \"not found in metadata\"))\n",
    "  }\n",
    "  \n",
    "  # 获取所有可用分组\n",
    "  available_groups <- unique(seurat_obj@meta.data[[params$group_by]])\n",
    "  cat(\"Available groups:\", paste(available_groups, collapse = \", \"), \"\\n\")\n",
    "  \n",
    "  # 如果未指定，自动生成所有成对比较\n",
    "  if (is.null(params$compare_groups)) {\n",
    "    if (length(available_groups) >= 2) {\n",
    "      params$compare_groups <- combn(available_groups, 2, function(x) paste(x[1], \"vs\", x[2], sep = \"\"))\n",
    "      cat(\"Auto-generated comparisons:\", paste(params$compare_groups, collapse = \", \"), \"\\n\")\n",
    "    } else {\n",
    "      stop(\"Need at least 2 groups for comparison\")\n",
    "    }\n",
    "  }\n",
    "  \n",
    "  # 解析比较对\n",
    "  parsed_comparisons <- parse_compare_groups(params, available_groups)\n",
    "  \n",
    "  # 获取可用簇\n",
    "  available_clusters <- unique(seurat_obj@meta.data[[params$cluster_by]])\n",
    "  cat(\"Available clusters:\", paste(available_clusters, collapse = \", \"), \"\\n\")\n",
    "  \n",
    "  # 存储所有结果\n",
    "  group_markers_list <- list()\n",
    "  \n",
    "  # 比较每个簇内的分组\n",
    "  for (cluster in available_clusters) {\n",
    "    cat(\"\\nAnalyzing cluster:\", cluster, \"\\n\")\n",
    "    \n",
    "    # 子集化为特定簇\n",
    "    cells_in_cluster <- colnames(seurat_obj)[seurat_obj@meta.data[[params$cluster_by]] == cluster]\n",
    "    cluster_subset <- subset(seurat_obj, cells = cells_in_cluster)\n",
    "\n",
    "    # 确保该簇至少有两个分组\n",
    "    cluster_groups <- unique(cluster_subset@meta.data[[params$group_by]])\n",
    "    if (length(cluster_groups) < 2) {\n",
    "      cat(\"Skipping cluster\", cluster, \"- insufficient groups\\n\")\n",
    "      next\n",
    "    }\n",
    "\n",
    "    # 对每个比较对执行 DE\n",
    "    for (comp in parsed_comparisons) {\n",
    "      ident1 <- comp$ident1\n",
    "      ident2 <- comp$ident2\n",
    "      \n",
    "      # 检查该簇中是否存在这两个分组\n",
    "      if (!(ident1 %in% cluster_groups) || !(ident2 %in% cluster_groups)) {\n",
    "        cat(\"Skipping comparison\", comp$comparison, \"- groups not found in cluster\", cluster, \"\\n\")\n",
    "        next\n",
    "      }\n",
    "      \n",
    "      cat(\"  Comparing\", ident1, \"vs\", ident2, \"\\n\")\n",
    "      \n",
    "      # 设置分组标识\n",
    "      Idents(cluster_subset) <- params$group_by\n",
    "      \n",
    "      # 运行差异表达\n",
    "      tryCatch({\n",
    "        markers <- FindMarkers(\n",
    "          cluster_subset,\n",
    "          ident.1 = ident1,\n",
    "          ident.2 = ident2,\n",
    "          only.pos = FALSE,\n",
    "          min.pct = params$min_pct,\n",
    "          logfc.threshold = params$logfc_threshold,\n",
    "          test.use = params$de_method\n",
    "        )\n",
    "        \n",
    "        # 添加元数据\n",
    "        markers$gene <- rownames(markers)\n",
    "        markers$cluster <- cluster\n",
    "        markers$ident1 <- ident1\n",
    "        markers$ident2 <- ident2\n",
    "        markers$comparison <- comp$comparison\n",
    "        \n",
    "        # Store results\n",
    "        result_name <- paste0(\"cluster_\", cluster, \"_\", comp$comparison)\n",
    "        group_markers_list[[result_name]] <- markers\n",
    "        \n",
    "        cat(\"    Found\", nrow(markers), \"differentially expressed genes\\n\")\n",
    "        \n",
    "      }, error = function(e) {\n",
    "        cat(\"    Error in comparison\", comp$comparison, \":\", e$message, \"\\n\")\n",
    "      })\n",
    "    }\n",
    "  }\n",
    "  \n",
    "  # 合并所有结果\n",
    "  if (length(group_markers_list) > 0) {\n",
    "    group_markers <- do.call(rbind, group_markers_list)\n",
    "    \n",
    "    # 保存结果\n",
    "    base_dir <- file.path(params$output_path, \"group_DE\")\n",
    "    if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "    output_file <- file.path(base_dir, \"group_comparison_markers_all.csv\")\n",
    "    write.csv(group_markers, output_file, row.names = FALSE)\n",
    "    cat(\"\\nGroup comparison results saved to:\", output_file, \"\\n\")\n",
    "    \n",
    "    return(group_markers)\n",
    "  } else {\n",
    "    cat(\"\\nNo valid group comparisons found.\\n\")\n",
    "    return(NULL)\n",
    "  }\n",
    "}\n",
    "\n",
    "# 函数 4：创建簇分析的可视化\n",
    "create_clusters_plots <- function(seurat_obj, all_markers, params, clusters_colors) {\n",
    "  \n",
    "  # 1. 每个簇的前 3 个基因点图\n",
    "  top_genes_per_cluster <- all_markers %>% \n",
    "    group_by(cluster) %>% \n",
    "    top_n(3, avg_log2FC) %>% \n",
    "    pull(gene)\n",
    "  \n",
    "  dotplot_plot <- DotPlot(seurat_obj, features = unique(top_genes_per_cluster)) +\n",
    "    scale_color_gradientn(colors = params$continuous_colors) +\n",
    "    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n",
    "    labs(title = \"Top 3 Marker Genes per Cluster\", \n",
    "         subtitle = \"DotPlot\")\n",
    "  \n",
    "  # 保存点图\n",
    "  ggsave(file.path(params$output_path, \"cluster_DE\", \"clusters_dotplot.pdf\"), \n",
    "         dotplot_plot, width = 10, height = 8, dpi = params$plot_dpi)\n",
    "  \n",
    "  # 2. 热门基因热图\n",
    "  top_genes_heatmap <- all_markers %>% \n",
    "    group_by(cluster) %>% \n",
    "    top_n(5, avg_log2FC) %>% \n",
    "    pull(gene)\n",
    "\n",
    "  # 检查基因是否存在于 scale.data 插槽中\n",
    "  available_genes_heatmap <- intersect(unique(top_genes_heatmap), rownames(GetAssayData(seurat_obj, slot = \"scale.data\")))\n",
    "  \n",
    "  if (length(available_genes_heatmap) > 0) {\n",
    "    heatmap_plot <- DoHeatmap(seurat_obj, features = available_genes_heatmap, \n",
    "                              slot = \"scale.data\", \n",
    "                              group.colors = clusters_colors) +\n",
    "      scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "      labs(title = \"Top 5 Marker Genes per Cluster\", \n",
    "          subtitle = \"Heatmap\")\n",
    "    \n",
    "    # 保存热图\n",
    "    ggsave(file.path(params$output_path, \"cluster_DE\", \"clusters_heatmap.pdf\"), \n",
    "          heatmap_plot, width = 12, height = 10, dpi = params$plot_dpi)\n",
    "  } else {\n",
    "    cat(\"Warning: No requested features found in the scale.data slot for the RNA assay. Skipping heatmap generation.\\n\")\n",
    "  }\n",
    "  \n",
    "  # 3. 热门基因小提琴图\n",
    "  top_genes_violin <- all_markers %>% \n",
    "    group_by(cluster) %>% \n",
    "    top_n(3, avg_log2FC) %>% \n",
    "    pull(gene)\n",
    "\n",
    "  violin_plot <- VlnPlot(seurat_obj, features = unique(top_genes_violin), \n",
    "                         stack = TRUE, flip = TRUE, \n",
    "                         fill.by = \"ident\",cols = clusters_colors) +\n",
    "    theme(axis.text.x = element_text(angle = 0, hjust = 1))+\n",
    "    labs(title = \"Top 3 Marker Genes per Cluster\", \n",
    "         subtitle = \"Violin Plot\")\n",
    "  \n",
    "  # 保存小提琴图\n",
    "  ggsave(file.path(params$output_path, \"cluster_DE\", \"clusters_violin_plot.pdf\"), \n",
    "         violin_plot, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "  \n",
    "  cat(\"\\nAll clusters visualization plots saved to:\", params$output_path)\n",
    "}\n",
    "\n",
    "# 函数 5：创建分组比较分析的可视化\n",
    "create_group_comparison_plots <- function(seurat_obj, group_markers, params) {\n",
    "  if (is.null(group_markers) || nrow(group_markers) == 0) {\n",
    "    cat(\"No group comparison results to visualize.\\n\")\n",
    "    return(invisible(NULL))\n",
    "  }\n",
    "\n",
    "  group_col   <- params$group_by\n",
    "  cluster_col <- params$cluster_by\n",
    "\n",
    "  # 1. 按簇分组比例的堆叠条形图\n",
    "  cat(\"Creating stacked bar chart for group proportions by cluster...\\n\")\n",
    "  prop_data <- seurat_obj@meta.data %>%\n",
    "    dplyr::group_by(!!rlang::sym(cluster_col), !!rlang::sym(group_col)) %>%\n",
    "    dplyr::summarise(count = dplyr::n(), .groups = 'drop') %>%\n",
    "    dplyr::group_by(!!rlang::sym(cluster_col)) %>%\n",
    "    dplyr::mutate(proportion = count / sum(count)) %>%\n",
    "    dplyr::ungroup()\n",
    "\n",
    "  stacked_bar <- ggplot(prop_data, aes(x = !!rlang::sym(cluster_col), y = proportion, fill = !!rlang::sym(group_col))) +\n",
    "    geom_bar(stat = \"identity\", position = \"stack\") +\n",
    "    scale_fill_brewer(palette = params$discrete_colors) +\n",
    "    labs(title = \"Group Proportions by Cluster\",\n",
    "         subtitle = \"Stacked Bar Chart\",\n",
    "         x = \"Cluster\", y = \"Proportion\", fill = \"Group\") +\n",
    "    theme_minimal() +\n",
    "    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n",
    "    scale_y_continuous(labels = scales::percent_format())\n",
    "  ggsave(file.path(params$output_path, \"group_DE\", \"group_comparison_group_proportions_by_cluster.pdf\"),\n",
    "         stacked_bar, width = 10, height = 6, dpi = params$plot_dpi)\n",
    "\n",
    "  # 2. 每个簇输出：点图、小提琴图、热图、火山图\n",
    "  cat(\"Creating per-cluster Dot/Violin/Heatmap/Volcano...\\n\")\n",
    "\n",
    "  clusters_all <- sort(unique(as.character(seurat_obj@meta.data[[cluster_col]])))\n",
    "  n_clusters   <- length(clusters_all)\n",
    "  clusters_colors <- RColorBrewer::brewer.pal(min(max(3, n_clusters), 12), params$discrete_colors)\n",
    "  if (n_clusters > length(clusters_colors)) {\n",
    "    clusters_colors <- grDevices::colorRampPalette(clusters_colors)(n_clusters)\n",
    "  }\n",
    "\n",
    "  out_dir <- file.path(params$output_path, \"group_DE\")\n",
    "  if (!dir.exists(out_dir)) dir.create(out_dir, recursive = TRUE)\n",
    "\n",
    "  for (cluster in clusters_all) {\n",
    "    cluster_data <- group_markers[group_markers$cluster == cluster, , drop = FALSE]\n",
    "    if (nrow(cluster_data) == 0) next\n",
    "\n",
    "    up_genes <- cluster_data %>%\n",
    "      dplyr::filter(avg_log2FC > 0) %>%\n",
    "      dplyr::top_n(5, avg_log2FC) %>%\n",
    "      dplyr::pull(gene)\n",
    "\n",
    "    down_genes <- cluster_data %>%\n",
    "      dplyr::filter(avg_log2FC < 0) %>%\n",
    "      dplyr::top_n(5, abs(avg_log2FC)) %>%\n",
    "      dplyr::pull(gene)\n",
    "\n",
    "    cluster_genes <- unique(c(up_genes, down_genes))\n",
    "    cluster_genes <- cluster_genes[cluster_genes %in% rownames(seurat_obj)]\n",
    "    if (length(cluster_genes) == 0) next\n",
    "\n",
    "    cells_in_cluster <- rownames(seurat_obj@meta.data)[as.character(seurat_obj@meta.data[[cluster_col]]) == cluster]\n",
    "    cluster_subset <- subset(seurat_obj, cells = cells_in_cluster)\n",
    "\n",
    "    # 2.1 点图\n",
    "    cluster_dot <- DotPlot(cluster_subset, features = cluster_genes, cols = c(\"lightgrey\", \"red\"),\n",
    "                           group.by = group_col) +\n",
    "      scale_color_gradientn(colors = params$continuous_colors) +\n",
    "      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n",
    "      labs(title = paste(\"Cluster\", cluster, \"DE Genes\"),\n",
    "           subtitle = \"Top 5 Up + Top 5 Down\") +\n",
    "      theme_bw()\n",
    "    ggsave(file.path(out_dir, paste0(\"cluster_\", cluster, \"_de_genes_dotplot.pdf\")),\n",
    "           cluster_dot, width = 10, height = 6, dpi = params$plot_dpi)\n",
    "\n",
    "    # 2.2 小提琴图\n",
    "    cluster_violin <- VlnPlot(cluster_subset, features = cluster_genes,\n",
    "                              stack = TRUE, flip = TRUE,\n",
    "                              fill.by = \"ident\",\n",
    "                              group.by = group_col,\n",
    "                              cols = brewer.pal(8, params$discrete_colors)[1:2]) +\n",
    "      NoLegend() +\n",
    "      labs(title = paste(\"Cluster\", cluster, \"DE Genes\"),\n",
    "           subtitle = \"Top 5 Up + Top 5 Down\")\n",
    "    ggsave(file.path(out_dir, paste0(\"cluster_\", cluster, \"_de_genes_violin.pdf\")),\n",
    "           cluster_violin, width = 6, height = 10, dpi = params$plot_dpi)\n",
    "\n",
    "    # 2.3 热图\n",
    "    available_genes_heatmap <- intersect(cluster_genes, rownames(GetAssayData(cluster_subset, slot = \"scale.data\")))\n",
    "    if (length(available_genes_heatmap) > 1) {\n",
    "      heatmap_cluster <- DoHeatmap(cluster_subset, features = available_genes_heatmap,\n",
    "                                   slot = \"scale.data\",\n",
    "                                   group.colors = clusters_colors,\n",
    "                                   group.by = group_col, draw.lines = TRUE) +\n",
    "        scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "        labs(title = paste(\"Cluster\", cluster, \"DE Genes\"),\n",
    "             subtitle = \"Top 5 Up + Top 5 Down\")\n",
    "      ggsave(file.path(out_dir, paste0(\"cluster_\", cluster, \"_de_genes_heatmap.pdf\")),\n",
    "             heatmap_cluster, width = 10, height = 3, dpi = params$plot_dpi)\n",
    "    } else {\n",
    "      cat(\"No genes found in scale.data for cluster\", cluster, \"heatmap generation.\\n\")\n",
    "    }\n",
    "\n",
    "    # 2.4 火山图\n",
    "    cluster_data$top_genes <- ifelse(cluster_data$gene %in% cluster_genes, cluster_data$gene, \"\")\n",
    "    volcano_plot <- ggplot(cluster_data, aes(x = avg_log2FC, y = -log10(p_val_adj))) +\n",
    "      geom_point(aes(color = ifelse(p_val_adj < 0.05 & abs(avg_log2FC) > 0.5,\n",
    "                                    ifelse(avg_log2FC > 0, \"Up\", \"Down\"), \"NS\")),\n",
    "                 alpha = 0.6, size = 1.5) +\n",
    "      scale_color_manual(values = c(\"Down\" = \"blue\", \"NS\" = \"grey\", \"Up\" = \"red\")) +\n",
    "      geom_hline(yintercept = -log10(0.05), linetype = \"dashed\", color = \"red\", alpha = 0.7) +\n",
    "      geom_vline(xintercept = c(-0.5, 0.5), linetype = \"dashed\", color = \"red\", alpha = 0.7) +\n",
    "      ggrepel::geom_text_repel(aes(label = top_genes),\n",
    "                               size = 3, max.overlaps = 20,\n",
    "                               box.padding = 0.5, point.padding = 0.3,\n",
    "                               segment.color = \"black\", segment.size = 0.3) +\n",
    "      labs(title = paste(\"Volcano Plot - Cluster\", cluster),\n",
    "           subtitle = \"Top 5 Up + Top 5 Down Genes Marked\",\n",
    "           x = \"Average Log2 Fold Change\",\n",
    "           y = \"-log10(Adjusted P-value)\",\n",
    "           color = \"Regulation\") +\n",
    "      theme_bw() +\n",
    "      theme(legend.position = \"bottom\",\n",
    "            plot.title = element_text(hjust = 0.5, size = 14),\n",
    "            plot.subtitle = element_text(hjust = 0.5, size = 12))\n",
    "    ggsave(file.path(out_dir, paste0(\"cluster_\", cluster, \"_de_genes_volcano.pdf\")),\n",
    "           volcano_plot, width = 6, height = 6, dpi = params$plot_dpi)\n",
    "  }\n",
    "\n",
    "  cat(\"\\nGroup comparison visualization plots saved to:\", params$output_path, \"\\n\")\n",
    "  invisible(NULL)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行差异分析和可视化 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:13.895303Z",
     "iopub.status.busy": "2025-09-16T09:42:13.893347Z",
     "iopub.status.idle": "2025-09-16T09:42:38.689256Z",
     "shell.execute_reply": "2025-09-16T09:42:38.687191Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Performing GROUP COMPARISON differential expression analysis...\n",
      "Available groups: S133A, S133T \n",
      "Available clusters: 0, 6, 5 \n",
      "\n",
      "Analyzing cluster: 0 \n",
      "  Comparing S133T vs S133A \n",
      "    Found 293 differentially expressed genes\n",
      "\n",
      "Analyzing cluster: 6 \n",
      "  Comparing S133T vs S133A \n",
      "    Found 548 differentially expressed genes\n",
      "\n",
      "Analyzing cluster: 5 \n",
      "  Comparing S133T vs S133A \n",
      "    Found 855 differentially expressed genes\n",
      "\n",
      "Group comparison results saved to: /home/demo-seekgene-com/workspace/project/demo-seekgene-com/DiffEnrich/result/group_DE/group_comparison_markers_all.csv \n",
      "Group comparison filtering results:\n",
      "- Total genes before filtering: 1696 \n",
      "- Total genes after filtering: 933 \n",
      "- P-value cutoff: 0.05 \n",
      "- Log2FC cutoff: 0.5 \n",
      "Creating stacked bar chart for group proportions by cluster...\n",
      "Creating per-cluster Dot/Violin/Heatmap/Volcano...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Warning message:\n",
      "“Scaling data with a low number of groups may produce misleading results”\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mcolour\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mcolour\u001b[39m, which will replace the existing scale.\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mfill\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mfill\u001b[39m, which will replace the existing scale.\n",
      "Warning message:\n",
      "“Scaling data with a low number of groups may produce misleading results”\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mcolour\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mcolour\u001b[39m, which will replace the existing scale.\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mfill\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mfill\u001b[39m, which will replace the existing scale.\n",
      "Warning message:\n",
      "“Scaling data with a low number of groups may produce misleading results”\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mcolour\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mcolour\u001b[39m, which will replace the existing scale.\n",
      "\u001b[1m\u001b[22mScale for \u001b[32mfill\u001b[39m is already present.\n",
      "Adding another scale for \u001b[32mfill\u001b[39m, which will replace the existing scale.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Group comparison visualization plots saved to: /home/demo-seekgene-com/workspace/project/demo-seekgene-com/DiffEnrich/result \n",
      "\n",
      "Summary of group comparison differential expression (filtered):\n",
      "- Total differentially expressed genes: 933\n",
      "- Comparisons performed: S133TvsS133A\n",
      "- Clusters analyzed: 0, 6, 5"
     ]
    }
   ],
   "source": [
    "if (params$differential_analysis_type == \"clusters\") {\n",
    "  cat(\"Performing CLUSTERS differential expression analysis...\\n\")\n",
    "  \n",
    "  # 执行分析（Seurat DE 的所有基因）\n",
    "  clusters_markers_all <- perform_clusters_de(seurat_obj, params)\n",
    "  group_markers <- NULL\n",
    "  \n",
    "  # 用于报告/绘图的过滤版本\n",
    "  if (!is.null(clusters_markers_all)) {\n",
    "    clusters_markers <- clusters_markers_all %>%\n",
    "      dplyr::filter(\n",
    "        is.finite(p_val_adj) & p_val_adj < params$p_adj_cutoff\n",
    "      ) %>%\n",
    "      dplyr::filter(\n",
    "        is.finite(avg_log2FC) & abs(avg_log2FC) >= params$abs_log2fc_cutoff\n",
    "      )\n",
    "    write.csv(clusters_markers, file.path(params$output_path, \"cluster_DE\", \"clusters_markers_filtered.csv\"), row.names = FALSE)\n",
    "    \n",
    "    cat(\"Filtering results:\\n\")\n",
    "    cat(\"- Total genes before filtering:\", nrow(clusters_markers_all), \"\\n\")\n",
    "    cat(\"- Total genes after filtering:\", nrow(clusters_markers), \"\\n\")\n",
    "    cat(\"- P-value cutoff:\", params$p_adj_cutoff, \"\\n\")\n",
    "    cat(\"- Log2FC cutoff:\", params$abs_log2fc_cutoff, \"\\n\")\n",
    "  } else {\n",
    "    clusters_markers <- NULL\n",
    "  }\n",
    "\n",
    "  # 可视化使用过滤后的数据\n",
    "  create_clusters_plots(seurat_obj, clusters_markers, params, clusters_colors)\n",
    "\n",
    "  # 摘要（过滤后）\n",
    "  cat(\"\\nSummary of all clusters differential expression (filtered):\")\n",
    "  cat(\"\\n- Total differentially expressed genes:\", nrow(clusters_markers))\n",
    "  cat(\"\\n- Average genes per cluster:\", ifelse(length(unique(clusters_markers$cluster))>0, round(nrow(clusters_markers) / length(unique(clusters_markers$cluster)), 1), 0))\n",
    "  cat(\"\\n- Top 5 genes per cluster:\")\n",
    "  if (nrow(clusters_markers) > 0) print(clusters_markers %>% group_by(cluster) %>% top_n(5, avg_log2FC))\n",
    "  \n",
    "} else if (params$differential_analysis_type == \"group_comparison\") {\n",
    "  cat(\"Performing GROUP COMPARISON differential expression analysis...\\n\")\n",
    "  \n",
    "  # 执行分析（所有）\n",
    "  group_markers_all <- perform_group_comparison_de(seurat_obj, params)\n",
    "  clusters_markers <- NULL\n",
    "  \n",
    "  # 过滤后\n",
    "  if (!is.null(group_markers_all)) {\n",
    "    group_markers <- group_markers_all %>%\n",
    "      dplyr::filter(\n",
    "        is.finite(p_val_adj) & p_val_adj < params$p_adj_cutoff\n",
    "      ) %>%\n",
    "      dplyr::filter(\n",
    "        is.finite(avg_log2FC) & abs(avg_log2FC) >= params$abs_log2fc_cutoff\n",
    "      )\n",
    "    write.csv(group_markers, file.path(params$output_path, \"group_DE\", \"group_comparison_markers_filtered.csv\"), row.names = FALSE)\n",
    "    \n",
    "    cat(\"Group comparison filtering results:\\n\")\n",
    "    cat(\"- Total genes before filtering:\", nrow(group_markers_all), \"\\n\")\n",
    "    cat(\"- Total genes after filtering:\", nrow(group_markers), \"\\n\")\n",
    "    cat(\"- P-value cutoff:\", params$p_adj_cutoff, \"\\n\")\n",
    "    cat(\"- Log2FC cutoff:\", params$abs_log2fc_cutoff, \"\\n\")\n",
    "  } else {\n",
    "    group_markers <- NULL\n",
    "  }\n",
    "  \n",
    "  # 可视化使用过滤后的数据\n",
    "  create_group_comparison_plots(seurat_obj, group_markers, params)\n",
    "  \n",
    "  # 摘要（过滤后）\n",
    "  if (!is.null(group_markers)) {\n",
    "    cat(\"\\nSummary of group comparison differential expression (filtered):\")\n",
    "    cat(\"\\n- Total differentially expressed genes:\", nrow(group_markers))\n",
    "    cat(\"\\n- Comparisons performed:\", paste(unique(group_markers$comparison), collapse = \", \"))\n",
    "    cat(\"\\n- Clusters analyzed:\", paste(unique(group_markers$cluster), collapse = \", \"))\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 富集分析\n",
    "\n",
    "执行 ORA（过表达分析）或 GSEA（基因集富集分析）以了解差异表达基因的生物学意义。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "准备富集分析所需的数据库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:38.693839Z",
     "iopub.status.busy": "2025-09-16T09:42:38.692223Z",
     "iopub.status.idle": "2025-09-16T09:42:38.718018Z",
     "shell.execute_reply": "2025-09-16T09:42:38.716047Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Enrichment databases setup for species: human \n",
      "Available databases: GO, KEGG, HALLMARK, REACTOME \n"
     ]
    }
   ],
   "source": [
    "# 设置富集分析数据库\n",
    "setup_enrichment_databases <- function(species) {\n",
    "  \n",
    "  if (species == \"human\") {\n",
    "    org_db <- org.Hs.eg.db\n",
    "    kegg_org <- \"hsa\"\n",
    "  } else if (species == \"mouse\") {\n",
    "    org_db <- org.Mm.eg.db\n",
    "    kegg_org <- \"mmu\"\n",
    "  } else {\n",
    "    stop(\"Species must be 'human' or 'mouse'\")\n",
    "  }\n",
    "  \n",
    "  # 解析富集数据库\n",
    "  databases <- strsplit(params$enrichment_databases, \",\")[[1]]\n",
    "  \n",
    "  return(list(\n",
    "    org_db = org_db,\n",
    "    kegg_org = kegg_org,\n",
    "    databases = databases\n",
    "  ))\n",
    "}\n",
    "\n",
    "# 设置数据库\n",
    "db_setup <- setup_enrichment_databases(params$species)\n",
    "cat(\"Enrichment databases setup for species:\", params$species, \"\\n\")\n",
    "cat(\"Available databases:\", paste(db_setup$databases, collapse = \", \"), \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义富集分析和可视化的函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:38.722923Z",
     "iopub.status.busy": "2025-09-16T09:42:38.721420Z",
     "iopub.status.idle": "2025-09-16T09:42:38.762326Z",
     "shell.execute_reply": "2025-09-16T09:42:38.760232Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 函数 1：读取 GMT 文件\n",
    "read_gmt_file <- function(file_path) {\n",
    "  if (!file.exists(file_path)) {\n",
    "    stop(\"GMT file not found:\", file_path)\n",
    "  }\n",
    "  \n",
    "  # 读取 GMT 文件\n",
    "  gmt_data <- readLines(file_path)\n",
    "  \n",
    "  # 解析 GMT 格式\n",
    "  genesets <- list()\n",
    "  for (line in gmt_data) {\n",
    "    parts <- strsplit(line, \"\\t\")[[1]]\n",
    "    if (length(parts) >= 3) {\n",
    "      term_name <- parts[1]\n",
    "      term_description <- parts[2]\n",
    "      genes <- parts[3:length(parts)]\n",
    "      genes <- genes[genes != \"\"]  # 移除空字符串\n",
    "      \n",
    "      if (length(genes) > 0) {\n",
    "        genesets[[term_name]] <- data.frame(\n",
    "          term = term_name,\n",
    "          gene = genes,\n",
    "          stringsAsFactors = FALSE\n",
    "        )\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "  \n",
    "  # 合并所有基因集\n",
    "  if (length(genesets) > 0) {\n",
    "    combined_genesets <- do.call(rbind, genesets)\n",
    "    return(combined_genesets)\n",
    "  } else {\n",
    "    stop(\"No valid genesets found in GMT file\")\n",
    "  }\n",
    "}\n",
    "\n",
    "# 函数 2：ORA 富集分析\n",
    "perform_ora_analysis <- function(markers_data, db_setup, params) {\n",
    "  cat(\"Performing Over-Representation Analysis (ORA) with local data (split Up/Down)...\\n\")\n",
    "  \n",
    "  # 按（簇或比较）和方向准备基因列表\n",
    "  if (\"cluster\" %in% colnames(markers_data)) {\n",
    "    split_key <- markers_data$cluster\n",
    "  } else {\n",
    "    split_key <- markers_data$comparison\n",
    "  }\n",
    "  \n",
    "  markers_data$direction <- ifelse(is.finite(markers_data$avg_log2FC) & markers_data$avg_log2FC > 0, \"Up\",\n",
    "                                   ifelse(is.finite(markers_data$avg_log2FC) & markers_data$avg_log2FC < 0, \"Down\", NA))\n",
    "  markers_data <- markers_data[!is.na(markers_data$direction), , drop = FALSE]\n",
    "\n",
    "  index_keys <- unique(split_key)\n",
    "  directions <- c(\"Up\",\"Down\")\n",
    "\n",
    "  # 将 symbol 转换为 entrez\n",
    "  convert_to_entrez <- function(genes) {\n",
    "    m <- AnnotationDbi::select(db_setup$org_db, keys = genes, columns = \"ENTREZID\", keytype = \"SYMBOL\")\n",
    "    unique(m$ENTREZID[!is.na(m$ENTREZID)])\n",
    "  }\n",
    "\n",
    "  enrichment_results <- list()\n",
    "  combined_results_list <- list()\n",
    "\n",
    "  for (db in db_setup$databases) {\n",
    "    cat(\"Analyzing database:\", db, \"\\n\")\n",
    "\n",
    "    for (k in index_keys) {\n",
    "      for (dirc in directions) {\n",
    "        sub <- markers_data[split_key == k & markers_data$direction == dirc, , drop = FALSE]\n",
    "        if (nrow(sub) == 0) next\n",
    "        genes <- unique(sub$gene)\n",
    "        eg <- convert_to_entrez(genes)\n",
    "        if (length(eg) == 0) next\n",
    "\n",
    "        if (db == \"GO\") {\n",
    "          ego <- enrichGO(gene = eg,\n",
    "                          OrgDb = db_setup$org_db,\n",
    "                          ont = \"ALL\",\n",
    "                          pAdjustMethod = \"BH\",\n",
    "                          pvalueCutoff = params$pvalue_cutoff,\n",
    "                          qvalueCutoff = params$qvalue_cutoff,\n",
    "                          minGSSize = params$minGSSize,\n",
    "                          maxGSSize = params$maxGSSize)\n",
    "          if (nrow(ego) > 0) {\n",
    "            ego_df <- as.data.frame(ego)\n",
    "            ego_df$cluster   <- k\n",
    "            ego_df$database  <- db\n",
    "            ego_df$direction <- dirc\n",
    "            enrichment_results[[paste(db, k, dirc, sep = \"_\")]] <- ego\n",
    "            combined_results_list[[paste(db, k, dirc, sep = \"_\")]] <- ego_df\n",
    "          }\n",
    "\n",
    "        } else if (db == \"KEGG\") {\n",
    "          ekegg <- enrichKEGG(gene = eg,\n",
    "                              organism = db_setup$kegg_org,\n",
    "                              pAdjustMethod = \"BH\",\n",
    "                              pvalueCutoff = params$pvalue_cutoff,\n",
    "                              qvalueCutoff = params$qvalue_cutoff,\n",
    "                              minGSSize = params$minGSSize,\n",
    "                              maxGSSize = params$maxGSSize)\n",
    "          if (nrow(ekegg) > 0) {\n",
    "            ekegg_df <- as.data.frame(ekegg)\n",
    "            ekegg_df$cluster   <- k\n",
    "            ekegg_df$database  <- db\n",
    "            ekegg_df$direction <- dirc\n",
    "            enrichment_results[[paste(db, k, dirc, sep = \"_\")]] <- ekegg\n",
    "            combined_results_list[[paste(db, k, dirc, sep = \"_\")]] <- ekegg_df\n",
    "          }\n",
    "\n",
    "        } else if (db == \"HALLMARK\") {\n",
    "          data_dir <- \"/jp_envs/Enrichment/\"\n",
    "          hallmark_file <- file.path(data_dir, \"h.all.v7.5.1.entrez.gmt\")\n",
    "          if (file.exists(hallmark_file)) {\n",
    "            hallmark_genesets <- read_gmt_file(hallmark_file)\n",
    "            emsig <- tryCatch({\n",
    "              enricher(gene = eg,\n",
    "                       TERM2GENE = hallmark_genesets,\n",
    "                       pAdjustMethod = \"BH\",\n",
    "                       pvalueCutoff = params$pvalue_cutoff,\n",
    "                       qvalueCutoff = params$qvalue_cutoff,\n",
    "                       minGSSize = params$minGSSize,\n",
    "                       maxGSSize = params$maxGSSize)\n",
    "            }, error = function(e) NULL)\n",
    "            if (!is.null(emsig) && nrow(emsig) > 0) {\n",
    "              emsig_df <- as.data.frame(emsig)\n",
    "              emsig_df$cluster   <- k\n",
    "              emsig_df$database  <- db\n",
    "              emsig_df$direction <- dirc\n",
    "              enrichment_results[[paste(db, k, dirc, sep = \"_\")]] <- emsig\n",
    "              combined_results_list[[paste(db, k, dirc, sep = \"_\")]] <- emsig_df\n",
    "            }\n",
    "          } else {\n",
    "            cat(\"Warning: HALLMARK file not found, skipping...\\n\")\n",
    "          }\n",
    "\n",
    "        } else if (db == \"REACTOME\") {\n",
    "          data_dir <- \"/jp_envs/Enrichment/\"\n",
    "          reactome_file <- file.path(data_dir, \"c2.cp.reactome.v7.5.1.entrez.gmt\")\n",
    "          if (file.exists(reactome_file)) {\n",
    "            reactome_genesets <- read_gmt_file(reactome_file)\n",
    "            emsig <- tryCatch({\n",
    "              enricher(gene = eg,\n",
    "                       TERM2GENE = reactome_genesets,\n",
    "                       pAdjustMethod = \"BH\",\n",
    "                       pvalueCutoff = params$pvalue_cutoff,\n",
    "                       qvalueCutoff = params$qvalue_cutoff,\n",
    "                       minGSSize = params$minGSSize,\n",
    "                       maxGSSize = params$maxGSSize)\n",
    "            }, error = function(e) NULL)\n",
    "            if (!is.null(emsig) && nrow(emsig) > 0) {\n",
    "              emsig_df <- as.data.frame(emsig)\n",
    "              emsig_df$cluster   <- k\n",
    "              emsig_df$database  <- db\n",
    "              emsig_df$direction <- dirc\n",
    "              enrichment_results[[paste(db, k, dirc, sep = \"_\")]] <- emsig\n",
    "              combined_results_list[[paste(db, k, dirc, sep = \"_\")]] <- emsig_df\n",
    "            }\n",
    "          } else {\n",
    "            cat(\"Warning: REACTOME file not found, skipping...\\n\")\n",
    "          }\n",
    "        }\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "  \n",
    "  # 合并结果\n",
    "  if (length(combined_results_list) > 0) {\n",
    "    all_columns <- unique(unlist(lapply(combined_results_list, colnames)))\n",
    "    cat(\"All columns found:\", paste(all_columns, collapse = \", \"), \"\\n\")\n",
    "    \n",
    "    standardized_results <- lapply(combined_results_list, function(df) {\n",
    "      for (col in all_columns) {\n",
    "        if (!col %in% colnames(df)) df[[col]] <- NA\n",
    "      }\n",
    "      df <- df[, all_columns, drop = FALSE]\n",
    "      df\n",
    "    })\n",
    "    combined_results <- do.call(rbind, standardized_results)\n",
    "\n",
    "    # 保存结果（带方向）\n",
    "    base_dir <- file.path(params$output_path, \"ORA\")\n",
    "    if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "    output_file <- file.path(base_dir, \"ora_enrichment_results.csv\")\n",
    "    write.csv(combined_results, output_file, row.names = FALSE)\n",
    "    cat(\"ORA results saved to:\", output_file, \"\\n\")\n",
    "\n",
    "    return(list(results = combined_results, enrichment_objects = enrichment_results))\n",
    "  } else {\n",
    "    cat(\"No significant enrichment results found.\\n\")\n",
    "    return(NULL)\n",
    "  }\n",
    "}\n",
    "\n",
    "# 函数 3：GSEA 富集分析\n",
    "perform_gsea_analysis <- function(markers_data, db_setup, params) {\n",
    "  cat(\"Performing Gene Set Enrichment Analysis (GSEA) with local data...\\n\")\n",
    "  \n",
    "  # 准备排序的基因列表 \n",
    "  if (\"cluster\" %in% colnames(markers_data)) {\n",
    "    gene_lists <- split(markers_data, markers_data$cluster)\n",
    "  } else {\n",
    "    gene_lists <- split(markers_data, markers_data$comparison)\n",
    "  }\n",
    "  \n",
    "  # 将基因符号转换为 Entrez ID 并创建排序列表\n",
    "  ranked_lists <- list()\n",
    "  \n",
    "  for (name in names(gene_lists)) {\n",
    "    cluster_data <- gene_lists[[name]]\n",
    "    \n",
    "    # 基于对数倍数变化创建排序列表（无过滤）\n",
    "    ranked_genes <- cluster_data$avg_log2FC\n",
    "    names(ranked_genes) <- cluster_data$gene\n",
    "    \n",
    "    # 按倍数变化排序\n",
    "    ranked_genes <- sort(ranked_genes, decreasing = TRUE)\n",
    "    \n",
    "    # 转换为 Entrez ID\n",
    "    entrez_mapping <- AnnotationDbi::select(db_setup$org_db, \n",
    "                                           keys = names(ranked_genes), \n",
    "                                           columns = \"ENTREZID\", \n",
    "                                           keytype = \"SYMBOL\")\n",
    "    \n",
    "    # 使用 Entrez ID 创建排序列表\n",
    "    entrez_ranked <- ranked_genes[match(entrez_mapping$SYMBOL, names(ranked_genes))]\n",
    "    names(entrez_ranked) <- entrez_mapping$ENTREZID\n",
    "    entrez_ranked <- entrez_ranked[!is.na(names(entrez_ranked))]\n",
    "    \n",
    "    ranked_lists[[name]] <- entrez_ranked\n",
    "  }\n",
    "  \n",
    "  # 对每个数据库执行 GSEA\n",
    "  gsea_results <- list()\n",
    "  combined_results_list <- list()\n",
    "  \n",
    "  for (db in db_setup$databases) {\n",
    "    cat(\"Analyzing database:\", db, \"\\n\")\n",
    "    \n",
    "    if (db == \"GO\") {\n",
    "      # GO GSEA\n",
    "      for (name in names(ranked_lists)) {\n",
    "        if (length(ranked_lists[[name]]) > 0) {\n",
    "          gsea_go <- gseGO(geneList = ranked_lists[[name]],\n",
    "                           OrgDb = db_setup$org_db,\n",
    "                           ont = \"ALL\",\n",
    "                           pAdjustMethod = \"BH\",\n",
    "                           pvalueCutoff = params$pvalue_cutoff,\n",
    "                           minGSSize = params$minGSSize,\n",
    "                           maxGSSize = params$maxGSSize,\n",
    "                           nPerm = 1000)\n",
    "          \n",
    "          if (nrow(gsea_go) > 0) {\n",
    "            gsea_go_df <- as.data.frame(gsea_go)\n",
    "            gsea_go_df$cluster <- name\n",
    "            gsea_go_df$database <- db\n",
    "            \n",
    "            gsea_results[[paste(db, name, sep = \"_\")]] <- gsea_go\n",
    "            combined_results_list[[paste(db, name, sep = \"_\")]] <- gsea_go_df\n",
    "          }\n",
    "        }\n",
    "      }\n",
    "      \n",
    "    } else if (db == \"KEGG\") {\n",
    "      # KEGG GSEA\n",
    "      for (name in names(ranked_lists)) {\n",
    "        if (length(ranked_lists[[name]]) > 0) {\n",
    "          gsea_kegg <- gseKEGG(geneList = ranked_lists[[name]],\n",
    "                               organism = db_setup$kegg_org,\n",
    "                               pAdjustMethod = \"BH\",\n",
    "                               pvalueCutoff = params$pvalue_cutoff,\n",
    "                               minGSSize = params$minGSSize,\n",
    "                               maxGSSize = params$maxGSSize,\n",
    "                               nPerm = 1000)\n",
    "          \n",
    "          if (nrow(gsea_kegg) > 0) {\n",
    "            gsea_kegg_df <- as.data.frame(gsea_kegg)\n",
    "            gsea_kegg_df$cluster <- name\n",
    "            gsea_kegg_df$database <- db\n",
    "            \n",
    "            gsea_results[[paste(db, name, sep = \"_\")]] <- gsea_kegg\n",
    "            combined_results_list[[paste(db, name, sep = \"_\")]] <- gsea_kegg_df\n",
    "          }\n",
    "        }\n",
    "      }\n",
    "      \n",
    "    } else if (db == \"HALLMARK\") {\n",
    "      # 使用本地数据的 HALLMARK GSEA\n",
    "      data_dir <- \"/jp_envs/Enrichment/\"\n",
    "      hallmark_file <- file.path(data_dir, \"h.all.v7.5.1.entrez.gmt\")\n",
    "      \n",
    "      if (file.exists(hallmark_file)) {\n",
    "        hallmark_genesets <- read_gmt_file(hallmark_file)\n",
    "        \n",
    "        for (name in names(ranked_lists)) {\n",
    "          if (length(ranked_lists[[name]]) > 0) {\n",
    "            tryCatch({\n",
    "              gsea_msig <- GSEA(geneList = ranked_lists[[name]],\n",
    "                                TERM2GENE = hallmark_genesets,\n",
    "                                pAdjustMethod = \"BH\",\n",
    "                                pvalueCutoff = params$pvalue_cutoff,\n",
    "                                minGSSize = params$minGSSize,\n",
    "                                maxGSSize = params$maxGSSize,\n",
    "                                nPerm = 1000)\n",
    "              \n",
    "              if (nrow(gsea_msig) > 0) {\n",
    "                gsea_msig_df <- as.data.frame(gsea_msig)\n",
    "                gsea_msig_df$cluster <- name\n",
    "                gsea_msig_df$database <- db\n",
    "                \n",
    "                gsea_results[[paste(db, name, sep = \"_\")]] <- gsea_msig\n",
    "                combined_results_list[[paste(db, name, sep = \"_\")]] <- gsea_msig_df\n",
    "              }\n",
    "            }, error = function(e) {\n",
    "              cat(\"Warning: Failed to analyze HALLMARK for\", name, \":\", e$message, \"\\n\")\n",
    "            })\n",
    "          }\n",
    "        }\n",
    "      } else {\n",
    "        cat(\"Warning: HALLMARK file not found, skipping...\\n\")\n",
    "      }\n",
    "      \n",
    "    } else if (db == \"REACTOME\") {\n",
    "      # 使用本地数据的 REACTOME GSEA\n",
    "      data_dir <- \"/jp_envs/Enrichment/\"\n",
    "      reactome_file <- file.path(data_dir, \"c2.cp.reactome.v7.5.1.entrez.gmt\")\n",
    "      \n",
    "      if (file.exists(reactome_file)) {\n",
    "        reactome_genesets <- read_gmt_file(reactome_file)\n",
    "        \n",
    "        for (name in names(ranked_lists)) {\n",
    "          if (length(ranked_lists[[name]]) > 0) {\n",
    "            tryCatch({\n",
    "              gsea_msig <- GSEA(geneList = ranked_lists[[name]],\n",
    "                                TERM2GENE = reactome_genesets,\n",
    "                                pAdjustMethod = \"BH\",\n",
    "                                pvalueCutoff = params$pvalue_cutoff,\n",
    "                                minGSSize = params$minGSSize,\n",
    "                                maxGSSize = params$maxGSSize,\n",
    "                                nPerm = 1000)\n",
    "              \n",
    "              if (nrow(gsea_msig) > 0) {\n",
    "                gsea_msig_df <- as.data.frame(gsea_msig)\n",
    "                gsea_msig_df$cluster <- name\n",
    "                gsea_msig_df$database <- db\n",
    "                \n",
    "                gsea_results[[paste(db, name, sep = \"_\")]] <- gsea_msig\n",
    "                combined_results_list[[paste(db, name, sep = \"_\")]] <- gsea_msig_df\n",
    "              }\n",
    "            }, error = function(e) {\n",
    "              cat(\"Warning: Failed to analyze REACTOME for\", name, \":\", e$message, \"\\n\")\n",
    "            })\n",
    "          }\n",
    "        }\n",
    "      } else {\n",
    "        cat(\"Warning: REACTOME file not found, skipping...\\n\")\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "  \n",
    "  # 合并结果\n",
    "  if (length(combined_results_list) > 0) {\n",
    "    all_columns <- unique(unlist(lapply(combined_results_list, colnames)))\n",
    "    cat(\"All columns found:\", paste(all_columns, collapse = \", \"), \"\\n\")\n",
    "    \n",
    "    standardized_results <- lapply(combined_results_list, function(df) {\n",
    "      for (col in all_columns) {\n",
    "        if (!col %in% colnames(df)) df[[col]] <- NA\n",
    "      }\n",
    "      df <- df[, all_columns, drop = FALSE]\n",
    "      df\n",
    "    })\n",
    "    \n",
    "    combined_results <- do.call(rbind, standardized_results)\n",
    "    \n",
    "    # 保存结果\n",
    "    base_dir <- file.path(params$output_path, \"GSEA\")\n",
    "    if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "    output_file <- file.path(base_dir, \"gsea_enrichment_results.csv\")\n",
    "    write.csv(combined_results, output_file, row.names = FALSE)\n",
    "    cat(\"GSEA results saved to:\", output_file, \"\\n\")\n",
    "    \n",
    "    return(list(results = combined_results, gsea_objects = gsea_results, ranked_lists = ranked_lists))\n",
    "  } else {\n",
    "    cat(\"No significant GSEA results found.\\n\")\n",
    "    return(NULL)\n",
    "  }\n",
    "}\n",
    "\n",
    "# 函数 4：创建 ORA 可视化图\n",
    "create_ora_plots <- function(ora_results, params) {\n",
    "  if (is.null(ora_results)) { cat(\"No ORA results to visualize.\\n\"); return(invisible(NULL)) }\n",
    "  df <- ora_results$results\n",
    "  if (is.null(df) || nrow(df) == 0) { cat(\"Empty ORA results.\\n\"); return(invisible(NULL)) }\n",
    "\n",
    "  if (!\"direction\" %in% names(df)) {\n",
    "    df$direction <- \"All\"\n",
    "  } else {\n",
    "    df$direction <- ifelse(is.na(df$direction) | df$direction == \"\", \"All\", as.character(df$direction))\n",
    "  }\n",
    "\n",
    "  # 生成\n",
    "  if (!\"generation\" %in% names(df)) {\n",
    "    gen <- rep(NA_real_, nrow(df))\n",
    "    if (\"GeneRatio\" %in% names(df)) {\n",
    "      gr <- as.character(df$GeneRatio)\n",
    "      gen_from_gr <- suppressWarnings(vapply(gr, function(x) {\n",
    "        if (is.na(x) || x == \"\") return(NA_real_)\n",
    "        if (grepl(\"/\", x, fixed = TRUE)) {\n",
    "          p <- strsplit(x, \"/\", fixed = TRUE)[[1]]\n",
    "          as.numeric(p[1]) / as.numeric(p[2])\n",
    "        } else as.numeric(x)\n",
    "      }, numeric(1)))\n",
    "      gen <- gen_from_gr\n",
    "    }\n",
    "    if (all(is.na(gen)) && \"Count\" %in% names(df)) gen <- suppressWarnings(as.numeric(df$Count))\n",
    "    df$generation <- gen\n",
    "  }\n",
    "\n",
    "  # GO\n",
    "  df$ont <- NA_character_\n",
    "  if (\"ONTOLOGY\" %in% names(df)) df$ont <- as.character(df$ONTOLOGY)\n",
    "  else if (\"gs_subcat\" %in% names(df)) df$ont <- toupper(as.character(df$gs_subcat))\n",
    "  else if (\"subcategory\" %in% names(df)) df$ont <- toupper(as.character(df$subcategory))\n",
    "  else if (\"category\" %in% names(df)) df$ont <- toupper(as.character(df$category))\n",
    "  df$ont[grepl(\"BP\", df$ont, ignore.case = TRUE)] <- \"BP\"\n",
    "  df$ont[grepl(\"CC\", df$ont, ignore.case = TRUE)] <- \"CC\"\n",
    "  df$ont[grepl(\"MF\", df$ont, ignore.case = TRUE)] <- \"MF\"\n",
    "  df$ont[is.na(df$ont) | df$ont == \"\"] <- \"Other\"\n",
    "\n",
    "  clusters   <- sort(unique(df$cluster))\n",
    "  databases  <- sort(unique(df$database))\n",
    "  directions <- sort(unique(df$direction))\n",
    "\n",
    "  for (db in databases) {\n",
    "    for (dirc in directions) {\n",
    "      db_dot_dir <- file.path(params$output_path, \"ORA\", db, dirc, \"dotplot\")\n",
    "      db_bar_dir <- file.path(params$output_path, \"ORA\", db, dirc, \"bar\")\n",
    "      if (!dir.exists(db_dot_dir)) dir.create(db_dot_dir, recursive = TRUE)\n",
    "      if (!dir.exists(db_bar_dir)) dir.create(db_bar_dir, recursive = TRUE)\n",
    "\n",
    "      for (cl in clusters) {\n",
    "        cl_df <- df[df$database == db & df$cluster == cl & df$direction == dirc, , drop = FALSE]\n",
    "        # 过滤\n",
    "        cl_df <- cl_df[is.finite(cl_df$generation) & !is.na(cl_df$Description), , drop = FALSE]\n",
    "        if (nrow(cl_df) == 0) next\n",
    "\n",
    "        # Top10\n",
    "        cl_top <- cl_df[order(-cl_df$generation, cl_df$p.adjust), , drop = FALSE]\n",
    "        if (nrow(cl_top) > 10) cl_top <- cl_top[seq_len(10), , drop = FALSE]\n",
    "        cl_top$neglogp <- -log10(cl_top$p.adjust)\n",
    "\n",
    "        # 1. 点图\n",
    "        cl_top$Description <- forcats::fct_reorder(cl_top$Description, cl_top$generation, .desc = FALSE)\n",
    "        p_dot <- ggplot(cl_top, aes(x = generation, y = Description, size = Count, color = neglogp))\n",
    "        if (identical(db, \"GO\")) {\n",
    "          p_dot <- p_dot + geom_point(aes(shape = ont)) +\n",
    "            scale_shape_manual(values = c(BP = 18, CC = 17, MF = 15, Other = 16), na.translate = FALSE)\n",
    "        } else {\n",
    "          p_dot <- p_dot + geom_point()\n",
    "        }\n",
    "        p_dot <- p_dot +\n",
    "          scale_color_gradientn(colors = params$continuous_colors) +\n",
    "          labs(title = paste0(\"ORA dotplot - \", dirc, \" - Cluster \", cl, \" (\", db, \")\"),\n",
    "               x = \"generation\", y = \"Enriched Terms\",\n",
    "               size = \"Gene Count\", color = \"-log10(Adj.P)\") +\n",
    "          theme_bw() + theme(axis.text.y = element_text(size = 8))\n",
    "        ggsave(file.path(db_dot_dir, paste0(\"cluster_\", cl, \".pdf\")),\n",
    "               p_dot, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "\n",
    "        # 2. 条形图\n",
    "        p_bar <- ggplot(cl_top, aes(\n",
    "          x = generation,\n",
    "          y = forcats::fct_reorder(Description, generation, .desc = FALSE),\n",
    "          fill = neglogp\n",
    "        )) +\n",
    "          geom_col() +\n",
    "          scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "          labs(\n",
    "            title = paste0(\"ORA Bar - \", dirc, \" - Cluster \", cl, \" (\", db, \")\"),\n",
    "            x = \"generation\", y = \"Enriched Terms\", fill = \"-log10(Adj.P)\"\n",
    "          ) +\n",
    "          theme_bw() + theme(axis.text.y = element_text(size = 8))\n",
    "        ggsave(file.path(db_bar_dir, paste0(\"cluster_\", cl, \".pdf\")),\n",
    "               p_bar, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "  cat(\"\\nORA plots saved (split by Up/Down).\\n\")\n",
    "  invisible(NULL)\n",
    "}\n",
    "\n",
    "# 函数 5：创建 GSEA 可视化图\n",
    "create_gsea_plots <- function(gsea_results, params) {\n",
    "  if (is.null(gsea_results)) {\n",
    "    cat(\"No GSEA results to visualize.\\n\")\n",
    "    return(invisible(NULL))\n",
    "  }\n",
    "  df <- gsea_results$results\n",
    "  gsea_objects <- gsea_results$gsea_objects\n",
    "  if (is.null(df) || nrow(df) == 0) {\n",
    "    cat(\"Empty GSEA results.\\n\")\n",
    "    return(invisible(NULL))\n",
    "  }\n",
    "\n",
    "  df$neglogp <- -log10(df$p.adjust)\n",
    "\n",
    "  df$ont <- NA_character_\n",
    "  if (\"ONTOLOGY\" %in% names(df)) df$ont <- as.character(df$ONTOLOGY)\n",
    "  df$ont[grepl(\"BP\", df$ont, ignore.case = TRUE)] <- \"BP\"\n",
    "  df$ont[grepl(\"CC\", df$ont, ignore.case = TRUE)] <- \"CC\"\n",
    "  df$ont[grepl(\"MF\", df$ont, ignore.case = TRUE)] <- \"MF\"\n",
    "  df$ont[is.na(df$ont) | df$ont == \"\"] <- \"Other\"\n",
    "\n",
    "  clusters  <- sort(unique(df$cluster))\n",
    "  databases <- sort(unique(df$database))\n",
    "\n",
    "  base_dir <- file.path(params$output_path, \"GSEA\")\n",
    "  if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "\n",
    "  for (db in databases) {\n",
    "    db_dot_dir   <- file.path(base_dir, db, \"dotplot\")\n",
    "    db_bar_dir   <- file.path(base_dir, db, \"bar\")\n",
    "    db_curve_dir <- file.path(base_dir, db, \"curve\")\n",
    "    if (!dir.exists(db_dot_dir))   dir.create(db_dot_dir, recursive = TRUE)\n",
    "    if (!dir.exists(db_bar_dir))   dir.create(db_bar_dir, recursive = TRUE)\n",
    "    if (!dir.exists(db_curve_dir)) dir.create(db_curve_dir, recursive = TRUE)\n",
    "\n",
    "    for (cl in clusters) {\n",
    "      cl_df <- df[df$database == db & df$cluster == cl, , drop = FALSE]\n",
    "      cl_df <- cl_df[is.finite(cl_df$NES) & !is.na(cl_df$Description), , drop = FALSE]\n",
    "      if (nrow(cl_df) == 0) next\n",
    "\n",
    "      # 按 |NES| 降序，然后按 p.adjust 升序的前 10 个\n",
    "      cl_top <- cl_df[order(-(cl_df$NES), cl_df$p.adjust), , drop = FALSE]\n",
    "      if (nrow(cl_top) > 10) cl_top <- cl_top[seq_len(10), , drop = FALSE]\n",
    "\n",
    "      # 1. 点图\n",
    "      cl_top$Description <- forcats::fct_reorder(cl_top$Description, cl_top$NES, .desc = FALSE)\n",
    "      p_dot <- ggplot(cl_top, aes(x = NES, y = Description, size = setSize, color = neglogp))\n",
    "      if (identical(db, \"GO\")) {\n",
    "        p_dot <- p_dot + geom_point(aes(shape = ont), alpha = 0.85) +\n",
    "          scale_shape_manual(values = c(BP = 16, CC = 17, MF = 15, Other = 18), name = \"GO\")\n",
    "      } else {\n",
    "        p_dot <- p_dot + geom_point(alpha = 0.85)\n",
    "      }\n",
    "      p_dot <- p_dot +\n",
    "        scale_size_continuous(name = \"Set Size\") +\n",
    "        scale_color_gradientn(colors = params$continuous_colors) +\n",
    "        labs(title = paste0(\"GSEA dotplot - Cluster \", cl, \" (\", db, \")\"),\n",
    "             x = \"NES\", y = \"Enriched Terms\", color = \"-log10(Adj.P)\") +\n",
    "        theme_minimal() + theme(axis.text.y = element_text(size = 8))\n",
    "      ggsave(file.path(db_dot_dir, paste0(\"cluster_\", cl, \".pdf\")),\n",
    "             p_dot, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "\n",
    "      # 2. 条形图\n",
    "      p_bar <- ggplot(cl_top, aes(x = NES,\n",
    "                                  y = forcats::fct_reorder(Description, NES, .desc = FALSE),\n",
    "                                  fill = neglogp)) +\n",
    "        geom_col() +\n",
    "        scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "        labs(title = paste0(\"GSEA Bar - Cluster \", cl, \" (\", db, \")\"),\n",
    "             x = \"NES\", y = \"Enriched Terms\", fill = \"-log10(Adj.P)\") +\n",
    "        theme_minimal() + theme(axis.text.y = element_text(size = 8))\n",
    "      ggsave(file.path(db_bar_dir, paste0(\"cluster_\", cl, \".pdf\")),\n",
    "             p_bar, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "\n",
    "      # 3. 曲线图\n",
    "      k_curve <- 5\n",
    "      top_terms <- cl_df[order(cl_df$NES), , drop = FALSE]\n",
    "      top_terms <- top_terms[seq_len(min(k_curve, nrow(top_terms))), , drop = FALSE]\n",
    "\n",
    "      obj_keys <- names(gsea_objects)\n",
    "      if (length(obj_keys) > 0) {\n",
    "        cand <- obj_keys[grepl(paste0(\"^\", db, \"_\"), obj_keys)]\n",
    "        cand <- cand[grepl(paste0(\"_\", cl, \"$\"), cand)]\n",
    "      } else cand <- character(0)\n",
    "\n",
    "      if (length(cand) > 0) {\n",
    "        gobj <- gsea_objects[[cand[1]]]\n",
    "        if (inherits(gobj, \"gseaResult\") && nrow(gobj@result) > 0) {\n",
    "          ids_in_obj <- gobj@result$ID\n",
    "          if (\"ID\" %in% names(top_terms)) {\n",
    "            term_ids <- intersect(top_terms$ID, ids_in_obj)\n",
    "          } else {\n",
    "            term_ids <- gobj@result$ID[match(top_terms$Description, gobj@result$Description)]\n",
    "            term_ids <- term_ids[!is.na(term_ids)]\n",
    "          }\n",
    "          # Draw up to first k_curve curves\n",
    "          if (length(term_ids) > 0) {\n",
    "            # 单独曲线\n",
    "            for (tid in term_ids) {\n",
    "              tid_idx <- which(gobj@result$ID == tid)[1]\n",
    "              if (length(tid_idx) == 1 && !is.na(tid_idx)) {\n",
    "                gp <- enrichplot::gseaplot2(gobj, geneSetID = tid_idx,\n",
    "                                            title = paste0(\"GSEA - \", gobj@result$Description[tid_idx],\n",
    "                                                           \" (Cluster \", cl, \", \", db, \")\"))\n",
    "                safe_name <- gsub(\"[^A-Za-z0-9]+\", \"_\", gobj@result$Description[tid_idx])\n",
    "                ggsave(file.path(db_curve_dir, paste0(\"cluster_\", cl, \"_\", safe_name, \".pdf\")),\n",
    "                       gp, width = 10, height = 8, dpi = params$plot_dpi)\n",
    "              }\n",
    "            }\n",
    "            # 多重曲线 \n",
    "            idx_vec <- which(gobj@result$ID %in% term_ids)\n",
    "            if (length(idx_vec) > 1) {\n",
    "              idx_vec <- idx_vec[seq_len(min(k_curve, length(idx_vec)))]\n",
    "              gp_multi <- enrichplot::gseaplot2(gobj, geneSetID = idx_vec,\n",
    "                                                title = paste0(\"GSEA Top Terms - Cluster \", cl, \" (\", db, \")\"))\n",
    "              ggsave(file.path(db_curve_dir, paste0(\"cluster_\", cl, \"_top_terms.pdf\")),\n",
    "                     gp_multi, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "            }\n",
    "          }\n",
    "        }\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "\n",
    "  cat(\"\\nGSEA plots saved (simple mode) to:\", base_dir, \"\\n\")\n",
    "  invisible(NULL)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行富集分析和可视化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:42:38.766986Z",
     "iopub.status.busy": "2025-09-16T09:42:38.765335Z",
     "iopub.status.idle": "2025-09-16T09:47:29.784006Z",
     "shell.execute_reply": "2025-09-16T09:47:29.781974Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Performing Over-Representation Analysis (ORA)...\n",
      "Performing Over-Representation Analysis (ORA) with local data (split Up/Down)...\n",
      "Analyzing database: GO \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Analyzing database: KEGG \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "Reading KEGG annotation online: \"https://rest.kegg.jp/link/hsa/pathway\"...\n",
      "\n",
      "Reading KEGG annotation online: \"https://rest.kegg.jp/list/pathway/hsa\"...\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Analyzing database: HALLMARK \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Analyzing database: REACTOME \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n",
      "'select()' returned 1:1 mapping between keys and columns\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All columns found: ONTOLOGY, ID, Description, GeneRatio, BgRatio, pvalue, p.adjust, qvalue, geneID, Count, cluster, database, direction, category, subcategory \n",
      "ORA results saved to: /home/demo-seekgene-com/workspace/project/demo-seekgene-com/DiffEnrich/result/ORA/ora_enrichment_results.csv \n",
      "\n",
      "ORA plots saved (split by Up/Down).\n"
     ]
    }
   ],
   "source": [
    "if (params$enrichment_analysis_type == \"ORA\") {\n",
    "  cat(\"Performing Over-Representation Analysis (ORA)...\\n\")\n",
    "  \n",
    "  # 对可用标记物执行 ORA 分析\n",
    "  if (!is.null(clusters_markers)) {\n",
    "    ora_results_clusters <- perform_ora_analysis(markers_data = clusters_markers, db_setup, params)\n",
    "    ora_results_group <- NULL\n",
    "    \n",
    "    # 创建 ORA 可视化\n",
    "    create_ora_plots(ora_results_clusters, params)\n",
    "    \n",
    "  } else if (!is.null(group_markers)) {\n",
    "    ora_results_group <- perform_ora_analysis(markers_data = group_markers, db_setup, params)\n",
    "    ora_results_clusters <- NULL\n",
    "    \n",
    "    # 创建 ORA 可视化\n",
    "    create_ora_plots(ora_results_group, params)\n",
    "    \n",
    "  } else {\n",
    "    ora_results_clusters <- NULL\n",
    "    ora_results_group <- NULL\n",
    "    cat(\"No differential expression results available for ORA analysis.\\n\")\n",
    "  }\n",
    "  \n",
    "  # 由于未执行 GSEA，将 GSEA 结果设置为 NULL\n",
    "  gsea_results_clusters <- NULL\n",
    "  gsea_results_group <- NULL\n",
    "  \n",
    "} else if (params$enrichment_analysis_type == \"GSEA\") {\n",
    "  cat(\"Performing Gene Set Enrichment Analysis (GSEA)...\\n\")\n",
    "  \n",
    "  # 对可用标记物执行 GSEA 分析\n",
    "  if (!is.null(clusters_markers_all)) {\n",
    "    gsea_results_clusters <- perform_gsea_analysis(clusters_markers_all, db_setup, params)\n",
    "    saveRDS(gsea_results_clusters, file.path(params$output_path, \"GSEA\", \"gsea_results.rds\"))\n",
    "    gsea_results_group <- NULL\n",
    "    \n",
    "    # 创建 GSEA 可视化\n",
    "    create_gsea_plots(gsea_results_clusters, params)\n",
    "    \n",
    "  } else if (!is.null(group_markers_all)) {\n",
    "    gsea_results_group <- perform_gsea_analysis(group_markers_all, db_setup, params)\n",
    "    saveRDS(gsea_results_group, file.path(params$output_path, \"GSEA\", \"gsea_results.rds\"))\n",
    "    gsea_results_clusters <- NULL\n",
    "    \n",
    "    # 创建 GSEA 可视化\n",
    "    create_gsea_plots(gsea_results_group, params)\n",
    "    \n",
    "  } else {\n",
    "    gsea_results_clusters <- NULL\n",
    "    gsea_results_group <- NULL\n",
    "    cat(\"No differential expression results available for GSEA analysis.\\n\")\n",
    "  }\n",
    "  \n",
    "  # 由于未执行 ORA，将 ORA 结果设置为 NULL\n",
    "  ora_results_clusters <- NULL\n",
    "  ora_results_group <- NULL\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 可视化自定义差异表达基因和富集通路\n",
    "\n",
    "基于自定义差异表达基因和富集通路的可视化。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "配置自定义基因和通路可视化参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:47:29.788682Z",
     "iopub.status.busy": "2025-09-16T09:47:29.787077Z",
     "iopub.status.idle": "2025-09-16T09:47:29.807349Z",
     "shell.execute_reply": "2025-09-16T09:47:29.805338Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "custom_plot <- FALSE # 启用/禁用自定义基因/通路可视化\n",
    "custom_genes <- c(\"C1QA\",\"C1QB\",\"C1QC\", \"CD3D\", \"CD3E\") # 要在自定义图中突出显示的感兴趣基因（必须存在于 Seurat 对象中）\n",
    "terms <- c(\"regulation of T cell activation\", \n",
    "        \"positive regulation of lymphocyte activation\", \n",
    "        \"positive regulation of leukocyte activation\",\n",
    "        \"leukocyte cell-cell adhesion\",\n",
    "        \"positive regulation of cell activation\") # 要在 ORA/GSEA 结果中绘制的通路术语\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义自定义基因和通路可视化函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:47:29.812538Z",
     "iopub.status.busy": "2025-09-16T09:47:29.810912Z",
     "iopub.status.idle": "2025-09-16T09:47:29.847997Z",
     "shell.execute_reply": "2025-09-16T09:47:29.845948Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 函数 1：簇 DE 的自定义基因图\n",
    "plot_custom_genes_clusters <- function(seurat_obj, custom_genes, params, clusters_colors) {\n",
    "  if (length(custom_genes) == 0) {\n",
    "    cat(\"No custom genes provided.\\n\"); return(invisible(NULL))\n",
    "  }\n",
    "  out_dir <- file.path(params$output_path, \"custom_cluster_DE\")\n",
    "  if (!dir.exists(out_dir)) dir.create(out_dir, recursive = TRUE)\n",
    "\n",
    "  # 仅保留对象中存在的基因\n",
    "  genes_in_obj <- intersect(unique(custom_genes), rownames(seurat_obj))\n",
    "  if (length(genes_in_obj) == 0) {\n",
    "    cat(\"Custom genes not found in object; skip all plots.\\n\"); return(invisible(NULL))\n",
    "  }\n",
    "\n",
    "  # 1. 点图\n",
    "  cluster_col <- params$cluster_by\n",
    "  if (!(cluster_col %in% colnames(seurat_obj@meta.data))) {\n",
    "    cluster_col <- \"ident\"\n",
    "    seurat_obj@meta.data$ident <- as.character(Idents(seurat_obj))\n",
    "    cat(\"Using Idents as cluster column for custom genes plots\\n\")\n",
    "  } else {\n",
    "    cat(\"Using cluster column for custom genes plots:\", cluster_col, \"\\n\")\n",
    "  }\n",
    "\n",
    "  cat(\"Creating DotPlot for custom genes...\\n\")\n",
    "  p_dot <- DotPlot(seurat_obj, features = genes_in_obj, group.by = cluster_col) +\n",
    "    scale_color_gradientn(colors = params$continuous_colors) +\n",
    "    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n",
    "    labs(title = \"Custom Genes - DotPlot (All Clusters)\",\n",
    "         x = \"Genes\", y = \"Clusters\")\n",
    "  ggsave(file.path(out_dir, \"custom_genes_clusters_dotplot.pdf\"),\n",
    "         p_dot, width = 12, height = 6, dpi = params$plot_dpi)\n",
    "\n",
    "  # 2. 小提琴图\n",
    "  cat(\"Creating Violin for custom genes...\\n\")\n",
    "  mat <- FetchData(seurat_obj, vars = genes_in_obj, slot = \"data\")\n",
    "  var_ok <- sapply(as.data.frame(mat), function(x) stats::var(x, na.rm = TRUE) > 0)\n",
    "  genes_var <- genes_in_obj[var_ok]\n",
    "  if (length(genes_var) == 0) {\n",
    "    cat(\"All requested genes have zero variance; skip Violin.\\n\")\n",
    "  } else {\n",
    "    if (length(genes_var) >= 2) {\n",
    "      p_vln <- VlnPlot(seurat_obj, features = genes_var,\n",
    "                       stack = TRUE, flip = TRUE, group.by = cluster_col, cols = clusters_colors, fill.by = \"ident\") +\n",
    "        labs(title = \"Custom Genes - Stacked Violin (All Clusters)\",\n",
    "             x = cluster_col, y = \"Expression\")\n",
    "    } else {\n",
    "      p_vln <- VlnPlot(seurat_obj, features = genes_var,\n",
    "                       stack = FALSE, flip = FALSE, group.by = cluster_col, cols = clusters_colors, fill.by = \"ident\") +\n",
    "        labs(title = \"Custom Gene - Violin (All Clusters)\",\n",
    "             x = cluster_col, y = \"Expression\")\n",
    "    }\n",
    "    ggsave(file.path(out_dir, \"custom_genes_clusters_violin.pdf\"),\n",
    "           p_vln, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "  }\n",
    "\n",
    "  # 3. 热图 \n",
    "  cat(\"Creating Heatmap for custom genes...\\n\")\n",
    "  genes_in_scale <- intersect(genes_in_obj, rownames(GetAssayData(seurat_obj, slot = \"scale.data\")))\n",
    "  if (length(genes_in_scale) < 2) {\n",
    "    cat(\"No or insufficient custom genes in scale.data; skip Heatmap.\\n\")\n",
    "  } else {\n",
    "    heat_obj <- seurat_obj\n",
    "    max_cells <- 1200\n",
    "    if (ncol(heat_obj) > max_cells) {\n",
    "      set.seed(42)\n",
    "      heat_obj <- subset(heat_obj, cells = sample(colnames(heat_obj), max_cells))\n",
    "    }\n",
    "    p_heat <- DoHeatmap(heat_obj, features = genes_in_scale, slot = \"scale.data\",\n",
    "                        draw.lines = TRUE, group.colors = clusters_colors) +\n",
    "      scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "      labs(title = \"Custom Genes - Heatmap (All Clusters)\")\n",
    "    ggsave(file.path(out_dir, \"custom_genes_clusters_heatmap.pdf\"),\n",
    "           p_heat, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "  }\n",
    "  invisible(list(dotplot = p_dot, violin = if (exists(\"p_vln\")) p_vln else NULL))\n",
    "}\n",
    "\n",
    "# 函数 2：分组比较 DE 的自定义基因图\n",
    "plot_custom_genes_group_comparison <- function(seurat_obj, group_markers, custom_genes, params, clusters_colors) {\n",
    "  if (is.null(group_markers) || nrow(group_markers) == 0) {\n",
    "    cat(\"No group comparison markers; skip custom plots.\\n\")\n",
    "    return()\n",
    "  }\n",
    "  if (!requireNamespace(\"ggrepel\", quietly = TRUE)) {\n",
    "    cat(\"Package 'ggrepel' not available; labels will be limited.\\n\")\n",
    "  }\n",
    "\n",
    "  # 确定分组列和簇列\n",
    "  group_col <- params$group_by\n",
    "  \n",
    "  # 通过 params$cluster_by 定位簇列；如果缺失则回退到 Idents\n",
    "  cluster_col <- params$cluster_by\n",
    "  if (!(cluster_col %in% colnames(seurat_obj@meta.data))) {\n",
    "    cluster_col <- \"ident\"\n",
    "    seurat_obj@meta.data$ident <- as.character(Idents(seurat_obj))\n",
    "    cat(\"Using Idents as cluster column for custom genes plots\\n\")\n",
    "  } else {\n",
    "    cat(\"Using cluster column for custom genes plots:\", cluster_col, \"\\n\")\n",
    "  }\n",
    "\n",
    "  # 1. 火山图\n",
    "  for (cl in unique(group_markers$cluster)) {\n",
    "    df <- group_markers[group_markers$cluster == cl, , drop = FALSE]\n",
    "    if (nrow(df) == 0) next\n",
    "    df$highlight <- ifelse(df$gene %in% custom_genes, \"Custom\", \"Other\")\n",
    "\n",
    "    p_vol <- ggplot(df, aes(x = avg_log2FC, y = -log10(p_val_adj))) +\n",
    "      geom_point(aes(color = highlight), alpha = 0.7, size = 1.5) +\n",
    "      scale_color_manual(values = c(\"Custom\" = \"red\", \"Other\" = \"grey70\")) +\n",
    "      geom_hline(yintercept = -log10(0.05), linetype = \"dashed\", color = \"red\") +\n",
    "      geom_vline(xintercept = c(-0.5, 0.5), linetype = \"dashed\", color = \"red\") +\n",
    "      labs(title = paste0(\"Volcano (highlight custom) - Cluster \", cl),\n",
    "           x = \"Average Log2 Fold Change\", y = \"-log10(Adjusted P-value)\", color = \"\") +\n",
    "      theme_bw()\n",
    "\n",
    "    if (requireNamespace(\"ggrepel\", quietly = TRUE)) {\n",
    "      lab_df <- df[df$highlight == \"Custom\" & df$p_val_adj < 0.05 & abs(df$avg_log2FC) > 0.5, ]\n",
    "      if (nrow(lab_df) > 0) {\n",
    "        p_vol <- p_vol + ggrepel::geom_text_repel(data = lab_df,\n",
    "                  aes(label = gene), size = 3, max.overlaps = 20)\n",
    "      }\n",
    "    }\n",
    "    if (!dir.exists(file.path(params$output_path, \"custom_group_DE\"))) dir.create(file.path(params$output_path, \"custom_group_DE\"), recursive = TRUE)\n",
    "    ggsave(file.path(params$output_path, \"custom_group_DE\", paste0(\"custom_genes_cluster_\", cl, \"_volcano.pdf\")),\n",
    "           p_vol, width = 8, height = 6, dpi = params$plot_dpi)\n",
    "  }\n",
    "\n",
    "  # 2. 为每个簇创建分组自定义基因可视化\n",
    "  clusters <- unique(group_markers$cluster)\n",
    "  genes_available <- intersect(custom_genes, rownames(seurat_obj))\n",
    "  \n",
    "  if (length(genes_available) == 0) {\n",
    "    cat(\"No custom genes found in object; skip cluster-specific plots.\\n\")\n",
    "    return()\n",
    "  }\n",
    "\n",
    "  for (cl in clusters) {\n",
    "    cat(\"Creating custom gene plots for cluster\", cl, \"...\\n\")\n",
    "    \n",
    "    # 子集化为特定簇\n",
    "    cluster_subset <- subset(seurat_obj, \n",
    "                            cells = rownames(seurat_obj@meta.data)[\n",
    "                              as.character(seurat_obj@meta.data[[cluster_col]]) == cl\n",
    "                            ])\n",
    "    \n",
    "    if (ncol(cluster_subset) == 0) {\n",
    "      cat(\"No cells found in cluster\", cl, \"; skip.\\n\")\n",
    "      next\n",
    "    }\n",
    "    \n",
    "    # 检查该簇中的可用分组\n",
    "    cluster_groups <- unique(cluster_subset@meta.data[[group_col]])\n",
    "    if (length(cluster_groups) < 2) {\n",
    "      cat(\"Cluster\", cl, \"has insufficient groups; skip.\\n\")\n",
    "      next\n",
    "    }\n",
    "    \n",
    "    # 2.1 点图\n",
    "    p_dot <- DotPlot(cluster_subset, features = genes_available, \n",
    "                     group.by = group_col) +\n",
    "      scale_color_gradientn(colors = params$continuous_colors) +\n",
    "      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +\n",
    "      labs(title = paste(\"Custom Genes - Cluster\", cl),\n",
    "           subtitle = paste(\"Groups:\", paste(cluster_groups, collapse = \" vs \")),\n",
    "           x = \"Genes\", y = \"Groups\") +\n",
    "      theme_bw()\n",
    "    \n",
    "    ggsave(file.path(params$output_path, \"custom_group_DE\", paste0(\"custom_genes_cluster_\", cl, \"_dotplot.pdf\")),\n",
    "           p_dot, width = 12, height = 6, dpi = params$plot_dpi)\n",
    "    \n",
    "    # 2.2 小提琴图\n",
    "    mat <- FetchData(cluster_subset, vars = genes_available, slot = \"data\")\n",
    "    var_ok <- sapply(as.data.frame(mat), function(x) stats::var(x, na.rm = TRUE) > 0)\n",
    "    genes_var <- genes_available[var_ok]\n",
    "    \n",
    "    if (length(genes_var) == 0) {\n",
    "      cat(\"All custom genes have zero variance in cluster\", cl, \"; skip violin.\\n\")\n",
    "    } else {\n",
    "      if (length(cluster_groups) >= 2) {\n",
    "        if (length(genes_var) >= 2) {\n",
    "          p_vln <- VlnPlot(cluster_subset, features = genes_var, \n",
    "                           stack = TRUE, flip = TRUE, \n",
    "                           fill.by = \"ident\",\n",
    "                           group.by = group_col,\n",
    "                           cols = brewer.pal(8, params$discrete_colors)[1:2]) +\n",
    "            labs(title = paste(\"Custom Genes - Cluster\", cl),\n",
    "                 subtitle = paste(\"Groups:\", paste(cluster_groups, collapse = \" vs \")))\n",
    "        } else {\n",
    "          p_vln <- VlnPlot(cluster_subset, features = genes_var, \n",
    "                           stack = FALSE, flip = FALSE, \n",
    "                           fill.by = \"ident\",\n",
    "                           group.by = group_col,\n",
    "                           cols = brewer.pal(8, params$discrete_colors)[1:2]) +\n",
    "            labs(title = paste(\"Custom Gene - Cluster\", cl),\n",
    "                 subtitle = paste(\"Groups:\", paste(cluster_groups, collapse = \" vs \")))\n",
    "        }\n",
    "        \n",
    "        ggsave(file.path(params$output_path, \"custom_group_DE\", paste0(\"custom_genes_cluster_\", cl, \"_violin.pdf\")),\n",
    "               p_vln, width = 6, height = 10, dpi = params$plot_dpi)\n",
    "      } else {\n",
    "        cat(\"Cluster\", cl, \"has only one group; skip violin.\\n\")\n",
    "      }\n",
    "    }\n",
    "    \n",
    "    # 2.3 热图\n",
    "    genes_heat <- intersect(genes_available, rownames(GetAssayData(seurat_obj, slot = \"scale.data\")))\n",
    "    if (length(genes_heat) > 0) {\n",
    "      if (length(genes_heat) > 20) {\n",
    "        genes_heat <- genes_heat[1:20]\n",
    "      }\n",
    "      \n",
    "      if (length(cluster_groups) >= 2) {\n",
    "        p_heat <- DoHeatmap(cluster_subset, features = genes_heat, \n",
    "                            slot = \"scale.data\", \n",
    "                            group.colors = brewer.pal(8, params$discrete_colors),\n",
    "                            group.by = group_col) +\n",
    "          scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "          labs(title = paste(\"Custom Genes - Cluster\", cl),\n",
    "               subtitle = paste(\"Groups:\", paste(cluster_groups, collapse = \" vs \")))\n",
    "      } else {\n",
    "        p_heat <- DoHeatmap(cluster_subset, features = genes_heat, \n",
    "                            slot = \"scale.data\") +\n",
    "          scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "          labs(title = paste(\"Custom Genes - Cluster\", cl),\n",
    "               subtitle = paste(\"Groups:\", paste(cluster_groups, collapse = \" vs \")))\n",
    "      }\n",
    "      \n",
    "      ggsave(file.path(params$output_path, \"custom_group_DE\", paste0(\"custom_genes_cluster_\", cl, \"_heatmap.pdf\")),\n",
    "             p_heat, width = 12, height = 8, dpi = params$plot_dpi)\n",
    "    } else {\n",
    "      cat(\"No custom genes found in scale.data for cluster\", cl, \"; skip heatmap.\\n\")\n",
    "    }\n",
    "  }\n",
    "}\n",
    "\n",
    "# 函数 3：自定义 ORA 图\n",
    "plot_custom_from_ora_csv <- function(\n",
    "  csv_path,\n",
    "  terms,\n",
    "  output_dir,\n",
    "  plot_dpi = 300\n",
    ") {\n",
    "  if (!file.exists(csv_path)) {\n",
    "    cat(\"ORA CSV not found:\", csv_path, \"\\n\"); return(invisible(NULL))\n",
    "  }\n",
    "  if (length(terms) == 0) {\n",
    "    cat(\"No terms provided. Nothing to plot.\\n\"); return(invisible(NULL))\n",
    "  }\n",
    "\n",
    "  df <- suppressMessages(readr::read_csv(csv_path, show_col_types = FALSE))\n",
    "\n",
    "  need_cols <- c(\"Description\", \"p.adjust\", \"Count\", \"cluster\", \"database\")\n",
    "  miss <- setdiff(need_cols, colnames(df))\n",
    "  if (length(miss) > 0) {\n",
    "    cat(\"CSV missing columns:\", paste(miss, collapse = \", \"), \"\\n\")\n",
    "    return(invisible(NULL))\n",
    "  }\n",
    "  if (!\"direction\" %in% colnames(df)) df$direction <- \"All\"\n",
    "\n",
    "  # 术语\n",
    "  sel <- tolower(df$Description) %in% tolower(terms)\n",
    "  sub <- df[sel, , drop = FALSE]\n",
    "  if (nrow(sub) == 0) {\n",
    "    cat(\"No matched pathways in CSV (case-insensitive).\\n\"); return(invisible(NULL))\n",
    "  }\n",
    "\n",
    "  out_dir_base <- file.path(output_dir, \"custom_ORA\")\n",
    "  if (!dir.exists(out_dir_base)) dir.create(out_dir_base, recursive = TRUE)\n",
    "\n",
    "  # 生成和 -log10(padj)\n",
    "  if (!\"generation\" %in% colnames(sub)) {\n",
    "    if (\"GeneRatio\" %in% colnames(sub)) {\n",
    "      gr <- as.character(sub$GeneRatio)\n",
    "      sub$generation <- suppressWarnings(vapply(gr, function(x) {\n",
    "        if (is.na(x) || x == \"\") return(NA_real_)\n",
    "        if (grepl(\"/\", x, fixed = TRUE)) {\n",
    "          p <- strsplit(x, \"/\", fixed = TRUE)[[1]]\n",
    "          as.numeric(p[1]) / as.numeric(p[2])\n",
    "        } else as.numeric(x)\n",
    "      }, numeric(1)))\n",
    "    } else if (\"Count\" %in% colnames(sub)) {\n",
    "      sub$generation <- suppressWarnings(as.numeric(sub$Count))\n",
    "    } else {\n",
    "      sub$generation <- NA_real_\n",
    "    }\n",
    "  }\n",
    "  sub$neglogp <- -log10(sub$p.adjust)\n",
    "\n",
    "  # GO\n",
    "  if (!\"ONTOLOGY\" %in% colnames(sub)) sub$ONTOLOGY <- NA_character_\n",
    "\n",
    "  # 绘图\n",
    "  for (db in sort(unique(sub$database))) {\n",
    "    for (dirc in sort(unique(sub$direction))) {\n",
    "      sub_db_dir <- file.path(out_dir_base, db, dirc)\n",
    "      dir_dot <- file.path(sub_db_dir, \"dotplot\")\n",
    "      dir_bar <- file.path(sub_db_dir, \"bar\")\n",
    "      for (d in c(dir_dot, dir_bar)) if (!dir.exists(d)) dir.create(d, recursive = TRUE)\n",
    "\n",
    "      sub_db_dirc <- sub[sub$database == db & sub$direction == dirc, , drop = FALSE]\n",
    "      if (nrow(sub_db_dirc) == 0) next\n",
    "\n",
    "      for (cl in sort(unique(sub_db_dirc$cluster))) {\n",
    "        sub_cl <- sub_db_dirc[sub_db_dirc$cluster == cl, , drop = FALSE]\n",
    "        if (nrow(sub_cl) == 0) next\n",
    "\n",
    "        # sort\n",
    "        sub_cl <- sub_cl[order(sub_cl$generation, sub_cl$p.adjust), , drop = FALSE]\n",
    "        sub_cl$Description <- forcats::fct_reorder(sub_cl$Description, sub_cl$generation, .desc = FALSE)\n",
    "\n",
    "        # GO\n",
    "        sub_cl$go_type <- \"Other\"\n",
    "        if (\"ONTOLOGY\" %in% colnames(sub_cl)) {\n",
    "          sub_cl$go_type <- toupper(ifelse(is.na(sub_cl$ONTOLOGY), \"Other\", as.character(sub_cl$ONTOLOGY)))\n",
    "          sub_cl$go_type[!sub_cl$go_type %in% c(\"BP\",\"CC\",\"MF\")] <- \"Other\"\n",
    "        }\n",
    "\n",
    "        # 1. 点图\n",
    "        p_dot <- ggplot(sub_cl, aes(x = generation, y = Description, size = Count, color = neglogp)) +\n",
    "          geom_point(aes(shape = go_type)) +\n",
    "          scale_shape_manual(values = c(BP = 16, CC = 17, MF = 15, Other = 19), na.translate = FALSE) +\n",
    "          scale_color_gradientn(colors = params$continuous_colors) +\n",
    "          labs(title = paste0(\"ORA dotplot (Selected Terms) - \", dirc, \" - Cluster \", cl, \" (\", db, \")\"),\n",
    "               x = \"generation\", y = \"Enriched Terms\",\n",
    "               size = \"Gene Count\", color = \"-log10(Adj.P)\", shape = \"GO Type\") +\n",
    "          theme_minimal() + theme(axis.text.y = element_text(size = 8))\n",
    "        ggsave(file.path(dir_dot, paste0(\"cluster_\", cl, \"_\", dirc, \".pdf\")),\n",
    "               p_dot, width = 12, height = 8, dpi = plot_dpi)\n",
    "\n",
    "        # 2. 条形图\n",
    "        p_bar <- ggplot(sub_cl, aes(x = generation,\n",
    "                                    y = forcats::fct_reorder(Description, generation, .desc = FALSE),\n",
    "                                    fill = neglogp)) +\n",
    "          geom_col() +\n",
    "          scale_fill_gradientn(colors = params$continuous_colors) +\n",
    "          labs(title = paste0(\"ORA Bar (Selected Terms) - \", dirc, \" - Cluster \", cl, \" (\", db, \")\"),\n",
    "               x = \"generation\", y = \"Enriched Terms\", fill = \"-log10(Adj.P)\") +\n",
    "          theme_minimal() + theme(axis.text.y = element_text(size = 8))\n",
    "        ggsave(file.path(dir_bar, paste0(\"cluster_\", cl, \"_\", dirc, \".pdf\")),\n",
    "               p_bar, width = 12, height = 8, dpi = plot_dpi)\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "\n",
    "  cat(\"Custom ORA plots saved to:\", out_dir_base, \"\\n\")\n",
    "  invisible(NULL)\n",
    "}\n",
    "\n",
    "# 函数 4：自定义 GSEA 图\n",
    "plot_custom_from_gsea_csv <- function(csv_path,\n",
    "                                      terms,      \n",
    "                                      output_dir,         \n",
    "                                      plot_dpi = 300,\n",
    "                                      discrete_palette = \"Paired\",\n",
    "                                      gsea_rds_path\n",
    ") {\n",
    "  if (!file.exists(csv_path)) stop(\"GSEA CSV not found: \", csv_path)\n",
    "  if (!file.exists(gsea_rds_path)) stop(\"GSEA RDS not found: \", gsea_rds_path)\n",
    "\n",
    "  # 读取 GSEA 对象\n",
    "  gsea_res <- readRDS(gsea_rds_path)\n",
    "  gsea_objects <- gsea_res$gsea_objects\n",
    "  if (is.null(gsea_objects) || !length(gsea_objects)) {\n",
    "    stop(\"gsea_objects not found inside RDS. Ensure you saved the full result of perform_gsea_analysis().\")\n",
    "  }\n",
    "\n",
    "  df <- suppressMessages(readr::read_csv(csv_path, show_col_types = FALSE))\n",
    "  need_cols <- c(\"ID\",\"Description\",\"setSize\",\"NES\",\"p.adjust\",\"cluster\",\"database\")\n",
    "  miss <- setdiff(need_cols, colnames(df))\n",
    "  if (length(miss) > 0) stop(\"CSV missing required columns: \", paste(miss, collapse=\", \"))\n",
    "\n",
    "  if (!dir.exists(output_dir)) dir.create(output_dir, recursive = TRUE)\n",
    "\n",
    "  if (length(terms) == 0) stop(\"No terms provided.\")\n",
    "  desc_norm  <- tolower(df$Description)\n",
    "  terms_norm <- tolower(terms)\n",
    "  sub <- df[desc_norm %in% terms_norm, , drop = FALSE]\n",
    "  if (nrow(sub) == 0) stop(\"No matched pathways in CSV for given terms.\")\n",
    "\n",
    "  # GO 类型\n",
    "  if (\"ONTOLOGY\" %in% colnames(sub)) {\n",
    "    sub$go_type <- toupper(as.character(sub$ONTOLOGY))\n",
    "    sub$go_type[!sub$go_type %in% c(\"BP\",\"CC\",\"MF\")] <- \"Other\"\n",
    "  } else {\n",
    "    sub$go_type <- \"Other\"\n",
    "  }\n",
    "  sub$go_type <- factor(sub$go_type, levels = c(\"BP\",\"CC\",\"MF\",\"Other\"))\n",
    "\n",
    "  base_dir <- file.path(output_dir, \"custom_GSEA\")\n",
    "  if (!dir.exists(base_dir)) dir.create(base_dir, recursive = TRUE)\n",
    "\n",
    "  for (db in unique(sub$database)) {\n",
    "    db_dotplot_dir <- file.path(base_dir, db, \"dotplot\")\n",
    "    db_bar_dir    <- file.path(base_dir, db, \"bar\")\n",
    "    db_curve_dir  <- file.path(base_dir, db, \"curve\")\n",
    "    for (d in c(db_dotplot_dir, db_bar_dir, db_curve_dir)) if (!dir.exists(d)) dir.create(d, recursive = TRUE)\n",
    "\n",
    "    for (cl in unique(sub$cluster)) {\n",
    "      cl_df <- sub[sub$cluster == cl & sub$database == db, , drop = FALSE]\n",
    "      if (nrow(cl_df) == 0) next\n",
    "\n",
    "      # 1. barplot\n",
    "      p_bar <- ggplot(cl_df,\n",
    "                      aes(x = reorder(Description, NES),\n",
    "                          y = NES,\n",
    "                          fill = -log10(p.adjust))) +\n",
    "        geom_bar(stat = \"identity\") +\n",
    "        coord_flip() +\n",
    "        scale_fill_gradientn(colors = params$continuous_colors, name = \"-log10(Adj.P)\") +\n",
    "        labs(title = paste0(\"GSEA Bar (Selected Terms) - Cluster \", cl, \" (\", db, \")\"),\n",
    "             x = \"Enriched Terms\", y = \"NES\") +\n",
    "        theme_minimal() + theme(axis.text.y = element_text(size = 8))\n",
    "      ggsave(file.path(db_bar_dir, paste0(\"cluster_\", cl, \"_custom_bar.pdf\")),\n",
    "             p_bar, width = 12, height = 8, dpi = plot_dpi)\n",
    "\n",
    "      # 2. dotplot\n",
    "      p_dot <- ggplot(cl_df,\n",
    "                      aes(x = NES, y = reorder(Description, NES),\n",
    "                          size = setSize, color = -log10(p.adjust),\n",
    "                          shape = go_type)) +\n",
    "        scale_color_gradientn(colors = params$continuous_colors, name = \"-log10(Adj.P)\") +\n",
    "        scale_size_continuous(name = \"Set Size\", range = c(2, 8)) +\n",
    "        scale_shape_manual(values = c(\"BP\"=18, \"CC\"=17, \"MF\"=15, \"Other\"=16), name = \"ont\") +\n",
    "        geom_point(alpha = 0.8) +\n",
    "        labs(title = paste0(\"GSEA DotPlot (Selected Terms) - Cluster \", cl, \" (\", db, \")\"),\n",
    "             x = \"NES\", y = \"Enriched Terms\") +\n",
    "        theme_minimal() + theme(axis.text.y = element_text(size = 8)) +\n",
    "        geom_vline(xintercept = 0, linetype = \"dashed\", color = \"grey50\")\n",
    "      ggsave(file.path(db_dotplot_dir, paste0(\"cluster_\", cl, \"_custom_dotplot.pdf\")),\n",
    "             p_dot, width = 12, height = 8, dpi = plot_dpi)\n",
    "\n",
    "      # 3. curve\n",
    "      keys <- names(gsea_objects)\n",
    "      keys <- keys[grepl(paste0(\"^\", db, \"_\"), keys)]\n",
    "      keys <- keys[grepl(paste0(\"_\", cl, \"$\"), keys)]\n",
    "      if (length(keys) == 0) {\n",
    "        warning(\"No gseaResult object found for \", db, \" cluster=\", cl, \". Skip curves.\"); next\n",
    "      }\n",
    "      gobj <- gsea_objects[[keys[1]]]\n",
    "      if (!(inherits(gobj, \"gseaResult\") && nrow(gobj) > 0)) {\n",
    "        warning(\"gsea_object is not a valid gseaResult or empty for \", db, \" cluster=\", cl); next\n",
    "      }\n",
    "\n",
    "      for (i in seq_len(nrow(cl_df))) {\n",
    "        term_id   <- cl_df$ID[i]\n",
    "        term_desc <- cl_df$Description[i]\n",
    "        clean_nm  <- gsub(\"[^A-Za-z0-9]\", \"_\", term_desc)\n",
    "        idx <- which(gobj@result$ID == term_id)\n",
    "        if (length(idx) == 0) next\n",
    "        gp <- gseaplot2(gobj, geneSetID = idx,\n",
    "                        title = paste0(\"GSEA - \", term_desc, \" (Cluster \", cl, \")\"))\n",
    "        ggsave(file.path(db_curve_dir, paste0(\"cluster_\", cl, \"_\", clean_nm, \".pdf\")),\n",
    "               gp, width = 10, height = 8, dpi = plot_dpi)\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "  cat(\"Custom GSEA plots (bar, dot, curves) saved to:\", base_dir, \"\\n\")\n",
    "  invisible(TRUE)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行自定义基因和通路可视化 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2025-09-16T09:47:29.852665Z",
     "iopub.status.busy": "2025-09-16T09:47:29.851043Z",
     "iopub.status.idle": "2025-09-16T09:47:29.869582Z",
     "shell.execute_reply": "2025-09-16T09:47:29.867563Z"
    },
    "vscode": {
     "languageId": "r"
    }
   },
   "outputs": [],
   "source": [
    "# 自定义绘图\n",
    "if (custom_plot) {\n",
    "  if (params$differential_analysis_type == \"clusters\") {\n",
    "      # 跨簇可视化自定义基因\n",
    "      cat(\"Performing ALL CLUSTERS differential expression analysis...\\n\")\n",
    "      plot_custom_genes_clusters(seurat_obj, custom_genes, params, clusters_colors)\n",
    "  } else if (params$differential_analysis_type == \"group_comparison\") {\n",
    "      # 在簇内按分组拆分可视化自定义基因\n",
    "      cat(\"Performing GROUP COMPARISON differential expression analysis...\\n\")\n",
    "      plot_custom_genes_group_comparison(seurat_obj, group_markers, custom_genes, params, clusters_colors)\n",
    "  }\n",
    "\n",
    "  if (params$enrichment_analysis_type == \"ORA\") {\n",
    "    # 绘制来自 ORA CSV 的选定通路\n",
    "    file_path <- file.path(params$output_path, \"ORA\", \"ora_enrichment_results.csv\")\n",
    "    plot_custom_from_ora_csv(\n",
    "      csv_path       = file_path,\n",
    "      terms          = terms,\n",
    "      output_dir     = params$output_path,\n",
    "      plot_dpi       = params$plot_dpi\n",
    "    )\n",
    "  } else if (params$enrichment_analysis_type == \"GSEA\") {\n",
    "    # 绘制来自 GSEA CSV 的选定通路\n",
    "    file_path <- file.path(params$output_path, \"GSEA\", \"gsea_enrichment_results.csv\")\n",
    "    plot_custom_from_gsea_csv(\n",
    "      csv_path       = file_path,\n",
    "      terms          = terms,\n",
    "      output_dir     = params$output_path,\n",
    "      plot_dpi       = params$plot_dpi,\n",
    "      discrete_palette = params$discrete_colors,\n",
    "      gsea_rds_path = file.path(params$output_path, \"GSEA\", \"gsea_results.rds\")\n",
    "    )\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模版信息\n",
    "\n",
    "**模版发布日期**：2025年9月16日<br>\n",
    "**最后更新**：2025年9月16日  \n",
    "\n",
    "**联系方式**：有关本教程的问题、问题或建议，请通过 **“智能咨询”** 服务联系我们。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 参考资料\n",
    "[1] **Seurat Package**: https://github.com/satijalab/Seurat\n",
    "\n",
    "[2] **clusterProfiler**: https://github.com/YuLab-SMU/clusterProfiler\n",
    "\n",
    "[3] **Gene Ontology**: http://geneontology.org/\n",
    "\n",
    "[4] **KEGG**: https://www.genome.jp/kegg/\n",
    "\n",
    "[5] **MSigDB**: https://www.gsea-msigdb.org/\n",
    "\n",
    "[6] **Reactome**: https://reactome.org/ ### 本教程引用 Liu X., Wu C., Pan L.,  et al. (2025). SeekSoul Online: A user-friendly bioinformatics platform focused on single-cell multi-omics analysis. *The Innovation Life* **3**:100156. https://doi.org/10.59717/j.xinn-life.2025.100156\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "common_r",
   "language": "R",
   "name": "common_r"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}