Skip to content

单细胞空间转录组: 空间聚类分析(基于 Banksy)

作者: SeekGene
时长: 34 分钟
字数: 7.7k 字
更新: 2026-04-13
阅读: 0 次
空间转录组 分析指南 Nookbooks

模块简介

本模块基于 Banksy 算法,用于对空间转录组数据进行空间聚类区域分割 (Domain Segmentation)

Banksy 是一种利用空间邻域信息增强聚类效果的算法,它结合了基因表达特征和空间微环境特征,实现以下核心分析目标:

  • 空间区域识别:根据基因表达模式和空间邻域特征,将组织划分为具有生物学意义的空间区域(如肿瘤区、基质区、免疫浸润区)。
  • 去噪与增强:利用邻域信息平滑基因表达,减少技术噪声。
  • 可视化:直观展示不同空间区域的分布。

综合分析:从“细胞类型”到“生态位功能”

Banksy 通过基因表达矩阵和空间位置坐标信息,构建基因-细胞表达矩阵与邻域细胞-表达矩阵进行空间聚类,空间聚类结果(Cluster)代表着细胞处在相似的微环境。

通过 Banksy 空间聚类,我们可以探讨以下生物学问题:

  • 谁在哪里?(空间域 + 细胞类型) 哪些细胞类型富集在哪个特定的空间域中?比如,“增殖活跃的肿瘤细胞”是否特异性地聚集在“缺氧核心域”,而“耗竭状态的 T 细胞”是否被限制在“肿瘤浸润边缘域”?这种空间分布特征有助于揭示细胞微环境的异质性。

  • 它们在那儿做什么?(空间特异性表达 + 细胞状态) 在特定空间域中,细胞的基因表达模式发生了怎样的改变?该区域富集了哪些关键的功能通路?例如,位于“缺氧核心域”的肿瘤细胞亚群可能特异性地上调了糖酵解途径相关基因,而远离该区域的肿瘤细胞则不然。

  • 它们如何互动?(细胞间通讯 + 空间邻近性) 空间上相邻的细胞群之间是否存在活跃的信号交流?一个空间域(如成纤维细胞富集区)中表达的配体,是否与紧邻的另一个空间域(如免疫细胞浸润区)中表达的受体相匹配?通过结合空间位置信息,可以构建出更具生物学意义的细胞间通讯网络图。

  • 疾病演进与异质性?(空间演化 + 临床病理) 空间聚类结果是否与病理学家标注的组织学结构(如正常组织、癌前病变、浸润癌)相吻合?不同样本(如治疗前 vs 治疗后,或不同患者)之间空间生态位的组成和分布是否存在显著差异?这种差异可能为疾病的进展机制或治疗响应提供重要线索。

💡 Note
本流程输入为 Seurat 对象(.rds 文件),分析结果将保存为 SpatialExperiment 对象及相关图片/表格。

输入文件准备

文件结构示例

text
./
├── input.rds              # 必需:输入的 Seurat 对象(需包含 RNA assay 和 spatial 空间坐标信息)
└── meta.txt               # 可选:外部的 Metadata 文本文件

输入文件详细说明

  1. Seurat 对象 (input.rds)

    • 表达矩阵:需包含经过初步质控的细胞/空间点基因表达数据。
    • 空间坐标信息:必须包含空间物理坐标(通常存储在 reductions$spatial),Banksy 算法需要依赖此坐标来计算空间邻域特征。
    • 样本标识:对象内部的 metadata 中应包含明确区分不同切片或样本的列(如 Sample),这在多样本整合模式下尤为关键。
  2. 外部 Metadata 文件 (meta.txt)

    • 若 Seurat 对象中缺失部分重要的元数据,或需要更新样本分组、临床信息等,可通过外部制表符分隔的文本文件(如 .txt.tsv)导入。
    • 文件应包含 barcodes(或 barcode)列作为行名,以便与 Seurat 对象中的细胞/空间点准确匹配。

数据加载与预处理

本步骤将 Seurat 对象转换为 Banksy 所需的 SpatialExperiment 对象并完成基础数据处理,包含以下环节:

  1. 加载 Seurat 对象:从 .rds 读入带有空间坐标信息的 Seurat 对象。
  2. 提取坐标并转换格式:提取空间坐标,转换为 SpatialExperiment 格式。
  3. 数据归一化:根据数据集大小选择全基因归一化或高变基因提取。
R
# --- 环境准备 ---
# 加载空间组学分析与 Banksy 相关的核心 R 包。
library(Seurat)
library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)
library(scater)
library(cowplot)
library(ggplot2)
library(tidyverse)
library(futile.logger)
library(future)
library(parallel)
library(doParallel)
library(dplyr)
library(tibble)
library(data.table)
library(harmony)
library(base64enc, quietly = TRUE)
library(reshape2)
library(scales)
library(patchwork)
output
Attaching SeuratObject

Warning message:
“package ‘SummarizedExperiment’ was built under R version 4.3.2”
Loading required package: MatrixGenerics

Warning message:
“package ‘MatrixGenerics’ was built under R version 4.3.3”
Loading required package: matrixStats

Warning message:
“package ‘matrixStats’ was built under R version 4.3.3”

Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:matrixStats’:

colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars


Loading required package: GenomicRanges

Warning message:
“package ‘GenomicRanges’ was built under R version 4.3.3”
Loading required package: stats4

Loading required package: BiocGenerics

Warning message:
“package ‘BiocGenerics’ was built under R version 4.3.2”

Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min


Loading required package: S4Vectors

Warning message:
“package ‘S4Vectors’ was built under R version 4.3.3”

Attaching package: ‘S4Vectors’


The following object is masked from ‘package:utils’:

findMatches


The following objects are masked from ‘package:base’:

expand.grid, I, unname


Loading required package: IRanges

Warning message:
“package ‘IRanges’ was built under R version 4.3.3”
Loading required package: GenomeInfoDb

Warning message:
“package ‘GenomeInfoDb’ was built under R version 4.3.2”
Loading required package: Biobase

Warning message:
“package ‘Biobase’ was built under R version 4.3.3”
Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: ‘Biobase’


The following object is masked from ‘package:MatrixGenerics’:

rowMedians


The following objects are masked from ‘package:matrixStats’:

anyMissing, rowMedians



Attaching package: ‘SummarizedExperiment’


The following object is masked from ‘package:SeuratObject’:

Assays


The following object is masked from ‘package:Seurat’:

Assays


Loading required package: SingleCellExperiment

Loading required package: ggplot2

Warning message:
“package ‘cowplot’ was built under R version 4.3.3”
Warning message:
“package ‘tibble’ was built under R version 4.3.3”
Warning message:
“package ‘tidyr’ was built under R version 4.3.3”
Warning message:
“package ‘dplyr’ was built under R version 4.3.3”
Warning message:
“package ‘stringr’ was built under R version 4.3.3”
Warning message:
“package ‘lubridate’ was built under R version 4.3.3”
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
dplyr 1.1.4
readr 2.1.5

forcats 1.0.0
stringr 1.5.1

lubridate 1.9.4
tibble 3.2.1

purrr 1.0.2
tidyr 1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
lubridate::%within%() masks IRanges::%within%()
dplyr::collapse() masks IRanges::collapse()
dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
dplyr::count() masks matrixStats::count()
dplyr::desc() masks IRanges::desc()
tidyr::expand() masks S4Vectors::expand()
dplyr::filter() masks stats::filter()
dplyr::first() masks S4Vectors::first()
dplyr::lag() masks stats::lag()
ggplot2::Position() masks BiocGenerics::Position(), base::Position()
purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
dplyr::rename() masks S4Vectors::rename()
lubridate::second() masks S4Vectors::second()
lubridate::second<-() masks S4Vectors::second<-()
dplyr::slice() masks IRanges::slice()
lubridate::stamp() masks cowplot::stamp()
ℹ Use the conflicted package () to force all conflicts to become errors
Warning message:
“package ‘futile.logger’ was built under R version 4.3.3”
Warning message:
“package ‘future’ was built under R version 4.3.3”
Warning message:
“package ‘doParallel’ was built under R version 4.3.3”
Loading required package: foreach

Warning message:
“package ‘foreach’ was built under R version 4.3.3”

Attaching package: ‘foreach’


The following objects are masked from ‘package:purrr’:

accumulate, when


Loading required package: iterators

Warning message:
“package ‘iterators’ was built under R version 4.3.3”
Warning message:
“package ‘data.table’ was built under R version 4.3.3”

Attaching package: ‘data.table’


The following objects are masked from ‘package:lubridate’:

hour, isoweek, mday, minute, month, quarter, second, wday, week,
yday, year


The following objects are masked from ‘package:dplyr’:

between, first, last


The following object is masked from ‘package:purrr’:

transpose


The following object is masked from ‘package:SummarizedExperiment’:

shift


The following object is masked from ‘package:GenomicRanges’:

shift


The following object is masked from ‘package:IRanges’:

shift


The following objects are masked from ‘package:S4Vectors’:

first, second


Loading required package: Rcpp

Warning message:
“package ‘Rcpp’ was built under R version 4.3.3”

Attaching package: ‘DT’


The following object is masked from ‘package:SeuratObject’:

JS


The following object is masked from ‘package:Seurat’:

JS


Warning message:
“package ‘reshape2’ was built under R version 4.3.3”

Attaching package: ‘reshape2’


The following objects are masked from ‘package:data.table’:

dcast, melt


The following object is masked from ‘package:tidyr’:

smiths



Attaching package: ‘scales’


The following object is masked from ‘package:purrr’:

discard


The following object is masked from ‘package:readr’:

col_factor
R
# --- 输入参数配置 ---

## rds_path:输入的 Seurat 对象路径
rds_path = "./25020230_nao_CS.rds"

## colName:Seurat 对象 metadata 中样本列名
colName = "Sample"

## sample_name:需要分析的样本名称。
##   - 单样本模式:填入单个样本名即可(例如:"sample1")。
##   - 多样本整合模式:填入多个样本名,用逗号分隔(例如:"sample1,sample2,sample3")。系统会自动触发多样本整合与批次校正。
sample_name = "25020230_nao_CS_expression"

## meta_path:外部 metadata 文件路径
meta_path = "./meta.tsv"

## npcs:PCA 降维时使用的的主成分数量
npcs = "30"

## lambda:空间权重参数,控制空间特征和表达特征的相对贡献,多个值用逗号分隔
lambda = "0.6,0.8"

## algo:聚类算法,可选 "leiden", "louvain", "kmeans", "mclust"
algo = "leiden"

## resolution:聚类分辨率,多个值用逗号分隔(适用于 leiden/louvain)
resolution = "0.4,0.8"

## kmeans.centers:kmeans 聚类数量,多个值用逗号分隔(适用于 kmeans)
kmeans.centers = "5,10,15,20"

## mclust.G:mclust 聚类数量,多个值用逗号分隔(适用于 mclust)
mclust.G = "5,10,15,20"
R
# 参数解析
# 解析并处理传入的各项分析参数,如空间权重 (`lambda`)、聚类分辨率 (`resolution`) 等。
if(class(lambda) == 'character'){
    lambda = as.numeric(strsplit(as.character(lambda),",")[[1]])
}

if(class(resolution) == 'character'){
    resolution = as.numeric(strsplit(as.character(resolution),",")[[1]])
}


if(class(kmeans.centers) == 'character'){
    kmeans.centers = as.numeric(strsplit(as.character(kmeans.centers),",")[[1]])
}


if(class(mclust.G) == 'character'){
    mclust.G = as.numeric(strsplit(as.character(mclust.G),",")[[1]])
}

npcs = as.numeric(npcs)

if (any(resolution == "")) {
  resolution = NULL
}else if (any(kmeans.centers == "")) {
  kmeans.centers = NULL
}else if (any(mclust.G == "")) {
  mclust.G = NULL
}
if(is.null(resolution)){
    rm(resolution)
}else if(is.null(kmeans.centers)){
    rm(kmeans.centers)
}else if(is.null(mclust.G)){
    rm(mclust.G)
}
sample_name = strsplit(sample_name,",")[[1]]

💡 Note
多样本坐标处理注意:在进行多样本整合时,由于各个切片原始的物理坐标往往会发生重叠。因此在提取坐标信息时,本代码会根据样本标识对不同切片的坐标点进行自动平移(加上特定的偏移量),从而将它们在同一张虚拟画布上平铺展开,以防计算空间邻域网络时不同切片的细胞发生错误关联。

R
# --- 数据加载与坐标处理 ---
# 读取 Seurat 对象及元数据,提取细胞空间坐标,并根据样本进行坐标的标准化和平移处理。
obj<-readRDS(rds_path)
meta = read.delim(meta_path)
obj@meta.data = meta
if(colnames(meta)[1]=="barcodes"){
    obj@meta.data = column_to_rownames(obj@meta.data,"barcodes")
} else {
    obj@meta.data = column_to_rownames(obj@meta.data,"barcode")
}

columns_to_keep <- colnames(obj@meta.data)[!grepl("clust_|HarBsy_", colnames(obj@meta.data))]
obj@meta.data = obj@meta.data[,columns_to_keep]


obj_sample = subset(obj,cells = rownames(obj@meta.data[obj@meta.data[[colName]] %in% sample_name,]))

spatialCoords <- obj_sample@reductions$spatial@cell.embeddings
if(length(sample_name)>1){
    
    spatialCoords = as.data.frame(spatialCoords)
    spatialCoords$Sample = obj_sample@meta.data[[colName]]
    spatialCoords$sample_id = as.numeric(factor(spatialCoords$Sample, levels = unique(spatialCoords$Sample)))
    spatialCoords = spatialCoords[,-which(colnames(spatialCoords) %in% "Sample")]
    locs_dt <- data.table(spatialCoords)
    colnames(locs_dt) <- c("sdimx", "sdimy", "group")
    locs_dt[, sdimx := sdimx - min(sdimx), by = group]
    locs_dt$group = as.numeric(locs_dt$group)
    global_max <- max(locs_dt$sdimx) * 1.5
    locs_dt[, sdimx := sdimx + group * global_max]
    locs <- as.matrix(locs_dt[, 1:2])
    rownames(locs) <- rownames(obj_sample@meta.data)
    
} else if(length(sample_name)==1){
    
    colnames(spatialCoords) <- c("sdimx", "sdimy")

}
R
# --- 对象转换 ---
# 将 Seurat 对象转换为 `SpatialExperiment` 对象,以满足 Banksy 算法的输入格式要求。
sce <- Seurat::as.SingleCellExperiment(obj_sample)
sample_id <- as.character(unique(obj_sample$Sample))
sample <- gsub(sample_id,pattern = "_expression",replacement = "")
# 转为SpatialExperiment对象,该对象作为banksy的输入
if(length(sample_name)>1){
    
    spe = SpatialExperiment(
        assays = assays(sce),
        rowData = rowData(sce),
        colData = colData(sce),
        spatialCoords = locs
    )
    
} else if(length(sample_name)==1){
    
    spe = SpatialExperiment(
        assays = assays(sce),
        rowData = rowData(sce),
        colData = colData(sce),
        sample_id = sample_id,
        spatialCoords = spatialCoords,
    )
    
}
R
# --- 数据归一化与高变基因选择 ---
# 根据数据集大小自动选择处理策略:对于小数据集使用全基因进行归一化,对于大数据集提取高变特征基因 (HVGs) 进行后续分析。
if (dim(spe)[2] <= 50000 & dim(spe)[1] <= 40000) {
    # 默认使用全部基因,如果使用全部基因,细胞数必需在50000以内,将数据归一化
    spe <- computeLibraryFactors(spe)
    aname <- "normcounts"
    assay(spe, aname) <- normalizeCounts(spe, log = FALSE)
} else {
    # 使用高变基因
    seu <- as.Seurat(spe, data = NULL)
    seu <- FindVariableFeatures(seu, nfeatures = 2000)
    
    # 标准化
    scale_factor <- median(colSums(assay(spe, "counts")))
    seu <- NormalizeData(seu, scale.factor = scale_factor, normalization.method = "RC")

    # 标准化结果加到SpatialExperiment对象
    aname <- "normcounts"
    assay(spe, aname) <- GetAssayData(seu)
    spe <- spe[VariableFeatures(seu),]
    
}

空间特征计算与聚类核心

Banksy 通过 计算邻域特征降维空间聚类 完成区域识别。该过程包含以下关键步骤:

  1. 特征计算:利用空间邻域信息增强基因表达特征。
  2. 降维:使用 PCA 和 UMAP 在增强特征空间内进行降维。
  3. 批次效应校正:如果存在多样本,利用 Harmony 算法对多个样本间的空间特征进行批次效应校正。
  4. 聚类:基于降维结果执行空间微环境聚类。
R
# --- Banksy 特征计算与降维 ---
# 计算基于空间邻域的 Banksy 增强特征矩阵,并执行 PCA 降维以提取主要空间特征。
k_geom <- c(15, 30)
spe <- Banksy::computeBanksy(spe, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
  
lambda <- lambda
  
set.seed(1000)
spe <- Banksy::runBanksyPCA(spe, use_agf = TRUE,  lambda = lambda, npcs = npcs)
output
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 10.1 GiB”
Computing neighbors...n
Spatial mode is kNN_median

Parameters: k_geom=15

Done

Computing neighbors...n
Spatial mode is kNN_median

Parameters: k_geom=30

Done

Computing harmonic m = 0

Using 15 neighbors

Done

Computing harmonic m = 1

Using 30 neighbors

Centering

Done

Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 10.1 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 10.1 GiB”

单样本与多样本整合聚类:

本教程同时支持单样本空间聚类和多样本整合空间聚类。系统会根据您在后续配置中输入的样本数量(sample_name 参数)自动判断分析模式:

  • 单样本模式:仅对单个组织切片进行空间特征增强和聚类。
  • 多样本模式:针对多个组织切片,在空间特征计算后,自动引入 Harmony 算法进行批次效应校正,随后再进行联合降维与聚类,确保在消除技术批次差异的同时保留真实的空间生物学异质性。
R
# --- 批次效应校正 (仅多样本模式) ---
if(length(sample_name)>1){
    for (i in lambda) {
    
        set.seed(1000)
        pca_name <- paste0("PCA_M1_lam", i)
        print(paste0("computeBanksy Harmony: ", pca_name))

        harmony_embedding <- HarmonyMatrix(
            data_mat = reducedDim(spe, pca_name),
            meta_data = colData(spe),
            vars_use = colName,
            do_pca = FALSE,
            max.iter.harmony = 20,
            verbose = FALSE
        )

        banksy_name <- paste0("HarBsy_lam", i)
        reducedDim(spe, banksy_name) <- harmony_embedding
    }
}

降维与聚类单/多样本差异处理:

  • 多样本整合模式:将基于 Harmony 校正后的特征(dimred = 'HarBsy_lam...')进行 UMAP 降维与聚类。
  • 单样本模式:直接使用未校正的原始 Banksy 空间特征进行后续降维与聚类。
R
# --- 降维与空间聚类 ---
if(length(sample_name)>1){
    print("computeBanksy UMAP")
    #spe <- Banksy::runBanksyUMAP(spe, use_agf = TRUE, lambda = lambda, npcs = npcs)
    for (i in lambda) {
        banksy_name <- paste0("HarBsy_lam", i)
        spe <- Banksy::runBanksyUMAP(spe, dimred = banksy_name)
    }


    for (i in lambda) {
        banksy_name <- paste0("HarBsy_lam", i)
        if(algo == "leiden"|algo == "louvain"){
            spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda, algo = 'leiden', npcs = npcs, resolution = resolution,dimred = banksy_name)
        }else if(algo == "kmeans"){
            spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda,  algo = "kmeans",npcs = npcs, kmeans.centers = kmeans.centers,dimred = banksy_name)
        }else if(algo == "mclust"){
            spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda,  algo = "mclust",npcs = npcs, mclust.G = mclust.G,dimred = banksy_name)
        }
    }

    spe <- Banksy::connectClusters(spe)
} else if(length(sample_name)==1){
    spe <- Banksy::runBanksyUMAP(spe, use_agf = TRUE, lambda = lambda, npcs = npcs)
    
    if(algo == "leiden"|algo == "louvain"){
        spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda, algo = 'leiden', npcs = npcs, resolution = resolution)
    }else if(algo == "kmeans"){
        spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda,  algo = "kmeans",npcs = npcs, kmeans.centers = kmeans.centers)
    }else if(algo == "mclust"){
        spe <- Banksy::clusterBanksy(spe, use_agf = TRUE, lambda = lambda,  algo = "mclust",npcs = npcs, mclust.G = mclust.G)
    }

    spe <- Banksy::connectClusters(spe)
}
output
clust_M1_lam0.6_k50_res0.4 --> clust_M1_lam0.8_k50_res0.4

clust_M1_lam0.8_k50_res0.8 --> clust_M1_lam0.6_k50_res0.4

clust_M1_lam0.6_k50_res0.8 --> clust_M1_lam0.8_k50_res0.8
R
colData(spe) <- cbind(colData(spe), spatialCoords(spe))
data = as.data.frame(colData(spe))

color_palette_24 = c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", 
                  "#FFFF33", "#A65628", "#F781BF", "#999999", "#66C2A5",
                  "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F",
                  "#E5C494", "#B3B3B3", "#6A3D9A", "#FF4500", "#2E8B57",
                  "#9467BD", "#8C564B", "#E377C2", "#7F7F7F", "#BCBD22",
                  "#17BECF", "#A6761D", "#1B9E77", "#D95F02", "#7570B3",
                  "#E7298A", "#66A61E", "#E6AB02", "#A6761D", "#666666",
                  "#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99",
                  "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A",
                  "#FFFF99", "#B15928", "#BDC3C7", "#2C3E50", "#E67E22")

cnames <- grep("^clust", colnames(colData(spe)), value = TRUE)

空间聚类可视化

图注说明:下图展示组织切片的空间聚类图。

  • 坐标轴:代表组织切片的物理空间坐标,映射细胞在真实组织中的位置。
  • 颜色标记:不同的颜色区分了不同的空间聚类簇(Cluster),每一个簇可以理解为具有特定分子特征和空间邻近性的组织微环境(或生态位)。
  • 图名参数说明
    • lam:表示空间权重参数(λ),决定了细胞自身表达与周围微环境特征的混合比例。λ 值越大,聚类结果越倾向于考虑空间邻域信息,使得物理位置相近的细胞更容易被聚为一类,形成更平滑连续的空间域;λ 值越小,则越依赖细胞自身的转录组特征。
    • 分群参数:根据所选算法不同,res 后的数值表示 leiden/louvain 算法的聚类分辨率;kmeans / mclust 后的数值表示预设的分群数量。

💡 分析建议与优化: 空间分群的精细程度受空间权重(λ)与分群参数(res/kmeans/mclust)共同控制。若当前分群结果过于细碎,可在后续的云平台交互界面中,结合解剖学结构手动合并相邻或相似的区域。

R
options(repr.plot.height=8, repr.plot.width=16)

for(i in 1:length(sample_name)){
    a = data[data$Sample %in% sample_name[i],]
    for(j in 1:length(cnames)){
        # 动态计算图例列数
        num_clusters <- length(unique(a[[cnames[j]]]))  # 获取当前cluster变量的类别数
        if(num_clusters <= 10){
            ncol_legend <- 1
        } else if(num_clusters <= 20){
            ncol_legend <- 2
        } else if(num_clusters <= 30){
            ncol_legend <- 3
        } else {
            ncol_legend <- 4  # 超过30个时使用4列
        }
        
        plots = ggplot(a, aes(sdimx, sdimy))+
                        geom_point(aes_string(color=cnames[j], fill = cnames[j]), size = 1.2)+
                        coord_equal()+
                        theme_classic()+
                        theme(plot.background = element_rect(fill = "black", color = NA),
                              panel.background = element_rect(fill = "black", color = NA),
                              legend.background = element_rect(fill = "black", color = NA),
                              legend.key = element_rect(fill = "black", color = NA),
                              text = element_text(color = "white", size = 16),
                              axis.text = element_text(color = "white", size = 14),
                              axis.title = element_text(color = "white", size = 18),
                              axis.line = element_line(color = "white"),
                              axis.ticks = element_line(color = "white"),
                              legend.title = element_text(color = "white", size = 16),
                              legend.text = element_text(color = "white", size = 14),
                              legend.position = "right",
                              legend.spacing = unit(0.5, "cm"),
                              legend.key.size = unit(1, "cm")) +
                        scale_color_manual(values = color_palette_24)+
                        xlab("spatial_1")+
                        ylab("spatial_2")+
                        guides(color = guide_legend(override.aes = list(size = 5), 
                                                   ncol = ncol_legend))  # 使用动态计算的列数
        print(plots)
    }
}

图注说明:下图展示细胞在降维空间中的 UMAP 图。

  • 横/纵坐标:UMAP 降维后的二维坐标(UMAP_1 与 UMAP_2),保留高维特征空间的拓扑结构。
  • 颜色:空间聚类结果(Cluster)。
R
# 设置图形参数
options(repr.plot.height=16, repr.plot.width=16)

# 初始化列表存储所有图形
plot_list <- list()
plot_counter <- 1

# 根据样本数量执行不同的处理
if(length(sample_name) > 1){
    # 多样本情况
    if(algo == "leiden" | algo == "louvain"){
        for(i in 1:length(lambda)){
            # 获取UMAP降维数据
            b = reducedDim(spe, paste0("UMAP_HarBsy_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                           paste0("UMAP_HarBsy_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(resolution)){
                # 获取聚类列名
                cluster_col <- paste0("clust_HarBsy_lam", lambda[i], "_k50_res", resolution[j])
                
                # 计算聚类数量并设置图例列数
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                # 创建图形
                plots = ggplot(data, aes_string(x = paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_HarBsy_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.x = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", res=", resolution[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend), 
                           fill = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                # 存储图形
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    } else if(algo == "kmeans"){
        for(i in 1:length(lambda)){
            b = reducedDim(spe, paste0("UMAP_HarBsy_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                           paste0("UMAP_HarBsy_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(kmeans.centers)){
                cluster_col <- paste0("clust_HarBsy_lam", lambda[i], "_k50_kmeans", kmeans.centers[j])
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                plots = ggplot(data, aes_string(x = paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_HarBsy_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.x = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", kmeans=", kmeans.centers[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend), 
                           fill = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    } else if(algo == "mclust"){
        for(i in 1:length(lambda)){
            b = reducedDim(spe, paste0("UMAP_HarBsy_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                           paste0("UMAP_HarBsy_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(mclust.G)){
                cluster_col <- paste0("clust_HarBsy_lam", lambda[i], "_k50_mclust", mclust.G[j])
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                plots = ggplot(data, aes_string(x = paste0("UMAP_HarBsy_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_HarBsy_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.x = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", mclust G=", mclust.G[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend), 
                           fill = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    }
    
} else if(length(sample_name) == 1){
    # 单样本情况
    if(algo == "leiden" | algo == "louvain"){
        for(i in 1:length(lambda)){
            b = reducedDim(spe, paste0("UMAP_M1_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_M1_lam", lambda[i], "_1"), 
                           paste0("UMAP_M1_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(resolution)){
                cluster_col <- paste0("clust_M1_lam", lambda[i], "_k50_res", resolution[j])
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                plots = ggplot(data, aes_string(x = paste0("UMAP_M1_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_M1_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.y = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", res=", resolution[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    } else if(algo == "kmeans"){
        for(i in 1:length(lambda)){
            b = reducedDim(spe, paste0("UMAP_M1_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_M1_lam", lambda[i], "_1"), 
                           paste0("UMAP_M1_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(kmeans.centers)){
                cluster_col <- paste0("clust_M1_lam", lambda[i], "_k50_kmeans", kmeans.centers[j])
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                plots = ggplot(data, aes_string(x = paste0("UMAP_M1_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_M1_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.y = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", kmeans=", kmeans.centers[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    } else if(algo == "mclust"){
        for(i in 1:length(lambda)){
            b = reducedDim(spe, paste0("UMAP_M1_lam", lambda[i]))
            b = as.data.frame(b)
            colnames(b) = c(paste0("UMAP_M1_lam", lambda[i], "_1"), 
                           paste0("UMAP_M1_lam", lambda[i], "_2"))
            data = cbind(data, b)
            
            for(j in 1:length(mclust.G)){
                cluster_col <- paste0("clust_M1_lam", lambda[i], "_k50_mclust", mclust.G[j])
                n_clusters <- length(unique(data[[cluster_col]]))
                ncol_legend <- case_when(
                    n_clusters <= 8 ~ 1,
                    n_clusters <= 16 ~ 2,
                    n_clusters <= 24 ~ 3,
                    n_clusters <= 32 ~ 4,
                    n_clusters <= 40 ~ 5,
                    TRUE ~ 6
                )
                
                plots = ggplot(data, aes_string(x = paste0("UMAP_M1_lam", lambda[i], "_1"), 
                                                y = paste0("UMAP_M1_lam", lambda[i], "_2"))) +
                    geom_point(aes_string(color = cluster_col, fill = cluster_col), size = 0.5) +
                    coord_equal() +
                    theme_classic() +
                    theme(legend.position = "right",
                          legend.box = "vertical",
                          legend.title = element_blank(),
                          legend.spacing.y = unit(0.2, "cm"),
                          plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
                          plot.title = element_text(size = 12, hjust = 0.5)) +
                    scale_color_manual(values = color_palette_24) +
                    xlab("UMAP_1") +
                    ylab("UMAP_2") +
                    ggtitle(paste0("lambda=", lambda[i], ", mclust G=", mclust.G[j])) +
                    guides(color = guide_legend(override.aes = list(size = 5), ncol = ncol_legend))
                
                plot_list[[plot_counter]] <- plots
                plot_counter <- plot_counter + 1
            }
        }
    }
}

# 组合所有图形
# 计算每行显示的图形数量(可以根据需要调整)
ncol_layout <- 2  # 每行显示2个图形
nrow_layout <- ceiling(length(plot_list) / ncol_layout)

# 使用wrap_plots组合图形
combined_plot <- wrap_plots(plot_list, ncol = ncol_layout, byrow = TRUE) +
    plot_annotation(
        title = paste0("Clustering Results - ", toupper(algo)),
        theme = theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"))
    )

# 显示组合后的图形
print(combined_plot)
output
Warning message:
annotation$theme is not a valid theme.
Please use \`theme()\` to construct themes.”

图注说明:下图展示各样本的聚类比例堆叠柱状图。

  • 横坐标:样本名称。
  • 纵坐标:百分比(Percentage),表示各聚类在样本中的细胞占比。
  • 颜色:空间聚类结果(Cluster)。
  • 判读:用于比较不同样本之间空间区域组成的差异。
R
 # 加载必要的包
library(ggplot2)
library(reshape2)
library(dplyr)
library(stringr)
library(scales)
library(patchwork)

# 初始化列表存储所有图形
plot_list <- list()
plot_counter <- 1

# 设置图形参数  
options(repr.plot.height=12, repr.plot.width=16)

for(i in 1:length(cnames)){
    
    # 提取参数
    lam = as.numeric(gsub(".*lam([0-9.]+)_.*", "\\1", cnames[i]))
    if(algo == "leiden" | algo == "louvain"){
        res = as.numeric(gsub(".*res([0-9.]+).*", "\\1", cnames[i]))
        param_label <- paste0("lambda=", lam, ", res=", res)
    } else if(algo == "kmeans"){
        km = as.numeric(gsub(".*kmeans([0-9]+).*", "\\1", cnames[i]))
        param_label <- paste0("lambda=", lam, ", kmeans=", km)
    } else if(algo == "mclust"){
        g = as.numeric(gsub(".*mclust([0-9]+).*", "\\1", cnames[i]))
        param_label <- paste0("lambda=", lam, ", mclust G=", g)
    }
    
    # 从table命令创建基础数据
    tab <- table(data[[colName]], data[[cnames[i]]])
    
    # 转换为数据框
    df <- as.data.frame.matrix(tab)
    df$Sample <- rownames(df)
    
    # 转换为长格式
    df_long <- melt(df, id.vars = "Sample", variable.name = "Cluster", value.name = "Count")
    df_long$Sample = str_replace_all(df_long$Sample, c("^\\d+_" = "", "_expression.*$" = ""))
    
    # 计算百分比并按样本分组
    df_long <- df_long %>%
        group_by(Sample) %>%
        mutate(
            Percentage = Count / sum(Count) * 100,
            Percentage_label = sprintf("%.1f%%", Percentage)
        ) %>%
        filter(Percentage > 0) %>%
        arrange(Sample, desc(Percentage))
    
    # 获取聚类数量并自动设置图例列数
    n_clusters <- length(unique(df_long$Cluster))
    
    ncol_legend <- case_when(
        n_clusters <= 8 ~ 1,
        n_clusters <= 16 ~ 2,
        n_clusters <= 24 ~ 3,
        n_clusters <= 32 ~ 4,
        n_clusters <= 40 ~ 5,
        TRUE ~ 6
    )
    
    # 绘制图形
    p <- ggplot(df_long, aes(x = Sample, y = Percentage, fill = Cluster)) +
        geom_bar(stat = "identity", position = "stack", width = 0.7) +
        scale_fill_manual(values = color_palette_24[1:n_clusters]) +
        labs(
            title = param_label,
            x = "Sample",
            y = "Percentage (%)"
        ) +
        scale_y_continuous(
            labels = percent_format(scale = 1),
            expand = expansion(mult = c(0, 0.05))
        ) +
        theme_minimal() +
        theme(
            # X轴标签
            axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
            axis.title.x = element_text(size = 12, face = "bold"),
            
            # Y轴标签
            axis.text.y = element_text(size = 10),
            axis.title.y = element_text(size = 12, face = "bold"),
            
            # 标题
            plot.title = element_text(size = 12, face = "bold", hjust = 0.5, margin = margin(b = 10)),
            
            # 图例设置
            legend.position = "right",
            legend.title = element_text(size = 11, face = "bold"),
            legend.text = element_text(size = 9),
            legend.key.size = unit(0.8, "cm"),
            legend.spacing.y = unit(0.2, "cm"),
            legend.margin = margin(t = 10, r = 10, b = 10, l = 10),
            
            # 网格线
            panel.grid.major.x = element_blank(),
            panel.grid.minor = element_blank(),
            
            # 背景和边距
            panel.background = element_rect(fill = "white", color = NA),
            plot.background = element_rect(fill = "white", color = NA),
            plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm")
        ) +
        guides(
            fill = guide_legend(
                ncol = ncol_legend,
                byrow = TRUE,
                title = "Cluster",
                title.position = "top",
                title.hjust = 0.5,
                label.position = "right",
                label.hjust = 0
            )
        )
    
    # 存储图形
    plot_list[[plot_counter]] <- p
    plot_counter <- plot_counter + 1
}

# 组合所有图形
# 计算布局(每行显示2-3个图形)
ncol_layout <- ifelse(length(plot_list) >= 6, 3, 2)
nrow_layout <- ceiling(length(plot_list) / ncol_layout)

# 使用wrap_plots组合图形
combined_plot <- wrap_plots(plot_list, ncol = ncol_layout, byrow = TRUE) +
    plot_annotation(
        title = paste0("Cluster Distribution by Sample - ", toupper(algo)),
        subtitle = paste0("Total plots: ", length(plot_list), " parameter combinations"),
        theme = theme(
            plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
            plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray50")
        )
    )

# 显示组合后的图形
print(combined_plot)
output
Warning message:
annotation$theme is not a valid theme.
Please use \`theme()\` to construct themes.”
0 条评论·0 条回复