Skip to content

scFAST-seq 突变分析:数据合并、UMAP 展示与整合

作者: SeekGene
时长: 15 分钟
字数: 3.2k 字
更新: 2026-02-28
阅读: 0 次
全序列转录组 Notebooks 突变分析

右上角选择 R 环境

R
library(Seurat)
library(dplyr)
library(Matrix)
library(stringr)
library(ggplot2)
options(repr.plot.width = 12, repr.plot.height = 6)
output
The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0

rgeos version: 0.6-3, (SVN revision 696)
GEOS runtime version: 3.11.2-CAPI-1.17.2
Please note that rgeos will be retired during October 2023,
plan transition to sf or terra functions using GEOS at your earliest convenience.
See https://r-spatial.org/r/2023/05/15/evolution4.html for details.
GEOS using OverlayNG
Linking to sp version: 1.6-0
Polygon checking: TRUE


Attaching SeuratObject

Attaching sp


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

filter, lag


The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

将突变数据合并到单样本 rds 里

输入数据

  1. 单样本 rds(可以选择对应分析流程的 rds)
  2. 该样本的两个突变矩阵(all/alt,可联系客户经理申请释放,仅限于全序列数据)

矩阵文件可以在左上角的 upload 处点击上传

需要修改的内容

修改下方的 rds 路径,相对目录为../data/,绝对目录为/home/mambauser/data/

修改下方的矩阵文件名

R
#load rds and matrix
obj = readRDS("data/AY1732591902625/input.rds") # AY1732591902625为流程ID
# AY1732591902625为流程ID
meta = read.table("data/AY1732591902625/meta.tsv", header = T, sep = "\t", check.names = F)
rownames(meta) = meta$barcode
obj = AddMetaData(object = obj,metadata =  meta)

snv_cover_mat = read.delim("PBMC.snp_indel.all_UMI.matrix", header = T, row.names = 1)
snv_mut_mat = read.delim("PBMC.snp_indel.alt_UMI.matrix", header = T, row.names = 1)

查看矩阵

R
snv_cover_mat[1:3, 1:3]
snv_mut_mat[1:3, 1:3]
A data.frame: 3 × 3
AAGTTCGTACTGGTTCTCTGCAGGTACGGAGTAGTAACGACCGACTGCGCA
<int><int><int>
SDF4:chr1-1223263:T>G000
SLC35E2B:chr1-1668373:C>T000
CDK11A:chr1-1709071:C>T000
A data.frame: 3 × 3
AAGTTCGTACTGGTTCTCTGCAGGTACGGAGTAGTAACGACCGACTGCGCA
<int><int><int>
SDF4:chr1-1223263:T>G000
SLC35E2B:chr1-1668373:C>T000
CDK11A:chr1-1709071:C>T000
R
# Function:process cover/mut matrix
## object : Seurat object
## mat : matrix or dgCMatrix or data.frame
## assay_name: specified new assay name for matrix
generate_mat = function(object, mat, assay_name) {
    #make sure consistent of barcodes
    diff_barcode = setdiff(colnames(object), colnames(mat))
    if (length(diff_barcode) == ncol(object)) {
            stop("no common barcodes between matrix and seurat object")  
    }
    if (length(diff_barcode) != 0) {
        diff_matrix = matrix(0,
                         nrow = nrow(mat),
                         ncol = length(diff_barcode),
                         dimnames = list(rownames(mat), diff_barcode))
        mat = cbind(mat, diff_matrix)
    }
    mat = as(as.matrix(mat), 'dgCMatrix')
    #reorder barcode according rds
    mat = mat[, colnames(object)]
    #add new assay into rds
    object[[assay_name]] = CreateAssayObject(mat)
    return(object)
}

运行该命令,分别将突变矩阵和覆盖矩阵加入到 obj 中

R
#cover matrix
obj = generate_mat(object = obj, mat = snv_cover_mat, assay_name = "SNV_all")

#mutation matrix
obj = generate_mat(object = obj, mat = snv_mut_mat, assay_name = "SNV")
output
Warning message:
“Keys should be one or more alphanumeric characters followed by an underscore, setting key from snv_all_ to snvall_”

obj 有三个 assay,添加成功

R
#result
Assays(obj)
  1. 'RNA'
  2. 'SNV_all'
  3. 'SNV'
R
#check value
obj@assays$SNV[1:3,1:3]
output
3 x 3 sparse Matrix of class "dgCMatrix"
AAGTTCGTACTGGTTCT CTGCAGGTACGGAGTAG CCTCTAGATGTAATTCC
SDF4:chr1-1223263:T>G . . .
SLC35E2B:chr1-1668373:C>T . . .
CDK11A:chr1-1709071:C>T . . .

保存 rds

R
saveRDS(obj, file = "add_snv.rds")

展示突变的发生情况

展示单个突变在细胞群上的发生情况

指定 assay 为 SNV,即指定突变矩阵,展示发生该突变的细胞

例如指定突变:SLC35E2B:chr1-1668373:C>T

R
DefaultAssay(obj) = "SNV"
FeaturePlot(obj, features = "SLC35E2B:chr1-1668373:C>T",ncol = 2) +
DimPlot(obj, group.by = "resolution.0.8", label = T)

保存图片

图片文件名为 SLC35E2B_C_to_T_mut.png,保存在当前目录下,可在左侧下载图片

R
p = FeaturePlot(obj, features = "SLC35E2B:chr1-1668373:C>T",ncol = 2) +
DimPlot(obj, group.by = "resolution.0.8", label = T)
ggsave(p, file = "SLC35E2B_C_to_T_mut.png", width = 12, height = 6)

展示多个突变在细胞群上的发生情况

指定 assay 为 SNV,即指定突变矩阵,展示发生该突变的细胞

例如指定前 5 个突变

R
DefaultAssay(obj) = "SNV"
head5_mut = head(rownames(obj), 5)
head5_mut
  1. 'SDF4:chr1-1223263:T>G'
  2. 'SLC35E2B:chr1-1668373:C>T'
  3. 'CDK11A:chr1-1709071:C>T'
  4. 'RPL22:chr1-6197724:C>T'
  5. 'DNAJC11:chr1-6667742:TTC>-'

批量保存图片

R
for ( i in head5_mut) {
    p = FeaturePlot(obj, features = i, ncol = 2)
    filename = stringr::str_replace_all(i, "[:>]", "_")
    ggsave(p, file = paste0(filename, ".png"), width = 6, height = 6) #保存在当前目录下
    }

展示覆盖到该位点的细胞群

指定 assay 为 SNV_all,即指定覆盖矩阵,展示覆盖到该突变所在位点的细胞

R
DefaultAssay(obj) = "SNV_all"
FeaturePlot(obj, features = "SLC35E2B:chr1-1668373:C>T",ncol = 2) +
DimPlot(obj, group.by = "resolution.0.8", label = T)

展示该突变所在基因的表达情况

指定 assay 为 RNA,即指定表达数据,展示该突变所在基因的表达情况

R
DefaultAssay(obj) = "RNA"
FeaturePlot(obj, features = "SLC35E2B",ncol = 2) +
DimPlot(obj, group.by = "resolution.0.8", label = T)

将是否发生突变添加为新的标签

将细胞分类为两类:发生该突变/未发生该突变,并作为新的标签进行展示;

例如指定突变为"SLC35E2B:chr1-1668373:C>T"

R
interest_mut = "SLC35E2B:chr1-1668373:C>T"
DefaultAssay(obj) = "SNV"
mut_label = FetchData(obj, interest_mut)
mut_label = ifelse(mut_label == 0, "negative", "positive")
table(mut_label)
output
mut_label
negative positive
1909 12
添加标签
R
obj = AddMetaData(obj, metadata = mut_label, col.name = "SLC35E2B_C_to_T")
展示
R
DefaultAssay(obj) = "RNA"
FeaturePlot(obj, features = "SLC35E2B",ncol = 2) +
DimPlot(obj, group.by = "SLC35E2B_C_to_T", cols = c("grey","red"))
导出表格
R
df = data.frame(barcode = colnames(obj),
                SLC35E2B_C_to_T = mut_label)
head(df)
write.csv(df, file="SLC35E2B_C_to_T_mut_meta.csv", quote=F, row.names = F)
A data.frame: 6 × 2
barcodeSLC35E2B.chr1.1668373.C.T
<chr><chr>
AAGTTCGTACTGGTTCTAAGTTCGTACTGGTTCTnegative
CTGCAGGTACGGAGTAGCTGCAGGTACGGAGTAGnegative
CCTCTAGATGTAATTCCCCTCTAGATGTAATTCCnegative
TAACGACCGACTGCGCATAACGACCGACTGCGCAnegative
ATCAGGTGCTTTCAGACATCAGGTGCTTTCAGACnegative
GACCCTTTACGTGTTCCGACCCTTTACGTGTTCCnegative
上传表格到云平台

表格保存到当前路径下,在左侧点击下载,进入云平台

云平台-选择分析流程-辅助信息-标签管理-新增标签-上传合并 meta-点击上传

全序列突变数据的多样本整合

将多个样本的突变矩阵与整合后 rds 进行合并,从而可以在样本间进行突变的相关分析。

输入数据

  1. 整合后 rds。(可以选择对应分析流程)
  2. 各个样本的突变发生矩阵,覆盖矩阵(all/alt,可联系客户经理申请释放,仅限于全序列数据)

例如:整合 rds 中 SAMple 列为"PBMC", 突变矩阵文件名为 "PBMC.snp_indel.all_UMI.matrix", "PBMC.snp_indel.alt_UMI.matrix"

矩阵文件可以在左上角的 upload 处点击上传

需要修改的内容

修改下方的 rds 路径,相对目录为../data/,绝对目录为/home/mambauser/data/

修改下方的矩阵文件名

R
##read integrated rds
integrate_obj = readRDS("integrated_seurat.rds") #或 integrate_obj = readRDS("data/流程ID/input.rds")
unique(integrate_obj$Sample)

#snv_cover_mat&snv_mut_mat的矩阵文件需放在当前工作目录下,您可以通过setwd()来更改工作目录
snv_cover_mat = c("PBMC.snp_indel.all_UMI.matrix", "cellline.snp_indel.all_UMI.matrix")
snv_mut_mat = c("PBMC.snp_indel.alt_UMI.matrix", "cellline.snp_indel.alt_UMI.matrix")
  1. 'PBMC'
  2. 'cellline'
R
# Function:process cover/mut matrix
## object : Seurat object
## mat : matrix or dgCMatrix or data.frame
## assay_name: specified new assay name for matrix
generate_mat = function(object, mat, assay_name) {
    #make sure consistent of barcodes
    diff_barcode = setdiff(colnames(object), colnames(mat))
    if (length(diff_barcode) == ncol(object)) {
            stop("no common barcodes between matrix and seurat object")  
    }
    if (length(diff_barcode) != 0) {
        diff_matrix = matrix(0,
                         nrow = nrow(mat),
                         ncol = length(diff_barcode),
                         dimnames = list(rownames(mat), diff_barcode))
        mat = cbind(mat, diff_matrix)
    }
    mat = as(as.matrix(mat), 'dgCMatrix')
    #reorder barcode according rds
    mat = mat[, colnames(object)]
    #add new assay into rds
    object[[assay_name]] = CreateAssayObject(mat)
    return(object)
}

# Function:process cover/mut matrix for integrated rds
## object: Seurat object
## snv_cover_mat: vector, cover matrix for each sample
## snv_mut_mat:vector, mut matrix for each sample
## meta_name: column name in meta.data, contains samples' name, same as marix filename's prefix
## cover_assay_name : new assay name for cover matrix
## mut_assay_name : new assay name for mut matrix
generate_mat_integrated = function(object,
                                   snv_cover_mat,
                                   snv_mut_mat,
                                   meta_name = "Sample",
                                   cover_assay_name = "SNV_all",
                                   mut_assay_name = "SNV") {

    #get barcode suffix for each sample
    suffix = as.data.frame(object@meta.data) %>%
        select(sym(meta_name)) %>%
        unique()
    suffix$barcode = rownames(suffix)
    suffix$suffix = stringr::str_replace_all(suffix$barcode,
                                             "[A-Z]{17}", "")

    #check sample name
    cover_sample = stringr::str_replace_all(snv_cover_mat,
                                            ".snp_indel.all_UMI.matrix","")
    mut_sample = stringr::str_replace_all(snv_mut_mat,
                                          ".snp_indel.alt_UMI.matrix","")
    
    if(! identical(cover_sample, mut_sample)) {
        stop("filenames are not same for snv_cover_mat and snv_mut_mat")
    }
    
    if(! all(suffix[[meta_name]] %in% cover_sample)) {
        stop("some samples do not have corresponding matrix files")
    }

    #get all matrix, add suffix
    cover_mat_list = lapply(1:nrow(suffix), function(e) {
        mat = read.delim(paste0(suffix[[meta_name]][e], ".snp_indel.all_UMI.matrix"),
                         header = T, row.names = 1)
        colnames(mat) = paste0(colnames(mat), suffix[["suffix"]][e])
        as.matrix(mat)
    })

    mut_mat_list = lapply(1:nrow(suffix), function(e) {
        mat = read.delim(paste0(suffix[[meta_name]][e], ".snp_indel.alt_UMI.matrix"),
                         header = T, row.names = 1)
        colnames(mat) = paste0(colnames(mat), suffix[["suffix"]][e])
        as.matrix(mat)
    })

    ##define two matrix
    all_muts = unique(unlist(lapply(cover_mat_list, rownames)))
    all_barcodes = unlist(lapply(cover_mat_list, colnames))
    
    all_cover_mat = Matrix::Matrix(0,
                               nrow = length(all_muts),
                               ncol = length(all_barcodes),
                               dimnames = list(all_muts, all_barcodes))
    all_mut_mat = Matrix::Matrix(0,
                               nrow = length(all_muts),
                               ncol = length(all_barcodes),
                               dimnames = list(all_muts, all_barcodes))

    #fill in value
    for (i in seq.int(cover_mat_list)) {
        mat = cover_mat_list[[i]]
        all_cover_mat[rownames(mat), colnames(mat)] = mat
        
        mat = mut_mat_list[[i]]
        all_mut_mat[rownames(mat), colnames(mat)] = mat
    }                      

    #add new assay, function is from top-line in this file
    object = generate_mat(object,
                          mat = all_cover_mat,
                          assay_name = cover_assay_name)
    
    object = generate_mat(object,
                          mat = all_mut_mat,
                          assay_name = mut_assay_name)

}

运行命令,合并数据

R
integrate_obj = generate_mat_integrated(integrate_obj,
                                        snv_cover_mat,
                                        snv_mut_mat,
                                        meta_name = "Sample")
output
Warning message:
“Keys should be one or more alphanumeric characters followed by an underscore, setting key from snv_all_ to snvall_”

新对象有三个 assay,添加成功

R
Assays(integrate_obj)
  1. 'RNA'
  2. 'SNV_all'
  3. 'SNV'

保存数据

R
saveRDS(integrate_obj, file = "integrate_add_snv.rds")

也可以先合并数据,再做整合

R
#PBMC data
obj1 = readRDS("PBMC_seurat.rds")
snv_cover_mat1 = read.delim("PBMC.snp_indel.all_UMI.matrix", header = T, row.names = 1)
snv_mut_mat1 = read.delim("PBMC.snp_indel.alt_UMI.matrix", header = T, row.names = 1)
#cover matrix
obj1 = generate_mat(object = obj1, mat = snv_cover_mat1, assay_name = "SNV_all")
#mutation matrix
obj1 = generate_mat(object = obj1, mat = snv_mut_mat1, assay_name = "SNV")
output
Warning message:
“Keys should be one or more alphanumeric characters followed by an underscore, setting key from snv_all_ to snvall_”
R
#cellline data
obj2 = readRDS("cellline_seurat.rds")
snv_cover_mat2 = read.delim("cellline.snp_indel.all_UMI.matrix", header = T, row.names = 1)
snv_mut_mat2 = read.delim("cellline.snp_indel.alt_UMI.matrix", header = T, row.names = 1)
#cover matrix
obj2 = generate_mat(object = obj2, mat = snv_cover_mat2, assay_name = "SNV_all")
#mutation matrix
obj2 = generate_mat(object = obj2, mat = snv_mut_mat2, assay_name = "SNV")
output
Warning message:
“Keys should be one or more alphanumeric characters followed by an underscore, setting key from snv_all_ to snvall_”
R
integrate_obj = merge(obj1, obj2)
output
Warning message in CheckDuplicateCellNames(object.list = objects):
“Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.”

新对象有三个 assay,添加成功

R
Assays(integrate_obj)
  1. 'RNA'
  2. 'SNV_all'
  3. 'SNV'

保存数据

R
saveRDS(integrate_obj, file = "integrate_add_snv.rds")

END

0 条评论·0 条回复