Skip to content

单细胞宿主-病毒互作:基因表达丰度与病毒载量相关性研究

作者: SeekGene
时长: 7 分钟
字数: 1.4k 字
更新: 2026-02-27
阅读: 0 次
3' 转录组 5' + 免疫组库 ATAC + RNA 双组学 FFPE 单细胞转录组 Notebooks 全序列转录组 分析指南 相关性分析 空间转录组
R
#####################################
# 选择 jupyter 脚本执行环境为 copyKAT #
#####################################
R
library(Seurat)
library(ggplot2)
library(ggpubr)
output
Loading required package: SeuratObject

Loading required package: sp

‘SeuratObject’ was built with package ‘Matrix’ 1.6.4 but the current
version is 1.7.0; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

intersect, t

基因表达数据使用“ESCC 文章复现”-“大群复现”数据作为示例

读取流程 rds 数据,相对目录为 data/,绝对目录为/home/mambauser/data/

R
# 流程rds为data/流程ID/input.rds
obj = readRDS("data/AY1743044870655/input.rds")
head(obj@meta.data)
A data.frame: 6 × 10
orig.identnCount_RNAnFeature_RNASamplemitoraw_SampleTissuePatientresolution.0.6_d20mitorelatedgenes
<chr><dbl><int><fct><dbl><chr><fct><fct><fct><dbl>
AAACCTGAGATACACA-1_1SeuratProject27971425S150T4.2903GSE145370_S150TTumorS15013.3249911
AAACCTGAGCTAACTC-1_1SeuratProject27901349S150T1.1470GSE145370_S150TTumorS15051.0035842
AAACCTGAGGAGCGAG-1_1SeuratProject17681054S150T4.7511GSE145370_S150TTumorS15014.0158371
AAACCTGAGGGAAACA-1_1SeuratProject44552017S150T2.2896GSE145370_S150TTumorS15081.8855219
AAACCTGAGTCCCACG-1_1SeuratProject1422 861S150T0.9845GSE145370_S150TTumorS15010.7032349
AAACCTGAGTGAACAT-1_1SeuratProject25221308S150T1.7843GSE145370_S150TTumorS15041.5463918
R
gene_expression_matrix = Seurat::GetAssayData(object = obj, slot = "counts")
gene_expression_matrix[1:3, 1:3]
output
Warning message:
The \`slot\` argument of \`GetAssayData()\` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the \`layer\` argument instead.”



3 x 3 sparse Matrix of class "dgCMatrix"
AAACCTGAGATACACA-1_1 AAACCTGAGCTAACTC-1_1 AAACCTGAGGAGCGAG-1_1
MIR1302-2HG . . .
AL627309.1 . . .
AL627309.2 . . .

此处的病毒基因表达数据使用基于负二项分布的模拟数据

R
virus_expression = read.delim("sim_virus.matrix")
virus_expression[1:3,1:3]
virus_expression = Matrix::as.matrix(virus_expression)
A data.frame: 3 × 3
AAACCTGAGATACACA.1_1AAACCTGAGCTAACTC.1_1AAACCTGAGGAGCGAG.1_1
<int><int><int>
virus-gene1010
virus-gene2000
virus-gene3000
R
#构建新对象,获得标准化后的基因表达数据
new_obj = CreateSeuratObject(counts = Matrix::rbind2(gene_expression_matrix, virus_expression))
new_obj = NormalizeData(new_obj)
new_obj$Sample = obj$Sample
output
Normalizing layer: counts
R
#分别获得病毒基因名和宿主基因名
virus_gene = rownames(new_obj)[grep("^virus", rownames(new_obj))]
host_gene = setdiff(rownames(new_obj), virus_gene)
R
#获得病毒基因表达数据
virus_data=GetAssayData(new_obj,assay = "RNA", slot = "data")[virus_gene,]
virus_data_sum = colSums(virus_data)
R
#获得宿主基因表达数据
host_data = GetAssayData(obj, slot = "data")[host_gene,]
host_gene_num = colSums(host_data != 0)

定义

基因表达丰度:细胞表达基因数目

病毒载量:标准化后的病毒基因表达值之和(参考上述文章)

R
plot_data=data.frame(virus_load = virus_data_sum,
                     host_gene_num = host_gene_num,
                    Sample = new_obj$Sample)
head(plot_data)
A data.frame: 6 × 3
virus_loadhost_gene_numSample
<dbl><int><fct>
AAACCTGAGATACACA-1_1 2.7265851425S150T
AAACCTGAGCTAACTC-1_1 4.2498121349S150T
AAACCTGAGGAGCGAG-1_1 8.1470581054S150T
AAACCTGAGGGAAACA-1_1 5.1707282017S150T
AAACCTGAGTCCCACG-1_113.314534 861S150T
AAACCTGAGTGAACAT-1_1 0.0000001308S150T

画图

R
p <- ggplot(plot_data, aes(x = virus_load, y = host_gene_num)) +
  geom_point(color = "#9C9C9C", size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, color = "Firebrick") +  # 添加线性拟合曲线
  facet_wrap(~Sample) + # 按照样本分开展示
  theme_classic() +
  theme(
    axis.title = element_text(size = 15),  # 增大横纵轴字体
    plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),  # 标题居中
  ) +
  labs(
    x = "Normalized Virus Load", # x轴标题
    y = "Host Detected Genes"# y 轴标题
  ) +
  stat_cor(label.x = 20, label.y = 4000, # 相关值的位置
           size = 2, # 相关值字体大小
           color = "brown") + # 添加相关系数 
    scale_x_continuous(expand = c(0,0)) + # 调整x轴范围
    scale_y_continuous(expand = c(0,0),limits = range(plot_data$host_gene_num)) # 调整x轴范围
p
output
\`geom_smooth()\` using formula = 'y ~ x'
Warning message:
Removed 152 rows containing missing values or values outside the scale range
(\`geom_smooth()\`).”

保存图片

R
ggsave(p, file = "correlation.png", width = 10, height = 10)
output
\`geom_smooth()\` using formula = 'y ~ x'
Warning message:
Removed 152 rows containing missing values or values outside the scale range
(\`geom_smooth()\`).”

END

0 条评论·0 条回复