Skip to content

Single-Cell Host-Virus Interaction: Correlation Study between Gene Expression Abundance and Viral Load

Author: SeekGene
Time: 7 min
Words: 1.3k words
Updated: 2026-02-27
Reads: 0 times
3' scRNA-seq 5' + Immune Profiling Analysis Guide Correlation Analysis FFPE scRNA-seq Notebooks Spatial-seq scATAC + RNA-seq scFAST-seq
R
##################################### 
# Select Jupyter script execution environment as copyKAT #
#####################################
R
library(Seurat)
library(ggplot2)
library(ggpubr)
output
Loading required package: SeuratObject

Loading required package: sp

‘SeuratObject’ was built with package ‘Matrix’ 1.6.4 but the current
version is 1.7.0; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

intersect, t

Gene expression data uses "ESCC Paper Reproduction" - "Large Group Reproduction" data as an example

Read workflow RDS data, relative path data/, absolute path /home/mambauser/data/

R
# Workflow rds is data/ProcessID/input.rds
obj = readRDS("data/AY1743044870655/input.rds")
head(obj@meta.data)
A data.frame: 6 × 10
orig.identnCount_RNAnFeature_RNASamplemitoraw_SampleTissuePatientresolution.0.6_d20mitorelatedgenes
<chr><dbl><int><fct><dbl><chr><fct><fct><fct><dbl>
AAACCTGAGATACACA-1_1SeuratProject27971425S150T4.2903GSE145370_S150TTumorS15013.3249911
AAACCTGAGCTAACTC-1_1SeuratProject27901349S150T1.1470GSE145370_S150TTumorS15051.0035842
AAACCTGAGGAGCGAG-1_1SeuratProject17681054S150T4.7511GSE145370_S150TTumorS15014.0158371
AAACCTGAGGGAAACA-1_1SeuratProject44552017S150T2.2896GSE145370_S150TTumorS15081.8855219
AAACCTGAGTCCCACG-1_1SeuratProject1422 861S150T0.9845GSE145370_S150TTumorS15010.7032349
AAACCTGAGTGAACAT-1_1SeuratProject25221308S150T1.7843GSE145370_S150TTumorS15041.5463918
R
gene_expression_matrix = Seurat::GetAssayData(object = obj, slot = "counts")
gene_expression_matrix[1:3, 1:3]
output
Warning message:
The \`slot\` argument of \`GetAssayData()\` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the \`layer\` argument instead.”



3 x 3 sparse Matrix of class "dgCMatrix"
AAACCTGAGATACACA-1_1 AAACCTGAGCTAACTC-1_1 AAACCTGAGGAGCGAG-1_1
MIR1302-2HG . . .
AL627309.1 . . .
AL627309.2 . . .

Viral gene expression data here uses simulated data based on negative binomial distribution

R
virus_expression = read.delim("sim_virus.matrix")
virus_expression[1:3,1:3]
virus_expression = Matrix::as.matrix(virus_expression)
A data.frame: 3 × 3
AAACCTGAGATACACA.1_1AAACCTGAGCTAACTC.1_1AAACCTGAGGAGCGAG.1_1
<int><int><int>
virus-gene1010
virus-gene2000
virus-gene3000
R
# Construct new object, obtain normalized gene expression data
new_obj = CreateSeuratObject(counts = Matrix::rbind2(gene_expression_matrix, virus_expression))
new_obj = NormalizeData(new_obj)
new_obj$Sample = obj$Sample
output
Normalizing layer: counts
R
# Obtain viral gene names and host gene names respectively
virus_gene = rownames(new_obj)[grep("^virus", rownames(new_obj))]
host_gene = setdiff(rownames(new_obj), virus_gene)
R
# Obtain viral gene expression data
virus_data=GetAssayData(new_obj,assay = "RNA", slot = "data")[virus_gene,]
virus_data_sum = colSums(virus_data)
R
# Obtain host gene expression data
host_data = GetAssayData(obj, slot = "data")[host_gene,]
host_gene_num = colSums(host_data != 0)

Definition

Gene Expression Abundance: Number of genes expressed in cells

Viral Load: Sum of normalized viral gene expression values (refer to the article above)

R
plot_data=data.frame(virus_load = virus_data_sum,
                     host_gene_num = host_gene_num,
                    Sample = new_obj$Sample)
head(plot_data)
A data.frame: 6 × 3
virus_loadhost_gene_numSample
<dbl><int><fct>
AAACCTGAGATACACA-1_1 2.7265851425S150T
AAACCTGAGCTAACTC-1_1 4.2498121349S150T
AAACCTGAGGAGCGAG-1_1 8.1470581054S150T
AAACCTGAGGGAAACA-1_1 5.1707282017S150T
AAACCTGAGTCCCACG-1_113.314534 861S150T
AAACCTGAGTGAACAT-1_1 0.0000001308S150T

Plotting

R
p <- ggplot(plot_data, aes(x = virus_load, y = host_gene_num)) +
  geom_point(color = "#9C9C9C", size = 1.5) +
  geom_smooth(method = "lm", se = TRUE, color = "Firebrick") +  # Add linear fit line
  facet_wrap(~Sample) + # Facet by Sample
  theme_classic() +
  theme(
    axis.title = element_text(size = 15),  # Increase axis font size
    plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),  # Center title
  ) +
  labs(
    x = "Normalized Virus Load", # x-axis title
    y = "Host Detected Genes"# y-axis title
  ) +
  stat_cor(label.x = 20, label.y = 4000, # Position of correlation value
           size = 2, # Correlation value font size
           color = "brown") + # Add correlation coefficient 
    scale_x_continuous(expand = c(0,0)) + # Adjust x-axis range
    scale_y_continuous(expand = c(0,0),limits = range(plot_data$host_gene_num)) # Adjust y-axis range
p
output
\`geom_smooth()\` using formula = 'y ~ x'
Warning message:
Removed 152 rows containing missing values or values outside the scale range
(\`geom_smooth()\`).”

Save Image

R
ggsave(p, file = "correlation.png", width = 10, height = 10)
output
\`geom_smooth()\` using formula = 'y ~ x'
Warning message:
Removed 152 rows containing missing values or values outside the scale range
(\`geom_smooth()\`).”

END

0 comments·0 replies