CopyscAT Analysis: scATAC-seq-based CNV Inference and Tumor Identification
Background
Chromosomal Copy Number Variation (CNV) is a typical hallmark of tumor genomic instability, characterized by large-scale amplification or deletion of chromosomal segments. CNV drives tumor initiation, metastasis, and therapeutic resistance by altering the copy number of oncogenes (e.g., MYC) or tumor suppressor genes (e.g., TP53) and inducing gene dosage effects. Resolving CNV at single-cell resolution is critical for revealing tumor heterogeneity: relying on malignant cell-specific CNV events (e.g., Chromosome 7 amplification, Chromosome 10 deletion), tumor cells can be distinguished from non-malignant cells (e.g., T cells, fibroblasts) in complex microenvironments, thereby resolving subclone-specific epigenetic regulatory networks and their dynamic interactions with the microenvironment.
Copy-scAT is designed for CNV inference from scATAC-seq data. By integrating chromatin accessibility signals and genomic structural features, it constructs a high-precision CNV inference model at the single-cell level, enabling the identification of malignant and non-malignant cells and fine characterization of subclones.
Core Principles
- Focal CNV Detection:
- Uses a 1 Mbp binning strategy to correct technical biases and identify focal amplification events (e.g., EGFR double minutes, extrachromosomal DNA/ecDNA regions).
- Large-scale CNV Inference:
- Based on normalized chromosome arm-level signal intensity, combined with Gaussian mixture model clustering, to discriminate segmental (e.g., EGFR amplification) and whole chromosome arm (e.g., Chromosome 7 gain) copy number variations.
- Signal Integration and Correction:
- Identifies significant signal peaks through change point analysis, combined with normalized coverage matrix, to improve sensitivity and specificity of CNV detection.
Parameter Settings
Please mount the cloud platform workflow data to be analyzed first. Refer to the Jupyter tutorial for mounting instructions.
Specific parameter filling:
- rds: Mounted workflow-related rds file. RDS file is a standard Seurat object file, data has been preprocessed and multiple samples merged
- meta: Mounted workflow-related meta file, corresponding to the meta.data of the workflow rds file;
- species: Species information, supports human only;
- SAMplenames: Sample names. Note that to be compatible with SeekArc output fragment file naming, sample names need to include A, e.g., "joint14_A", "joint22_A", cannot just be "joint14", "joint22";
- reduction: Dimensionality reduction method. In visualization, maps each cell's CNV status to the reduction plot (umap/tsne/lsi/atacumap/atactsne/wnnumap/wnntsne) to intuitively compare CNV differences across cell types;
- celltype_col: Cell type column name in meta.data. Select the label corresponding to the cell type or cluster to be analyzed. If analyzing annotated cell types, select the corresponding label, e.g., CellAnnotation;
- group_col: Cell group column name in meta.data, sample grouping status, facilitates cell grouping display during CNV heatmap visualization
- resolution_cluster: Cell cluster column name in meta.data;
- minimumSegments: Minimum number of segments: Cell filtering threshold, defaults to keeping cells with ≥ 40 segments, recommended to use default parameters
- methods: Two choices: auto and reference. reference means using normal cells as reference to calculate CNV, auto means no normal cell reference needed;
- Normal_celltype: If methods="reference" above, select which specific cells to use as reference here. Generally recommended to use immune B cells or T cells, etc. as reference.
- file_path: Fragment files of samples to be analyzed, supports multiple samples;
# Parameters
rds = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/input.rds"
meta = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/meta.txt"
#outdir = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/output/"
species = "human"
samplenames = c("joint14_A","joint22_A")# To be compatible with SeekArc output fragment file naming, sample names need to include A, e.g., "joint14_A", "joint22_A", cannot just be "joint14", "joint22";
reduction = "wnnumap"
celltype_col = "CellAnnotation"
group_col = ""
resolution_cluster = "wknn_res.0.5_d30_l2_50"
minimumSegments = "40"
methods = "reference"
Normal_celltype = "T_cells"
file_path=c("/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6345-581728456711101050/joint14_A_fragments.tsv.gz","/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6345-581728456711101050/joint22_A_fragments.tsv.gz")Environment Preparation
Load R Packages
Please select the common_r environment for this integration tutorial
options(warn = -1)
suppressWarnings({
suppressMessages({
invisible(capture.output(library(CopyscAT)))
library(GenomicRanges)
library(Seurat, lib.loc = "/PROJ2/FLOAT/shumeng/apps/miniconda3/envs/Geneactivity/lib/R/library",quietly = TRUE)
library(tidyverse)
library(ggplot2)
library(ComplexHeatmap)
library(circlize)
library(RColorBrewer)
library(base64enc, quietly = TRUE)
library(DT, quietly = TRUE)
library(htmltools)
library(dplyr)
library(data.table)
source("/PROJ2/FLOAT/advanced-analysis/Seek_module_dev/CopyscAT/Module/bin/cnv_caller_functions_packaged.R")
})
})
# 5. Load genome database based on species
suppressMessages({
if(species=="human"){
library(BSgenome.Hsapiens.UCSC.hg38, quietly = TRUE)
} else {
message("Please install the corresponding species database")
}
})
options(warn = 0)genomeFilepath="/PROJ2/FLOAT/advanced-analysis/Seek_module_dev/CopyscAT/Module/bin/"
dir.create("./result", recursive = TRUE, showWarnings = FALSE)
setOutputFile("./result","result")
resolution.cluster=resolution_clusterif (species == "human") {
genomeFile <- paste0(genomeFilepath, "hg38_chrom_sizes.tsv")
cytobandFile <- paste0(genomeFilepath, "hg38_1e+06_cytoband_densities_granges.tsv")
cpgFile = paste0(genomeFilepath, "hg38_1e+06_cpg_densities.tsv")
} else {
stop(paste0("Error: Unsupported species '", species, "'. Supported species are human."))
}Fragment File Preprocessing
Fragments files generated by SeekArctools or cellranger arc/ATAC contain fragment information for all barcodes, only a part of which are real cells. To save computational resources and maintain consistency with subsequent cell annotation, fragments need to be filtered before CNV analysis: only 'cell whitelist' barcodes that passed QC and are used for downstream analysis are retained. It is recommended to subset fragments based on established cell metadata/annotation tables (e.g., list of barcodes that passed QC and are included in analysis) to avoid introducing low-quality barcodes or non-cell background noise into CNV inference.
Fragment File Filtering
Fragments files generated by SeekArctools or cellranger arc/ATAC contain fragment information for all barcodes, only a part of which are real cells. To save computational resources and maintain consistency with subsequent cell annotation, fragments need to be filtered before CNV analysis: only 'cell whitelist' barcodes that passed QC and are used for downstream analysis are retained. It is recommended to subset fragments based on established cell metadata/annotation tables (e.g., list of barcodes that passed QC and are included in analysis) to avoid introducing low-quality barcodes or non-cell background noise into CNV inference.
meta <- read.table(meta,sep="\t",header=T) # Read mounted meta file, which contains barcode information for all samples in the mounted project data
Barcodes <- meta$barcodes # Extract all Barcodes
# Modify loop to save files separately for each sample
for(sample in samplenames){
file_path <- grep(pattern = sample, file_paths, value = TRUE) # Note this should be file_paths (plural)
frag <- fread(file_path, header=F)
# Fix column name assignment syntax error here
colnames(frag) <- c("chr", "start", "end", "Barcode", "reads") # Use <- for assignment
frag_filter <- frag[Barcode %in% Barcodes]
# Save files separately for each sample, naming format: {sample}_filtered_fragments.tsv
output_file <- paste0("./", sample, "_filtered_fragments.tsv")
write.table(frag_filter,
file = output_file,
quote = FALSE,
row.names = FALSE,
col.names = TRUE, # Write column names, as each file is independent
sep = "\t")
}Fragment File Bin Processing
Description
Before CopyscAT performs CNV analysis on scATAC-seq data, filtered fragments need further processing. Each chromosome is divided into 1000kb bins, and transposase cleavage events in each bin are counted to generate a new matrix file. Rows are cells, columns are chromosome bin IDs, e.g., chr1_0 represents merging the first 1000kb bases of chromosome 1 into a bin, '0' represents the starting base of this bin, and the value represents the total number of transposition events detected in this barcode in the corresponding bin. As shown below:
Cell_id chr1_0 chr1_1000000 chr1_2000000 chr1_3000000 chr1_4000000
GTAACCACGACCGCTAG_1 321.0 6057.0 4540.0 1942.0 2437.0
AACACACCGACCAGCGT_1 368.0 998.0 490.0 0.0 0.0
ACGTAGTATGTGAGGTC_1 363.0 1123.0 831.0 56.0 0.0
AACACACTACATGATTG_1 400.0 2629.0 476.0 658.0 172.0
TTACGCCCGACCTGCGA_1 261.0 1960.0 2450.0 1916.0 0.0
CAGGTATGCTTGACTGT_1 429.0 670.0 286.0 1018.0 0.0
AGTCAACGCTACCGACC_1 2264.0 14874.0 17016.0 11043.0 4900.0
CTTGAGACGACCGTGGT_1 921.0 4590.0 4726.0 1870.0 1255.0
CAACCATACATTTGCC_1 762.0 4810.0 6431.0 5360.0 827.0
Processing Method
Go to CopyscAT official github to copy process_fragment_file.py script and hg38_chrom_sizes.tsv file, link: https://github.com/spcdot/CopyscAT/blob/master/process_fragment_file.pyhttps://github.com/spcdot/CopyscAT/blob/master/hg38_references/hg38_chrom_sizes.tsv
Then set parameters as required, for example:
- python ./process_fragment_file.py -i ./
{SAMple}_filtered_output_1000kb.tsv -b 1000000 -f 0 -g ./hg38_chrom_sizes.tsv
CNV Analysis
Based on the aforementioned background and parameter settings, we begin executing the Copy-scAT CNV inference workflow. This workflow converts scATAC-seq fragment data into a genome-wide fixed-window coverage matrix and resolves chromosomal copy number variations at the single-cell level through a multi-level analysis strategy.
Execution Steps Overview
- Data Input: Use the processing result file ${SAMple}_filtered_output_1000kb.tsv from the previous part as input, supports batch processing of multiple samples.
- Focal CNV Detection: Identifies focal amplification events based on significant signal peaks (e.g., extrachromosomal DNA/ecDNA regions) in the normalized coverage matrix via change point analysis.
- Large-scale CNV Inference: Uses normalized chromosome arm-level signal intensity combined with GMM clustering to distinguish segmental (e.g., EGFR amplification) and whole-arm (e.g., Chr 7 gain) CNVs.
- Result Integration: Distinguishes malignant from non-malignant cell populations, outputs single-cell CNV atlas and subclonal epigenetic feature association analysis.
Key Parameters Description
- Binning Strategy: Adopts a 1 Mbp binning strategy to balance CNV resolution and statistical robustness
- Signal Correction: Corrects technical biases through normalization to improve CNV detection accuracy
- Clustering Analysis: Combines Gaussian Mixture Models to support joint detection of focal and large-scale CNVs
- Quality Control: Automatically filters low-quality signals to ensure result reliability
suppressWarnings({
suppressMessages({
sc=readRDS(rds)
meta=read.table(meta, sep = "\t", header = T)
rownames(meta)=meta$barcodes
sc=AddMetaData(sc, metadata=meta)
sc@meta.data$Sample <- gsub("_arc$", "_A", sc@meta.data$Sample)
sc@meta.data$num <- sub("^.*_(\\d+)$", "\\1", sc@meta.data$barcodes)
})
})if (length(samplenames) > 1) {
inputdata <- list()
# Dynamically generate cellSuffix based on samplename length
suffixes <- paste0("_", seq_along(samplenames)) # Generate e.g. "_1", "_2", ..., "_n"
# Initialize environment (adjust cellSuffix dynamically based on sample count)
initialiseEnvironment(
genomeFile = genomeFile,
cytobandFile = cytobandFile,
cpgFile = cpgFile,
binSize = 1e6,
minFrags = 1e4,
cellSuffix = suffixes, # Dynamically pass suffix
lowerTrim = 0.5,
upperTrim = 0.8
)
for (i in seq_along(samplenames)) {
inputdata[[i]] <- readInputTable(paste0(outdir, samplenames[[i]], "_output_1000kb.tsv"))
inputdata[[i]]$Barcode=rownames(inputdata[[i]])
}
scData <- rbindlist(inputdata)
scData <- as.data.frame(scData)
rownames(scData)=scData$Barcode
#setDT(scData, keep.rownames = TRUE)
scData$Barcode=NULL
} else {
# Use "_1" directly for single sample
initialiseEnvironment(
genomeFile = genomeFile,
cytobandFile = cytobandFile,
cpgFile = cpgFile,
binSize = 1e6,
minFrags = 1e4,
cellSuffix = "_1", # Fixed suffix for single sample
lowerTrim = 0.5,
upperTrim = 0.8
)
scData <- readInputTable(paste0(outdir, samplenames, "_output_1000kb.tsv"))
rownames(scData) <- paste0(rownames(scData), "_1")
sc@meta.data$barcodes=paste0(sc@meta.data$barcodes,"_1")
}#PART 1: INITIAL DATA NORMALIZATION
#step 1 normalize the matrix
suppressWarnings({
suppressMessages({
scData_k_norm <- normalizeMatrixN(scData,logNorm = FALSE,maxZero=2000,imputeZeros = FALSE,blacklistProp = 0.8,blacklistCutoff=125,dividingFactor=1,upperFilterQuantile = 0.95)
#collapse into chromosome arm level
summaryFunction<-cutAverage
scData_collapse<-collapseChrom3N(scData_k_norm,summaryFunction=summaryFunction,binExpand = 1,minimumChromValue = 100,logTrans = FALSE,tssEnrich = 1,logBase=2,minCPG=300,powVal=0.73)
})
})options(repr.plot.width = 12, repr.plot.height = 6)
suppressWarnings({
suppressMessages({
#apply additional filters
scData_collapse<-filterCells(scData_collapse,minimumSegments = minimumSegments,minDensity =0.1)
cells=rownames(sc@meta.data[which(sc@meta.data$barcodes %in% colnames(scData_collapse)),])
sc_filter=subset(sc, cells=cells)
#show unscaled chromosome list
graphCNVDistribution(scData_collapse,outputSuffix = ("_violinsn2"))
#compute centers
median_iqr <- computeCenters(scData_collapse,summaryFunction=summaryFunction)
})
})options(warn = -1)
suppressWarnings({
suppressMessages({
nmf_results<-identifyNonNeoplastic(scData_collapse,methodHclust="ward.D",cutHeight = 0.4)
#?identifyNonNeoplastic
write.table(x=rownames_to_column(data.frame(nmf_results$cellAssigns),var="Barcode"),file=str_c(scCNVCaller$locPrefix,scCNVCaller$outPrefix,"_nmf_clusters.csv"),quote=FALSE,row.names = FALSE,sep=",")
#print(paste("Normal cluster is: ",nmf_results$clusterNormal))
})
})
options(warn = 0)options(warn = -1)
suppressWarnings({
suppressMessages({
#PART 2: ASSESSMENT OF CHROMOSOME-LEVEL CNVs
if(methods == "auto"){
#print("methods is auto")
#OPTION 1: identify chromosome-level amplifications using all cells to generate 'normal' control
#identify chromosome-level amplifications
candidate_cnvs<-identifyCNVClusters(scData_collapse,median_iqr,useDummyCells = TRUE,propDummy=0.25,minMix=0.01,deltaMean = 0.03,deltaBIC2 = 0.25,bicMinimum = 0.1, subsetSize=600,fakeCellSD = 0.08, uncertaintyCutoff = 0.55,summaryFunction=summaryFunction,maxClust = 4,mergeCutoff = 3,IQRCutoff= 0.2,medianQuantileCutoff = 0.4)
#cleanup step
candidate_cnvs_clean<-clusterCNV(initialResultList = candidate_cnvs,medianIQR = candidate_cnvs[[3]],minDiff=1.5) #= 1.5)
#final results and annotation
final_cnv_list<-annotateCNV4(candidate_cnvs_clean, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.5,filterResults=TRUE,filterRange=0.8)
}else if(methods == "reference"){
#print("methods is reference")
Idents(sc_filter)=sc_filter@meta.data[[celltype_col]]
table(Idents(sc_filter))
Normal_sc=subset(sc_filter, idents=Normal_celltype)
Normal_cells=Normal_sc@meta.data$barcodes
median_iqr <- computeCenters(scData_collapse %>% dplyr::select(chrom,Normal_cells),summaryFunction=summaryFunction)
#setting medianQuantileCutoff to -1 and feeding non-neoplastic barcodes in as normalCells can improve accuracy of CNV calls
candidate_cnvs<-identifyCNVClusters(scData_collapse,median_iqr,useDummyCells = TRUE,propDummy=0.25,minMix=0.01,deltaMean = 0.03,deltaBIC2 = 0.25,bicMinimum = 0.1, subsetSize=800,fakeCellSD = 0.09, uncertaintyCutoff = 0.65,summaryFunction=summaryFunction,maxClust = 4,mergeCutoff = 3,IQRCutoff = 0.25,medianQuantileCutoff = -1,normalCells=Normal_cells)
candidate_cnvs_clean<-clusterCNV(initialResultList = candidate_cnvs,medianIQR = candidate_cnvs[[3]],minDiff=1.0) #= 1.5)
#to save this data you can use annotateCNV4 as per usual
#final_cnv_list<-annotateCNV4(candidate_cnvs_clean, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.6,filterResults=TRUE,filterRange=0.4)
final_cnv_list<-annotateCNV4B(candidate_cnvs_clean, Normal_cells, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.6,filterResults=TRUE,filterRange=0.4,minAlteredCellProp = 0.5)
}
})
})
options(warn = 0)Heatmap Visualization
Heatmap visualization of cell CNV status. Copy number values around 2 indicate normal diploidy; values between 1-2 indicate chromosomal loss (closer to 1 means more severe loss); values between 2-3 indicate chromosomal amplification (closer to 3 means more severe amplification). The left sidebar of the heatmap arranges cells by cell type and clustering. Each row represents a cell, each column a chromosome arm. Heatmap color represents CNV score: redder means more severe amplification, bluer means more severe loss.
options(warn = -1)
ht_opt$message <- FALSE
# 1. Read Data
cnv_data <- final_cnv_list[[3]]
cnv_matrix <- cnv_data %>%
tibble::column_to_rownames("rowname") %>%
as.matrix()
# 2. Extract Metadata
if(nchar(group_col) > 0){
if(nchar(celltype_col) > 0){
metadata <- sc_filter@meta.data[,c("raw_Sample", celltype_col, "barcode", resolution.cluster, group_col)]
} else {
metadata <- sc_filter@meta.data[,c("raw_Sample", "CellAnnotation", "barcode", resolution.cluster, group_col)]
}
} else {
if(nchar(celltype_col) > 0){
metadata <- sc_filter@meta.data[,c("raw_Sample", celltype_col, "barcode", resolution.cluster)]
} else {
metadata <- sc_filter@meta.data[,c("raw_Sample", "CellAnnotation", "barcode", resolution.cluster)]
}
}
# 3. Dynamically Generate Colors - Fix all colorRampPalette functions
# Colors for Group
if(nchar(group_col) > 0) {
n_groups <- length(unique(metadata[[group_col]]))
group_colors <- colorRampPalette(brewer.pal(min(9, 3), "Set1"))(n_groups)
}
# Colors for Cluster
n_clusters <- length(unique(metadata[[resolution.cluster]]))
clusters_colors <- colorRampPalette(brewer.pal(min(8, 3), "Set2"))(n_clusters)
# Colors for Sample
n_samples <- length(unique(metadata$Sample))
sample_colors <- colorRampPalette(brewer.pal(min(12, 3), "Paired"))(n_samples)
# Colors for Cell type
if(nchar(celltype_col) > 0) {
n_celltypes <- length(unique(metadata[[celltype_col]]))
cell_type_colors <- colorRampPalette(brewer.pal(min(12, 3), "Set3"))(n_celltypes)
} else {
n_celltypes <- length(unique(metadata$CellAnnotation))
cell_type_colors <- colorRampPalette(brewer.pal(min(12, 3), "Set3"))(n_celltypes)
}
# 4. Prepare Heatmap Color Mapping
temp <- log(cnv_matrix + 0.01)
d <- parallelDist::parallelDist(temp)
hc_re <- hclust(d, method = "ward.D2")
color_thre = c(min(cnv_matrix), max(cnv_matrix))
c <- colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7, name = "RdYlBu")))(100)
colormap <- circlize::colorRamp2(c(1:100) * (color_thre[2] - color_thre[1]) * 0.01 + color_thre[1], c)
# 5. Create Annotations - Hide Cell_type legend
if(nchar(group_col) > 0){
if(nchar(celltype_col) > 0){
ha = rowAnnotation(
Sample = metadata$Sample,
Group = metadata[[group_col]],
Cell_type = metadata[[celltype_col]],
Cluster = metadata[[resolution.cluster]],
col = list(
Group = setNames(group_colors, unique(metadata[[group_col]])),
Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
Sample = setNames(sample_colors, unique(metadata$Sample)),
Cell_type = setNames(cell_type_colors, unique(metadata[[celltype_col]]))
),
show_annotation_name = TRUE,
show_legend = c(TRUE, TRUE, TRUE, TRUE) # Hide Cell_type legend
)
} else {
ha = rowAnnotation(
Sample = metadata$Sample,
Group = metadata[[group_col]],
Cell_type = metadata$CellAnnotation,
Cluster = metadata[[resolution.cluster]],
col = list(
Group = setNames(group_colors, unique(metadata[[group_col]])),
Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
Sample = setNames(sample_colors, unique(metadata$Sample)),
Cell_type = setNames(cell_type_colors, unique(metadata$CellAnnotation))
),
show_annotation_name = TRUE,
show_legend = c(TRUE, TRUE, TRUE, TRUE) # Hide Cell_type legend
)
}
} else {
if(nchar(celltype_col) > 0){
ha = rowAnnotation(
Sample = metadata$Sample,
Cell_type = metadata[[celltype_col]],
Cluster = metadata[[resolution.cluster]],
col = list(
Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
Sample = setNames(sample_colors, unique(metadata$Sample)),
Cell_type = setNames(cell_type_colors, unique(metadata[[celltype_col]]))
),
show_annotation_name = TRUE,
show_legend = c(TRUE, TRUE, TRUE) # Hide Cell_type legend
)
} else {
ha = rowAnnotation(
Sample = metadata$Sample,
Cell_type = metadata$CellAnnotation,
Cluster = metadata[[resolution.cluster]],
col = list(
Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
Sample = setNames(sample_colors, unique(metadata$Sample)),
Cell_type = setNames(cell_type_colors, unique(metadata$CellAnnotation))
),
show_annotation_name = TRUE,
show_legend = c(TRUE, TRUE, TRUE) # Hide Cell_type legend
)
}
}
# 6. Plot Heatmap
ht <- Heatmap(
cnv_matrix,
name = "Copy Number",
col = colormap,
left_annotation = ha,
show_row_names = FALSE,
cluster_rows = TRUE,
cluster_columns = FALSE,
row_split = metadata[,2],
use_raster = TRUE,
row_title = NULL,
column_names_rot = 90,
show_row_dend = FALSE,
border = FALSE,
column_names_gp = gpar(fontsize = 8),
heatmap_legend_param = list(
title = "Copy number",
at = c(1, 1.5, 2, 2.5, 3),
legend_height = unit(4, "cm"),
grid_width = unit(0.5, "cm")
)
)
# 8. Save Image
invisible(pdf("./result/cnv_heatmap_annotated.pdf", width = 15, height = 10))
invisible(draw(ht))
invisible(dev.off())
draw(ht)
options(warn = 0)Visualize CNV score for each chromosome
CNV scores for each chromosomal region are graded (<1.5 as loss (1), 1.5-2.5 as normal (2), >2.5 as amplification (3)). By mapping the CNV scores of each chromosome for each cell onto a UMAP plot, one can intuitively see which cell clusters have significant chromosomal CNV on which chromosomes.
options(warn = -1)
suppressWarnings({
suppressMessages({
celltype_colors <- c(
"1" = "#7A94F4", # Blue
"2" = "#E4EAEA", # Orange
"3" = "#F769A3"
)
library(RColorBrewer)
library(scales)
base_colors <- c("#EDD496","#65D1BF","#7ADAEA","#F2AFAF","#EA8F8F",
"#C1A9D3","#F2DA9E","#D9EF78","#EF9ED9","#C9BCC6","#8BD198")
generate_cols <- function(n) {
if (n <= 0) return(character(0))
if (n <= length(base_colors)) {
base_colors[1:n]
} else {
colorRampPalette(base_colors)(n)
}
}
# Create directory to save results
output_dir <- file.path("./result", "cnv_plots")
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
# Define function to save and encode images
save_and_encode_plot <- function(plot, filename, width, height) {
# Save image file
full_path_pdf <- paste0(filename, ".pdf")
temp_png <- tempfile(fileext = ".png")
# Save PDF
ggsave(full_path_pdf, plot, width = width, height = height)
# Save temp PNG for base64 encoding
ggsave(temp_png, plot, width = width, height = height, dpi = 300)
# Convert image to base64
png_data <- readBin(temp_png, "raw", file.info(temp_png)$size)
base64_img <- base64encode(png_data)
# Delete temp PNG file
unlink(temp_png)
return(list(
base64 = base64_img,
pdf_path = full_path_pdf
))
}
# Generate cell type annotation colors
if (nchar(celltype_col) > 0) {
unique_celltype <- sort(unique(sc_filter@meta.data[[celltype_col]]))
clusters_cols <- generate_cols(length(unique_celltype))
identity_colors <- setNames(clusters_cols, unique_celltype)
celltype_umap <- DimPlot(sc_filter,
group.by = celltype_col,
cols = identity_colors,
label = TRUE,
reduction = reduction) + NoLegend()
} else {
unique_celltype <- sort(unique(sc_filter@meta.data[["CellAnnotation"]]))
clusters_cols <- generate_cols(length(unique_celltype))
identity_colors <- setNames(clusters_cols, unique_celltype)
celltype_umap <- DimPlot(sc_filter,
group.by = "CellAnnotation",
cols = identity_colors,
label = TRUE,
reduction = reduction) + NoLegend()
}
# Generate cluster annotation colors
unique_clusters <- sort(unique(sc_filter@meta.data[[resolution.cluster]]))
clusters_cols <- generate_cols(length(unique_clusters))
identity_colors <- setNames(clusters_cols, unique_clusters)
cluster_umap <- DimPlot(sc_filter,
group.by = resolution.cluster,
cols = identity_colors,
label = TRUE,
reduction = reduction) + NoLegend()
plot_filename2 <- file.path(output_dir, paste0(resolution.cluster, "_umap"))
plot_filename3 <- file.path(output_dir, paste0("celltype", "_umap"))
result2 <- save_and_encode_plot(
plot = cluster_umap,
filename = plot_filename2,
width = 5,
height = 4
)
result3 <- save_and_encode_plot(
plot = celltype_umap,
filename = plot_filename3,
width = 5,
height = 4
)
# Create dataframe for display
image_df <- data.frame(
Chromosome = character(),
CNV_UMAP = character(),
cell_culster=character(),
celltype_culster=character(),
stringsAsFactors = FALSE
)
# Initialize storage list
char <- list()
# Iterate over each chromosome column
for(i in colnames(cnv_matrix)) {
# Process single chromosome
char[[i]] <- cnv_data %>%
dplyr::select(rowname, i) %>%
mutate(
copy_number = case_when(
.data[[i]] < 1.5 ~ "1",
.data[[i]] >= 1.5 & .data[[i]] <= 2.5 ~ "2",
.data[[i]] > 2.5 ~ "3"
)
)
colnames(char[[i]]) <- c("Barcodes", paste0(i,"_scores"), i)
rownames(char[[i]]) <- char[[i]]$Barcodes
char[[i]] <- char[[i]][,-1]
sc_filter <- AddMetaData(sc_filter, metadata=char[[i]])
# Prepare plotting data
max.col=max(cnv_matrix)
#p=FeaturePlot(sc_filter, features=paste0(i,"_scores"),min.cutoff = 0,max.cutoff = max.col,raster=T)
p=DimPlot(sc_filter, group.by=i,cols=celltype_colors,reduction = reduction)
# Save image and get base64 encoding
plot_filename <- file.path(output_dir, paste0(i, "_umap"))
result <- save_and_encode_plot(
plot = p,
filename = plot_filename,
width = 5,
height = 4
)
# Add to dataframe
img_tag <- sprintf(
'<a href="%s" target="_blank">
<img src="/placeholder.svg" width="400px"/>
</a>',
result$pdf_path,
result$base64
)
img_tag2 <- sprintf(
'<a href="%s" target="_blank">
<img src="/placeholder.svg" width="400px"/>
</a>',
result2$pdf_path,
result2$base64
)
img_tag3 <- sprintf(
'<a href="%s" target="_blank">
<img src="/placeholder.svg" width="400px"/>
</a>',
result3$pdf_path,
result3$base64
)
image_df <- rbind(image_df, data.frame(
Chromosome = i,
CNV_UMAP = img_tag,
cluster_UMAP = img_tag2,
celltype_UMAP = img_tag3
))
# Also save normal PDF (same as original)
#ggsave(paste0(outdir, i, "_umap.pdf"), plot = p, width = 5, height = 4)
}
# Create interactive table to display results
# 6. Create Interactive Table
cnv_table <- DT::datatable(
image_df,
escape = FALSE,
options = list(
pageLength = 1,
autoWidth = TRUE,
scrollX = TRUE,
dom = 'Bfrtip',
server = TRUE, # Force server-side mode
deferRender = TRUE, # Defer render
scroller = TRUE, # Scroll loading
columnDefs = list(
# Only for first two columns (index 0 and 1)
list(targets = 0, width = "4%"),
list(targets = 1, width = "32%"),
list(targets = 2, width = "32%"),
list(targets = 2, width = "32%")
)
),
rownames = FALSE,
filter = 'top',
caption = htmltools::tags$caption(
style = 'caption-side: top; text-align: center; font-size: 1.2em; font-weight: bold;',
'Chromosomal Region CNV UMAP Visualization'
)
)
# Display table
cnv_table
})
})
options(warn = 0)Result Files
├── cnv_plots
│ ├── chr*_umap.pdf Displays CNV variation for each chromosome arm
└── cnv_heatmap_annotated.pdf CNV Heatmap Display
Result Directory: Includes all images and table files involved in this analysis
Literature Case Analysis
《Single-cell multi-omics sequencing uncovers region-specific plasticity of glioblastoma for complementary therapeutic targeting》
《Single-cell landscape of primary central nervous system diffuse large B-cell lymphoma》
In this study, the authors used inferCNV for CNV analysis of scRNA-seq and Copy-scAT for CNV analysis of scATAC-seq to distinguish between tumor and non-tumor cells
References
[1] Nikolic A, Singhal D, Ellestad K, et al.Copy-scAT: Deconvoluting single-cell chromatin accessibility of genetic subclones in cancer[J].Science advances, 2021, 7(42):eabg6045.DOI: 10.1126/sciadv.abg6045.
Appendix
.txt : Result data table file, tab-separated. Unix/Linux/Mac users can use less or more commands; Windows users can use advanced text editors like Notepad++ or Microsoft Excel.
.pdf : Result image file, vector graphics, can be zoomed without distortion, convenient for viewing and editing (e.g., with Adobe Illustrator) for publication.
.rds : Seurat object containing gene activity assay, needs to be opened in R environment for viewing and further analysis.
