CopyscAT Analysis: scATAC-seq-based CNV Inference and Tumor Identification

Author: SeekGene

Time: 21 min

Words: 4.0k words

Updated: 2026-02-28

Reads: 0 times

scATAC + RNA-seq Analysis Guide Notebooks Copy Number Variation Analysis

Background

Chromosomal Copy Number Variation (CNV) is a typical hallmark of tumor genomic instability, characterized by large-scale amplification or deletion of chromosomal segments. CNV drives tumor initiation, metastasis, and therapeutic resistance by altering the copy number of oncogenes (e.g., MYC) or tumor suppressor genes (e.g., TP53) and inducing gene dosage effects. Resolving CNV at single-cell resolution is critical for revealing tumor heterogeneity: relying on malignant cell-specific CNV events (e.g., Chromosome 7 amplification, Chromosome 10 deletion), tumor cells can be distinguished from non-malignant cells (e.g., T cells, fibroblasts) in complex microenvironments, thereby resolving subclone-specific epigenetic regulatory networks and their dynamic interactions with the microenvironment.

Copy-scAT is designed for CNV inference from scATAC-seq data. By integrating chromatin accessibility signals and genomic structural features, it constructs a high-precision CNV inference model at the single-cell level, enabling the identification of malignant and non-malignant cells and fine characterization of subclones.

Core Principles

Focal CNV Detection:
- Uses a 1 Mbp binning strategy to correct technical biases and identify focal amplification events (e.g., EGFR double minutes, extrachromosomal DNA/ecDNA regions).
Large-scale CNV Inference:
- Based on normalized chromosome arm-level signal intensity, combined with Gaussian mixture model clustering, to discriminate segmental (e.g., EGFR amplification) and whole chromosome arm (e.g., Chromosome 7 gain) copy number variations.
Signal Integration and Correction:
- Identifies significant signal peaks through change point analysis, combined with normalized coverage matrix, to improve sensitivity and specificity of CNV detection.

Parameter Settings

Please mount the cloud platform workflow data to be analyzed first. Refer to the Jupyter tutorial for mounting instructions.

Specific parameter filling:

rds: Mounted workflow-related rds file. RDS file is a standard Seurat object file, data has been preprocessed and multiple samples merged
meta: Mounted workflow-related meta file, corresponding to the meta.data of the workflow rds file;
species: Species information, supports human only;
SAMplenames: Sample names. Note that to be compatible with SeekArc output fragment file naming, sample names need to include A, e.g., "joint14_A", "joint22_A", cannot just be "joint14", "joint22";
reduction: Dimensionality reduction method. In visualization, maps each cell's CNV status to the reduction plot (umap/tsne/lsi/atacumap/atactsne/wnnumap/wnntsne) to intuitively compare CNV differences across cell types;
celltype_col: Cell type column name in meta.data. Select the label corresponding to the cell type or cluster to be analyzed. If analyzing annotated cell types, select the corresponding label, e.g., CellAnnotation;
group_col: Cell group column name in meta.data, sample grouping status, facilitates cell grouping display during CNV heatmap visualization
resolution_cluster: Cell cluster column name in meta.data;
minimumSegments: Minimum number of segments: Cell filtering threshold, defaults to keeping cells with ≥ 40 segments, recommended to use default parameters
methods: Two choices: auto and reference. reference means using normal cells as reference to calculate CNV, auto means no normal cell reference needed;
Normal_celltype: If methods="reference" above, select which specific cells to use as reference here. Generally recommended to use immune B cells or T cells, etc. as reference.
file_path: Fragment files of samples to be analyzed, supports multiple samples;

# Parameters
rds = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/input.rds"
meta = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/meta.txt"
#outdir = "/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6349-581745468808125050/output/"
species = "human"
samplenames = c("joint14_A","joint22_A")# To be compatible with SeekArc output fragment file naming, sample names need to include A, e.g., "joint14_A", "joint22_A", cannot just be "joint14", "joint22";
reduction = "wnnumap"
celltype_col = "CellAnnotation"
group_col = ""
resolution_cluster = "wknn_res.0.5_d30_l2_50"
minimumSegments = "40"
methods = "reference"
Normal_celltype = "T_cells"
file_path=c("/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6345-581728456711101050/joint14_A_fragments.tsv.gz","/PROJ2/FLOAT/advanced-analysis/test/2388-5504-6345-581728456711101050/joint22_A_fragments.tsv.gz")

Environment Preparation

Load R Packages

Please select the common_r environment for this integration tutorial

options(warn = -1)
suppressWarnings({
  suppressMessages({
      invisible(capture.output(library(CopyscAT)))
      library(GenomicRanges)
      library(Seurat, lib.loc = "/PROJ2/FLOAT/shumeng/apps/miniconda3/envs/Geneactivity/lib/R/library",quietly = TRUE)
      library(tidyverse)
      library(ggplot2)
      library(ComplexHeatmap)
      library(circlize)
      library(RColorBrewer)
      library(base64enc, quietly = TRUE)
      library(DT, quietly = TRUE)
      library(htmltools)
      library(dplyr)
      library(data.table)
      source("/PROJ2/FLOAT/advanced-analysis/Seek_module_dev/CopyscAT/Module/bin/cnv_caller_functions_packaged.R")
  })
})  
# 5. Load genome database based on species
suppressMessages({
    if(species=="human"){
        library(BSgenome.Hsapiens.UCSC.hg38, quietly = TRUE)
    } else {
        message("Please install the corresponding species database")
    }
})
options(warn = 0)

genomeFilepath="/PROJ2/FLOAT/advanced-analysis/Seek_module_dev/CopyscAT/Module/bin/"
dir.create("./result", recursive = TRUE, showWarnings = FALSE)
setOutputFile("./result","result")
resolution.cluster=resolution_cluster

if (species == "human") {
    genomeFile <- paste0(genomeFilepath, "hg38_chrom_sizes.tsv")
    cytobandFile <- paste0(genomeFilepath, "hg38_1e+06_cytoband_densities_granges.tsv")
    cpgFile = paste0(genomeFilepath, "hg38_1e+06_cpg_densities.tsv")
} else {
  stop(paste0("Error: Unsupported species '", species, "'. Supported species are human."))
}

Fragment File Preprocessing

Fragments files generated by SeekArctools or cellranger arc/ATAC contain fragment information for all barcodes, only a part of which are real cells. To save computational resources and maintain consistency with subsequent cell annotation, fragments need to be filtered before CNV analysis: only 'cell whitelist' barcodes that passed QC and are used for downstream analysis are retained. It is recommended to subset fragments based on established cell metadata/annotation tables (e.g., list of barcodes that passed QC and are included in analysis) to avoid introducing low-quality barcodes or non-cell background noise into CNV inference.

Fragment File Filtering

meta <- read.table(meta,sep="\t",header=T)  # Read mounted meta file, which contains barcode information for all samples in the mounted project data
Barcodes <- meta$barcodes # Extract all Barcodes

# Modify loop to save files separately for each sample
for(sample in samplenames){
    file_path <- grep(pattern = sample, file_paths, value = TRUE) # Note this should be file_paths (plural)
    frag <- fread(file_path, header=F)
    
    # Fix column name assignment syntax error here
    colnames(frag) <- c("chr", "start", "end", "Barcode", "reads") # Use <- for assignment
    
    frag_filter <- frag[Barcode %in% Barcodes]
    
    # Save files separately for each sample, naming format: {sample}_filtered_fragments.tsv
    output_file <- paste0("./", sample, "_filtered_fragments.tsv")
    write.table(frag_filter, 
                file = output_file, 
                quote = FALSE, 
                row.names = FALSE, 
                col.names = TRUE, # Write column names, as each file is independent
                sep = "\t")
}

Fragment File Bin Processing

Description

Before CopyscAT performs CNV analysis on scATAC-seq data, filtered fragments need further processing. Each chromosome is divided into 1000kb bins, and transposase cleavage events in each bin are counted to generate a new matrix file. Rows are cells, columns are chromosome bin IDs, e.g., chr1_0 represents merging the first 1000kb bases of chromosome 1 into a bin, '0' represents the starting base of this bin, and the value represents the total number of transposition events detected in this barcode in the corresponding bin. As shown below:
Cell_id chr1_0 chr1_1000000 chr1_2000000 chr1_3000000 chr1_4000000
GTAACCACGACCGCTAG_1 321.0 6057.0 4540.0 1942.0 2437.0
AACACACCGACCAGCGT_1 368.0 998.0 490.0 0.0 0.0
ACGTAGTATGTGAGGTC_1 363.0 1123.0 831.0 56.0 0.0
AACACACTACATGATTG_1 400.0 2629.0 476.0 658.0 172.0
TTACGCCCGACCTGCGA_1 261.0 1960.0 2450.0 1916.0 0.0
CAGGTATGCTTGACTGT_1 429.0 670.0 286.0 1018.0 0.0
AGTCAACGCTACCGACC_1 2264.0 14874.0 17016.0 11043.0 4900.0
CTTGAGACGACCGTGGT_1 921.0 4590.0 4726.0 1870.0 1255.0
CAACCATACATTTGCC_1 762.0 4810.0 6431.0 5360.0 827.0

Processing Method

Go to CopyscAT official github to copy process_fragment_file.py script and hg38_chrom_sizes.tsv file, link: https://github.com/spcdot/CopyscAT/blob/master/process_fragment_file.py https://github.com/spcdot/CopyscAT/blob/master/hg38_references/hg38_chrom_sizes.tsv

Then set parameters as required, for example:

python ./process_fragment_file.py -i ./ ${S A M p l e}_{f} i l t e r e d_{f} r a g m e n t s . t s v - o . /$ {SAMple}_filtered_output_1000kb.tsv -b 1000000 -f 0 -g ./hg38_chrom_sizes.tsv

CNV Analysis

Based on the aforementioned background and parameter settings, we begin executing the Copy-scAT CNV inference workflow. This workflow converts scATAC-seq fragment data into a genome-wide fixed-window coverage matrix and resolves chromosomal copy number variations at the single-cell level through a multi-level analysis strategy.

Execution Steps Overview

Data Input: Use the processing result file ${SAMple}_filtered_output_1000kb.tsv from the previous part as input, supports batch processing of multiple samples.
Focal CNV Detection: Identifies focal amplification events based on significant signal peaks (e.g., extrachromosomal DNA/ecDNA regions) in the normalized coverage matrix via change point analysis.
Large-scale CNV Inference: Uses normalized chromosome arm-level signal intensity combined with GMM clustering to distinguish segmental (e.g., EGFR amplification) and whole-arm (e.g., Chr 7 gain) CNVs.
Result Integration: Distinguishes malignant from non-malignant cell populations, outputs single-cell CNV atlas and subclonal epigenetic feature association analysis.

Key Parameters Description

Binning Strategy: Adopts a 1 Mbp binning strategy to balance CNV resolution and statistical robustness
Signal Correction: Corrects technical biases through normalization to improve CNV detection accuracy
Clustering Analysis: Combines Gaussian Mixture Models to support joint detection of focal and large-scale CNVs
Quality Control: Automatically filters low-quality signals to ensure result reliability

suppressWarnings({
    suppressMessages({
        sc=readRDS(rds)
        meta=read.table(meta, sep = "\t", header = T)
        rownames(meta)=meta$barcodes
        sc=AddMetaData(sc, metadata=meta)
        sc@meta.data$Sample <- gsub("_arc$", "_A", sc@meta.data$Sample)
        sc@meta.data$num <- sub("^.*_(\\d+)$", "\\1", sc@meta.data$barcodes)
    })
})

if (length(samplenames) > 1) {
    inputdata <- list()
    # Dynamically generate cellSuffix based on samplename length
    suffixes <- paste0("_", seq_along(samplenames))  # Generate e.g. "_1", "_2", ..., "_n"
    # Initialize environment (adjust cellSuffix dynamically based on sample count)
    initialiseEnvironment(
        genomeFile = genomeFile,
        cytobandFile = cytobandFile,
        cpgFile = cpgFile,
        binSize = 1e6,
        minFrags = 1e4,
        cellSuffix = suffixes,  # Dynamically pass suffix
        lowerTrim = 0.5,
        upperTrim = 0.8
        )
    for (i in seq_along(samplenames)) {
        inputdata[[i]] <- readInputTable(paste0(outdir, samplenames[[i]], "_output_1000kb.tsv"))
        inputdata[[i]]$Barcode=rownames(inputdata[[i]])
    }
    scData <- rbindlist(inputdata)
    scData <- as.data.frame(scData)
    rownames(scData)=scData$Barcode
    #setDT(scData, keep.rownames = TRUE) 
    scData$Barcode=NULL
} else {
    # Use "_1" directly for single sample
    initialiseEnvironment(
        genomeFile = genomeFile,
        cytobandFile = cytobandFile,
        cpgFile = cpgFile,
        binSize = 1e6,
        minFrags = 1e4,
        cellSuffix = "_1",  # Fixed suffix for single sample
        lowerTrim = 0.5,
        upperTrim = 0.8
    )
    scData <- readInputTable(paste0(outdir, samplenames, "_output_1000kb.tsv"))
    rownames(scData) <- paste0(rownames(scData), "_1")
    sc@meta.data$barcodes=paste0(sc@meta.data$barcodes,"_1")
}

#PART 1: INITIAL DATA NORMALIZATION
#step 1 normalize the matrix
suppressWarnings({
  suppressMessages({
      scData_k_norm <- normalizeMatrixN(scData,logNorm = FALSE,maxZero=2000,imputeZeros = FALSE,blacklistProp = 0.8,blacklistCutoff=125,dividingFactor=1,upperFilterQuantile = 0.95)
      #collapse into chromosome arm level
      summaryFunction<-cutAverage
      scData_collapse<-collapseChrom3N(scData_k_norm,summaryFunction=summaryFunction,binExpand = 1,minimumChromValue = 100,logTrans = FALSE,tssEnrich = 1,logBase=2,minCPG=300,powVal=0.73) 
        })
})

options(repr.plot.width = 12, repr.plot.height = 6)
suppressWarnings({
  suppressMessages({
      #apply additional filters
      scData_collapse<-filterCells(scData_collapse,minimumSegments = minimumSegments,minDensity =0.1)
      cells=rownames(sc@meta.data[which(sc@meta.data$barcodes %in% colnames(scData_collapse)),])
      sc_filter=subset(sc, cells=cells)
      #show unscaled chromosome list
      graphCNVDistribution(scData_collapse,outputSuffix = ("_violinsn2"))
      #compute centers
      median_iqr <- computeCenters(scData_collapse,summaryFunction=summaryFunction)
              })
})

options(warn = -1)
suppressWarnings({
  suppressMessages({
      nmf_results<-identifyNonNeoplastic(scData_collapse,methodHclust="ward.D",cutHeight = 0.4)
      #?identifyNonNeoplastic
      write.table(x=rownames_to_column(data.frame(nmf_results$cellAssigns),var="Barcode"),file=str_c(scCNVCaller$locPrefix,scCNVCaller$outPrefix,"_nmf_clusters.csv"),quote=FALSE,row.names = FALSE,sep=",")
      #print(paste("Normal cluster is: ",nmf_results$clusterNormal))
                })
}) 
options(warn = 0)

options(warn = -1)
suppressWarnings({
  suppressMessages({
      #PART 2: ASSESSMENT OF CHROMOSOME-LEVEL CNVs 
      if(methods == "auto"){
          #print("methods is auto")
          #OPTION 1: identify chromosome-level amplifications using all cells to generate 'normal' control
          #identify chromosome-level amplifications
          candidate_cnvs<-identifyCNVClusters(scData_collapse,median_iqr,useDummyCells = TRUE,propDummy=0.25,minMix=0.01,deltaMean = 0.03,deltaBIC2 = 0.25,bicMinimum = 0.1, subsetSize=600,fakeCellSD = 0.08, uncertaintyCutoff = 0.55,summaryFunction=summaryFunction,maxClust = 4,mergeCutoff = 3,IQRCutoff= 0.2,medianQuantileCutoff = 0.4)
          #cleanup step
          candidate_cnvs_clean<-clusterCNV(initialResultList = candidate_cnvs,medianIQR = candidate_cnvs[[3]],minDiff=1.5) #= 1.5)
          #final results and annotation
          final_cnv_list<-annotateCNV4(candidate_cnvs_clean, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.5,filterResults=TRUE,filterRange=0.8)
      
      }else if(methods == "reference"){
          #print("methods is reference")
          Idents(sc_filter)=sc_filter@meta.data[[celltype_col]]
          table(Idents(sc_filter))
          Normal_sc=subset(sc_filter, idents=Normal_celltype)
          Normal_cells=Normal_sc@meta.data$barcodes
          median_iqr <- computeCenters(scData_collapse %>% dplyr::select(chrom,Normal_cells),summaryFunction=summaryFunction)
          #setting medianQuantileCutoff to -1 and feeding non-neoplastic barcodes in as normalCells can improve accuracy of CNV calls
          candidate_cnvs<-identifyCNVClusters(scData_collapse,median_iqr,useDummyCells = TRUE,propDummy=0.25,minMix=0.01,deltaMean = 0.03,deltaBIC2 = 0.25,bicMinimum = 0.1, subsetSize=800,fakeCellSD = 0.09, uncertaintyCutoff = 0.65,summaryFunction=summaryFunction,maxClust = 4,mergeCutoff = 3,IQRCutoff = 0.25,medianQuantileCutoff = -1,normalCells=Normal_cells) 
          candidate_cnvs_clean<-clusterCNV(initialResultList = candidate_cnvs,medianIQR = candidate_cnvs[[3]],minDiff=1.0) #= 1.5)          
          #to save this data you can use annotateCNV4 as per usual
          #final_cnv_list<-annotateCNV4(candidate_cnvs_clean, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.6,filterResults=TRUE,filterRange=0.4)
          final_cnv_list<-annotateCNV4B(candidate_cnvs_clean, Normal_cells, saveOutput=TRUE,outputSuffix = "clean_cnv",sdCNV = 0.6,filterResults=TRUE,filterRange=0.4,minAlteredCellProp = 0.5)

      }
          })
}) 
options(warn = 0)

Heatmap Visualization

Heatmap visualization of cell CNV status. Copy number values around 2 indicate normal diploidy; values between 1-2 indicate chromosomal loss (closer to 1 means more severe loss); values between 2-3 indicate chromosomal amplification (closer to 3 means more severe amplification). The left sidebar of the heatmap arranges cells by cell type and clustering. Each row represents a cell, each column a chromosome arm. Heatmap color represents CNV score: redder means more severe amplification, bluer means more severe loss.

options(warn = -1)
ht_opt$message <- FALSE

# 1. Read Data
cnv_data <- final_cnv_list[[3]]
cnv_matrix <- cnv_data %>%
    tibble::column_to_rownames("rowname") %>%
    as.matrix()

# 2. Extract Metadata
if(nchar(group_col) > 0){
  if(nchar(celltype_col) > 0){
    metadata <- sc_filter@meta.data[,c("raw_Sample", celltype_col, "barcode", resolution.cluster, group_col)]
  } else {
    metadata <- sc_filter@meta.data[,c("raw_Sample", "CellAnnotation", "barcode", resolution.cluster, group_col)]
  }
} else {
  if(nchar(celltype_col) > 0){
    metadata <- sc_filter@meta.data[,c("raw_Sample", celltype_col, "barcode", resolution.cluster)]
  } else {
    metadata <- sc_filter@meta.data[,c("raw_Sample", "CellAnnotation", "barcode", resolution.cluster)]
  }
}

# 3. Dynamically Generate Colors - Fix all colorRampPalette functions
# Colors for Group
if(nchar(group_col) > 0) {
  n_groups <- length(unique(metadata[[group_col]]))
  group_colors <- colorRampPalette(brewer.pal(min(9, 3), "Set1"))(n_groups)
}

# Colors for Cluster
n_clusters <- length(unique(metadata[[resolution.cluster]]))
clusters_colors <- colorRampPalette(brewer.pal(min(8, 3), "Set2"))(n_clusters)

# Colors for Sample
n_samples <- length(unique(metadata$Sample))
sample_colors <- colorRampPalette(brewer.pal(min(12, 3), "Paired"))(n_samples)

# Colors for Cell type
if(nchar(celltype_col) > 0) {
  n_celltypes <- length(unique(metadata[[celltype_col]]))
  cell_type_colors <- colorRampPalette(brewer.pal(min(12, 3), "Set3"))(n_celltypes)
} else {
  n_celltypes <- length(unique(metadata$CellAnnotation))
  cell_type_colors <- colorRampPalette(brewer.pal(min(12, 3), "Set3"))(n_celltypes)
}

# 4. Prepare Heatmap Color Mapping
temp <- log(cnv_matrix + 0.01)
d <- parallelDist::parallelDist(temp)
hc_re <- hclust(d, method = "ward.D2")
color_thre = c(min(cnv_matrix), max(cnv_matrix))
c <- colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7, name = "RdYlBu")))(100)
colormap <- circlize::colorRamp2(c(1:100) * (color_thre[2] - color_thre[1]) * 0.01 + color_thre[1], c)

# 5. Create Annotations - Hide Cell_type legend
if(nchar(group_col) > 0){
    if(nchar(celltype_col) > 0){
        ha = rowAnnotation(
            Sample = metadata$Sample,
            Group = metadata[[group_col]],
            Cell_type = metadata[[celltype_col]],
            Cluster = metadata[[resolution.cluster]],
            col = list(
                Group = setNames(group_colors, unique(metadata[[group_col]])),
                Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
                Sample = setNames(sample_colors, unique(metadata$Sample)),
                Cell_type = setNames(cell_type_colors, unique(metadata[[celltype_col]]))
            ),
            show_annotation_name = TRUE,
            show_legend = c(TRUE, TRUE, TRUE, TRUE)  # Hide Cell_type legend
        )
    } else {
        ha = rowAnnotation(
            Sample = metadata$Sample,
            Group = metadata[[group_col]],
            Cell_type = metadata$CellAnnotation,
            Cluster = metadata[[resolution.cluster]],
            col = list(
                Group = setNames(group_colors, unique(metadata[[group_col]])),
                Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
                Sample = setNames(sample_colors, unique(metadata$Sample)),
                Cell_type = setNames(cell_type_colors, unique(metadata$CellAnnotation))
            ),
            show_annotation_name = TRUE,
            show_legend = c(TRUE, TRUE, TRUE, TRUE)  # Hide Cell_type legend
        )
    }
} else {
    if(nchar(celltype_col) > 0){
        ha = rowAnnotation(
            Sample = metadata$Sample,
            Cell_type = metadata[[celltype_col]],
            Cluster = metadata[[resolution.cluster]],
            col = list(
                Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
                Sample = setNames(sample_colors, unique(metadata$Sample)),
                Cell_type = setNames(cell_type_colors, unique(metadata[[celltype_col]]))
            ),
            show_annotation_name = TRUE,
            show_legend = c(TRUE, TRUE, TRUE)  # Hide Cell_type legend
        )
    } else {
        ha = rowAnnotation(
            Sample = metadata$Sample,
            Cell_type = metadata$CellAnnotation,
            Cluster = metadata[[resolution.cluster]],
            col = list(
                Cluster = setNames(clusters_colors, unique(metadata[[resolution.cluster]])),
                Sample = setNames(sample_colors, unique(metadata$Sample)),
                Cell_type = setNames(cell_type_colors, unique(metadata$CellAnnotation))
            ),
            show_annotation_name = TRUE,
            show_legend = c(TRUE, TRUE, TRUE)  # Hide Cell_type legend
        )
    }
}

# 6. Plot Heatmap
ht <- Heatmap(
    cnv_matrix,
    name = "Copy Number",
    col = colormap,
    left_annotation = ha,
    show_row_names = FALSE,
    cluster_rows = TRUE,
    cluster_columns = FALSE,
    row_split = metadata[,2],
    use_raster = TRUE,
    row_title = NULL,
    column_names_rot = 90,
    show_row_dend = FALSE,
    border = FALSE,
    column_names_gp = gpar(fontsize = 8),
    heatmap_legend_param = list(
        title = "Copy number",
        at = c(1, 1.5, 2, 2.5, 3),
        legend_height = unit(4, "cm"),
        grid_width = unit(0.5, "cm")
    )
)
# 8. Save Image
invisible(pdf("./result/cnv_heatmap_annotated.pdf", width = 15, height = 10))
invisible(draw(ht))
invisible(dev.off())
draw(ht)
options(warn = 0)

Visualize CNV score for each chromosome

CNV scores for each chromosomal region are graded (<1.5 as loss (1), 1.5-2.5 as normal (2), >2.5 as amplification (3)). By mapping the CNV scores of each chromosome for each cell onto a UMAP plot, one can intuitively see which cell clusters have significant chromosomal CNV on which chromosomes.

options(warn = -1)
suppressWarnings({
  suppressMessages({

    celltype_colors <- c(
        "1" = "#7A94F4",          # Blue
        "2" = "#E4EAEA", # Orange
        "3" = "#F769A3"
    )
      library(RColorBrewer)
      library(scales)  
      base_colors <- c("#EDD496","#65D1BF","#7ADAEA","#F2AFAF","#EA8F8F",
                 "#C1A9D3","#F2DA9E","#D9EF78","#EF9ED9","#C9BCC6","#8BD198")

      generate_cols <- function(n) {
          if (n <= 0) return(character(0))
          if (n <= length(base_colors)) {
            base_colors[1:n]
          } else {
              colorRampPalette(base_colors)(n)
          }
      }
      # Create directory to save results
      output_dir <- file.path("./result", "cnv_plots")
      dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)

      # Define function to save and encode images
      save_and_encode_plot <- function(plot, filename, width, height) {
      # Save image file
      full_path_pdf <- paste0(filename, ".pdf")
      temp_png <- tempfile(fileext = ".png")
      
      # Save PDF
      ggsave(full_path_pdf, plot, width = width, height = height)
      # Save temp PNG for base64 encoding
      ggsave(temp_png, plot, width = width, height = height, dpi = 300)
      
      # Convert image to base64
      png_data <- readBin(temp_png, "raw", file.info(temp_png)$size)
      base64_img <- base64encode(png_data)
      
      # Delete temp PNG file
      unlink(temp_png)
      
      return(list(
        base64 = base64_img,
        pdf_path = full_path_pdf
      ))
    }
      # Generate cell type annotation colors
      if (nchar(celltype_col) > 0) {
      unique_celltype <- sort(unique(sc_filter@meta.data[[celltype_col]]))
      clusters_cols <- generate_cols(length(unique_celltype))
      identity_colors <- setNames(clusters_cols, unique_celltype)
      celltype_umap <- DimPlot(sc_filter, 
                          group.by = celltype_col, 
                          cols = identity_colors,
                          label = TRUE,
                          reduction = reduction) + NoLegend()
      } else {
          unique_celltype <- sort(unique(sc_filter@meta.data[["CellAnnotation"]]))
          clusters_cols <- generate_cols(length(unique_celltype))
          identity_colors <- setNames(clusters_cols, unique_celltype)
          celltype_umap <- DimPlot(sc_filter,
                          group.by = "CellAnnotation",
                          cols = identity_colors,
                          label = TRUE,
                          reduction = reduction) + NoLegend()
      }
      # Generate cluster annotation colors
      unique_clusters <- sort(unique(sc_filter@meta.data[[resolution.cluster]]))
      clusters_cols <- generate_cols(length(unique_clusters))
      identity_colors <- setNames(clusters_cols, unique_clusters)
      cluster_umap <- DimPlot(sc_filter,
                       group.by = resolution.cluster,
                       cols = identity_colors,
                       label = TRUE,
                       reduction = reduction) + NoLegend()
      plot_filename2 <- file.path(output_dir, paste0(resolution.cluster, "_umap"))
      plot_filename3 <- file.path(output_dir, paste0("celltype", "_umap"))
      result2 <- save_and_encode_plot(
          plot = cluster_umap,
          filename = plot_filename2,
          width = 5,
          height = 4
      )
      result3 <- save_and_encode_plot(
          plot = celltype_umap,
          filename = plot_filename3,
          width = 5,
          height = 4
      )
      # Create dataframe for display
      image_df <- data.frame(
        Chromosome = character(),
        CNV_UMAP = character(),
        cell_culster=character(),
        celltype_culster=character(),
        stringsAsFactors = FALSE
      )
      # Initialize storage list
      char <- list()
      # Iterate over each chromosome column
      for(i in colnames(cnv_matrix)) {
          # Process single chromosome
          char[[i]] <- cnv_data %>%
          dplyr::select(rowname, i) %>%
          mutate(
              copy_number = case_when(
              .data[[i]] < 1.5 ~ "1",
              .data[[i]] >= 1.5 & .data[[i]] <= 2.5 ~ "2",
              .data[[i]] > 2.5 ~ "3"
              )
          )
          colnames(char[[i]]) <- c("Barcodes", paste0(i,"_scores"), i)
          rownames(char[[i]]) <- char[[i]]$Barcodes
          char[[i]] <- char[[i]][,-1]
          sc_filter <- AddMetaData(sc_filter, metadata=char[[i]])
      
          # Prepare plotting data
          max.col=max(cnv_matrix)
          #p=FeaturePlot(sc_filter, features=paste0(i,"_scores"),min.cutoff = 0,max.cutoff = max.col,raster=T)
          p=DimPlot(sc_filter, group.by=i,cols=celltype_colors,reduction = reduction)
          # Save image and get base64 encoding
          plot_filename <- file.path(output_dir, paste0(i, "_umap"))
          result <- save_and_encode_plot(
            plot = p,
            filename = plot_filename,
            width = 5,
            height = 4
          )

          # Add to dataframe
          img_tag <- sprintf(
            '<a href="%s" target="_blank">
              <img src="/placeholder.svg" width="400px"/>
            </a>',
            result$pdf_path,
            result$base64
          )
        
          img_tag2 <- sprintf(
            '<a href="%s" target="_blank">
              <img src="/placeholder.svg" width="400px"/>
            </a>',
            result2$pdf_path,
            result2$base64
          )
           img_tag3 <- sprintf(
            '<a href="%s" target="_blank">
              <img src="/placeholder.svg" width="400px"/>
            </a>',
            result3$pdf_path,
            result3$base64
          )
          image_df <- rbind(image_df, data.frame(
            Chromosome = i,
            CNV_UMAP = img_tag,
            cluster_UMAP = img_tag2,
            celltype_UMAP = img_tag3
          ))
      
          # Also save normal PDF (same as original)
          #ggsave(paste0(outdir, i, "_umap.pdf"), plot = p, width = 5, height = 4)
        }
    
         # Create interactive table to display results
         # 6. Create Interactive Table
         cnv_table <- DT::datatable(
           image_df,
           escape = FALSE,
           options = list(
             pageLength = 1,
             autoWidth = TRUE,
             scrollX = TRUE,
             dom = 'Bfrtip',
             server = TRUE,       # Force server-side mode
             deferRender = TRUE,  # Defer render
             scroller = TRUE,     # Scroll loading
             columnDefs = list(
                   # Only for first two columns (index 0 and 1)
                   list(targets = 0, width = "4%"),
                   list(targets = 1, width = "32%"),
                   list(targets = 2, width = "32%"),
                   list(targets = 2, width = "32%")
               )
           ),
           rownames = FALSE,
           filter = 'top',
           caption = htmltools::tags$caption(
             style = 'caption-side: top; text-align: center; font-size: 1.2em; font-weight: bold;',
             'Chromosomal Region CNV UMAP Visualization'
           )
         )
    
         # Display table
         cnv_table
        })
})
options(warn = 0)

Result Files

├── cnv_plots
│ ├── chr*_umap.pdf Displays CNV variation for each chromosome arm
└── cnv_heatmap_annotated.pdf CNV Heatmap Display

Result Directory: Includes all images and table files involved in this analysis

Literature Case Analysis

《Single-cell multi-omics sequencing uncovers region-specific plasticity of glioblastoma for complementary therapeutic targeting》
《Single-cell landscape of primary central nervous system diffuse large B-cell lymphoma》
In this study, the authors used inferCNV for CNV analysis of scRNA-seq and Copy-scAT for CNV analysis of scATAC-seq to distinguish between tumor and non-tumor cells

References

[1] Nikolic A, Singhal D, Ellestad K, et al.Copy-scAT: Deconvoluting single-cell chromatin accessibility of genetic subclones in cancer[J].Science advances, 2021, 7(42):eabg6045.DOI: 10.1126/sciadv.abg6045.

Appendix

.txt : Result data table file, tab-separated. Unix/Linux/Mac users can use less or more commands; Windows users can use advanced text editors like Notepad++ or Microsoft Excel.

.pdf : Result image file, vector graphics, can be zoomed without distortion, convenient for viewing and editing (e.g., with Adobe Illustrator) for publication.

.rds : Seurat object containing gene activity assay, needs to be opened in R environment for viewing and further analysis.

CopyscAT Analysis: scATAC-seq-based CNV Inference and Tumor Identification ​

Background ​

Core Principles ​

Parameter Settings ​

Environment Preparation ​

Fragment File Preprocessing ​

Fragment File Filtering ​

Fragment File Bin Processing ​

Description ​

Processing Method ​

CNV Analysis ​

Execution Steps Overview ​

Key Parameters Description ​

Heatmap Visualization ​

Visualize CNV score for each chromosome ​

Result Files ​

Literature Case Analysis ​

References ​

Appendix ​