scATAC-seq Pseudotime Analysis: Chromatin Dynamic Trajectory Resolution
Background
The ATAC_Monocle3 analysis pipeline is designed for trajectory inference and developmental dynamics research of single-cell ATAC-seq data. Based on Signac/Seurat for ATAC data preprocessing and combined with Monocle3's pseudotime analysis capabilities, this pipeline reveals chromatin accessibility trajectories during development, differentiation, or response. By constructing cell state transition paths, ATAC_Monocle3 helps resolve dynamic opening of regulatory elements, activity changes of key transcription factors, and their impact on gene regulatory networks.
Core Principles
- Trajectory Inference: Uses Monocle3 to perform dimensionality reduction and clustering on high-dimensional ATAC-seq data to reconstruct pseudotime paths of cell state transitions.
- Dynamic Accessibility Analysis: Analyzes dynamic changes in chromatin open regions along pseudotime to identify regulatory elements related to cell fate determination.
- Regulatory Network Resolution: Combines gene activity and motif enrichment to infer the roles of key transcription factors and their regulatory networks in cell development.
Purpose and Significance
- Reveal chromatin remodeling rules during cell differentiation/development
- Identify key regulatory elements and transcription factors driving cell fate transitions
- Provide dynamic epigenetic evidence for fields like disease mechanisms and developmental biology
Parameter Settings
Please first mount the cloud platform process data to be analyzed. For mounting methods, please refer to the jupyter usage tutorial.
Specific parameter entry:
- rds: The process-related rds data mounted, located in the /home/{UserName}/workspace/project/{UserName}/ directory, e.g., /home/shumeng/workspace/data/AY1752167131383/
- meta: The metadata file in the same directory as rds
- outdir: Analysis result output path
- clusters_col: Cell type column name, select the cell type or cluster label to analyze. If analyzing annotated cell types, select the corresponding label, e.g., CellAnnotation, used with celltypes.
- celltypes: Cell types, multiple selection allowed. Select cell types or cluster results to analyze, such as Monocyte and Macrophage.
- root: The cluster or celltype used as the developmental starting point.
- SAMple_col: Sample classification label, such as SAMple, which is a column name in meta.data
- SAMples: Sample names used for analysis
- use_partition: Parameter passed to Monocle3 learn_graph, controls whether to learn the trajectory graph by partition. use_partition = TRUE (recommended, default): fits trajectories within each partition independently, without cross-partition connections, avoiding forcing connections between unrelated cells.
- downSAMple: Whether to downsample
- downSAMple_num: If downsampling, how many cells to downsample per cluster or celltype
# Parameters
outdir = "/PROJ2/FLOAT/advanced-analysis/prod/17801-49246-86550-580868389543631482/_build/html/"
rds = "/PROJ2/FLOAT/advanced-analysis/prod/17801-49246-86550-580868389543631482/input.rds"
meta = "/PROJ2/FLOAT/advanced-analysis/prod/17801-49246-86550-580868389543631482/meta.txt"
root = "9"
clusters_col = "resolution.0.5_d30"
celltypes = "9,2"
sample_col = "Sample"
samples = "NSC_0722_arc,TSC_0722_arc"
use_partition = "TRUE"
downsample = "TRUE"
downsample_num = "1000"Environment Preparation
R Package Loading
Please select the common_r environment for this integrated tutorial.
suppressPackageStartupMessages({
library(Signac)
library(Seurat)
library(SeuratWrappers)
library(monocle3)
#library(cicero)
library(Matrix)
library(ggplot2)
library(patchwork)
library(parallel)
})Monocle3 Trajectory Analysis
Monocle3[1] is a single-cell omics analysis framework developed by the Cole Trapnell Lab at the University of Washington, designed to reconstruct cell state transition trajectories and calculate pseudotime. Given a user-specified root cell, Monocle3 calculates the geodesic distance from the root to each cell based on the learned principal graph, defining pseudotime as a continuous scale to characterize the process position of cells during development, differentiation, or response. This allows sorting discrete cell states to reconstruct continuous biological processes. In the context of scATAC-seq, pseudotime reflects progressive changes in chromatin accessibility patterns, helping to reveal the temporal activity of key regulatory elements and transcription factors and the dynamic remodeling of gene regulatory networks.
outdir <- paste0(outdir,'/output')
dir.create(outdir, recursive = TRUE)
celltypes <- strsplit(celltypes,",")[[1]]
samples <- strsplit(samples,",")[[1]]
use_partition <- as.logical(use_partition)
downsample <- as.logical(downsample)
downsample_num <- as.numeric(downsample_num)Data Loading
Read input objects and metadata, complete barcode alignment, cell/sample filtering, and default Assay settings to prepare data for subsequent Monocle3 trajectory analysis.
- Read RDS and import metadata, align by barcode
- Filter objects by cell type and sample
- Set default Assay to ATAC, optional downsampling to control scale
obj <- readRDS(rds)
if (meta != ""){
meta <- read.table(meta,header=T,sep='\t',check.names=F)
rownames(meta) <- meta$barcodes
obj <- AddMetaData(obj, meta)
}
obj@meta.data[[clusters_col]] <- as.character(obj@meta.data[[clusters_col]])
obj <- subset(obj, subset = !!sym(clusters_col) %in% celltypes)
obj <- subset(obj, subset = !!sym(sample_col) %in% samples)
table(obj@meta.data[[clusters_col]])
DefaultAssay(obj) <- "ATAC"
Idents(obj) <- obj@meta.data[[clusters_col]]
if (downsample){
obj <-subset(obj,cells=WhichCells(obj,downsample=downsample_num))
}
table(obj@meta.data[[clusters_col]])CDS Construction and Pseudotime Inference
This section constructs a cell_data_set suitable for Monocle3, completing trajectory learning and pseudotime inference. Choose whether to enable use_partition based on data connectivity, and set root to determine the pseudotime direction after identifying the biological starting point; enable downSAMple to improve efficiency without changing the overall trend when cell numbers are large. Outputs include the learned trajectory graph, cell pseudotime, and visualizations of key branches.```
cds <- as.cell_data_set(obj)
cds <- detect_genes(cds)
cds <- cluster_cells(cds = cds, reduction_method = "UMAP")
cds <- learn_graph(cds, use_partition = use_partition)
root_cells=rownames(obj@meta.data[obj@meta.data[[clusters_col]]==root,])
cds <- order_cells(cds, reduction_method = "UMAP", root_cells = root_cells)
saveRDS(cds, file.path(outdir, 'cds.rds'))Monocle3 Result Visualization
Display and interpret core visualizations of trajectory inference to help understand the timing and branch structure of cell state transitions.
- Trajectory Plot (UMAP + principal graph): Overlay the learned trajectory backbone on UMAP to observe continuous transitions and branch directions in low-dimensional space.
- Pseudotime Distribution (Pseudotime on UMAP): Display cell pseudotime with a color gradient to assess if the process direction from the root cell aligns with biological expectations.
- Feature Trends: Display peak/gene activity or motif activity trends along pseudotime to identify stage-specific regulatory features.
Interpretation Points:
- Whether continuity and directionality are reasonable (affected by
rootselection). - Whether branches correspond to known cell subpopulations or state transitions.
- Partitions should not have hard connections (
use_partition=TRUE); cross-partition connections require careful interpretation. - Downsampling only changes point density, not the overall trajectory shape and trend.
p1=plot_cells(
cds = cds,
color_cells_by = "pseudotime",
label_cell_groups = F,
show_trajectory_graph = T
)
p2=plot_cells(
cds = cds,
color_cells_by = "pseudotime",
label_cell_groups = F,
show_trajectory_graph = F
)
p3=plot_cells(
cds = cds,
color_cells_by = clusters_col,
show_trajectory_graph = T,
label_cell_groups = F
)
p4=plot_cells(
cds = cds,
color_cells_by = clusters_col,
show_trajectory_graph = F,
label_cell_groups = F
)
ggsave(p1+p2, file = file.path(outdir, 'pseudotime.png'), width = 12, height = 5, dpi=300)
ggsave(p3+p4, file = file.path(outdir, 'celltype.png'), width = 12, height = 5, dpi=300)
ggsave(p1+p2, file = file.path(outdir, 'pseudotime.pdf'), width = 12, height = 5)
ggsave(p3+p4, file = file.path(outdir, 'celltype.pdf'), width = 12, height = 5)input_cds_lin <- cds[,is.finite(pseudotime(cds))]
pData(input_cds_lin)$Pseudotime <- pseudotime(input_cds_lin)
pData(input_cds_lin)$cell_subtype <- cut(pseudotime(input_cds_lin), 10)
binned_input_lin <- aggregate_by_cell_bin(input_cds_lin, "cell_subtype")
acc_fits <- fit_models(binned_input_lin, cores = ifelse(detectCores() > 32, 25, trunc(0.8*detectCores())),
model_formula_str = "~Pseudotime + num_genes_expressed" )
fit_coefs <- coefficient_table(acc_fits)
significant_sites <- subset(fit_coefs, term == "Pseudotime" & q_value < .05)important_cols <- c("site_name", "estimate", "normalized_effect", "p_value", "q_value")
important_data <- significant_sites[, important_cols]
write.csv(important_data, file.path(outdir, "pseudotime_differentially_accessible_sites.csv"), row.names = FALSE)top_increasing <- significant_sites[order(-significant_sites$estimate), ][1:5, ] # Most significant increase over time
top_decreasing <- significant_sites[order(significant_sites$estimate), ][1:5, ] # Most significant decrease over time
top_increasing <- top_increasing[!is.na(top_increasing$site_name), ]
top_decreasing <- top_decreasing[!is.na(top_decreasing$site_name), ]p5=plot_accessibility_in_pseudotime(input_cds_lin[top_increasing$site_name])
p6=plot_accessibility_in_pseudotime(input_cds_lin[top_decreasing$site_name])
ggsave(p5, file = file.path(outdir, 'top_increasing.png'), width = 5, height = 8, dpi=300)
ggsave(p6, file = file.path(outdir, 'top_decreasing.png'), width = 5, height = 8, dpi=300)
ggsave(p5, file = file.path(outdir, 'top_increasing.pdf'), width = 5, height = 8)
ggsave(p6, file = file.path(outdir, 'top_decreasing.pdf'), width = 5, height = 8)4.3.1 Pseudotime Trajectory Plot Interpretation
- Each point represents a cell; black segments are the learned trajectory backbone and branch paths.
- Darker colors indicate closer to the developmental start, lighter colors indicate closer to the developmental end (pseudotime increases from dark -> light).
- Numbers in white circles are markers for root nodes.
- Cells not in the same partition as the root node do not have calculated pseudotime and are displayed in gray, typically representing outliers or disconnected components.
options(repr.plot.width = 20, repr.plot.height = 8)
print(p1+p2)4.3.2 Cell Type Trajectory Plot Interpretation
- Points and Colors: Each point represents a cell; different colors correspond to different cell groups/annotations.
- Trajectory Backbone: Black lines are the learned trajectory backbone and branch paths, indicating state transition directions.
- Branch Nodes: Black circles with numbers indicate branch nodes connecting different differentiation fates (analogous to branches).
- Terminal States: Gray circles indicate terminal differentiation states, the final result of cell fate (analogous to leaves).
- Numbering: Node numbers are random labels with no sequential meaning; differentiation direction must be judged combined with pseudotime.
options(repr.plot.width = 20, repr.plot.height = 8)
print(p3+p4)4.3.3 Accessibility Trends Along Pseudotime Interpretation
This section displays the dynamic changes of peak/gene activity along the pseudotime axis to identify early, late, and transient regulatory modules.
- Method: Sort cells by pseudotime, smooth selected features (e.g., LOESS), and draw heatmaps/mean curves.
- Interpretation: Upregulation -> late opening; Downregulation -> early opening; Biphasic -> stage-specific regulation; Branch-specific curves indicate fate differentiation differences.
- Practical Advice: Standardize features by z-score within cells; cluster trends into modules and perform GO/TF enrichment; interpret cross-partitions separately.
options(repr.plot.width = 20, repr.plot.height = 8)
print(p5+p6)Result Files
- pseudotime.pdf/png
- Pseudotime trajectory plot
- celltype.pdf/png
- Cell type trajectory plot
- top_increasing.pdf/png
- Top 5 sites with most significant accessibility increase over time
- top_decreasing.pdf/png
- Top 5 sites with most significant accessibility decrease over time
Result Directory: The directory includes all images and table files involved in this analysis
Literature Case Analysis
- Literature 1: In the literature 'Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney' [2], researchers used Monocle3 to deeply analyze proximal tubule (PT) cell subpopulations, discovering a specific PT subpopulation expressing VCAM1, which has regenerative potential after kidney injury. Through pseudotime analysis, researchers traced the dynamic process of transition from conventional PT cells to VCAM1+PT cells and identified the key role of the NF-κB signaling pathway in this transition. Additionally, they found a co-accessible region associated with the VCAM1 gene enriched with NF-κB binding sites, confirming the role of NF-κB in regulating VCAM1 expression.
References
[1] Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn J L The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014 Apr;32(4):381-386. doi: 10.1038/nbt.2859. Epub 2014 Mar 23. PMID: 24658644; PMCID: PMC4122333.
[2] Muto Y, Wilson P C, Ledru N et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney[J]. Nat Commun, 2021, 12: 2190.
Appendix
.csv/.xls/.txt: Result data table files, separated by commas or tabs. Linux/Mac users use less or more commands to view; Windows users use advanced text editors like Notepad++, or open with Microsoft Excel.
.png: Result image files, bitmap, lossless compression.
.pdf: Result image files, vector graphics, can be zoomed in/out without distortion, convenient for viewing and editing, can be edited using Adobe Illustrator for publication.
