Skip to content

Standard Analysis for Single-cell scATAC-seq & scRNA-seq Multi-omics: Motif Analysis

Author: SeekGene
Time: 12 min
Words: 2.3k words
Updated: 2025-11-20
Reads: 0 times

Document Overview

Motif analysis is a crucial step in deciphering transcription factor regulatory networks in single-cell ATAC-seq data. Common analytical strategies include: identifying enriched motifs within differentially accessible peak sets and comparing motif activity among different cell populations. This document systematically introduces the basic concepts of motifs, methods for motif enrichment analysis within differentially accessible peaks, and approaches for analyzing differential motif activity.

一. Core Concept: What is a Motif?

(1) Definition of Motif

Transcription factors must bind to open DNA regions (peaks) to exert regulatory effects, and each transcription factor has preferred binding sequences (for example, SP1 prefers GC-rich sequences). These sequence patterns are known as motifs. Thus, a motif can be understood as a DNA sequence pattern recognized by a transcription factor's binding site, typically consisting of 6-20 base pairs, as illustrated below:

(2) Motif Databases

To date, several authoritative databases have consolidated and curated motif information from various species, providing structured and comprehensive resources for downstream analysis. Commonly used motif databases include: JASPAR, HOMER, MEME Suite, and TRANSFAC. Among these, the JASPAR database is the most widely used database of transcription factor binding sites and contains motifs from multiple species.

TIP

In the early days, motif identification mainly relied on experimental techniques (such as ChIP-seq, SELEX, etc.) or bioinformatics predictions, resulting in scattered and limited sets of transcription factor binding sequences. With research advances and accumulation of extensive experimental data, motif information has become increasingly enriched and standardized. As a result, motif information is now relatively comprehensive and can be directly accessed from the relevant databases for any given species.

二. Motif Enrichment Analysis

(1) What is Motif Enrichment?

Motif enrichment refers to calculating the frequency of specific motifs in particular genomic regions (such as differentially accessible peaks) and comparing these frequencies to those in background peaks (e.g., all peaks or matched controls). If a motif appears significantly more frequently in the set of differential peaks than in the background, it is considered "enriched" in those regions.

In simpler terms, motif enrichment means counting how many times a certain motif appears in a region of interest (such as differential peaks), comparing it to its occurrences in background regions—if it occurs more frequently in the differential peaks, it is considered enriched.

Applications include:

  • Comparing the openness of peaks in different cell types or states; screening for peaks that are specifically open in a particular cell type or condition, then analyzing the enriched motifs to infer key regulatory factors.
  • Performing motif enrichment analysis on differential peaks (e.g., disease group vs. control group) to discover TFs (transcription factors) associated with abnormal regulation, aiding biological interpretation and functional annotation.

TIP

For a more intuitive understanding of motif enrichment analysis on differential peaks, you may refer to practical case studies such as Single-cell ATAC_RNA Multi-omics Differential Peaks Analysis.

(2) Implementation of Motif Enrichment Analysis:

Taking Signac as an example, Signac uses the AddMotifs() function to add motif information to a Seurat object, and performs motif enrichment analysis with FindMotifs(). Below is a key code example (using the human genome, hg38):

r
library(Signac)
library(JASPAR2022)
library(TFBSTools)

# 1. Load the motif database (using human as example)
pfm <- getMatrixSet(
  x = JASPAR2020,
  opts = list(collection = "CORE", tax_group = "vertebrates", all_versions = FALSE)
)

# 2. Add motif information to the ATAC object (e.g., atac)
DefaultAssay(obj) <- "ATAC"
obj <- AddMotifs(
  object = obj,                  # Your Seurat/Signac object
  genome = BSgenome.Hsapiens.UCSC.hg38, 
  pfm = pfm
)

# 3. Perform motif enrichment analysis on the peaks of interest, using all peaks in obj as background peaks
motif.results <- FindMotifs(
  object = obj,
  features = diff_peak    # e.g., the list of differential peaks obtained from differential analysis
)

head(motif.results)
motifobservedbackgroundpct.obspct.bkgfold.enrichpvaluemotif.namep.adjust
MA0497.1556831549.1620.792.3650MEF2C0
MA0052.4533794847.1319.872.3720MEF2A0
MA0773.1398493035.1912.332.8550MEF2D0
MA0660.1345400230.5010.013.0490MEF2B0
MA1151.1286320625.298.023.1550RORC0
MA0592.3350466930.9511.672.6510MEF2A0

motif: The identifier of the motif in the database, typically representing the binding sequence of a specific transcription factor.
observed: The number of target peaks (e.g., differential peaks) that contain this motif.
background: The number of background peaks (e.g., control group or all peaks) that contain this motif.
pct.obs: The percentage of target peaks containing this motif (calculated as observed / total number of target peaks × 100%).
pct.bkg: The percentage of background peaks containing this motif (calculated as background / total number of background peaks × 100%).
fold.enrich: Fold enrichment of the motif in the target peaks, calculated as pct.obs / pct.bkg; higher values indicate stronger motif enrichment in differential peaks.
pvalue: The raw p-value for motif enrichment; a smaller value suggests that the enrichment is less likely to occur by chance.
motif.name: The name of the transcription factor corresponding to the motif, allowing for direct identification of regulatory factors.
p.adjust: The p-value adjusted for multiple hypothesis testing (e.g., FDR correction); p.adjust < 0.05 is often considered significantly enriched.

These metrics can be used to determine which motifs (and their corresponding transcription factors) are significantly and strongly enriched in differential peaks, thus revealing key TFs potentially involved in regulation.

TIP

Background peaks:

  • In motif enrichment analysis, background peaks refer to a set of peaks used as a control, which usually represent all peaks or peaks with no significant differences. Their function is to provide a baseline for the frequency of a given motif in the peaks of interest (such as differential peaks).
  • By comparing the level of enrichment of a motif in differential peaks versus background peaks, one can determine whether the motif is significantly enriched under a specific biological condition.
  • The FindMotifs() function allows background specification via a feature vector as a background set, or a numeric value to randomly select a certain number of background peaks.

三. Motif Activity Analysis

(1) What is Motif Activity?

  • Motif activity refers to the ability of a transcription factor motif to potentially bind and regulate in specific cells.
  • Unlike motif enrichment analysis, motif activity is not simply counting the occurrences of a motif, but instead considers which peaks in a cell contain that motif and the accessibility of those peaks.

(2) Calculation of Motif Activity

Taking Signac as an example, the main workflow and R code for motif activity analysis are as follows:

r
library(Signac)
library(Seurat)
library(JASPAR2022)
library(TFBSTools)
library(BSgenome.Hsapiens.UCSC.hg38) # Or adjust according to your species

# Obtain the motif dataset
pfm <- getMatrixSet(
  x = JASPAR2022,
  opts = list(collection = "CORE", tax_group = "vertebrates")
)

# Add motif information to the ATAC object (assuming the object is named obj and the Assay is named ATAC)
obj <- AddMotifs(
  object = obj,
  genome = BSgenome.Hsapiens.UCSC.hg38,
  pfm = pfm
)

# Run chromVAR to calculate motif activity
obj <- RunChromVAR(
  object = obj,
  genome = BSgenome.Hsapiens.UCSC.hg38
)

Viewing Motif Activity Results

  • After running the analysis, a new assay named "chromvar" will be added to the object. You can use Seurat's FeaturePlot to visualize motif activity.
  • For example, you can visualize the activity of a specific motif on the UMAP:
r
FeaturePlot(obj, features = "MA0497.1", assay = "chromvar")
  • Interpretation of Motif Activity Scores
  • Positive deviation score: The motif is active in the cell, and the corresponding transcription factor (TF) may be functionally active.
  • Negative deviation score: The motif is inactive or repressed in the cell.
  • Deviation score close to zero: No significant difference in activity compared to the background.

WARNING

Notes

  • It is recommended to use the appropriate genome version for your organism when calculating motif activity.
  • If there are many motifs or a large number of cells, the analysis can be time-consuming; parallel computation may help speed up the process.

四. Differential Analysis of Motif Activity

Previously, we introduced how to calculate the chromVAR deviation score for each motif at the single-cell level. Once you have obtained the motif activity score matrix, it is common to compare different cell types or states to identify which motifs show significant differences in activity. This allows further inference of key transcription factors.

Direct Analysis of Differential Motif Activity

Compared to motif enrichment analysis based solely on differential peaks, directly comparing motif activity between cell types usually gives highly consistent results and provides a more intuitive picture of regulatory activity.

In the Signac/Seurat analysis framework, you can use the FindMarkers() or FindAllMarkers() functions to directly perform differential analysis on chromVAR-derived z-scores (i.e., the values in the motif activity matrix) between two groups or two sets of cells. Typically, the following parameters are used to obtain the "difference in average z-score between groups":

r
differential.activity <- FindMarkers(
  object = obj,
  ident.1 = 'Pvalb', # Compare motif activity differences between the 'Pvalb' and 'Sst' cell groups
  ident.2 = 'Sst',
  only.pos = TRUE,
  mean.fxn = rowMeans,
  fc.name = "avg_diff"
)

TIP

  • FindAllMarkers() is suitable for automatically comparing each cell group to all others in one step, making it ideal for comprehensive differential analysis in multi-group scenarios (e.g., multiple cell types or states).
  • FindMarkers() is suitable for direct comparisons between two groups (such as disease vs. control), enabling flexible one-to-one differential peak screening for more precise comparison between specified groups.

After identifying differential motifs, you can use MotifPlot to visualize the significant motifs:

r
MotifPlot(
  object = obj,
  motifs = head(rownames(differential.activity)),
  assay = 'peaks'
)

五. Interpretation of Motif Activity

In multi-omics data, it is possible to simultaneously examine: Motif activity (from ATAC data) and TF expression (from RNA data).

(1) Consistency Between Motif Activity and TF Expression:

Ideally, if a transcription factor is active in a certain cell type, both its motif activity and its RNA expression should be high.

(2) Inconsistency Between Motif Activity and TF Expression:

  1. High motif activity but low TF expression—possible reasons:
  • The transcription factor may regulate in a manner independent of its own expression (e.g., via co-activators)
  • The transcription factor is present at the protein level, but its RNA abundance is low
  • Other transcription factors share the same motif
  1. High TF expression but low motif activity—possible reasons:
  • The transcription factor may bind to regions other than the identified peaks (non-peak regions)
  • The transcription factor may function in a way that does not depend on DNA binding

TIP

Identifying Key Transcription Factors:

  • In multi-omics data, special attention should be paid to transcription factors with both high motif activity and high expression.
  • These TFs are typically the core regulators supported by converging evidence—they possess strong binding potential (motif activity), are clearly expressed (RNA), and may directly regulate their target genes (peak-gene association).

六. Biological Applications of Motif Analysis

(1) Cell Type-Specific Regulatory Factors

Identifying Cell Type Marker Transcription Factors (TFs):

  • By comparing motif activity across different cell types, one can identify transcription factors that are specific to each cell type.
  • These TFs can serve as regulatory markers for cell identity, similar to marker genes at the expression level.
  • For example, SPI1/PU.1 is a marker transcription factor for microglia cells; PAX5 is a marker transcription factor for B cells.

(2) Disease-Associated Regulatory Abnormalities

Abnormal Motif Activity in Disease Samples:

  • Comparing motif activity between disease samples and control samples can identify transcription factors with aberrant regulatory activity.
  • Such abnormalities may provide clues to underlying disease mechanisms or potential therapeutic targets.
  • For instance, in Alzheimer's disease, the motif activity of certain transcription factors (such as ZEB1, MAFB) may be abnormal.

(3) Advantages of Multi-omics Data

RNA-ATAC Integrated Validation:

  • In multi-omics data, you can simultaneously examine:

    1. Motif Activity (from ATAC): Reflects the potential binding ability of transcription factors.
    2. TF Expression (from RNA): Reflects the actual expression level of transcription factors.
  • If both motif activity and expression of a transcription factor are high, it suggests that the TF is active and involved in regulation in that cell.

    TIP

    Triple-Evidence Regulatory Analysis:

    • Peak-Gene-TF Trio Analysis:
      1. Identify differentially accessible peaks (differential peak analysis).
      2. Link peaks to nearby genes (gene annotation).
      3. Identify enriched motifs within peaks (motif analysis).
      4. Validate the correlation between TF expression, peak accessibility, and gene expression (multi-omics integration).
    • Through this triple-evidence approach, it is possible to more reliably identify key transcription factors and regulatory mechanisms.

七. Frequently Asked Questions

Q1: What is the difference between Motif Activity and Motif Enrichment?

A: They are related but distinct concepts:

  • Motif Enrichment:

    • Focus: Which motifs are significantly enriched in a specific set of peaks (e.g., differential peaks).
    • Method: Quantifies the occurrence frequency of motifs in the target peak set and compares it to a background set.
    • Result: Produces a motif enrichment list, indicating which transcription factors may regulate these peaks.
    • Application: Commonly used for functional interpretation after differential peak analysis.
  • Motif Activity:

    • Focus: The activity level of each motif in every single cell.
    • Method: Considers the chromatin accessibility of peaks containing a given motif in each cell and computes a deviation score.
    • Result: Generates a cell × motif activity matrix.
    • Application: Useful for cell clustering, identifying cell-type specific TFs, and analyzing dynamic changes in TF activity.
0 comments·0 replies