Skip to content

mut Analysis

Author: SeekGene
Time: 10 min
Words: 1.9k words
Updated: 2026-01-26
Reads: 0 times
SeekSoul™ Online

Introduction

IMPORTANT

The mut module focuses on single-cell SNV/Indel enrichment analysis, localizing cell populations carrying specific mutations and evaluating their functional pathways through joint modeling of mutation matrices and expression matrices. The workflow defaults to accepting upstream mutation detection results (*.snp_indel.all_UMI.matrix, *.snp_indel.alt_UMI.matrix) and does not require re-running long processes like VarScan within this module.

With the widespread application of single-cell sequencing, many studies aim to answer whether "a specific cell population carries particular mutations and exhibits functional gains." The mut module in SeekSoul™ Online is designed for this purpose: it automatically performs matrix validation, sample splitting, mutation enrichment, differential/pathway analysis, and report generation, significantly lowering the threshold for bioinformatics development.


Theoretical Foundation of mut Analysis

Core Principles

  1. Matrix Validation and Download: The system automatically reads the all_UMI/alt_UMI paths recorded in sample_matrix.txt, performs availability validation, and downloads them in parallel to ensure all mutation matrices for each sample are ready.
  2. RDS Subset and Sample Identification: Generates subset RDS based on the user-specified sample column (default Sample) and cell annotation column, unifying barcode naming for subsequent analysis and filtering irrelevant cells.
  3. Mutation Information Summary: The workflow reads the mutation matrix and counts each site's UMI count, number of barcodes carrying mutations, mutation rate, and distribution across different celltypes/clusters.
  4. Mutation Enrichment Determination: Mounts the mutation matrix as an additional Assay to the single-cell object, uses Fisher's exact test to determine significant enrichment for each site × cell population, and outputs *_snv_markers.xls with UMAP visualization.
  5. Differential and Pathway Analysis: If the species is human/mouse, the system automatically selects the top 10 significant sites, performs differential analysis between mutant cells and covered cells, and conducts GO/KEGG/Reactome enrichment, outputting tables and images.
  6. Report Generation: Produces directly deliverable HTML/PDF reports.

Single Sample vs. Multiple Sample Strategy

ScenarioProcessing ApproachOutput
Single SampleDirectly performs statistics, enrichment, and visualization on the sample's *.snp_indel.*.matrixSample.mut.info.txt, mutation_umap/, differential enrichment (if species ∈ {human, mouse})
Multiple SamplesThe system generates two matrix sets: multi (all mutations) and common (shared sites), and performs statistics and enrichment separately for each, facilitating comparison of overall and intersection resultsDual results of multi.* and common.*, presented in separate sections in the report

Key Statistical Indicators

  • UMI / barcode: Reflects the coverage of mutations at the cellular level, which can be used to assess whether sequencing depth is sufficient.
  • mut_rate: barcode_count / total_cells, measuring the frequency of mutations in the sample.
  • Fisher's exact test: Constructs a 2×2 contingency table of "mutation vs. coverage" and "target cells vs. other cells," returning indicators such as p_val and ident1_mut.
  • Differential expression/enrichment: Default logfc.threshold=0.25, with GO/KEGG/Reactome unified plotting and table output.

SeekSoul™ Online Operation Guide

Pre-Analysis Preparation

CAUTION

  • The upstream mutation matrix files should maintain consistent barcode naming with those in RDS; if they include suffixes, the mut process will automatically match them, but it's still recommended to check before uploading.
  • The column names and content of metadata should not contain Chinese characters or special characters (&, spaces, etc.), otherwise the process may fail.
  • The differential enrichment module is only executed when species is set to human/mouse.

Parameter Details

Interface ParameterDescriptionNotes
Task NameStarts with English, can contain Chinese/numbers/underscoresUsed for report header and task tracking
Group.byColumn representing samples in metadata, default SampleDetermines subset_samples.R --group
Cell TypeCell annotation column in metadata, e.g., CellAnnotationAffects enrichment test and differential analysis
Sample InformationSample information to be analyzed, along with corresponding all_UMI.matrix and alt_UMI.matrixSupports OSS paths
Specieshuman / mouse / otherControls whether differential enrichment is performed
NoteCustom textRecords analysis background

Result Interpretation

Result Directory Overview

PathContentDescription
output/results/<sample>.mut.info.txtUMI, barcode, mut_rate and cluster information for each siteCan be used for downstream screening of hotspot mutations
output/results/<sample>/mutation_umap/SNV_diff/*.pngUMAP visualization of Fisher significant mutationsThe file name indicates the mutation site
output/results/<sample>/mutation_umap/SNV_diff/<sample>_snv_markers.xlsMutation enrichment statistics tableContains p_val, ident*_mut/cover, etc.
.../diff_pathway/pos*/diffgene.xlsDifferential expression resultsident.1 = alt, ident.2 = WT
`.../diff_pathway/pos*/gokeggreactome/`
report/HTML report directoryPackaged as report.zip for download

Key Visualization Examples

Single Sample View

Shows significant sites filtered by Fisher test (Example: THRAP3 chr1-36296730 G>A upregulated in B cells of PBMC samples). Red dots represent mutant cells, gray dots represent covered but non-mutant cells.

Based on differential analysis between mutant Monocytes and WT cells at the pos0EGR1 site, the system filters significant items and plots bar charts, allowing quick identification of themes such as "leukocyte migration" and "wound healing".

KEGG enrichment dot plots focus on immune/infection-related pathways (Chemokine signaling, NF-κB, Platelet activation, etc.), where dot color and size represent significance and number of enriched genes respectively.

Multi-Sample View

Under the multi matrix, RALY chr20-34077058 C>CAG is significantly enriched in Basophil cells. In the figure, red dots represent mutant cells, and gray dots represent covered cells.

Multi-sample differential analysis shows that RALY-mutant Basophils are enriched in ribosomal/mitochondrial processes such as ribosome biogenesis and mitochondrial gene expression.

In terms of KEGG, the same site highlights proliferation-related pathways such as DNA replication and Cell cycle, suggesting that these mutant cells have high synthetic activity.

Common Variation View

The common matrix emphasizes the mutation SRP14 chr15-40036395 GTGC>- that exists in all samples, showing consistent enrichment in Plasma Cells.

SRP14 mutation-related cells are mainly enriched in transcription/translation processes such as ribosome biogenesis and RNA processing.

KEGG results emphasize basic molecular machinery such as Ribosome and Spliceosome, deepening the understanding of the functional background of common mutations.


Case Reference: Latest Single-cell Mutation Practices

The workflow of the mut module is consistent with high-impact research in recent years. Take the single-cell multi-omics study of hepatoblastoma (HB) published by Roehrig A et al. in Nature Communications (2024, 15:3031) as an example:

  • Hepatoblastoma Clonal Evolution and Chemotherapy Response Study
    • In their research, Roehrig A et al. achieved reconstruction of HB tumor clonal architecture and mutation localization at the single-cell level through single-cell multi-omics (snRNA-seq + snATAC-seq) combined with whole-genome sequencing (WGS) —— this highly aligns with the mut module's analytical logic of "mutation - cell population - function". The study first used WGS to identify key driver mutations in HB (such as CTNNB1 activating mutations, copy-neutral loss of heterozygosity at 11p15.5, cnLOH), and then mapped these mutations to specific cell subsets through single-cell data to clarify the differentiation status range of each genetic subclone (such as scH hepatocyte-like, scLP liver progenitor-like, scM mesenchymal-like).
    • Similarly, in mut module analysis, cell subsets significantly enriched for specific mutations can be located through Fisher's exact test (corresponding to "subclone differentiation status analysis" in the literature). For example, if CTNNB1 mutations are detected to be significantly enriched in the scLP subset in HB samples, differential expression analysis can be further performed between this mutant cell population and wild-type cells, usually observing the characteristics mentioned in the literature such as "high expression of stem cell markers (like PROM1) and DNA repair genes"; subsequent KEGG pathway enrichment can verify whether these mutations activate cell cycle, DNA repair-related pathways (such as the functional association of faster proliferation of scLP subclones after chemotherapy mentioned in the literature), thereby revealing the impact mechanism of mutations on tumor cell chemoresistance.

Recommended practice workflow is:

  1. Mutation Localization: Use *_mut.info and SNV_diff to identify SNVs significantly enriched within specific celltype/cluster.
  2. Functional Evaluation: Perform differential analysis + GO/KEGG/Reactome enrichment on these sites to observe whether they are concentrated in cell cycle, immune pathways or metabolic pathways.
  3. Result Delivery: Export images and tables through the report module, connect the "mutation-cell type-functional pathway" chain, and incorporate it into project reports or papers.

In this way, we can understand tumor heterogeneity at single-cell resolution and provide deeper insights for precision medicine.


Precautions and Best Practices

WARNING

The mut workflow does not perform variant detection, only analyzes matrices output from upstream processes; if the matrix quality is poor or sample barcodes do not match, it will directly affect enrichment results.

  • Reasonable Sample Screening: Single-cell samples vary greatly, it is recommended to prioritize projects with sufficient coverage (≥3k cells, UMI depth >20k) and accurate meta annotations.
  • Multi-sample Interpretation: The results of multi and common have different meanings — the former displays all mutations, while the latter emphasizes "cross-sample consistent" hotspots; the report has presented them in separate chapters.

FAQ (Frequently Asked Questions)

  1. Q: Why does it prompt "mutation matrix and RDS barcode do not match"?
    A: This is usually because the upstream matrix retains suffixes like _1 or -1. The mut process will attempt to match, but it will report an error if there is no overlap at all (setdiff=all). Please confirm whether the matrix column names are consistent with the Seurat object or can be matched by suffix.

  2. Q: No results for differential enrichment?
    A: Two conditions need to be met: species ∈ {human, mouse} and there must be at least one site with p_val < 0.05 in SNV_diff. You can confirm the species when uploading parameters or relax group_input_name to obtain more significant sites.

  3. Q: Why is the "common variation" section empty in the report?
    A: In multi-sample projects, if there are no common sites between different samples (common_alt_pos is empty), the com_mut section will only display prompt information. You can check whether all matrices share the same pos column.


References

  1. Skinnider MA, et al. Cell type prioritization in single-cell data. Nat Biotechnol. 2021, 39(4):436-447.
  2. Courtine G, et al. The neurons that restore walking after paralysis. Nature. 2022, 607(7918):313-319.
  3. Roehrig A, et al. Single-cell multiomics reveals the interplay of clonal evolution and cellular plasticity in hepatoblastoma. Nature Communications. 2024, 15:3031.
0 comments·0 replies