Run

Author: SeekGene

Time: 12 min

Words: 2.3k words

Updated: 2026-02-26

Reads: 0 times

scMethyl + RNA-seq Analysis Guide

Usage

Activate Environment

bash

conda activate seeksoulmethyl

Run Dual-omics Analysis (Shell script)

bash

# sc_methy_workflow.sh can be found in the SeekSoulMethyl directory you cloned
bash sc_methy_workflow.sh \
/path/to/expression_R1.fastq.gz \
/path/to/expression_R2.fastq.gz \
/path/to/methy_R1.fastq.gz \
/path/to/methy_R2.fastq.gz \
--sample WTJW880 \
--outdir /path/to/results \
--database_dir /path/to/human-reference-GRCh38 \
--chemistry DD-MET3 \
--core 64 \
--filter_ch 2

If a sample has multiple datasets, separate file paths with commas. Ensure that the FASTQ files are listed in the correct order for each dataset.

shell

bash sc_methy_workflow.sh \
/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L004_R1.fq.gz \
/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz \
/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L004_R1.fq.gz \
/path/to/WTJW969_Met_L000_R2.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz \
--sample WTJW969 \
--outdir /path/to/results \
--database_dir /path/to/human-reference-GRCh38 \
--chemistry DD-MET5 \
--core 64 \
--filter_ch 2

Input Parameters:

$1: Single-cell transcriptome Read1 fastq file path.
$2: Single-cell transcriptome Read2 fastq file path.
$3: Single-cell methylation Read1 fastq file path.
$4: Single-cell methylation Read2 fastq file path.
sample: Sample name.
outdir: Output directory path.
database_dir: Reference genome database path.
chemistry: Methylation chemistry (DD-MET3 or DD-MET5). DD-MET3 is a dual-label dataset, meaning the RNA and DNA methylation data barcodes are different for the same cell, while DD-MET5 is single-label, meaning the RNA and DNA methylation data barcodes are the same for the same cell.
core: Number of CPU cores.
filter_ch: Filter reads that contain > n CH methylation sites. If you do not want to enable this filter, set filter_ch to 0.

Quick Start with Nextflow (Recommended)

If you want to process samples in batch and get pipeline-level reports, use Nextflow:

bash

conda activate seeksoulmethyl
conda install -n seeksoulmethyl -c bioconda nextflow

# Prepare the sample sheet (see example above)
nextflow run SeekSoulMethyl/nf/main.nf \
  --outdir /path/to/results \
  --samplesheet samplelist.csv \
  -w /path/to/work \
  -c SeekSoulMethyl/nf/nextflow.config \
  -profile aliyun_k8s \
  --database_dir /path/to/reference \
  --split_fastq 4 \
  --filter_ch 2 \
  --chemistry DD-MET3 > methy.log

Figure 6: SeekSoulMethyl nextflow pipeline workflow

Running with Nextflow

Install nextflow:

bash

conda install -n seeksoulmethyl -c bioconda nextflow

Prepare your input samplesheet

cat > samplelist.csv << EOF
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
XYRD-WTJW880,/path/to/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz,/path/to/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz
EOF

# expression_r1: Single-cell transcriptome Read1 fastq file
# expression_r2: Single-cell transcriptome Read2 fastq file
# methylation_r1: Single-cell methylation Read1 fastq file
# methylation_r2: Single-cell methylation Read2 fastq file

If a single sample has multiple datasets and the transcriptome and methylation FASTQ counts do not match (e.g., WTJW969), add multiple rows to the samplesheet, with each row representing one dataset.

text

sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
WTJW969,/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L000_R2.fq.gz
WTJW969,/path/to/WTJW969_E_L004_R1.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L004_R1.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz

Run the pipeline:

bash

nextflow run -bg SeekSoulMethyl/nf/main.nf \
--outdir /path/to/tiny_demo/results/ \
--samplesheet samplelist.csv \
-w /path/to/tiny_demo/results/work \
-c SeekSoulMethyl/nf/nextflow.config \
-profile aliyun_k8s \
--database_dir /path/to/human-reference-GRCh38/ \
--split_fastq 1 \
--filter_ch 2 \
--chemistry DD-MET3 > methy.log

# --outdir: final results directory
# --samplesheet: input samplesheet file
# -w: working directory for nextflow
# -c: nextflow configuration file. Must be modified according to your server configuration.
# -profile: nextflow profile for aliyun k8s
# --database_dir: reference genome database directory
# --split_fastq: To speed up the analysis process, split fastq according to first n bases of barcode. Default is 4.
# --filter_ch: filter reads that contain > 2 CH methylation sites.
# --chemistry: methylation chemistry (DD-MET3 or DD-MET5)

Notes on nextflow.config

nextflow.config declares the execution environment, resource quotas, and run policies. You must customize it to your own server or cluster.

Location: SeekSoulMethyl/nf/nextflow.config (specify with -c SeekSoulMethyl/nf/nextflow.config when running).
Key items to tailor to your infrastructure:
- Executor: process.executor (e.g., local, slurm, pbs, k8s, awsbatch).
- Resources: process.cpus, process.memory, process.time, or fine-grained overrides via withLabel/withName.
- Work directory: workDir (can also be set via -w); ensure it is writable and has sufficient space.
- Environment: enable one of conda.enabled, docker.enabled, or singularity.enabled according to what your server supports.

Example configurations (replace paths and parameters with values valid on your servers):

groovy

profiles {
  // Local machine
  local {
    process.executor = 'local'
    workDir          = '/path/to/work'
    process.cpus     = 8
    process.memory   = '32 GB'
    conda.enabled    = true
    // Or containers: singularity.enabled = true / docker.enabled = true
  }

  // Slurm cluster (HPC)
  slurm {
    process.executor   = 'slurm'
    workDir            = '/lustre/work/your_user'
    process.cpus       = 8
    process.memory     = '32 GB'
    process.queue      = 'normal'
    process.clusterOptions = '-A your_account --qos=normal'

    withLabel: 'high_mem' {
      cpus   = 16
      memory = '64 GB'
    }
  }

  // Kubernetes (e.g., Alibaba Cloud ACK)
  aliyun_k8s {
    process.executor   = 'k8s'
    workDir            = '/mnt/nf-work'    // persistent volume path
    k8s {
      namespace        = 'your-namespace'
      storageClaimName = 'your-pvc'
      cpu              = 4
      memory           = '16 GB'
    }
    // If using container images globally: docker.enabled = true
  }
}

Tips:

Pick the -profile that matches your environment (e.g., slurm, local, aliyun_k8s), then adapt the parameters.
Keep workDir on fast storage with ample capacity; the final results directory is controlled by --outdir.
If you use the README’s conda environment, prefer conda.enabled; if your cluster favors containers, use Docker/Singularity.

References:

Nextflow configuration & profiles: https://www.nextflow.io/docs/latest/config.html
Executors (local, Slurm, K8s, etc.): https://www.nextflow.io/docs/latest/executor.html
Kubernetes guide: https://www.nextflow.io/docs/latest/kubernetes.html

Nextflow Step-by-Step Details

This section describes each Nextflow process with inputs, core logic, key parameters, and outputs for troubleshooting and interpretation. The workflow is defined in nf/main.nf (script/SeekSoulMethyl/nf/main.nf:1) and processes are implemented in nf/modules/*.nf.

Step 1: Preprocessing and Barcode Parsing (step1.nf)

Compute genome-wide CpG count: COMPUTE_CPG_SITES (script/SeekSoulMethyl/nf/modules/step1.nf:6)
- Input: params.genomefa, params.chrom_size_path
- Action: run count_cg_sites.py to count CpG sites
- Output: genome_cg_info.json
Expression FASTQ QC (multi-group): FASTP_EXPRESSION_MULTI (script/SeekSoulMethyl/nf/modules/step1.nf:22)
- Input: paired FASTQs per sample (groups G1/G2/...)
- Action: fastp trimming and QC
- Output: cleaned *_expression_clean_R1/2.fastq.gz, *.html, *.json
Methylation FASTQ QC (multi-group): FASTP_METHYLATION_MULTI (script/SeekSoulMethyl/nf/modules/step1.nf:88)
- Input: paired FASTQs per sample (groups G1/G2/...)
- Action: fastp QC (adapter detection disabled, trimming as per pipeline)
- Output: cleaned *_methylation_clean_R1/2.fastq.gz, *.html, *.json
RNA alignment and quantification: SEEKSOULTOOLS_RNA (script/SeekSoulMethyl/nf/modules/step1.nf:156)
- Input: cleaned expression R1/R2 lists
- Action: seeksoultools rna run (STAR mapping, counting, filtering, clustering, DE)
- Output: Analysis/step3/filtered_feature_bc_matrix/ etc.
Methylation barcode parsing: METHYLATION_BARCODE_EXTRACTION (script/SeekSoulMethyl/nf/modules/step1.nf:206)
- Input: cleaned methylation R1/R2 lists, whitelist params.methy_barcode_wl
- Action: run barcode_cs_multi.py to parse/correct cell barcodes and UMIs by params.chemistry; optional --split_fastq n to shard reads by first n barcode bases
- Output: step1/${sample}_forward_*{1,2}.fq.gz, step1/${sample}_reverse_*{1,2}.fq.gz, ${sample}_methy_summary.json, optional ${sample}_barcode_stats.txt
Build forward/reverse pairing lists: PARSE_FASTQ_FILES (script/SeekSoulMethyl/nf/modules/step1.nf:255)
- Input: forward/reverse FASTQ sets
- Action: scan step1/ and pair files by identifiers
- Output: forward_pairs.txt, reverse_pairs.txt
Post-barcode-extraction QC: FASTP_METHYLATION_BARCODE_EXTRACT (script/SeekSoulMethyl/nf/modules/step1.nf:368)
- Input: paired sub-FASTQs
- Action: fastp QC
- Output: per-pair *.html, *.json

Step 2: Bismark alignment and BAM sorting (step2.nf)

Forward-strand alignment: BISMARK_ALIGNMENT_FORWARD (script/SeekSoulMethyl/nf/modules/step2.nf:3)
- Key flags: --add_barcode, --add_umi; max insert size -X 1000
- Output: *_bismark_bt2_pe.bam, *_bismark_bt2_PE_report.txt
Reverse (PBAT) alignment: BISMARK_ALIGNMENT_REVERSE (script/SeekSoulMethyl/nf/modules/step2.nf:31)
- Key flags: --pbat, --add_barcode, --add_umi
- Output: same as above
Sort by read name: SORT_BAM_BY_NAME (script/SeekSoulMethyl/nf/modules/step2.nf:61)
- Action: samtools sort -n
- Output: *_sortbyname.bam

Step 3: Per-cell split, ALLC generation/merge, dataset building (step3.nf)

Split BAMs by cell barcode: SPLIT_BAM_FILES (script/SeekSoulMethyl/nf/modules/step3.nf:1)
- Input: name-sorted BAM and GEX barcodes barcodes.tsv.gz
- Action: run step3_split_bams.py to split by cell barcode and keep shared cells only
- Output: per-cell BAM dir, *_filtered_barcode, *_filtered_barcode_reads_counts.csv
Merge forward/reverse per-cell BAMs and counts: MERGE_BISMARK_BAM (script/SeekSoulMethyl/nf/modules/step3.nf:27)
- Action: samtools merge -n per matching cell; merge/deduplicate barcodes and read counts
- Output: *_merged_fr_bam/, *_merge_filtered_barcode, *_merge_filtered_barcode_reads_counts.csv
Generate per-cell ALLC: ALLCOOLS_BAM_TO_ALLC (script/SeekSoulMethyl/nf/modules/step3.nf:86)
- Action: run step3_bam_to_allc.py (custom ALLCools), UR-based dedup and methylation quantification
- Output: per-cell *_allc.gz and index
Merge cell metrics: MERGE_FILTERED_BARCODE_READS_COUNTS (script/SeekSoulMethyl/nf/modules/step3.nf:114)
- Action: merge barcodes and read counts, create ${sample}_cells.csv and .json
- Output: filtered_barcode, filtered_barcode_reads_counts.csv, ${sample}_cells.{csv,json}
Build multi-scale dataset: ALLCOOLS_GENERATE_DATASETS (script/SeekSoulMethyl/nf/modules/step3.nf:136)
- Action: allcools generate-dataset for chrom10k/20k/50k/100k/500k/1M, metrics like count and hypo-score
- Output: ${sample}.mcds
Merge ALLC (when sharded): ALLCOOLS_SUBMERGE (script/SeekSoulMethyl/nf/modules/step3.nf:188), ALLCOOLS_MERGE (script/SeekSoulMethyl/nf/modules/step3.nf:223)
- Action: merge per-shard/per-sample ALLCs
- Output: ${sample}_merge_allc.gz and index
Extract CG context ALLC: ALLCOOLS_EXTRACT (script/SeekSoulMethyl/nf/modules/step3.nf:252)
- Action: allcools extract-allc --mc_contexts CGN --strandness merge
- Output: *.CGN-Merge*

(Methylation-only workflow methy_only.nf additionally includes COUNTS_MAPPED_READS and ESTIMATED_CELLS for read-count-based cell estimation and barcode filtering, see script/SeekSoulMethyl/nf/modules/utils.nf:1 and :17)

Step 4: Summary, DR & joint report (step4.nf)

Methylation summary: METHYLATION_SUMMARY (script/SeekSoulMethyl/nf/modules/step4.nf:2)
- Action: step4_wgs_summary.py aggregates Bismark reports, cell metrics and CpG stats to produce ${sample}_methy_summary.json and ${sample}_wgs_summary.csv
LSI/PCA clustering and visualization: METHYLATION_LSI_PCA_CLUSTERING (script/SeekSoulMethyl/nf/modules/step4.nf:26)
- Action: step4_allcools_PCA_cluster.py --var_dim chrom20k --reduc lsi
- Output: *.h5ad, *.pdf, *.png
Transcriptome+Methylation joint report: MULTI_REPORT (script/SeekSoulMethyl/nf/modules/step4.nf:52)
- Action: step4_report_rna_met.py integrates transcriptome and methylation outputs
- Output: ${sample}_rna_methyl_report.html, ${sample}_rna_met.json

Methylation-only workflow (test version, currently not recommended for use)

Use the simplified workflow when you only have methylation reads:

bash

nextflow run SeekSoulMethyl/nf/methy_only.nf \
  --outdir /path/to/results \
  --samplesheet samplelist.csv \
  -w /path/to/work \
  -c SeekSoulMethyl/nf/nextflow.config \
  -profile aliyun_k8s \
  --database_dir /path/to/reference \
  --split_fastq 4 \
  --filter_ch 2 \
  --chemistry DD-MET3

Key parameters and tips

--database_dir: reference directory with fasta/genome.fa, genes/genes.gtf, star/, bed/chr_nochrM.bed (script/SeekSoulMethyl/nf/main.nf:19)
--chemistry: DD-MET3 or DD-MET5; also sets transcriptome chemistry and barcode whitelist (script/SeekSoulMethyl/nf/main.nf:27)
--split_fastq: shard by the first n barcode bases (default 4; increases parallelism but adds scheduling/merge overhead) (script/SeekSoulMethyl/nf/main.nf:36)
--filter_ch: filter reads with > n CH methylation sites (default 2) (script/SeekSoulMethyl/nf/modules/step1.nf:241)
The samplesheet header must be sample_id (script/SeekSoulMethyl/nf/main.nf:116)

Execution environment and resources

Recommended container image: registry.cn-beijing.aliyuncs.com/seekgene/seeksoulmethyl:1.1.2 (script/SeekSoulMethyl/nf/nextflow.config:69)
Choose -profile aliyun_k8s or -profile docker, and edit nf/nextflow.config for your infra
Heavy steps: Bismark/ALLCools need substantial CPU/RAM; defaults are set in withName blocks, scale up if needed (script/SeekSoulMethyl/nf/nextflow.config:313)

System Requirements

If you use sc_methy_workflow.sh, the system requirements are as follows:

CPU: 64 cores (recommended)
Memory: 128GB RAM (recommended)
Storage: At least 500GB available space
OS: Linux (recommended Ubuntu 18.04+ or CentOS 7+)

FAQ

Samplesheet parsing error: ensure first column is sample_id, use absolute paths (script/SeekSoulMethyl/nf/main.nf:111)
Missing ${sample}.mcds: check ALLCOOLS_BAM_TO_ALLC produced per-cell *_allc.gz and chrom_size_path is correct
Stuck at Bismark: verify reference indices and that params.bismark_ref is visible in the container
Resume runs: use -resume with the same -w work directory

Run ​

Usage ​

Activate Environment ​

Run Dual-omics Analysis (Shell script) ​

Input Parameters: ​

Quick Start with Nextflow (Recommended) ​

Running with Nextflow ​

Notes on nextflow.config ​

Nextflow Step-by-Step Details ​

Step 1: Preprocessing and Barcode Parsing (step1.nf) ​

Step 2: Bismark alignment and BAM sorting (step2.nf) ​

Step 3: Per-cell split, ALLC generation/merge, dataset building (step3.nf) ​

Step 4: Summary, DR & joint report (step4.nf) ​

Methylation-only workflow (test version, currently not recommended for use) ​

Key parameters and tips ​

Execution environment and resources ​

System Requirements ​

FAQ ​

Run

Usage

Activate Environment

Run Dual-omics Analysis (Shell script)

Input Parameters:

Quick Start with Nextflow (Recommended)

Running with Nextflow

Notes on nextflow.config

Nextflow Step-by-Step Details

Step 1: Preprocessing and Barcode Parsing (step1.nf)

Step 2: Bismark alignment and BAM sorting (step2.nf)

Step 3: Per-cell split, ALLC generation/merge, dataset building (step3.nf)

Step 4: Summary, DR & joint report (step4.nf)

Methylation-only workflow (test version, currently not recommended for use)

Key parameters and tips

Execution environment and resources

System Requirements

FAQ