Skip to content

Inputs and Data Preparation

Author: SeekGene
Time: 3 min
Words: 556 words
Updated: 2026-05-12
Reads: 0 times
scMethyl + RNA-seq Inputs

Required Input Files

SeekSoul™ Methyl Tools accepts transcriptome and methylation paired-end FASTQ files for each sample:

  • Transcriptome Read1 FASTQ
  • Transcriptome Read2 FASTQ
  • Methylation Read1 FASTQ
  • Methylation Read2 FASTQ

If one sample contains multiple sequencing datasets, list them in the same order. In shell mode, separate file paths with commas. In Nextflow mode, provide one row per dataset in the samplesheet.

Shell Input Parameters

The shell wrapper sc_methy_workflow.sh uses the following arguments:

  • $1: transcriptome Read1 FASTQ path
  • $2: transcriptome Read2 FASTQ path
  • $3: methylation Read1 FASTQ path
  • $4: methylation Read2 FASTQ path
  • --sample: sample name
  • --outdir: output directory
  • --database_dir: reference genome database directory
  • --chemistry: DD-MET3 or DD-MET5
  • --core: number of CPU cores
  • --filter_ch: remove reads with more than n CH methylation sites; set to 0 to disable

Sample Naming Rules

From v2.0.0 onward, sample_id and outdir are validated more strictly:

  • Whitespace is not allowed
  • Whitespace-like separators should be normalized to underscores
  • Keep sample names stable across transcriptome and methylation inputs

Nextflow Samplesheet Format

The required header is:

text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2

Example for one dataset:

text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
XYRD-WTJW880,/path/to/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz,/path/to/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz

Example when transcriptome and methylation FASTQ counts are not equal:

text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
WTJW969,/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L000_R2.fq.gz
WTJW969,/path/to/WTJW969_E_L004_R1.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L004_R1.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz

Reference Database Requirements

The --database_dir directory should contain:

text
<database_dir>/
|-- bed/
|   |-- chr_len.bed
|   `-- chr_nochrM.bed
|-- fasta/
|   |-- genome.fa
|   |-- genome.fa.fai
|   `-- Bisulfite_Genome/
|-- genes/
|   `-- genes.gtf
`-- star/

Path conventions used by the pipeline:

  • star/: STAR genome index
  • fasta/genome.fa: genome FASTA
  • fasta/: Bismark genome folder after bismark_genome_preparation
  • genes/genes.gtf: gene annotation
  • bed/chr_len.bed: chromosome sizes
  • bed/chr_nochrM.bed: recommended chromosome list without mitochondrial contigs

For detailed construction steps, see:

Repository Layout

After cloning, the main pipeline components are:

  • nf/main.nf: top-level Nextflow entry point; choose the sub-workflow through --workflow
  • nf/subworkflows/: workflow definitions for rna_met, met_only, and force_cell
  • nf/modules/: step-wise processing modules
  • nf/bin/: helper scripts and barcode resources
  • nf/nextflow.config: execution profile and resource configuration
  • nf/nextflow_schema.json: parameter schema
  • sc_methy_workflow.sh: shell wrapper for dual-omics analysis
0 comments·0 replies