Inputs and Data Preparation
Time: 3 min
Words: 556 words
Updated: 2026-05-12
Reads: 0 times
Required Input Files
SeekSoul™ Methyl Tools accepts transcriptome and methylation paired-end FASTQ files for each sample:
- Transcriptome Read1 FASTQ
- Transcriptome Read2 FASTQ
- Methylation Read1 FASTQ
- Methylation Read2 FASTQ
If one sample contains multiple sequencing datasets, list them in the same order. In shell mode, separate file paths with commas. In Nextflow mode, provide one row per dataset in the samplesheet.
Shell Input Parameters
The shell wrapper sc_methy_workflow.sh uses the following arguments:
$1: transcriptome Read1 FASTQ path$2: transcriptome Read2 FASTQ path$3: methylation Read1 FASTQ path$4: methylation Read2 FASTQ path--sample: sample name--outdir: output directory--database_dir: reference genome database directory--chemistry:DD-MET3orDD-MET5--core: number of CPU cores--filter_ch: remove reads with more thannCH methylation sites; set to0to disable
Sample Naming Rules
From v2.0.0 onward, sample_id and outdir are validated more strictly:
- Whitespace is not allowed
- Whitespace-like separators should be normalized to underscores
- Keep sample names stable across transcriptome and methylation inputs
Nextflow Samplesheet Format
The required header is:
text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2Example for one dataset:
text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
XYRD-WTJW880,/path/to/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz,/path/to/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gzExample when transcriptome and methylation FASTQ counts are not equal:
text
sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
WTJW969,/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L000_R2.fq.gz
WTJW969,/path/to/WTJW969_E_L004_R1.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L004_R1.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gzReference Database Requirements
The --database_dir directory should contain:
text
<database_dir>/
|-- bed/
| |-- chr_len.bed
| `-- chr_nochrM.bed
|-- fasta/
| |-- genome.fa
| |-- genome.fa.fai
| `-- Bisulfite_Genome/
|-- genes/
| `-- genes.gtf
`-- star/Path conventions used by the pipeline:
star/: STAR genome indexfasta/genome.fa: genome FASTAfasta/: Bismark genome folder afterbismark_genome_preparationgenes/genes.gtf: gene annotationbed/chr_len.bed: chromosome sizesbed/chr_nochrM.bed: recommended chromosome list without mitochondrial contigs
For detailed construction steps, see:
Repository Layout
After cloning, the main pipeline components are:
nf/main.nf: top-level Nextflow entry point; choose the sub-workflow through--workflownf/subworkflows/: workflow definitions forrna_met,met_only, andforce_cellnf/modules/: step-wise processing modulesnf/bin/: helper scripts and barcode resourcesnf/nextflow.config: execution profile and resource configurationnf/nextflow_schema.json: parameter schemasc_methy_workflow.sh: shell wrapper for dual-omics analysis
