Inputs and Data Preparation

Author: SeekGene

Time: 3 min

Words: 556 words

Updated: 2026-05-12

Reads: 0 times

scMethyl + RNA-seq Inputs

Required Input Files

SeekSoul™ Methyl Tools accepts transcriptome and methylation paired-end FASTQ files for each sample:

Transcriptome Read1 FASTQ
Transcriptome Read2 FASTQ
Methylation Read1 FASTQ
Methylation Read2 FASTQ

If one sample contains multiple sequencing datasets, list them in the same order. In shell mode, separate file paths with commas. In Nextflow mode, provide one row per dataset in the samplesheet.

Shell Input Parameters

The shell wrapper sc_methy_workflow.sh uses the following arguments:

$1: transcriptome Read1 FASTQ path
$2: transcriptome Read2 FASTQ path
$3: methylation Read1 FASTQ path
$4: methylation Read2 FASTQ path
--sample: sample name
--outdir: output directory
--database_dir: reference genome database directory
--chemistry: DD-MET3 or DD-MET5
--core: number of CPU cores
--filter_ch: remove reads with more than n CH methylation sites; set to 0 to disable

Sample Naming Rules

From v2.0.0 onward, sample_id and outdir are validated more strictly:

Whitespace is not allowed
Whitespace-like separators should be normalized to underscores
Keep sample names stable across transcriptome and methylation inputs

Nextflow Samplesheet Format

The required header is:

text

sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2

Example for one dataset:

text

sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
XYRD-WTJW880,/path/to/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz,/path/to/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz

Example when transcriptome and methylation FASTQ counts are not equal:

text

sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2
WTJW969,/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L000_R2.fq.gz
WTJW969,/path/to/WTJW969_E_L004_R1.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz
WTJW969,,,/path/to/WTJW969_Met_L004_R1.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz

Reference Database Requirements

The --database_dir directory should contain:

text

<database_dir>/
|-- bed/
|   |-- chr_len.bed
|   `-- chr_nochrM.bed
|-- fasta/
|   |-- genome.fa
|   |-- genome.fa.fai
|   `-- Bisulfite_Genome/
|-- genes/
|   `-- genes.gtf
`-- star/

Path conventions used by the pipeline:

star/: STAR genome index
fasta/genome.fa: genome FASTA
fasta/: Bismark genome folder after bismark_genome_preparation
genes/genes.gtf: gene annotation
bed/chr_len.bed: chromosome sizes
bed/chr_nochrM.bed: recommended chromosome list without mitochondrial contigs

For detailed construction steps, see:

Repository Layout

After cloning, the main pipeline components are:

nf/main.nf: top-level Nextflow entry point; choose the sub-workflow through --workflow
nf/subworkflows/: workflow definitions for rna_met, met_only, and force_cell
nf/modules/: step-wise processing modules
nf/bin/: helper scripts and barcode resources
nf/nextflow.config: execution profile and resource configuration
nf/nextflow_schema.json: parameter schema
sc_methy_workflow.sh: shell wrapper for dual-omics analysis

Inputs and Data Preparation ​

Required Input Files ​

Shell Input Parameters ​

Sample Naming Rules ​

Nextflow Samplesheet Format ​

Reference Database Requirements ​

Repository Layout ​

Inputs and Data Preparation

Required Input Files

Shell Input Parameters

Sample Naming Rules

Nextflow Samplesheet Format

Reference Database Requirements

Repository Layout