Skip to content

Barcode-Converter

Author: SeekGene
Time: 4 min
Words: 742 words
Updated: 2025-08-11
Reads: 0 times

This tool converts CellBarcodes in single-cell sequencing data.

Conversion workflow

First, detect CellBarcodes in R1 of the input FASTQ files. One mismatch is allowed for CellBarcode matching.

Then, complete the conversion using the whitelist mapping.

Download

Download package

NOTE

Download and extract the package.

Quick start

TIP

Before converting, make sure conv.0.1.2 is properly installed and the corresponding whitelist files are prepared.

Example 1: Convert DD CellBarcodes to 10x

--wl1 specifies the input whitelist file, --wl2 specifies the output whitelist file, --rs specifies the start position pattern of the converted CellBarcode, -t specifies the number of threads, and -o specifies the output directory.

text
/path/to/conv.0.1.2 --fq1 ./demo_dd_S39_L001_R1_001.fastq.gz --fq2 ./demo_dd_S39_L001_R2_001.fastq.gz --wl1 ./P3CB.barcode.txt.gz --wl2 3M-february-2018.txt.gz --rs 17C+T -t 12 -o output/

Example 2: Convert multiple files for the same sample

--fq1 and --fq2 can accept multiple files separated by spaces; ensure the order of R1 and R2 files matches.

text
/path/to/conv.0.1.2 --fq1 ./demo_dd_S39_L001_R1_001.fastq.gz ./demo_dd_S39_L001_R1_002.fastq.gz --fq2 ./demo_dd_S39_L001_R2_001.fastq.gz ./demo_dd_S39_L001_R2_002.fastq.gz  --wl1 ./P3CB.barcode.txt.gz --wl2 3M-february-2018.txt.gz --rs 17C+T -t 12 -o output/

Example 3: Use an existing CellBarcode mapping

Use this when multiple omics of the same sample need a consistent mapping.

text
/path/to/conv.0.1.2 --fq1 ./demo_dd_S39_L001_R1_001.fastq.gz  --fq2 ./demo_dd_S39_L001_R2_001.fastq.gz --map ../map.tsv --rs 17C+T -t 12 -o output/

Options

OptionDescription
--fq1 ...Input R1 FASTQ files; you can specify multiple runs for one sample, space-separated
--fq2 ...Input R2 FASTQ files; you can specify multiple runs for one sample, space-separated
--rsR1 structure pattern using digits/+ and letters: digits = base count, + = remaining bases, C = CellBarcode bases, T = other bases [default: 17C+T]. DD series use 17C+T
--wl1Whitelist file for the input FASTQ chemistry; DD series use barcode/P3CB.barcode.txt
--wl2Whitelist file for the target (output) chemistry; e.g., 3' libraries use 3M-february-2018.txt.gz, 5' libraries use 737K-august-2016.txt.gz
--mapBarcode mapping file with two tab-separated columns: first = input whitelist, second = output whitelist. --map takes precedence over --wl1 and --wl2. At least one of --wl1/--wl2 or --map must be provided
--no-multiRedistribute reads with multiple matching barcodes; enabled by default
-t, --threadsNumber of threads [default: 10]
-o, --outOutput directory [default: ./]
-h, --helpPrint help
-V, --versionPrint version

Output files

The output directory contains:

  1. <OUT>/*fastq.gz: Converted FASTQ files
  2. <OUT>/multi_*fastq.gz: Intermediate FASTQ files for reads with multiple matching barcodes; candidate barcodes joined by "_"
  3. <OUT>/map.txt: Barcode mapping (two columns, tab-separated): first = input whitelist, second = output whitelist

Notes

IMPORTANT

Vendors may provide multiple whitelists and different products may use different ones. Set --wl1 and --wl2 correctly. 10x Genomics barcodes are defined in cellranger-*/lib/python/cellranger/chemistry_defs.json or cellranger-5.0.1/lib/python/cellranger/chemistry.py; barcodes are under cellranger-*/lib/python/cellranger/barcodes/.

NOTE

When comparing the counts of CellBarcodes in --wl1 vs --wl2: if --wl1 has more barcodes than --wl2, 10M reads are sampled to count CellBarcodes. If the input FASTQ contains more unique CellBarcodes than the --wl2 whitelist size, map the most frequent input CellBarcodes to the --wl2 barcodes. If not, map all observed input CellBarcodes to --wl2 and randomly assign the remaining --wl2 barcodes.

WARNING

When --no-multi is set, after counting reads per CellBarcode, reads with multiple matching barcodes are redistributed. Candidate barcodes are sorted by read counts; assign to the barcode with the highest reads. If the top two barcodes have equal read counts, skip assignment.

TIP

For multiple omics of the same sample (e.g., 5'/TCR/BCR), use a consistent mapping. Convert one data type first (RNA library is recommended), then reuse its output map.txt to convert other data types.

NOTE

conv.0.1.2 is an upgraded version of conv: it fixes high memory usage when the number of worker threads is small by setting read_ahead's chunk_size and chunk_queue_size to the square of the thread count instead of the default 100.

0 comments·0 replies