How to Build a Reference Genome
Time: 2 min
Words: 370 words
Updated: 2026-02-27
Reads: 0 times
Scenario 1: Building a Reference Genome Compatible with Single-cell Data from Different Platforms
NOTE
If you have both 10X Genomics single-cell data and SeekSpace™ product single-cell data, it is recommended to use 10X CellRanger to build the reference genome. SeekSpace™ Tools is compatible with reference genomes built by CellRanger.
The code for processing the gene annotation file (GTF file) is as follows:
shell
/path/to/cellranger mkgtf Homo_sapiens.GRCh38.ensembl.gtf Homo_sapiens.GRCh38.ensembl.filtered.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lncRNA \
--attribute=gene_biotype:antisense \
--attribute=gene_biotype:IG_LV_gene \
--attribute=gene_biotype:IG_V_gene \
--attribute=gene_biotype:IG_V_pseudogene \
--attribute=gene_biotype:IG_D_gene \
--attribute=gene_biotype:IG_J_gene \
--attribute=gene_biotype:IG_J_pseudogene \
--attribute=gene_biotype:IG_C_gene \
--attribute=gene_biotype:IG_C_pseudogene \
--attribute=gene_biotype:TR_V_gene \
--attribute=gene_biotype:TR_V_pseudogene \
--attribute=gene_biotype:TR_D_gene \
--attribute=gene_biotype:TR_J_gene \
--attribute=gene_biotype:TR_J_pseudogene \
--attribute=gene_biotype:TR_C_gene
cellranger mkref --genome=GRCh38 --fasta=GRCh38.fa --genes=GRCh38-filtered-ensembl.gtf
cd GRCh38/genes
gunzip -dc genes.gtf.gz > genes.gtfTIP
- When the reference genome built by Cell Ranger is incompatible with the STAR version of SeekSpace™ Tools, you can specify the Cell Ranger STAR path to SeekSpace™ Tools, for example:
--star_path /path/to/cellranger-5.0.1/lib/bin/STAR. - Chromosome names in the FASTA file must match those in the GTF file. For example, if chromosome 1 in FASTA is named
chr1, then chromosome 1 in the GTF file must also bechr1.
Scenario 2: Only SeekSpace™ Products, No Need to Consider Platform Compatibility
The code for building the STAR index is as follows:
shell
/demo/seekspacetools_v1.0.2/bin/STAR \
--runMode genomeGenerate \
--runThreadN 16 \
--genomeDir /path/to/star \
--genomeFastaFiles /path/to/genome.fa \
--sjdbGTFfile /path/to/genome.gtf \
--sjdbOverhang 149 \
--limitGenomeGenerateRAM 17179869184TIP
- Chromosome names in the FASTA file must match those in the GTF file. For example, if chromosome 1 in FASTA is named
chr1, then chromosome 1 in the GTF file must also bechr1.
