Skip to content

SeekSpace Advanced Analysis: stLearn-based Cell Communication Hotspot Identification

Author: SeekGene
Time: 8 min
Words: 1.5k words
Updated: 2026-02-28
Reads: 0 times
Spatial-seq Analysis Guide Notebooks Cell Communication Analysis
python
import stlearn as st
import numpy as np
import pandas as pd
# matplotlib version 3.5.3, higher versions conflict with stlearn
import matplotlib.pyplot as plt
import scanpy as sc
import re

stLearn Cell Communication Module Input Parameter Description

matrix_path: paths to matrix, feature, barcode files
spatial_path: spatial location path
SAMple_name: sample name
image_path: spatial H&E image path
celltype_path: path to annotated cell type file, requires barcode as row name, column name "celltype" (cell type) as column
spot_diameter_fullres: size of cell/spot, related to spatial distance range in analysis steps, optional 50 or 100
species: species, related to lr_database_bool below. If lr_database_bool is True, must be "human" or "mouse"; if False, not required
lr_database_bool: whether to use stlearn's built-in LR database, default True. If False, provide corresponding database in lr_database
lr_database_path: used with lr_database_bool. If lr_database_bool is False, provide LR database path, file content includes ligand_receptor column (with row names, no column names), e.g., ligand_receptor
ncpus: number of threads, affects speed, can be set higher
cluster_name: set column name for cell types in adata object obs
lr_pair: whether to manually set displayed LR pairs. If None, displays top 3 significant pairs; if set, provide pair names separated by comma, "A_B,C_D"
grid_step: whether to draw grids. Recommended True for data with many cells to reduce runtime; False if not used
n_: number of grid divisions for length and width. Required if grid_step is True

python
matrix_path = ""
spatial_path = ""
sample_name = ""
image_path = ""
celltype_path = ""
spot_diameter_fullres = 50
species = "human"
lr_database_bool = True
lr_database_path = ""
n_cpus = 16
cluster_name = "celltype"
lr_pair = "None"
grid_step = True
n_ = 125

Read SeekSpace data and perform normalization, dimensionality reduction, clustering

python
adata = sc.read_10x_mtx(matrix_path)
spatial = pd.read_csv(spatial_path,sep=",",index_col=0)
spatial = spatial.loc[:,("x","y")]
selected_rows = spatial.loc[spatial.index.isin(adata.obs_names)]
selected_rows.columns = ["imagecol","imagerow"]
selected_rows = selected_rows.reindex(adata.obs_names)
selected_rows = selected_rows*0.265385
a = st.create_stlearn(count=adata.to_df(),spatial=selected_rows,library_id=sample_name, scale=1,image_path=image_path,spot_diameter_fullres=spot_diameter_fullres)
a.layers["raw_count"] = a.X
# Preprocessing
#st.pp.filter_genes(a,min_cells=3)
st.pp.normalize_total(a)
st.pp.log1p(a)
# Keep raw data
a.raw = a
st.pp.scale(a)
st.em.run_pca(a,n_comps=50,random_state=0)
st.pp.neighbors(a,n_neighbors=25,use_rep='X_pca',random_state=0)
st.tl.clustering.louvain(a,random_state=0)
sc.tl.umap(a)

Add annotated cell types to adata object

python
celltype = pd.read_csv(celltype_path,index_col=0)
celltype.index = [re.sub(r'_9$', '', s) for s in celltype.index]
celltype = celltype.loc[a.obs.index]
a.obs[cluster_name] = celltype["celltype"]
a.obs[cluster_name] = a.obs[cluster_name].astype('category')

Divide spatial slice into length n_, width n_

python
if grid_step:
    #n_ = 125
    print(f'{n_} by {n_} has this many spots:\n', n_*n_)
    a = st.tl.cci.grid(a,n_row=n_, n_col=n_, use_label = cluster_name)
    print(a.shape )

Load LR database, calculate significance of LR expression

python

if lr_database_bool:
    lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species=species)
else:
    lrs = pd.read_csv(lr_database_path,names=["0"])
    lrs = np.array(list(lrs["0"]))

#lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species=species)
st.tl.cci.run(a, lrs,
                  min_spots = 20, #Filter out any LR pairs with no scores for less than min_spots
                  distance=None, # None defaults to spot+immediate neighbours; distance=0 for within-spot mode
                  n_pairs=100, # Number of random pairs to generate; low as example, recommend ~10,000
                  n_cpus=8, # Number of CPUs for parallel. If None, detects & use all available.
                  )

Display table of LR expression significance results

Rows: Ligand-Receptor Pairs
Column 1 n_spot: Total number of cells/spots with LR interaction (unfiltered by pval)
Column 2 n_spots_sig: Number of cells/spots with significant LR interaction after corrected pval filtering
Column 3 n_spots_sig_pval: Number of cells/spots with LR interaction filtered by set pval

python
lr_info = a.uns['lr_summary']
lr_info
python
st.tl.cci.adj_pvals(a, correct_axis='spot',
                   pval_adj_cutoff=0.05, adj_method='fdr_bh')

Diagnostic Plot (based on lrfeatures results)

A key aspect of LR analysis is controlling for expression level and frequency of LRs when determining significant hotspots.
Therefore, diagnostic plots should show little to no correlation between LR pair hotspots and expression levels/frequencies.
The following diagnostic methods help check and confirm this; otherwise, a larger number of Permutations might be needed.

  • Left Plot:
    X-axis: LR pairs sorted by n_spots_sig
    Y-axis: Median expression of LR pairs (non-zero)
  • Right Plot:
    X-axis: LR pairs sorted by n_spots_sig
    Y-axis: Proportion of cells with zero expression for LR pair
python
st.pl.lr_diagnostics(a, figsize=(10,2.5))

Plot based on lr_summary results: Showing top 500 and top 50 LR pairs
X-axis represents LR pairs sorted by n_spots_sig
Y-axis represents number of cells/spots with significant LR interaction after corrected pval filtering

python
st.pl.lr_summary(a, n_top=500)
st.pl.lr_summary(a, n_top=50, figsize=(10,3))

Plot based on lr_summary results: Showing top 500 and top 50 LR pairs
X-axis represents LR pairs sorted by n_spots_sig
Y-axis represents total number of cells/spots with LR interaction (unfiltered by pval) Color represents significant and non-significant LR pairs

python
st.pl.lr_n_spots(a, n_top=50, figsize=(11, 3),
                    max_text=100)
st.pl.lr_n_spots(a, n_top=500, figsize=(11, 3),
                    max_text=100)

Predict Significant Interacting Cell Types

python
st.tl.cci.run_cci(a, cluster_name, # Spot cell information either in data.obs or data.uns
                  min_spots=3, # Minimum number of spots for LR to be tested.
                  sig_spots=True, # Only consider neighbourhoods of spots which had significant LR scores.
                  n_perms=1000, # Permutations of cell information to get background, recommend ~1000
                  n_cpus=16
                 )

Diagnostic Plot: Check correlation between LR interaction and cell counts of cell types

If permutations are sufficient, the charts below should show little or no correlation; otherwise, increase n_perms.
X-axis represents cell types
Bar chart left Y-axis represents cell type count
Line chart right Y-axis represents number of interacting LRs for cell type

python
st.pl.cci_check(a, cluster_name)

CCI Network Plot

Displays interaction plot for corresponding LR pair, plotted via per_lr_cci_* results

python
# Visualising the no. of interactions between cell types across all LR pairs #
pos_1 = st.pl.ccinet_plot(a, cluster_name, return_pos=True)

# Just examining the cell type interactions between selected pairs #

if lr_pair == None:
    lr_pair=None
else:
    lr_pair=lr_pair.strip.split(",")

if lr_pair is not None:
    for best_lr in lr_pair:
            st.pl.ccinet_plot(a, cluster_name, best_lr, min_counts=2,
                              figsize=(10,7.5), pos=pos_1
                             )
else:
    lr_pair = a.uns['lr_summary'].index.values[0:3]
    for best_lr in lr_pair[0:3]:
        st.pl.ccinet_plot(a, cluster_name, best_lr, min_counts=2,
                          figsize=(10,7.5), pos=pos_1
                         )

CCI Chord Diagram

The figure shows the interaction chord diagram for corresponding ligand-receptor pairs

python
st.pl.lr_chord_plot(a, cluster_name)
for lr in lr_pair:
    st.pl.lr_chord_plot(a, cluster_name, lr)

CCI Heatmap

Displays interaction heatmap for corresponding LR pairs. X-axis: interacting cell types, Y-axis: interacting LRs. Adjust count via n_top_lrs and n_top_ccis, or specify LR pairs

python
st.pl.lr_cci_map(a, cluster_name, lrs=None, min_total=100, figsize=(20,4))
python
st.pl.lr_cci_map(a, cluster_name, lrs=lr_pair, min_total=100, figsize=(20,4))

Displays interaction heatmap for corresponding LR pairs. X and Y axes display cell types. Can specify displayed LR pairs

python
st.pl.cci_map(a, cluster_name)

lr_pair = a.uns['lr_summary'].index.values[0:3]
for lr in lr_pair[0:3]:
    st.pl.cci_map(a, cluster_name, lr)

Literature Case Analysis

  • Literature 1:
    Literature 'Spatially organized tumor-stroma boundary determines the efficacy of immunotherapy in colorectal cancer patients' used stlearn to reveal cell type interactions at the tumor-stroma boundary.

References

[1] Pham, D., Tan, X., Balderson, B. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat Commun 14, 7739 (2023). [2] Feng, Y., Ma, W., Zang, Y. et al. Spatially organized tumor-stroma boundary determines the efficacy of immunotherapy in colorectal cancer patients. Nat Commun 15, 10259 (2024).

0 comments·0 replies