SeekSpace Advanced Analysis: stLearn-based Cell Communication Hotspot Identification
import stlearn as st
import numpy as np
import pandas as pd
# matplotlib version 3.5.3, higher versions conflict with stlearn
import matplotlib.pyplot as plt
import scanpy as sc
import restLearn Cell Communication Module Input Parameter Description
matrix_path: paths to matrix, feature, barcode files
spatial_path: spatial location path
SAMple_name: sample name
image_path: spatial H&E image path
celltype_path: path to annotated cell type file, requires barcode as row name, column name "celltype" (cell type) as column
spot_diameter_fullres: size of cell/spot, related to spatial distance range in analysis steps, optional 50 or 100
species: species, related to lr_database_bool below. If lr_database_bool is True, must be "human" or "mouse"; if False, not required
lr_database_bool: whether to use stlearn's built-in LR database, default True. If False, provide corresponding database in lr_database
lr_database_path: used with lr_database_bool. If lr_database_bool is False, provide LR database path, file content includes ligand_receptor column (with row names, no column names), e.g., ligand_receptor
ncpus: number of threads, affects speed, can be set higher
cluster_name: set column name for cell types in adata object obs
lr_pair: whether to manually set displayed LR pairs. If None, displays top 3 significant pairs; if set, provide pair names separated by comma, "A_B,C_D"
grid_step: whether to draw grids. Recommended True for data with many cells to reduce runtime; False if not used
n_: number of grid divisions for length and width. Required if grid_step is True
matrix_path = ""
spatial_path = ""
sample_name = ""
image_path = ""
celltype_path = ""
spot_diameter_fullres = 50
species = "human"
lr_database_bool = True
lr_database_path = ""
n_cpus = 16
cluster_name = "celltype"
lr_pair = "None"
grid_step = True
n_ = 125Read SeekSpace data and perform normalization, dimensionality reduction, clustering
adata = sc.read_10x_mtx(matrix_path)
spatial = pd.read_csv(spatial_path,sep=",",index_col=0)
spatial = spatial.loc[:,("x","y")]
selected_rows = spatial.loc[spatial.index.isin(adata.obs_names)]
selected_rows.columns = ["imagecol","imagerow"]
selected_rows = selected_rows.reindex(adata.obs_names)
selected_rows = selected_rows*0.265385
a = st.create_stlearn(count=adata.to_df(),spatial=selected_rows,library_id=sample_name, scale=1,image_path=image_path,spot_diameter_fullres=spot_diameter_fullres)
a.layers["raw_count"] = a.X
# Preprocessing
#st.pp.filter_genes(a,min_cells=3)
st.pp.normalize_total(a)
st.pp.log1p(a)
# Keep raw data
a.raw = a
st.pp.scale(a)
st.em.run_pca(a,n_comps=50,random_state=0)
st.pp.neighbors(a,n_neighbors=25,use_rep='X_pca',random_state=0)
st.tl.clustering.louvain(a,random_state=0)
sc.tl.umap(a)Add annotated cell types to adata object
celltype = pd.read_csv(celltype_path,index_col=0)
celltype.index = [re.sub(r'_9$', '', s) for s in celltype.index]
celltype = celltype.loc[a.obs.index]
a.obs[cluster_name] = celltype["celltype"]
a.obs[cluster_name] = a.obs[cluster_name].astype('category')Divide spatial slice into length n_, width n_
if grid_step:
#n_ = 125
print(f'{n_} by {n_} has this many spots:\n', n_*n_)
a = st.tl.cci.grid(a,n_row=n_, n_col=n_, use_label = cluster_name)
print(a.shape )Load LR database, calculate significance of LR expression
if lr_database_bool:
lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species=species)
else:
lrs = pd.read_csv(lr_database_path,names=["0"])
lrs = np.array(list(lrs["0"]))
#lrs = st.tl.cci.load_lrs(['connectomeDB2020_lit'], species=species)
st.tl.cci.run(a, lrs,
min_spots = 20, #Filter out any LR pairs with no scores for less than min_spots
distance=None, # None defaults to spot+immediate neighbours; distance=0 for within-spot mode
n_pairs=100, # Number of random pairs to generate; low as example, recommend ~10,000
n_cpus=8, # Number of CPUs for parallel. If None, detects & use all available.
)Display table of LR expression significance results
Rows: Ligand-Receptor Pairs
Column 1 n_spot: Total number of cells/spots with LR interaction (unfiltered by pval)
Column 2 n_spots_sig: Number of cells/spots with significant LR interaction after corrected pval filtering
Column 3 n_spots_sig_pval: Number of cells/spots with LR interaction filtered by set pval
lr_info = a.uns['lr_summary']
lr_infost.tl.cci.adj_pvals(a, correct_axis='spot',
pval_adj_cutoff=0.05, adj_method='fdr_bh')Diagnostic Plot (based on lrfeatures results)
A key aspect of LR analysis is controlling for expression level and frequency of LRs when determining significant hotspots.
Therefore, diagnostic plots should show little to no correlation between LR pair hotspots and expression levels/frequencies.
The following diagnostic methods help check and confirm this; otherwise, a larger number of Permutations might be needed.
- Left Plot:
X-axis: LR pairs sorted by n_spots_sig
Y-axis: Median expression of LR pairs (non-zero) - Right Plot:
X-axis: LR pairs sorted by n_spots_sig
Y-axis: Proportion of cells with zero expression for LR pair
st.pl.lr_diagnostics(a, figsize=(10,2.5))Plot based on lr_summary results: Showing top 500 and top 50 LR pairs
X-axis represents LR pairs sorted by n_spots_sig
Y-axis represents number of cells/spots with significant LR interaction after corrected pval filtering
st.pl.lr_summary(a, n_top=500)
st.pl.lr_summary(a, n_top=50, figsize=(10,3))Plot based on lr_summary results: Showing top 500 and top 50 LR pairs
X-axis represents LR pairs sorted by n_spots_sig
Y-axis represents total number of cells/spots with LR interaction (unfiltered by pval) Color represents significant and non-significant LR pairs
st.pl.lr_n_spots(a, n_top=50, figsize=(11, 3),
max_text=100)
st.pl.lr_n_spots(a, n_top=500, figsize=(11, 3),
max_text=100)Predict Significant Interacting Cell Types
st.tl.cci.run_cci(a, cluster_name, # Spot cell information either in data.obs or data.uns
min_spots=3, # Minimum number of spots for LR to be tested.
sig_spots=True, # Only consider neighbourhoods of spots which had significant LR scores.
n_perms=1000, # Permutations of cell information to get background, recommend ~1000
n_cpus=16
)Diagnostic Plot: Check correlation between LR interaction and cell counts of cell types
If permutations are sufficient, the charts below should show little or no correlation; otherwise, increase n_perms.
X-axis represents cell types
Bar chart left Y-axis represents cell type count
Line chart right Y-axis represents number of interacting LRs for cell type
st.pl.cci_check(a, cluster_name)CCI Network Plot
Displays interaction plot for corresponding LR pair, plotted via per_lr_cci_* results
# Visualising the no. of interactions between cell types across all LR pairs #
pos_1 = st.pl.ccinet_plot(a, cluster_name, return_pos=True)
# Just examining the cell type interactions between selected pairs #
if lr_pair == None:
lr_pair=None
else:
lr_pair=lr_pair.strip.split(",")
if lr_pair is not None:
for best_lr in lr_pair:
st.pl.ccinet_plot(a, cluster_name, best_lr, min_counts=2,
figsize=(10,7.5), pos=pos_1
)
else:
lr_pair = a.uns['lr_summary'].index.values[0:3]
for best_lr in lr_pair[0:3]:
st.pl.ccinet_plot(a, cluster_name, best_lr, min_counts=2,
figsize=(10,7.5), pos=pos_1
)CCI Chord Diagram
The figure shows the interaction chord diagram for corresponding ligand-receptor pairs
st.pl.lr_chord_plot(a, cluster_name)
for lr in lr_pair:
st.pl.lr_chord_plot(a, cluster_name, lr)CCI Heatmap
Displays interaction heatmap for corresponding LR pairs. X-axis: interacting cell types, Y-axis: interacting LRs. Adjust count via n_top_lrs and n_top_ccis, or specify LR pairs
st.pl.lr_cci_map(a, cluster_name, lrs=None, min_total=100, figsize=(20,4))st.pl.lr_cci_map(a, cluster_name, lrs=lr_pair, min_total=100, figsize=(20,4))Displays interaction heatmap for corresponding LR pairs. X and Y axes display cell types. Can specify displayed LR pairs
st.pl.cci_map(a, cluster_name)
lr_pair = a.uns['lr_summary'].index.values[0:3]
for lr in lr_pair[0:3]:
st.pl.cci_map(a, cluster_name, lr)Literature Case Analysis
- Literature 1:
Literature 'Spatially organized tumor-stroma boundary determines the efficacy of immunotherapy in colorectal cancer patients' used stlearn to reveal cell type interactions at the tumor-stroma boundary.
References
[1] Pham, D., Tan, X., Balderson, B. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat Commun 14, 7739 (2023). [2] Feng, Y., Ma, W., Zang, Y. et al. Spatially organized tumor-stroma boundary determines the efficacy of immunotherapy in colorectal cancer patients. Nat Commun 15, 10259 (2024).
