I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!
Hello Bioinformatics lovers, I enabled my blog https://divingintogeneticsandgenomics.com/#posts RSS feed. So whenever I have a new blog post, it will be sent to your email. This is different from my weekly Saturday newsletter. My blog posts are mostly technical tutorials. This is new update from my blog: Exploring Spatial Transcriptomics A Dive into Visium Data Analysis in PythonPublished on May 3, 2025 Why guest posting?I want to write more hands-on tutorials, but I realized:
So that’s why I started to experiment guest posting! If you want to do a guest posting in my blog which gets 30k views per month, feel free to contact me on LinkedIn. I love to collaborate and share knowledge! This is the first ever guest blog post in my blog site Author: Angel Galvez Merchan
Angel is going to introduce the basics of spatial transcriptomics data analysis with a Visium dataset using Python. I love the way how he introduced the spatial data on top of his previous experience with single-cell data and he explains the details in a very accessible way. Enjoy! Overview of Spatial Transcriptomics TechnologiesSpatial transcriptomics is transforming our understanding of tissue biology by enabling researchers to measure gene expression within the spatial context of intact tissues. Unlike traditional single-cell RNA sequencing (scRNA-seq), which dissociates cells and loses spatial information, spatial transcriptomics retains the physical location of gene expression, opening up powerful insights into tissue architecture, cellular niches, and microenvironments in health and disease. Several technologies have emerged in recent years, each with distinct approaches, strengths, and trade-offs:
Why Focus on Visium?In this blog post, we will focus on 10x Genomics Visium because it provides a particularly smooth transition into spatial transcriptomics for those already familiar with scRNA-seq. Its data structure and analysis workflow align closely with standard scRNA-seq practices, allowing researchers to reuse tools like In fact, this was how I personally began exploring spatial transcriptomics: by extending familiar scRNA-seq workflows into spatial data, and gradually discovering the rich new layers it adds to biological interpretation. I hope this approach helps you make the transition as smooth and enjoyable as it did for me. Link to Google Colab: https://colab.research.google.com/drive/1nrnBi1LsigG3YEDEJTg5xGwq6c3ye4X3?usp=sharing Imports and installs
Read dataLet’s kick things off by loading a real Visium dataset. For this tutorial, we will use the “V1_Human_Lymph_Node” sample provided by 10x Genomics, which is conveniently available through This dataset comes from a human lymph node section and includes gene expression data, spatial coordinates, and a histology image.
Understanding the Visium data structureLet’s inspect the adata object:
At first glance, the Visium dataset looks a lot like a standard scRNA-seq dataset. And that is because they share the same foundation: a gene count matrix, with genes as rows and observations (in this case, spatial spots) as columns. But Visium adds two extra layers of spatial context:
These extra layers turn a familiar data object into something more informative, adding new dimensions that change how we explore and interpret the data. Shared foundation: the Gene Count MatrixThis matrix is functionally very similar to what you would find in scRNA-seq workflows. Each spot (a small region of tissue) acts like a pseudo-cell, and each entry in the matrix represents the expression level of a gene in that spot. Because of this similarity, many pre-processing steps (depth-normalization, log transformation, dimensionality reduction, clustering, etc.) can be performed using the same tools and techniques used for scRNA-seq. QC metricsMost QC metrics used in scRNA-seq can be translated to Visium data. We can use the same function in
FiltersJust like in single cell RNA-seq, it makes sense to filter out spots with very few detected genes, as these are: often low-quality or empty regions.
NormalizationIn most scRNA-seq workflows, we perform cell-depth normalization to correct for differences in sequencing depth or capture efficiency, followed by a log transformation to stabilize variance. However, there is an important caveat: Visium != single cellsAlthough Visium data looks like single-cell data (a gene-by-spot matrix), each spot captures transcripts from a multi-cellular region of the tissue, making it more akin to a series of bulk RNA-seq samples with spatial information. That difference matters, especially when thinking about how to normalize the data. In spatial transcriptomics, the total number of transcripts in a spot might actually reflect meaningful biology, like differences in tissue density, cellularity, or metabolic activity. Let’s visualize the total counts per spot:
You might notice that total counts show some spatial structure and aren’t randomly distributed across the tissue. In some cases, these patterns may align with known tissue morphology, hinting at potential biological relevance. This opens up the possibility that total counts may reflect meaningful differences, like local cell density or transcriptional activity. So… what should we do?There is no single best answer right now. How to normalize spatial transcriptomics data is still an open question in the field, and different approaches may be better suited for different downstream analyses. Sometimes, the spatial structure you see in total counts may reflect real biology, but other times, it could stem from technical artifacts, such as differences in how well certain regions of the slide captured transcripts. Some areas might just perform better than others in terms of RNA capture, leading to apparent “hotspots” or “cold spots” that aren’t necessarily biologically meaningful. For the purpose of this tutorial, we will keep things simple and apply the standard scRNA-seq normalization approach. But it is important to be aware of the assumptions behind this method, and the limitations it brings when working with spatial data.
This gives us a log-normalized expression matrix, similar to what we would use in single-cell workflows. Just keep in mind that we are treating total counts as technical noise, which may not always be true. Highly variable genesAs in scRNA-seq, we can select highly variable genes (HVGs) to focus on the most informative features for downstream analysis:
Dimensionality reduction and visualizationYou can apply the same dimensionality reduction and visualization pipeline used in scRNA-seq. After normalization and HVG selection, standard steps like PCA, nearest neighbor graph construction, leiden clustering and UMAP can be used to project the data into a low-dimensional space:
You can try to identify cell types in Visium data using the same approach as in single-cell RNA-seq: by plotting known marker genes. For example, to explore B cell populations, we can visualize expression of a few canonical B cell markers:
This brings us back to the key difference we mentioned earlier: each dot in the UMAP represents a spot, not an individual cell. Unlike in scRNA-seq, we don’t always see crisp clusters corresponding to distinct cell types. Instead, marker gene expression often appears broadly distributed, reflecting the fact that many spots likely contain mixtures of cell types. In the case of the lymph node, where B cells are abundant, this result makes sense. It is somewhat expected that most spots contain at least some B cell transcript signal. The Unique Layers of Spatial TranscriptomicsSo far, we have worked with the gene count matrix, a structure that closely mirrors scRNA-seq. Now, let’s look at what makes spatial transcriptomics different: the additional layers of spatial context. Histology ImageAlongside the gene expression data, Visium also provides a histology image of the tissue section. This image is stored within the
We can visualize the tissue image using:
Calculating features in imageTissue images can contain one or more channels. For example, fluorescence-based data might include separate channels for different markers. Even in standard histology images like H&E, we can extract useful information by calculating image-based features. These features might include pixel intensity, texture, or structural patterns associated with different tissue regions or staining types. While in this example the image isn’t rich in contrast and has no multi-channel content, we willll still use it to demonstrate how image features can be extracted and used in spatial analyses.
These features are computed using a circular region around each spot on the tissue image, providing a local summary of the image content beneath each capture area.
This plot represents the mean pixel intensity from the image under each spot, giving us a rough idea of local brightness across the tissue. Clustering on image featuresSince the image features are numerical values (just like gene expression), we can use them for clustering. This allows us to group spots based on similarities in their local tissue appearance.
Spatial CoordinatesEach spot in Visium data comes with associated (x, y) coordinates that indicate its physical location on the tissue slide. These spatial coordinates allow us to map gene expression data back onto the tissue’s layout. You can access them in
This returns an array with the x and y positions (in pixels) for each spot on the slide. Building a Spatial Neighborhood GraphTo incorporate spatial relationships into our analysis, we can build a spatial neighborhood graph. This graph defines which spots are considered neighbors based on their physical proximity on the slide. With
The resulting graph is stored in
We can also visualize the graph to see how spots are linked together.
Integrating the Layers of Spatial TranscriptomicsSo far, we’ve explored the core components of spatial transcriptomics: the gene count matrix, the histology image, and the spatial coordinates. Each of these layers is valuable on its own, but the real power of spatial transcriptomics emerges when we combine them. By integrating gene expression with spatial context and image-derived features, we can uncover patterns that would be invisible in standard scRNA-seq. This is where spatial transcriptomics moves beyond simply measuring gene expression. It starts to reveal how cells are organized, how they interact, and how structure relates to function. In the following sections, we will explore a few ways to combine these layers for spatially aware analysis. Gene expression features observed on tissueOne of the most powerful aspects of spatial transcriptomics is the ability to visualize gene expression directly on the tissue. This allows us to observe how specific genes are spatially distributed and how they relate to tissue structure. For example, we can visualize expression of the T cell marker CD3E:
We can see that CD3E expression is broadly distributed across the tissue, with some localized areas of higher expression. This pattern likely reflects regions with enriched T cell presence. Spatial Mapping of Gene Expression ClustersWe can visualize the Leiden clusters computed from the gene expression data directly on the tissue. This allows us to see whether spots that are transcriptionally similar are also spatially close to one another and whether distinct expression programs correspond to specific tissue regions.
Cluster 9 stands out as a spatially localized group of spots. To understand what characterizes this cluster, we can examine its marker genes:
The results indicate that cluster 9 is enriched for interferon-induced genes, suggesting that this region of the tissue is actively responding to interferon signaling. To further support this, we can visualize the expression of additional known interferon-stimulated genes that were not among the top-ranked markers.
Finding Spatially Variable GenesIn addition to identifying genes that vary across transcriptional clusters, we can also look for genes that show spatial structure, that is, genes whose expression levels are non-randomly distributed across the tissue. One way to quantify this is with Moran’s I, a measure of spatial autocorrelation. Genes with high Moran’s I values tend to be expressed in spatially coherent patterns, rather than scattered randomly across spots. We can compute this using:
Note that this functions uses the spatial neighborhood graph that we computed in previous sections. We can check the top spatially structured genes:
And we can visualize some of them, confirming their spatial structure:
Segmenting the Tissue into Spatial DomainsA key goal in spatial transcriptomics is to define tissue regions that reflect both molecular identity and physical organization. In contrast to clustering based purely on gene expression, spatial domain identification focuses on grouping spots that are transcriptionally similar and spatially close, capturing functional zones within the tissue. There are many sophisticated methods for identifying spatial domains including concordex, STAGATE, and others. Working with these tools and deeply exploring spatial domain detection could easily be a blog post (or several) on its own. For now, we will use a simple but intuitive approach that captures the essence of spatial domain identification. We take the principal components (PCs) from the PCA of the gene count matrix (a compact summary of transcriptional variation) and concatenate them with the spatial coordinates of each spot. This creates a feature space that incorporates both molecular identity and physical location. While this method lacks the complexity of more specialized spatial domain tools, it is surprisingly effective at revealing spatially coherent, transcriptionally distinct regions. It also serves as a great way to build intuition about how gene expression and tissue architecture align.
But wait, something looks off. After running clustering on the combined PCA and spatial coordinates, we get a very clean segmentation of the tissue, but the regions look suspiciously like purely spatial clusters. If we inspect the input for or combined matrix, we can see the issue: The PC values look like this:
But the spatial coordinates are on a completely different scale:
Because the spatial coordinates have much larger magnitudes, they dominate the clustering, overpowering the transcriptional signal. The result is clustering that is driven almost entirely by physical location. To fix this, we simply need to standardize both the PCA and spatial features so they contribute equally:
We now get regions that reflect a balance of spatial coherence and transcriptional similarity, a much more meaningful segmentation of the tissue Here is a question for you: What happens if you use 20 principal components instead of 10? What do you think might change? I will leave this as an exercise for the reader. Take a moment to guess before running the code below!
ConclusionsIn this tutorial, we explored the foundational concepts of spatial transcriptomics using 10x Visium data, with a perspective rooted in scRNA-seq workflows. Beginning with the familiar gene count matrix, we examined how spatial data builds on this structure by introducing additional layers, such as spatial coordinates and histology images. This tutorial serves as an entry point into the field. While there is much more to explore (from spatial domain modeling, to cell-cell interaction analysis) we hope it has provided a solid foundation and demonstrated how spatial transcriptomics extends naturally from single-cell analysis, while offering entirely new opportunities for discovery. Happy exploring! Notes from TommyWe developed https://monkeybread.readthedocs.io/en/latest/index.html when I was at Immunitas, and we use a counts table of the cell type counts around each cell within 50um, and cluster that matrix to find cellular niches and we heard good words from users. Tutorial here https://monkeybread.readthedocs.io/en/latest/notebooks/tutorial.html Happy Learning! Tommy aka. crazyhottommy |
I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!