Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free
That UMAP Plot Might Be Lying to You—Here’s What You Need to Know
Published 4 days ago • 2 min read
Hello Bioinformatics lovers,
Tommy here. It was a crazy week that I had 2 presentations.
Enough life lessons (I started to write life lessons beyond my bioinformatics posts, ignore them if you only like bioinformatics!),
Today, let’s talk about how dimensionality reduction—PCA, t-SNE, UMAP—can shape the way we see single-cell data… and how it can also mislead us. How to Read Your Single-Cell Plots Without Fooling Yourself
That UMAP plot you’re staring at? It might not be telling you the truth.
Let’s start with a meme-worthy idea: Same hand. Different angle. Completely different interpretation. That’s exactly what happens when you project high-dimensional data (like thousands of genes per cell) into 2D.
You’re changing perspective. And that changes meaning.
In single-cell RNA-seq, we work with thousands of features. You can’t plot a 20,000-dimensional space. So we reduce it.
PCA. t-SNE. UMAP. Each method gives a different story. Let’s break them down.
PCA is linear. The axes are ranked by variance. Distances between cells mean something. If two cells are far apart in PCA space, their global gene expression patterns are truly different.
But PCA is blind to nonlinear relationships.
t-SNE and UMAP are nonlinear. They preserve local structure. If two cells are near each other in high-dimensional space, t-SNE or UMAP will try to keep them close in 2D.
But the global distances? Total guesswork. Two clusters may look far apart… but they may not be.
So why do we still use UMAP and t-SNE?
Because they shine at visualizing local structure:
Rare cell types
Developmental trajectories
Branching lineages
They help you see potential patterns—but not measure them.
Here’s the danger: Over-interpreting the 2D plot.
Don’t assume cluster distance means biological difference.
Don’t trust that gap or island unless you validate it.
Don’t let your eye trick your mind.
What to do instead?
Use PCA when you need interpretability (Note, the clustering of cells step uses the PCA coordinates)
Use UMAP/t-SNE to visualize the possible structure
Cross-validate with marker genes, expression heatmaps, or statistical tests
Always ask: “Does this make biological sense?”
Final thought: Dimensionality reduction is like a sketch—it simplifies reality. But you are the one who gives it meaning.
That UMAP plot? It’s not your conclusion. It’s your starting point.
Now go question everything.
Let me know: What’s the biggest mistake you’ve seen in interpreting single-cell plots? Reply and share your story. I read every message.
Happy Learning!
Tommy aka crazyhottommy
Other posts you may find useful
H3K4me3 is a post-transcriptional histone mark | bioRxiv
Want to reorder bars in ggplot2? Here is a breakdown.
Survival analysis is a critical skill for bioinformaticians. But be careful with interpretation.
When should you use a package vs. solving from scratch in bioinformatics?
Looking for the best bioinformatics newsletter?Welcome to chatomics, a practical, beginner-friendly, and coding-focused newsletter read by thousands of scientists and data enthusiasts worldwide.
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free