That UMAP Plot Might Be Lying to You—Here’s What You Need to Know

Published 28 days ago • 2 min read

Hello Bioinformatics lovers,

Tommy here. It was a crazy week that I had 2 presentations.

I am always nervous when speaking in public.

Then, I realized pressure is always there, no matter what stage of life you are in.

No alternative text description for this image — *I rehearsed 10 times before my PhD defense*

Enough life lessons (I started to write life lessons beyond my bioinformatics posts, ignore them if you only like bioinformatics!),

Today, let’s talk about how dimensionality reduction—PCA, t-SNE, UMAP—can shape the way we see single-cell data… and how it can also mislead us.

How to Read Your Single-Cell Plots Without Fooling Yourself

That UMAP plot you’re staring at?
It might not be telling you the truth.

Let’s start with a meme-worthy idea:
Same hand. Different angle. Completely different interpretation.
That’s exactly what happens when you project high-dimensional data (like thousands of genes per cell) into 2D.

You’re changing perspective. And that changes meaning.

In single-cell RNA-seq, we work with thousands of features.
You can’t plot a 20,000-dimensional space.
So we reduce it.

PCA.
t-SNE.
UMAP.
Each method gives a different story.
Let’s break them down.

PCA is linear.
The axes are ranked by variance.
Distances between cells mean something.
If two cells are far apart in PCA space, their global gene expression patterns are truly different.

But PCA is blind to nonlinear relationships.

t-SNE and UMAP are nonlinear.
They preserve local structure.
If two cells are near each other in high-dimensional space, t-SNE or UMAP will try to keep them close in 2D.

But the global distances?
Total guesswork.
Two clusters may look far apart… but they may not be.

So why do we still use UMAP and t-SNE?

Because they shine at visualizing local structure:

Rare cell types
Developmental trajectories
Branching lineages

They help you see potential patterns—but not measure them.

Here’s the danger:
Over-interpreting the 2D plot.

Don’t assume cluster distance means biological difference.
Don’t trust that gap or island unless you validate it.
Don’t let your eye trick your mind.

What to do instead?

Use PCA when you need interpretability (Note, the clustering of cells step uses the PCA coordinates)
Use UMAP/t-SNE to visualize the possible structure
Cross-validate with marker genes, expression heatmaps, or statistical tests
Always ask: “Does this make biological sense?”

Final thought:
Dimensionality reduction is like a sketch—it simplifies reality.
But you are the one who gives it meaning.

That UMAP plot?
It’s not your conclusion.
It’s your starting point.

Now go question everything.

Let me know:
What’s the biggest mistake you’ve seen in interpreting single-cell plots?
Reply and share your story. I read every message.

Happy Learning!

Tommy aka crazyhottommy

Chatomics! — The Bioinformatics Newsletter

That UMAP Plot Might Be Lying to You—Here’s What You Need to Know

Other posts you may find useful