profile

Chatomics! — The Bioinformatics Newsletter

That UMAP Plot Might Be Lying to You—Here’s What You Need to Know


Hello Bioinformatics lovers,

Tommy here. It was a crazy week that I had 2 presentations.

I am always nervous when speaking in public.

Then, I realized pressure is always there, no matter what stage of life you are in.

Enough life lessons (I started to write life lessons beyond my bioinformatics posts, ignore them if you only like bioinformatics!),

Today, let’s talk about how dimensionality reduction—PCA, t-SNE, UMAP—can shape the way we see single-cell data… and how it can also mislead us.

How to Read Your Single-Cell Plots Without Fooling Yourself

That UMAP plot you’re staring at?
It might not be telling you the truth.

Let’s start with a meme-worthy idea:
Same hand. Different angle. Completely different interpretation.
That’s exactly what happens when you project high-dimensional data (like thousands of genes per cell) into 2D.

You’re changing perspective. And that changes meaning.

In single-cell RNA-seq, we work with thousands of features.
You can’t plot a 20,000-dimensional space.
So we reduce it.

PCA.
t-SNE.
UMAP.
Each method gives a different story.
Let’s break them down.


PCA is linear.
The axes are ranked by variance.
Distances between cells mean something.
If two cells are far apart in PCA space, their global gene expression patterns are truly different.

But PCA is blind to nonlinear relationships.


t-SNE and UMAP are nonlinear.
They preserve local structure.
If two cells are near each other in high-dimensional space, t-SNE or UMAP will try to keep them close in 2D.

But the global distances?
Total guesswork.
Two clusters may look far apart… but they may not be.


So why do we still use UMAP and t-SNE?

Because they shine at visualizing local structure:

  • Rare cell types
  • Developmental trajectories
  • Branching lineages

They help you see potential patterns—but not measure them.


Here’s the danger:
Over-interpreting the 2D plot.

  • Don’t assume cluster distance means biological difference.
  • Don’t trust that gap or island unless you validate it.
  • Don’t let your eye trick your mind.

What to do instead?

  • Use PCA when you need interpretability (Note, the clustering of cells step uses the PCA coordinates)
  • Use UMAP/t-SNE to visualize the possible structure
  • Cross-validate with marker genes, expression heatmaps, or statistical tests
  • Always ask: “Does this make biological sense?”

Final thought:
Dimensionality reduction is like a sketch—it simplifies reality.
But you are the one who gives it meaning.

That UMAP plot?
It’s not your conclusion.
It’s your starting point.

Now go question everything.


Let me know:
What’s the biggest mistake you’ve seen in interpreting single-cell plots?
Reply and share your story. I read every message.

Happy Learning!

Tommy aka crazyhottommy

Other posts you may find useful

  1. H3K4me3 is a post-transcriptional histone mark | bioRxiv
  2. Want to reorder bars in ggplot2? Here is a breakdown.
  3. Survival analysis is a critical skill for bioinformaticians. But be careful with interpretation.
  4. When should you use a package vs. solving from scratch in bioinformatics?
  5. How to assign unique IDs to combinations of multiple columns in an R dataframe.
  6. I had a roadmap for biology -> computation. What about the reverse?
  7. How to map mouse genes to human genes.
  8. what is cloud computing? clearly explained!
  9. Bioinformatics sounds exciting—until you spend 6 hours just trying to install a tool.

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it (I just passed 10K subscribers!).
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Chatomics! — The Bioinformatics Newsletter

Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free

Share this page