profile

Hi! I'm Tommy Tang

Heatmaps Are Lying to You (Unless You Understand These 2 Things)


Hello Bioinformatics lovers,

How is your learning going? Are you learning a bit more than yesterday?

The past week was challenging for me.

Two sick kids and a sick wife.

Three extracted wisdom teeth did not stop bleeding

I had them fixed, going back to the dental office, and then I became sick.

(If you have kids, you know how it works when one of them is sick...)

However, I still got quite a bit done. I made this video for the ChIP-seq analysis tutorial in the playlist.

Okay, let's talk about heatmaps!

If you work in bioinformatics, you’ve made a heatmap. Probably dozens.

But here’s the thing:

Most people making heatmaps don’t really understand them.

They look good. They’re colorful. They get published.
But they often mislead.

It is so easy to make one by using a heatmap function in a package, but do you really understand it?

Here’s what you need to know to stop making bad heatmaps —

and 7 resources that helped me finally get it right.


What is a heatmap?

A heatmap is a way to visualize a matrix of values using color.

Rows might be genes. Columns could be samples, time points, or conditions.

Each cell shows a value — often expression — represented by color intensity.

Sounds simple. It’s not.

Every step involves decisions. And each one can distort the story your data is trying to tell.


Why color mapping matters

Humans are visual creatures. We’re wired to notice contrast.

If your heatmap uses a nonlinear or poorly scaled color gradient,

the viewer might see a pattern that isn’t real. Or miss one that is.

For example:

  • Mapping values from -3 to 3 on a red-blue scale sounds okay.
  • But if most of your data lies between -0.5 and 0.5, your heatmap will look empty — mostly white.
  • The effect? Your viewer thinks there’s no signal when there’s actually a ton — just compressed by the color scale.

The choice of colors (e.g., diverging, sequential), the center of the scale, and whether it’s linear — they all matter.


The importance of z-score normalization

This might be the most common misunderstanding:

When plotting gene expression across samples,

it’s not raw values that matter — it’s the variation.

well, I take it back.

It depends on the purpose of your visualization, sometimes, you may want the raw values.

Let’s say Gene A has values [100, 110, 120]
Gene B has values [1, 10, 100]

Without z-score normalization, Gene A dominates because of its sheer magnitude.

But what if you standardize each row to have mean 0 and variance 1 — aka z-score normalization?

Suddenly, the heatmap shows patterns of change, not just magnitude.

This is usually what you care about:

  • Which genes go up or down
  • How patterns compare across conditions

Plotting raw counts? That’s a rookie move.
Z-scoring before the heatmap? That’s the mark of someone who gets it.


7 essential reads that helped me level up

1. Mapping quantitative data to color
Foundational knowledge. How numbers turn into shades.

2. Nature Methods: What makes a good heatmap
A must-read. This short piece changed how I think about visual encoding.

3. A tale of two heatmap functions
My own deep dive into R’s base heatmap() and heatmap2 from gplots.

This is a must-read if you want to understand it deeply. I plan to move it to my blog post.

4. Heatmap demystified
Another blog post where I walk you through a real example by analyzing a public dataset

5. Understand color mapping
Even small changes in scale or midpoint can alter interpretation. Tutorial from Complexheatmap.

The same data matrix, but using different color mapping, it looks different.

6. Learn what rasterization means
what is rastering, and why it is important when you have a lot of rows or columns.

Why your PDF heatmap looks blurry — and how to fix it.

7. Handling massive matrices
What happens when your gene matrix has 20,000 rows? Here’s how to stay sane.

Hint: your screen resolution is not even 20K


The tool I trust

Zuguang Gu’s ComplexHeatmap package in R changed the game for me.
Powerful, flexible, and honest. My go-to for serious heatmap work:


Takeaways

  • A heatmap is not just a pretty plot. It’s a powerful way to compress complex data.
  • Misused, it can mislead.
  • Understand why normalize rows/columns before plotting gene expression.
  • Always check your color scale.

I hope you understand deeply after today's newsletter.

Learn to understand.

Happy Learning!

Tommy aka crazyhottommy


Other posts from the past week you may find helpful

  1. 🧵 Want to automate bioinformatics tasks? learn the bash 'for loop'
  2. 🧵Regular expressions can save your bioinformatics analysis. Here’s how to use them to clean up messy IDs and merge datasets.
  3. 🧵Bioinformatics evolves fast. New tech. New data. New analysis. But here's how to stay grounded and not get overwhelmed
  4. 🧵Want to speed up your bioinformatics tasks? use GNU parallel.
  5. 🧵"All biology is computational biology."
  6. 8 R/command line tools to deal with excel, tsv and csv files 🧵 that makes your life easier
  7. 🧵 Bioinformaticians: Drowning in multiple projects? Here's why context switching is killing your productivity—and how to fix it.
  8. If you're doing bioinformatics and not using Git, you're walking a tightrope without a safety net.
  9. There is never a quick analysis for bioinformatics:)

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page