I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!
Hello Bioinformatics lovers, The past week was challenging for me. Two sick kids and a sick wife. Three extracted wisdom teeth did not stop bleeding I had them fixed, going back to the dental office, and then I became sick. (If you have kids, you know how it works when one of them is sick...) However, I still got quite a bit done. I made this video for the ChIP-seq analysis tutorial in the playlist. Okay, let's talk about heatmaps! If you work in bioinformatics, you’ve made a heatmap. Probably dozens. But here’s the thing: Most people making heatmaps don’t really understand them. They look good. They’re colorful. They get published. It is so easy to make one by using a heatmap function in a package, but do you really understand it? Here’s what you need to know to stop making bad heatmaps — and 7 resources that helped me finally get it right. What is a heatmap?A heatmap is a way to visualize a matrix of values using color. Rows might be genes. Columns could be samples, time points, or conditions. Each cell shows a value — often expression — represented by color intensity. Sounds simple. It’s not. Every step involves decisions. And each one can distort the story your data is trying to tell. Why color mapping mattersHumans are visual creatures. We’re wired to notice contrast. If your heatmap uses a nonlinear or poorly scaled color gradient, the viewer might see a pattern that isn’t real. Or miss one that is. For example:
The choice of colors (e.g., diverging, sequential), the center of the scale, and whether it’s linear — they all matter. The importance of z-score normalizationThis might be the most common misunderstanding: When plotting gene expression across samples, it’s not raw values that matter — it’s the variation. well, I take it back. It depends on the purpose of your visualization, sometimes, you may want the raw values. Let’s say Gene A has values [100, 110, 120] Without z-score normalization, Gene A dominates because of its sheer magnitude. But what if you standardize each row to have mean 0 and variance 1 — aka z-score normalization? Suddenly, the heatmap shows patterns of change, not just magnitude. This is usually what you care about:
Plotting raw counts? That’s a rookie move. 7 essential reads that helped me level up1. Mapping quantitative data to color 2. Nature Methods: What makes a good heatmap 3. A tale of two heatmap functions This is a must-read if you want to understand it deeply. I plan to move it to my blog post. 4. Heatmap demystified 5. Understand color mapping The same data matrix, but using different color mapping, it looks different. 6. Learn what rasterization means Why your PDF heatmap looks blurry — and how to fix it. 7. Handling massive matrices Hint: your screen resolution is not even 20K The tool I trustZuguang Gu’s ComplexHeatmap package in R changed the game for me. Takeaways
I hope you understand deeply after today's newsletter. Learn to understand. Happy Learning! Tommy aka crazyhottommy Other posts from the past week you may find helpful
PS: If you want to learn Bioinformatics, there are other ways that I can help:
Stay awesome! |
I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!