profile

Hi! I'm Tommy Tang

The Secret to Recreating Any Bioinformatics Figure (It’s easier than You Think)


Hello Bioinformatics lovers,

Tommy here. I want to teach you the fundamentals so you understand how to solve a problem from the ground up!

The other day, I was on Reddit and saw someone ask how to generate the figure below.

Let me give you some general principles for approaching this type of problem.

How to Recreate Bioinformatics Figures

Have you ever seen a great figure in a paper and wondered how to make something similar?

The key is to break it down step by step before you start coding.

Step 1: Analyze the Figure

Before jumping into ggplot2 or ComplexHeatmap, take a moment to examine the figure:

  • What’s on the x-axis?
  • What’s on the y-axis?
  • What type of plot is it?

Step 2: Recognize Common Plot Types

Most bioinformatics figures are built from a handful of basic plots:
✅ Boxplot
✅ Bar plot
✅ Scatter plot
✅ Histogram
✅ Line plot
✅ Heatmap

Step 3: Master the Right Tools

  • Use ggplot2 for general plots.
  • Use ComplexHeatmap for heatmaps.
  • If you can use both well, you can recreate most bioinformatics figures.

Step 4: Get Your Data Structure Right

The most important step isn’t plotting—it’s structuring your data correctly.

  • For ggplot2, put everything into a single tidy dataframe (one observation per row, one variable per column).
  • For ComplexHeatmap, prepare two corresponding matrices if needed.

Step 5: Ask the Right Questions

To structure your data properly, ask:

  • What column maps to the x-axis?
  • What column maps to the y-axis?
  • What columns control shape, color, or size?

Let's analyze this plot. (we will not discuss whether a bar plot is a good visualization or not today :))

  1. it is a barplot
  2. x-axis is the metabolite abundance. You have positive and negative values, which could be a log2Fold change between treatment and control.
  3. y-axis is the name of the metabolites.
  4. The color of the bar maps to the log2Foldchange too
  5. You also have error bars.

Once you know this, you should get a data frame like this.

column1: metabolites

column2: log2Foldchange (logFC)

column3: standard deviation(sd)

Then, you are ready to plot:

ggplot(df, aes(x=metabolites, y = logFC)) + # We map the x-axis to metabolites column, y-axis to logFC
geom_bar(stat= "identity", aes(color = logFC)) + # we map the color to logFC
geom_errorbar(aes(ymin=logFC-sd, ymax=logFC+sd), width = 0.2) + # we add the error bar, map the ymax and ymin
coord_flip() # we flip the x and y-axis

Read here for a tutorial on the error bar with ggplot2.

Another Real Example – Single-Cell RNA-seq Dot Plot

A dot plot in single-cell RNA-seq is essentially a heatmap:

  • Color = Expression level
  • Size = Proportion of nonzero cells in each cluster

If using ggplot2, make sure your data is in tidy format. Use pivot_longer() and pivot_wider() to reshape your dataframe as needed.

If using ComplexHeatmap, prepare two separate matrices:

  1. One matrix for expression values (mapped to color)
  2. One matrix for proportion values (mapped to dot size)

Step 7: Learn More

For a detailed walkthrough with code, check out this guide:
🔗 Clustered Dot Plot for Single-Cell RNA-seq (the link in the blog post has a walkthrough using ggplot)

and you can see how complicated you can go if you have the basic skills in your belt:

How to make a multi-group dotplot for single-cell RNAseq data

Yes, there are R packages to make such plots, but knowing that you can make any figure you want is liberating!

Key Takeaway

Before plotting, structure your data properly. That’s 90% of the work! and learn ggplot2 and ComplexHeatmap!

What’s your biggest struggle when making bioinformatics figures? Reply and let me know!

Other posts that you may find helpful

  1. good Tutorial! Cox Proportional Hazards Regression Model https://hbiostat.org/rmsc/cox
  2. Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution https://www.nature.com/articles/s41467-025-57029-9
  3. Review: Decoding cell–cell communication using spatial transcriptomics https://www.nature.com/articles/s41576-025-00824-3
  4. Bioinformatics data is messy. Here’s a real example that made me want to pull my hair out 🧵👇
  5. MUUMI: an R package for statistical and network-based meta-analysis for MUlti-omics data Integration https://www.biorxiv.org/content/10.1101/2025.03.10.642416v1
  6. Bioinformatics tutorials give you clean, pre-processed data. But real-world data? It's messy. Here’s why mastering data wrangling is essential 🧵👇
  7. Single-cell RNA-seq is popular and powerful, but it's not always the best choice. Here's why you should think carefully before diving in.
  8. How to stand out in a tough job market. A strong bioinformatics CV isn’t just a list of skills—it shows what you can do.
  9. Bioinformatics isn’t just about fancy plots. It’s 90% data wrangling, 10% analysis.
  10. "AI can generate code—do I still need to learn coding for bioinformatics?" Short answer: YES. AI is a tool, not a replacement. 🧵👇

Happy Learning!

Tommy aka. crazyhottommy

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it. I just added a new video in the replicating genomics paper playlist here.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page