The Secret to Recreating Any Bioinformatics Figure (It’s easier than You Think)

Published 3 months ago • 3 min read

Hello Bioinformatics lovers,

Tommy here. I want to teach you the fundamentals so you understand how to solve a problem from the ground up!

The other day, I was on Reddit and saw someone ask how to generate the figure below.

r/bioinformatics - Does anyone know how to generate a metabolite figure like this?

Let me give you some general principles for approaching this type of problem.

How to Recreate Bioinformatics Figures

Have you ever seen a great figure in a paper and wondered how to make something similar?

The key is to break it down step by step before you start coding.

Step 1: Analyze the Figure

Before jumping into ggplot2 or ComplexHeatmap, take a moment to examine the figure:

What’s on the x-axis?
What’s on the y-axis?
What type of plot is it?

Step 2: Recognize Common Plot Types

Most bioinformatics figures are built from a handful of basic plots:
✅ Boxplot
✅ Bar plot
✅ Scatter plot
✅ Histogram
✅ Line plot
✅ Heatmap

Step 3: Master the Right Tools

Use ggplot2 for general plots.
Use ComplexHeatmap for heatmaps.
If you can use both well, you can recreate most bioinformatics figures.

Step 4: Get Your Data Structure Right

The most important step isn’t plotting—it’s structuring your data correctly.

For ggplot2, put everything into a single tidy dataframe (one observation per row, one variable per column).
For ComplexHeatmap, prepare two corresponding matrices if needed.

Step 5: Ask the Right Questions

To structure your data properly, ask:

What column maps to the x-axis?
What column maps to the y-axis?
What columns control shape, color, or size?

Let's analyze this plot. (we will not discuss whether a bar plot is a good visualization or not today :))

it is a barplot
x-axis is the metabolite abundance. You have positive and negative values, which could be a log2Fold change between treatment and control.
y-axis is the name of the metabolites.
The color of the bar maps to the log2Foldchange too
You also have error bars.

Once you know this, you should get a data frame like this.

column1: metabolites

column2: log2Foldchange (logFC)

column3: standard deviation(sd)

Then, you are ready to plot:

ggplot(df, aes(x=metabolites, y = logFC)) + # We map the x-axis to metabolites column, y-axis to logFC

geom_bar(stat= "identity", aes(color = logFC)) + # we map the color to logFC

geom_errorbar(aes(ymin=logFC-sd, ymax=logFC+sd), width = 0.2) + # we add the error bar, map the ymax and ymin

coord_flip() # we flip the x and y-axis

Read here for a tutorial on the error bar with ggplot2.

Another Real Example – Single-Cell RNA-seq Dot Plot

A dot plot in single-cell RNA-seq is essentially a heatmap:

Color = Expression level
Size = Proportion of nonzero cells in each cluster

If using ggplot2, make sure your data is in tidy format. Use pivot_longer() and pivot_wider() to reshape your dataframe as needed.

If using ComplexHeatmap, prepare two separate matrices:

One matrix for expression values (mapped to color)
One matrix for proportion values (mapped to dot size)

Step 7: Learn More

For a detailed walkthrough with code, check out this guide:
🔗 Clustered Dot Plot for Single-Cell RNA-seq (the link in the blog post has a walkthrough using ggplot)

and you can see how complicated you can go if you have the basic skills in your belt:

How to make a multi-group dotplot for single-cell RNAseq data

Yes, there are R packages to make such plots, but knowing that you can make any figure you want is liberating!

Key Takeaway

Before plotting, structure your data properly. That’s 90% of the work! and learn ggplot2 and ComplexHeatmap!

What’s your biggest struggle when making bioinformatics figures? Reply and let me know!

Hi! I'm Tommy Tang