From Wet Lab to Code: Why Biologists Should Embrace Data and AI (But Carefully)

Published 8 months ago • 4 min read

Hi Bioinformatics lovers,

How are you doing? It is April, but it still feels like winter here in Boston.

It makes us warm that we hosted Josh Starmer (you should go and check out his Statquest YouTube video on statistics and machine learning) at our home.

I even got his "The StatQuest Illustrated Guide to Neural Networks and AI" book signed by him! Our kids had a lot of fun playing UNO with him.

Okay, let's dive into our topic today: Learn coding and AI.

I used to be just like that confused little dog on the right of those “expectation vs. reality” memes. Biology made sense. Code? Not so much.

But here’s the truth: biology has changed. It's now a data-heavy science,

and the amount of data we generate is enormous—especially with high-throughput sequencing and single-cell technologies.

The good news?
You don’t need to be a full-time bioinformatician to make sense of the data.

There are tools—R, Python, AI—that can help you explore, test hypotheses, and generate new ones, even using public datasets.

And here’s the best part: if you understand the biology, you know exactly where to look.

That gives you a major edge.
You’ll spot results that don’t make sense. You’ll debug faster. You’ll close the loop quicker.
You don’t need to rely on a computational collaborator for every little thing.

But I get it.
Coding isn’t everyone’s thing.

Still, I believe every biologist should try picking up some basic data analysis skills.

It won’t hurt. You might even enjoy it.

And if not?
That’s what collaboration is for.

Science today is a team sport.
We need each other. I’d rather work with someone who spent years honing their comp bio skills than try to DIY something outside my depth.
And trust me, I don’t see many computational biologists heading to the bench to do gel extractions either.

That brings me to something I’m really excited about: AI—especially AI for coding.

AI makes you 10x faster at writing code. But there’s a catch.

Here’s what I’ve learned:

1. You still need a solid foundation.
If you don’t know how things work under the hood, you’ll get stuck. Fast.

2. LLMs (like ChatGPT or Claude) can write convincing code—but they can also give you the wrong answer.
It might look right. It might run. But it might give you garbage results. That’s dangerous in science.

3. I use AI daily—and even I get burned.
Once, I asked it to generate a command after giving it the full help file. It got it wrong. Only after reading the doc myself did I spot the problem.

4. Debugging is where it breaks down.
Writing code is one thing. But when something fails—do you know how to fix it?
Do you know why it failed? Can you even ask the right question?

A story from my own learning curve

Last winter, I spent 40+ hours building a web app on Replit. It ran R and Python code in the browser. Super cool.

But then came the database setup:
User registration, authentication—total mess.
The LLM kept messing up logic. I eventually got it to work, but when I wanted to add a new feature… it broke again. And I didn’t know how to fix it.

Why?
Because I’m not a trained software engineer.
I didn’t understand the internals.

Same applies in bioinformatics.
Sure, you can copy-paste code from chatGPT. But when it fails?
When it runs but gives you bad science?

That’s when your skills matter.

AI can make you faster, but not smarter—unless you use it to learn.

Don’t let AI replace understanding. Use it to accelerate your learning.

Key Takeaways

Use AI to speed up learning
Read the docs (seriously)
Learn the fundamentals
Don’t trust AI blindly
Ask why something works, not just how

Yes, I used AI to help write this email. But I knew what I wanted to say—and I made sure it said it right.

I hope you found this useful.

Let’s make science smarter. Together.

Happy Learning!

Tommy aka crazyhottommy

Want more tips like this? Just hit reply and tell me what you’re working on. I read every message.

Getting emails like this keep me going!

Other posts that you may find helpful

What are you willing to endure for your dreams?
Two tools for plotting genomics data
Just finished recording another video for reproducing a genomics paper figure
single-cell data integration best practice
If PCA and VAE confuse you, you're not alone. what the heck is latent space?
Want to filter one file based on another in UNIX?, this one-liner does that.
🧵“What cutoff should I use for p-value, log2FC, or mito content?”
Terminal Genome Viewer https://github.com/zeqianli/tgv Another one that is around for a while: ASCIIGenome https://github.com/dariober/ASCIIGenome
🧵 Heatmaps are everywhere in bioinformatics. But most people get one critical thing wrong: the color map.
🧵 How to sanity check whether your 10X cell barcodes are from v2 or v3 chemistry, this unix one-liner does that.
🧵 Unlocking the true potential of AI and bioinformatics hinges on one missing link: real-time biological data.
🧵 The Myth of Push-Button Bioinformatics
Mastering sed for Bioinformatics: Handling Paths Like a Pro
🧵 Should you integrate single-cell RNA-seq datasets or not? You've got PBMCs from multiple donors. Merge them—or keep them separate? Let's break it down.

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

My free YouTube Chatomics channel, make sure you subscribe to it.
I have many resources collected on my github here.
I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Share this page