profile

Chatomics! — The Bioinformatics Newsletter

Why Deep Learning Isn’t Always the Answer in Bioinformatics


Hello Bioinformatics lovers,

Tommy here. Summer is over in Boston. I love the chilly wind, and I can already smell the Fall.

What's your current challenge in learning bioinformatics? Are you overwhelmed by the LLM and deep learning papers?

Think you need deep learning for every bioinformatics problem? Think again.

Most of the time, simpler models not only hold their ground—they win.

Structured biological data like gene expression tables, clinical metadata, or text rarely need deep neural nets.

Linear regression, random forests, and XGBoost often do the job better.

They predict outcomes. They prioritize features. They explain why a gene matters.

Here’s why simpler works (Note, read Occam's razor):

  • Biological datasets are usually small. Deep learning thrives on massive data. Without it, overfitting is inevitable.
  • Simpler models are interpretable. You can see which genes or mutations drive predictions. Deep nets? Too often a black box.

When deep learning shines: Images. Microscopy, histopathology, radiology—CNNs detect patterns and spatial structure automatically. That’s their home turf.

The tradeoffs are clear:

  • Simple models → easy to tune, fast to train, interpretable
  • Deep learning → powerful, but data-hungry, hard to debug, slower to deploy

A real example: Predicting patient response from RNA-seq? A random forest often matches a deep net in accuracy—and beats it on clarity.

A recent paper in Nature Methods: Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines.

Key takeaway: Choose the right tool, not the fanciest one.

  • Structured data → regression, random forest, XGBoost
  • Images → deep learning Most of the time, simple > sophisticated.

Action item for you: Next time you face a dataset, start simple. Build a baseline model. Understand it deeply. Only escalate if needed.

Bonus reading: Why do tree-based models still outperform deep learning on tabular data? https://arxiv.org/abs/2207.08815

Other posts that you may find useful

  1. chatomics! new blog post on reproducible bioinformatics.
  2. “How on earth did they make that?” “How on earth did they make that?”. let me break down.
  3. Free online ebook: R and python comparisons side by side
  4. A single job opening receives >1000 applications. (I am not kidding). How to stand out? ( The advice only works for some people. people who take action ).
  5. scplotter provides a set of functions to visualize single-cell sequencing data in an easy and efficient way
  6. Learning bioinformatics at home I started this repo 5 years ago at Harvard FAS informatics.
  7. You can save millions $ for your organization if everyone follows best practices to fill in a spreadsheet.
  8. Bioinformatics data is messy. Here’s the nightmare that almost broke me 🧵👇
  9. "AI can now write code. Does that mean bioinformaticians can stop learning to code. Short answer: No. Long answer: Let me explain. 🧵"
  10. Tutorials lie. They hand you clean data. But real bioinformatics? It’s chaos. And unless you master that chaos, you’ll sink. 🧵
  11. Bioinformatics isn’t just running pipelines. The hardest part? Knowing when not to code. A thread on why context comes first 🧵
  12. Bioinformatics doesn’t break you with code. It breaks you with the invisible details in data. That’s where most people get lost. 🧵

Happy Learning!

Tommy aka crazyhottommy

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 12 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Chatomics! — The Bioinformatics Newsletter

Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free

Share this page