Your Machine Learning model learned the noise

Published 2 months ago • 1 min read

Hello Bioinformatics lovers,

Tommy Here. 2 Months have past in 2026! Are you learning new skills?

Today, we will talk about ML.
Your model hit 99% accuracy. Now ask: what did it actually learn?

High accuracy doesn't mean your model learned biology.

It might have learned your batch effects, your sequencing center, your hospital's imaging artifacts.

Here's a famous example. Researchers trained a model to classify wolves vs. dogs. High accuracy.

Then they used LIME to explain its predictions — and found the model was classifying snow in the background, not the animals. Snow = wolf. Grass = dog.

This happens in our field constantly. Three examples:

COVID-19 X-ray classifiers learned hospital-specific artifacts — image contrast, patient positioning (AP vs. PA), even source hospital metadata. High accuracy. Wrong reasons.

Deep learning for cancer prognosis on TCGA. Howard, Kather & Pearson (Cancer Cell, 2023) showed that without site-preserved cross-validation, models trained on TCGA histology data can learn to infer the submitting institution rather than tumor biology — inflating accuracy that won't replicate elsewhere.

TCGA germline exome data (Rasnic et al., BMC Cancer, 2019) carries severe batch effects by sequencing center. Up to 30% variability in called germline variants — including in BRCA1 and KRAS. A model trained without batch correction is learning Broad vs. Baylor, not cancer biology.

The pattern is always the same: your model finds the easiest signal that predicts your labels. If technical variables correlate with your outcome, it will find that shortcut before it finds the biology.

Four ways to protect yourself:

Run SHAP, LIME, or GradCAM. Find out what your model is actually paying attention to.
Check for label-technical variable correlations before you train.
Validate on a genuinely independent dataset — different cohort, different institution.
Be suspicious of suspiciously good results. That 15-point benchmark improvement is a hypothesis, not a discovery.

Be suspicious when it is too good to be true

Accuracy is easy. Getting the right answer for the right reasons is the hard part.

Have you run into this in your own work?

Protip, next time when you evaluate your ML model, feed this paper (Multimodal deep learning: An improvement in prognostication or a reflection of batch effect?) into Claude Code and ask it examine your model accordingly.

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

My free YouTube Chatomics channel, make sure you subscribe to it.
I have many resources collected on my github here.
I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/
My daily Linkedin posts https://www.linkedin.com/in/%F0%9F%8E%AF-ming-tommy-tang-40650014/recent-activity/all/ you will find nuggests there!

Stay awesome!

PPS, get your privacy back.

Share this page

Chatomics! — The Bioinformatics Newsletter

Your Machine Learning model learned the noise

Chatomics! — The Bioinformatics Newsletter