How Sample Naming Wastes Millions—and What Bioinformatics Can Do About It

Published about 2 months ago • 2 min read

Hello Bioinformatics lovers,

Tommy here. I was interviewed yesterday by Nature yesterday, about best practices using

spreadsheets and naming files.

P.S.: This is a must read for any scientists (both wet and dry) https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989

It turns out to be a big problem for bioinformaticians.

How many hours do bioinformaticians lose matching sample IDs across assays?
Too many.
And every one of them is avoidable.

Let’s talk about why this keeps happening—and how we can stop it.

The Chaos Begins with a Name

The same cell line, labeled differently:

MCF7
MCF-7
MDA-MB-361
MDA_MB_361
MDAMB361

And that’s within the same team.

Then Come the Timepoints

Add time, treatment, or dose… and now you’ve got:

MCF7_24h
MCF7_48h
MCF7_resistant
MCF7_combo_dose2

And this is just one dataset.

The Multi-Omics Nightmare

Team A runs RNA-seq.
Team B runs proteomics.
Each with their own naming system.
No coordination. No UUIDs.

Then someone says:
“Let’s integrate the data!”
You sigh—and open a spreadsheet that will ruin your evening.

Sound Familiar?

You spend 3 hours cleaning it.
You're still unsure if MCF7_24h_A is the same as MCF-7_T24.
Then you find out…
The same sample was sequenced twice, under two different names, in the same assay.

No one noticed.
Resources wasted.
Trust gone.

This Isn’t Just a Small Lab Problem

Even large-scale efforts like TCGA, pharma pipelines, and multi-million dollar projects suffer from mislabeling and duplication.

Why?
Because data pipelines are only as good as the names they start with.

The Fix Is Boring—but Powerful

Assign a UUID (universally unique identifier) to every biospecimen
Centralized sample registry
Enforced naming conventions
Robust sample tracking
Communication between wet lab and data teams

This isn’t rocket science.
It’s just science done right.

In a Small Company?

You can fix this in a day.

In a Large Company?

It’s harder.
Siloed teams. Legacy systems. lack communication

But fixing it will save millions per year.
And preserve trust in your data.

Final Thought:

Bioinformatics isn’t janitorial work.
It’s science.
But right now, we're wasting time cleaning up names like MCF7, MCF-7, and MCF_7.

That’s not where our talent belongs.

Takeaways:

Bad naming = wasted hours and broken analyses
Use UUIDs to align assays across teams
Communication is infrastructure
Let bioinformaticians do real science, not name-guessing

If you're spending time cleaning up naming errors, it's not your fault.
But it is time to demand better.

Start with one rule:
Every sample gets a universal name.

Chatomics! — The Bioinformatics Newsletter