Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free
|
Hello Bioinformatics lovers, Tommy here. It is already in the middle of May, 2026! Are you overwhelmed by the advancement of the AI stuff? openclaw, hermes, paperclip etc etc (if you do not know them, google or chatGPT them). But let's work on some fundamentals for Bioinformatics first. MCF7. MCF-7. MDA-MB-361. MDA_MB_361. MDAMB361. That’s one cell line, one team, five names. Now add timepoints: MCF7_24h, MCF7_48h, MCF7_resistant, MCF7_combo_dose2. You haven’t opened the second dataset yet. This is the bioinformatics tax nobody puts in a grant budget. And it’s bigger than you think. The hidden costYou spend three hours cleaning a sample sheet and still get it wrong. RNA-seq comes from Team A with one naming logic, proteomics from Team B with another, and someone in the kickoff meeting says “let’s do multi-omics.” The worst version of this: a sample gets sequenced twice. Same cell line, different name, nobody catches it. Reagents burned. Sequencing slots burned. And once you find one duplicate, you start wondering what else is wrong — which is the real damage. You lose trust in the data. This isn’t a small-lab problem. The NCI Genomic Data Commons built its entire biospecimen model around UUIDs precisely because TCGA-scale projects can’t survive on team-invented IDs. ISBER’s biobank Best Practices and the ISBT 128 standard exist for the same reason. What actually worksAssign a UUID upfront, at biospecimen registration, before any assay touches it. Link the UUID to source, metadata, and assay history. Every downstream file inherits it. Concrete practices that hold up:
In a small org this is a Friday afternoon. In a big org it’s a years-long fight against legacy systems and team silos. It’s still worth the fight, because the alternative is a senior scientist matching The takeawaySample tracking isn’t janitorial. It’s the substrate everything else sits on — your DE analysis, your multi-omics integration, your clinical correlations, your reproducibility. Get the IDs right and the rest of the stack stops lying to you. If your team doesn’t have a UUID policy, that’s the first ticket to file Monday morning. Happy Learning! Tommy aka crazyhottommy What’s the worst sample-naming story you’ve inherited? Hit reply — I read every one. PS: If you want to learn Bioinformatics, there are four ways that I can help:
Stay awesome! PPS: My Nextflow Summit talk on Reproducible Bioinformatics is now on Youtube. https://www.youtube.com/watch?v=c9QJm8N67BAT Slides can be found here https://divingintogeneticsandgenomics.com/talk/2026-nextflow-summit-boston/ |
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free