Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free
|
Hello Bioinformatics lovers, Tommy here. I made a 40-minute video to show you how to do RNAseq analysis end-to-end. Watch it here! I was recently interviewed by Pure Storage: Data Cleaning ‘Janitorial Work’ is Key to Unlocking Life Sciences Breakthroughs Today, we will talk about single-cell cell type annotation. Your model might be accurate — but is it biologically meaningful?Everyone’s benchmarking single-cell annotation models these days. You train on a million cells. You annotate a new dataset. Everything looks great. But here’s the uncomfortable truth: Your prediction is only as good as your reference. What’s the ground truth, really?We don’t even have a universal definition of a “cell type.” It’s a human-imposed label, a convenient shorthand. In reality, many cells exist in a continuum — not discrete boxes. Take CD8 T cells, for example. In healthy tissues, they behave differently from tumors. You’ll find states like:
Each state has unique transcriptional signatures — and biological implications. Here’s the issue:If your model is trained on millions of cells without state-level annotation, it can’t predict these nuanced states in a new dataset. And those states are exactly what matter in biology. They drive immune responses, therapy outcomes, and disease progression. 📖 Example: So what’s the point of a model that just says “CD4” or “CD8”?If your predictions stop at the broad categories, you’re missing the biology that truly matters. Instead, we should be building models that understand states, not just types. I highly recommend exploring ProjecTILs — a tool that does this elegantly. Further readingIf you’re serious about rethinking how we define cell identity, start here:
The takeawayWe build models to understand biology — not to show off scale. If your model doesn’t bring new biological insight, it doesn’t matter how many cells it was trained on. Deep domain knowledge isn’t optional. It’s what separates real understanding from mere computation. Happy Learning! Tommy aka crazyhottommy PS: If you want to learn Bioinformatics, there are other ways that I can help:
Stay awesome! |
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free