profile

Chatomics! — The Bioinformatics Newsletter

Every common stat test is the same test


Hello Bioinformatics lovers,

Tommy Here. I drove back from Rutgers University last night after a panel discussion on AI and bioinformatics.

I am writing this in the early morning:)

We talked about a lot on how AI can transform the drug development process and I do see a big potential too.

btw, the campus is beautiful.

But I then thought: how about the foundations, the basic statistics?

Those are still the important things to learn, especially at the age of AI.

Let's talk about it.

In grad school, I memorized a flowchart. Two groups? t-test. More than two? ANOVA. Non-normal? Mann-Whitney.

Each test felt like a separate tool. Turns out, most of the common ones are the same thing — special cases of y = b0 + b1*x.

Jonas Lindeløv mapped this on one page: Common statistical tests are linear models. One printable cheat sheet.

I wish someone had handed me this instead of that decision tree.

"Non-parametric" tests aren't magic

For the common ones — Mann-Whitney, Wilcoxon — they're close approximations of parametric tests on rank-transformed data. Ranking helps with outliers and skew. No assumption-free magic happening.

A caveat though. When I posted this on LinkedIn, a commenter pushed back, and they were right.

There are hundreds of non-parametric tests (Brunner-Munzel, ART-ANOVA, Fligner-Policello, quantile regression) that don't reduce to ranked linear models.

Even the common ones are approximations, not exact equivalences — they can break down with tied observations or small samples. Lindeløv himself says so.

But the way to think test as linear models is so transformative and helps me to learn the common tests so much better.

Right framing: for the tests you use in a typical bioinformatics workflow, the linear model connection is real and useful. Not a universal law.

Where it gets practical

I wrote a blog post applying this to real data: Partial correlation controlling cancer types.

cor.test() and lm() give the same p-value. Scale both variables, and the regression coefficient equals the correlation coefficient.

Same model. But lm() lets you add covariates — which cor.test() can't.

I looked at CRISPR dependency scores from DepMap for FOXA1 and ESR1. Raw correlation across all cell lines: 0.38.

Breast cancer only: 0.52. Partial correlation controlling for cancer type: 0.30.

No new test — just add a term to the model.

Learn AI. But learn statistics first.

I use Claude Code daily. AI tools are a real multiplier for bioinformatics.

But I keep seeing people learning to prompt an LLM before they understand what a p-value means. If you don't know that a t-test and linear regression do the same thing, you won't catch it when the AI picks the wrong one.

Statistics is the foundation. AI is the tool. Get them in the right order.

Bookmark Lindeløv's page. Read the blog post with R code.

Happy Learning!

Tommy

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/
  4. Lastly, I have a book called "From Cell Line to Command Line" to teach you bioinformatics.

Stay awesome!

PPS: if you love building, try lovable to bring your idea into reality.

Chatomics! — The Bioinformatics Newsletter

Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free

Share this page