Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free
|
Hello Bioinformatics lovers, Tommy here. One of the key understanding when I was doing RNAseq analysis is multiple testing adjustment. Under the null hypothesis, with no real difference, a p-value is a random variable which follows a uniform distribution. This means about 5% of genes with p < 0.05 by chance. 1% of genes with p< 0.01 just by chance even there is no real differences. Run 10,000 genes where nothing changes and you collect 500 false hits before you reach one real signal. Picture 10,000 jars of jellybeans. You taste one bean from each jar, hunting for your favorite flavor. No jar contains it. You still walk away convinced that 500 jars did. You make the same mistake with a gene list. Bonferroni: the strict fix. Divide 0.05 by 10,000 and call a gene real only below that tiny threshold. You block most false positives. You also discard most of your real discoveries. GWAS needs that severity. You cannot afford it in a typical expression study. FDR: the smarter fix. The False Discovery Rate caps the fraction of bad calls among your hits rather than blocking each one. Set FDR to 5%. Report 100 genes and you expect about 5 to be wrong. You keep your discoveries and drop most of the junk. The q-value: per gene. FDR describes a whole list. The q-value applies to one gene. A q-value of 0.01 says that to include this gene, you accept a list where 1% of the hits are false. You report that number instead of a bare “p < 0.05.” In R, one line each: Both control false discoveries. The second estimates the share of your genes with no real effect, so it recovers a few more real hits. Correct for multiple testing and your reviewers trust your gene list while your collaborators stop chasing ghosts. Full breakdown, read my blog post: https://divingintogeneticsandgenomics.com/post/understanding-p-value-multiple-comparisons-fdr-and-q-value/ Happy Learning, Tommy aka crazyhottommy PS: If you want to learn Bioinformatics, there are four ways that I can help:
Stay awesome! PPS: |
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free