Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free
|
Hello Bioinformatics lovers, Tommy here. Can you believe it is June now? Boston summer is finally here and I wish you are learning new things. Let's talk about UMAP today. Tilt a photo of a hand and the same gesture reads as a wave or a threat. Same data, different projection, different meaning. That is the risk every time you flatten single-cell data into two dimensions. A single-cell experiment gives you expression for thousands of genes per cell. You cannot see thousands of dimensions. So you reduce them: PCA, t-SNE, UMAP. Each method makes a different trade, and each tells a different story about the same cells. PCA is linear. It ranks new axes by variance, and distances between points carry real meaning: how much two cells differ overall. That makes it interpretable. It also misses nonlinear structure, where a lot of biology lives. t-SNE and UMAP are nonlinear. Both build a nearest-neighbor graph and place cells so that close neighbors in high-dimensional space stay close on the plot. They preserve local structure. They do not preserve distance between distant clusters. Here is where people fool themselves. On a UMAP, two clusters sit far apart and you read a deep biological difference into the gap. The gap is not a measurement. Neither method preserves those large distances, so the space between clusters tells you little. Cluster size misleads the same way: t-SNE inflates sparse clusters and compresses dense ones, so a big blob is not a more heterogeneous population. So read it for what it does well:
Dimensionality reduction compresses the data. It does not do your thinking. That striking UMAP is the start of the analysis, not the result. What is the worst over-read UMAP you have seen in a paper or a talk? Reply and tell me. I collect these. Happy Learning! Tommy aka crazyhottommy PS: If you want to learn Bioinformatics, there are four ways that I can help:
Stay awesome! |
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad students, postdocs, and biotech professionals✅ 100% free