I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!
Hello Bioinformatics lovers, Tommy here. Summer is finally here in Boston! I look forward to more outdoor activities with the kids. Okay, let's get into today's topic: Matrix Factorization Matrix algebra used to feel abstract. Something from a dusty textbook. All symbols, no meaning. Then I applied it to biology. And suddenly, it was everywhere—in my RNA-seq pipelines, my proteomics scripts, even single-cell analysis. Let me show you what changed. The Matrix Is the Foundation of -Omics Biological data lives in matrices:
But matrices don’t just store data. We analyze, reduce, factor, and project them. Enter: Matrix Factorization. What Is Matrix Factorization? It’s the art of breaking a large matrix into the product of smaller ones. A simple example: Non-negative Matrix Factorization (NMF)
You’re not just compressing data. You’re learning the hidden biology that drives it. Need more background on latent space? I wrote a post here. Let’s Make It Real Imagine this: You have an RNA-seq matrix: 20,000 genes × 500 samples. You factor it with NMF into 5 latent components. Matrix Factorization in the Wild It’s not just NMF. Matrix factorization powers tools you already use:
These are all different ways to crack open the matrix and ask: Why NMF Works So Well for Bio NMF is especially useful in omics because expression values are non-negative. You can’t have “negative expression.” Here’s a minimal R example:
I dive deep into using NMF on single-cell data in this post: https://divingintogeneticsandgenomics.com/post/matrix-factorization-for-single-cell-rnaseq-data/ How It’s Been Used In breast cancer research, NMF helped define molecular subtypes from gene expression profiles. That insight changed how we stratify patients. Matrix factorization has also been used to:
One Cell, Many Signals In single-cell RNA-seq, matrix factorization helps decode what each cell is made of. Instead of 20,000 numbers per cell, you get a handful of loadings—weights on latent processes. You cluster cells by those. That’s powerful. Further Reading Want to understand it deeper? Read: Key Takeaways
What You Should Do Next
Final Thought Stop seeing gene expression as a pile of numbers. Start seeing it as a mixture of hidden processes—subtypes, pathways, cell states. Matrix factorization helps you decode that language. That’s why it’s not just math. Other posts from the past week that you may find useful
Happy Learning! Tommy aka crazyhottommy PS: If you want to learn Bioinformatics, there are other ways that I can help:
Stay awesome! |
I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!