profile

Hi! I'm Tommy Tang

Learn to Fish: Mastering Matrices for Bioinformatics


Hello Bioinformatics lovers,

Tommy here.

The purpose of this newsletter is simple: teach you how to fish, not just hand you fish.

There are so many concepts I wish I had learned differently. Let’s break them down in a way that makes sense.

1. Common statistical tests are just linear models

Many statistical tests—like t-tests and ANOVA—are just special cases of linear models. Understanding this will change how you see statistics. Check out this great resource:
Tests as Linear Models

2. Matrix multiplication = Linear transformation

Matrix operations aren’t just number crunching—they have a geometric meaning that makes them intuitive. If you haven’t seen it before, this video will change your perspective:
3Blue1Brown: Linear Transformations

3. Bioinformatics = Working with matrices

Most bioinformatics data types are just matrices (genes x samples, peaks x cells, etc.). That’s why understanding matrix manipulation in Python or R is essential:

  • Slicing rows and columns
  • Reordering data
  • Transposing matrices

4. Finding insights through matrix factorization

Once you understand matrices, you unlock powerful methods like:

  • PCA (Principal Component Analysis)
  • ICA (Independent Component Analysis)
  • NMF (Non-negative Matrix Factorization)

These are just different ways of factorizing a matrix to extract patterns. I highly recommend this paper:
Enter the Matrix: Factorization Uncovers Knowledge from Omics

5. Want to learn single-cell RNA-seq? Start with matrices.

Before diving into scRNA-seq analysis, get comfortable with matrices.

Check out my blog post on matrix factorization for single-cell data.https://divingintogeneticsandgenomics.com/post/matrix-factorization-for-single-cell-rnaseq-data/

The same skills apply across bioinformatics—gene expression, epigenomics, and beyond.

Bottom line: Learn the fundamentals of matrices, and you’ll be better equipped to analyze any bioinformatics dataset.

Struggling to understand bioinformatics concepts? That’s normal.

Learning takes time. I’ve revisited key topics multiple times before they clicked.

The first time I used prcomp() for PCA in R, I had no clue what was happening. I just ran it.

I’ve watched Josh Starmer’s PCA videos over 20 times!

Years later, I learned about Singular Value Decomposition (SVD)—

and realized that prcomp() internally uses SVD!

I wrote about this here.

Seeing PCA as an eigenvalue problem was a game-changer.

The eigenvectors of X^T * X are the same as the V matrix in SVD!

Recently, I re-learned linear algebra through 3blue1brown’s videos.

Suddenly, eigenvalues/eigenvectors made perfect sense.

Math is beautiful, but understanding takes time.

Each time I revisited PCA, my intuition got stronger.

This applies to all bioinformatics concepts.

I took the "Data Analysis for Life Sciences with R" course three times. Each time, I understood more.

Another example: "An Introduction to Statistical Learning" from Stanford. Took it twice—the second time was much easier.

https://www.statlearning.com/

✅ Learning takes time—don’t rush it

✅ Revisit concepts multiple times

✅ Apply them in real analysis for deeper understanding

If you don’t fully understand something yet, don’t get discouraged.

Time, experience, and real-world application will make it clearer.

Other posts that you may find helpful

  1. Bioinformatics isn’t just about running pipelines. Before diving into analysis, take time to understand the biology & existing methods
  2. A good review on IC50, GI50, GR50, and EC50 for drug response: Applicability of drug response metrics for cancer studies using biomaterials
  3. In bioinformatics, you don’t always need deep learning. Simple models often work better for text and tabular data. Here's why
  4. VCF files store variant calls, but sorting them while keeping the header intact can be tricky. Here’s a quick and efficient way to do it.
  5. Biology data is fragile. A conversation with a wet lab friend made me realize just how common this issue is
  6. Bioinformatics is hard—not because of coding, but because of nuances in the data.
  7. Here’s how to split large files by chromosome, sample, or any column using simple one-liners
  8. There is no perfect data in bioinformatics
  9. Bioinformatics is all about efficient data wrangling. Here are some powerful one-liners to filter and sort files like a pro

Happy Learning!

Tommy aka crazyhottommy

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it. PS: I added a new video in the playlist to reproduce the genomics paper figure. Make sure you subscribe to the channel. I will add more!
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page