Learn to Fish: Mastering Matrices for Bioinformatics

Published 4 months ago • 3 min read

Hello Bioinformatics lovers,

Tommy here.

The purpose of this newsletter is simple: teach you how to fish, not just hand you fish.

There are so many concepts I wish I had learned differently. Let’s break them down in a way that makes sense.

1. Common statistical tests are just linear models

Many statistical tests—like t-tests and ANOVA—are just special cases of linear models. Understanding this will change how you see statistics. Check out this great resource:
Tests as Linear Models

2. Matrix multiplication = Linear transformation

Matrix operations aren’t just number crunching—they have a geometric meaning that makes them intuitive. If you haven’t seen it before, this video will change your perspective:
3Blue1Brown: Linear Transformations

3. Bioinformatics = Working with matrices

Most bioinformatics data types are just matrices (genes x samples, peaks x cells, etc.). That’s why understanding matrix manipulation in Python or R is essential:

Slicing rows and columns
Reordering data
Transposing matrices

4. Finding insights through matrix factorization

Once you understand matrices, you unlock powerful methods like:

PCA (Principal Component Analysis)
ICA (Independent Component Analysis)
NMF (Non-negative Matrix Factorization)

These are just different ways of factorizing a matrix to extract patterns. I highly recommend this paper:
Enter the Matrix: Factorization Uncovers Knowledge from Omics

5. Want to learn single-cell RNA-seq? Start with matrices.

Before diving into scRNA-seq analysis, get comfortable with matrices.

Check out my blog post on matrix factorization for single-cell data.https://divingintogeneticsandgenomics.com/post/matrix-factorization-for-single-cell-rnaseq-data/

The same skills apply across bioinformatics—gene expression, epigenomics, and beyond.

Bottom line: Learn the fundamentals of matrices, and you’ll be better equipped to analyze any bioinformatics dataset.

Struggling to understand bioinformatics concepts? That’s normal.

Learning takes time. I’ve revisited key topics multiple times before they clicked.

The first time I used prcomp() for PCA in R, I had no clue what was happening. I just ran it.

I’ve watched Josh Starmer’s PCA videos over 20 times!

Years later, I learned about Singular Value Decomposition (SVD)—

and realized that prcomp() internally uses SVD!

I wrote about this here.

Seeing PCA as an eigenvalue problem was a game-changer.

The eigenvectors of X^T * X are the same as the V matrix in SVD!

Recently, I re-learned linear algebra through 3blue1brown’s videos.

Suddenly, eigenvalues/eigenvectors made perfect sense.

Math is beautiful, but understanding takes time.

Each time I revisited PCA, my intuition got stronger.

This applies to all bioinformatics concepts.

I took the "Data Analysis for Life Sciences with R" course three times. Each time, I understood more.

Another example: "An Introduction to Statistical Learning" from Stanford. Took it twice—the second time was much easier.

https://www.statlearning.com/

✅ Learning takes time—don’t rush it

✅ Revisit concepts multiple times

✅ Apply them in real analysis for deeper understanding

If you don’t fully understand something yet, don’t get discouraged.

Time, experience, and real-world application will make it clearer.

Chatomics! — The Bioinformatics Newsletter