profile

Hi! I'm Tommy Tang

How I Would Learn Bioinformatics From Scratch 12 Years Later: A Roadmap


Hello Bioinformatics lovers,

First of all, thanks to all the subscribers and I hope you get value from this newsletter.

I want to shout out to Krittiyabhorn (Namthip), a PhD student in Thailand. Her journey should be an inspiration for all of you.

read her post here. And make sure you check out her blog posts too. They are all well-written and attending to details.

The best way to learn is to "just play around".

Unfortunately, many of the students just do not get started.

How I Would Learn Bioinformatics From Scratch 12 Years Later: A Roadmap

I wrote a blog post: My opinionated selection of books/urls for bioinformatics/data science curriculum six years ago, and many links are broken. so I decided to write a new one.

You Can Change Your Appetites

Linear algebra, statistics, machine learning—these used to feel abstract to me.

I had zero experience with bioinformatics when I was studying for my PhD in a wet lab.

I memorized formulas without truly understanding them.

But over time, I found the right resources that made these concepts click, especially in the context of bioinformatics.

If I were starting my bioinformatics/computational journey again 12 years ago,

here are the FREE resources I would recommend.


1. Master the Linux Command Line

Knowing how to work in a Unix environment is a must for any bioinformatician. Start with these:


2. Learn R for Genomics

R is an essential tool for bioinformatics, especially for data wrangling and visualization.


3. Build a Strong Statistical Foundation

Understanding statistics is critical. These books and videos will help:


4. Linear Algebra: Make It Click

I never understood eigenvectors and eigenvalues—until I found these:

Why it is important to learn linear algebra? Most of the genomics data are just matrices:

  • an RNAseq expression matrix is a gene-by-sample matrix, with entries to be read counts for each gene
  • a single-cell expression matrix is a gene-by-cell matrix, with entries to be read counts for each gene
  • a ChIP-seq count matrix is a peak-by-sample matrix, with entries to be the number of reads in each peak
  • a drug response matrix is a drug-by-sample matrix, with entries to be IC50 for example

and many more… in other words,

Matrix is EVERYWHERE for bioinformatics (and many other data science topics)!

Many of the bioinformatics problems can be rephrased as matrix manipulation.

Understand what does matrix multiplication mean deeply;

Why matrix factorization is useful for genomics (see my post).

Matrix calculation is also the foundation of deep learning!


5. Get Comfortable with Machine Learning

Statistics and machine learning go hand in hand:


6. Python for Bioinformatics

I’m primarily an R user, but I use Python for workflow automation. If I had to start again:


Just Start!

Pick any resource that fits your learning stage and dive in.

Waiting won’t change anything. Taking action will.

Those who start and experiment win in bioinformatics.

of course, subscribe to my youtube channel chatomics to learn bioinformatics too! https://www.youtube.com/@chatomics

If you found this newsletter helpful, share it with others who might benefit.

Happy learning!

Tommy aka crazyhottommy

Other posts that you may find helpful from last week.

  1. Bioinformatics is hard—not because of coding, but because of nuances in the data. 🧵👇
  2. Here’s how to split large files by chromosome, sample, or any column using simple one-liners with Awk.
  3. Struggling to understand bioinformatics concepts? That’s normal. Learning takes time. I’ve revisited key topics multiple times before they clicked. Here’s my journey & why relearning is essential 🧵👇
  4. There is no perfect data in bioinformatics. Let me give you an example. In ChIP-seq, every experiment has noise and variability. Blindly trusting defaults? A bad idea. Here’s why and what you should do instead.
  5. Bioinformatics is all about efficient data wrangling. Here are some powerful one-liners to filter and sort files like a pro 🧵👇
  6. Bioinformatics is full of shiny new tools. But chasing every new tool can slow you down. Here's why picking one and moving forward is the best strategy 🧵👇
  7. You think you'll remember what you did 3 months ago? Think again. Poor documentation = wasted time & unreproducible results.
  8. Working with paired-end reads? You might need interleaved FASTQ files. Here’s why they matter & how to create them with seqtk & UNIX! 🧵👇
  9. 🚀 How to merge RNA-seq counts per sample from the featureCounts output?
  10. 🧵 "I looked over hundreds of bioinformatics GitHub repos in my career. Here's why most tool documentation fails users completely..."

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page