profile

Hi! I'm Tommy Tang

The Hardest Part of Bioinformatics? an under-rated skill you will learn in this newsletter


Hello Bioinformatics lovers,

Another week passed by.

I often feel how time flies. When I look at how fast my three kids grow, I realize how "old" I became.

whew...I will turn 40 next year! I started learning bioinformatics at 28.

The other day I went to a Chinese restaurant and got this fortune cookie message:

I think it is awesome to be a cheerleader for those who want to learn bioinformatics.

I knew I needed it when I started.

Technical blog post this week: PCA analysis on scATACseq data. In this blog post, I will show you how the first PC is correlated with sequencing depth

Today, we are going to talk about an underrated skill: Naming files

What’s the most challenging thing in bioinformatics? It’s not coding. It’s naming files.

How many test.txt or foo.txt files do you have? Probably too many. Let’s talk about better file naming practices.

Why File Naming Matters

Poor file names lead to:

• Confusion (final_final_v3.txt)

• Lost data (old_test_copy(1).txt)

• Errors in pipelines (file 1.txt breaks scripts, avoid spaces in names!)

Let’s fix this.

Three Principles for Good File Naming

📌 Machine-readable

• No spaces or special characters (* & !)

• Use underscores or hyphens (_ or -)

📌 Human-readable

• Name should explain what’s inside

• Use meaningful descriptors, not foo.txt

📌 Plays well with ordering

• Use numeric prefixes (001_sample.txt)

Follow ISO 8601 for dates (YYYY-MM-DD

Examples of Good vs. Bad Names

❌ test.txt → ✅ RNAseq_project1_counts.tsv

❌ final_v2(1).csv → ✅ 2024-02-04_sample_metadata.csv

❌ data analysis.docx → ✅ ChIPseq_peak_QC_notes.txt

Why Use ISO 8601 for Dates?

Bad: 02-04-2024 (Is this Feb 4 or April 2?)

Good: 2024-02-04 (Always unambiguous)

This format sorts correctly in file explorers and scripts.

Handling Large Projects

For projects with multiple datasets:

• Use consistent prefixes (RNAseq_, ChIPseq_)

Organize by folders (/data/raw/, /data/processed/)

Key Takeaways

• Stop using test.txt and final_v2(3).csv

• Make names machine-readable and human-readable

Use ISO 8601 dates and numeric prefixes

Action Item

Check your files today. Rename them using these principles.

More on good file naming: Jenny Bryan’s guide https://speakerdeck.com/jennybc/how-to-name-files

Other posts you may find helpful from the past week.

  1. Filling Missing Gene Names (NAs) in Genomics Data with {tidyr}
  2. Handling NAs in Genomics Data: A Must-Know Guide
  3. How to Reorder Rows in R Using a Custom Order
  4. Bash Quirks Every Bioinformatician Should Know
  5. How to Quickly Inspect Dataframe Headers in UNIX
  6. Two Types of Bioinformaticians—Which One Are You?
  7. Why Many Bioinformaticians Struggle in Industry
  8. Bash Strict Mode: Stop Silent Failures in Your Scripts
  9. Looking for a job in biotech sucks right now. With layoffs happening across biotech and pharma
  10. Why are workflow languages like Snakemake essential for bioinformatics? Let’s explore their power in streamlining genomic data analysis.

There is no easy way to master any skills.

You need to make decisions, hard decisions.

Should I spend the night going over a book chapter, or watching the NBA?

Should I go out to have a party or spend the night watching some YouTube videos? (of course, go and check chatomics :))

etc, etc...

Most of the time, it is a hard choice, an easy life eventually.

or easy choice, hard life.

I spent 4 hours EVERY weekend to write those long posts.

Hope they are useful to you.

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page