I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!
Hello Bioinformatics lovers, Another week passed by. I often feel how time flies. When I look at how fast my three kids grow, I realize how "old" I became. whew...I will turn 40 next year! I started learning bioinformatics at 28. The other day I went to a Chinese restaurant and got this fortune cookie message: I think it is awesome to be a cheerleader for those who want to learn bioinformatics. I knew I needed it when I started. Technical blog post this week: PCA analysis on scATACseq data. In this blog post, I will show you how the first PC is correlated with sequencing depth Today, we are going to talk about an underrated skill: Naming filesWhat’s the most challenging thing in bioinformatics? It’s not coding. It’s naming files. How many test.txt or foo.txt files do you have? Probably too many. Let’s talk about better file naming practices. Why File Naming Matters Poor file names lead to: • Confusion (final_final_v3.txt) • Lost data (old_test_copy(1).txt) • Errors in pipelines (file 1.txt breaks scripts, avoid spaces in names!) Let’s fix this. Three Principles for Good File Naming 📌 Machine-readable • No spaces or special characters (* & !) • Use underscores or hyphens (_ or -) 📌 Human-readable • Name should explain what’s inside • Use meaningful descriptors, not foo.txt 📌 Plays well with ordering • Use numeric prefixes (001_sample.txt) Follow ISO 8601 for dates (YYYY-MM-DD Examples of Good vs. Bad Names ❌ test.txt → ✅ RNAseq_project1_counts.tsv ❌ final_v2(1).csv → ✅ 2024-02-04_sample_metadata.csv ❌ data analysis.docx → ✅ ChIPseq_peak_QC_notes.txt Why Use ISO 8601 for Dates? Bad: 02-04-2024 (Is this Feb 4 or April 2?) Good: 2024-02-04 (Always unambiguous) This format sorts correctly in file explorers and scripts. Handling Large Projects For projects with multiple datasets: • Use consistent prefixes (RNAseq_, ChIPseq_) Organize by folders (/data/raw/, /data/processed/) Key Takeaways • Stop using test.txt and final_v2(3).csv • Make names machine-readable and human-readable Use ISO 8601 dates and numeric prefixes Action Item Check your files today. Rename them using these principles. More on good file naming: Jenny Bryan’s guide https://speakerdeck.com/jennybc/how-to-name-files Other posts you may find helpful from the past week.
There is no easy way to master any skills. You need to make decisions, hard decisions. Should I spend the night going over a book chapter, or watching the NBA? Should I go out to have a party or spend the night watching some YouTube videos? (of course, go and check chatomics :)) etc, etc... Most of the time, it is a hard choice, an easy life eventually. or easy choice, hard life. I spent 4 hours EVERY weekend to write those long posts. Hope they are useful to you. PS: If you want to learn Bioinformatics, there are other ways that I can help:
Stay awesome! |
I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!