profile

Hi! I'm Tommy Tang

The Bioinformatics Tool You’re Using Might Be a Waste of Time – Here’s Why


Hello, Bioinformatics lovers and new subscribers,

Tommy here. It is March, can you believe it?

Boston had heavy snow last week and we finally got a little warmer.

The snow is finally melting! Spring is near, I guess.

It is my pleasure to distribute my bioinformatics knowledge through this newsletter.

btw, I am adding video tutorials replicating the genomics paper in this playlist.

Remember 1% better a day, that's a lot of progress in a year!

If you find it helpful, kindly forward it to your friends.

Today, we will talk about how to choose Bioinformatics tools and documentation.


Why Most Bioinformatics Tools Fail Users Before They Even Start

I’ve looked at hundreds of bioinformatics GitHub repos. Here’s the harsh truth: most tools fail not because of poor algorithms, but because of bad documentation and usability issues.

And here’s the kicker—many of these are published in high-impact journals. But publication ≠ usability. Let’s talk about how to write better documentation AND choose better tools so you don’t waste time.


1. Can You Even Install It?

The first test of any bioinformatics tool: installation. If I need to wrestle with outdated dependencies, cryptic errors, or a broken Makefile, I’m moving on.

✅ Good sign: Works in a clean Conda/Docker setup
🚩 Red flag: “Run these 12 manual steps and hope for the best”

Your tool might be amazing, but it's dead on arrival if users can’t even install it.


2. Documentation: What Users Need

Developers often write:
❌ “Uses a novel graph-based normalization algorithm.”

What users want:
What’s the input format?
What’s the output format?
How long will it take?
How does it fit into my workflow?

Good docs solve problems, not just describe features.

Example: Instead of saying “outputs a normalized matrix”, say:

Input: Raw count matrix (genes × cells, CSV format)
Output: Log2-transformed, batch-corrected counts
Use case: Preparing scRNA-seq for clustering

3. Real-World Tool Selection: Do the Docs Pass the Test?

I recently found a “revolutionary” scRNA-seq tool.
🔹 Nature Methods paper
🔹 Impressive benchmarks

But…
🔻 Sparse documentation
🔻 Last GitHub update: 3 years ago
🔻 Maintainer left academia

I had to abandon it. Great science, but useless in practice.


4. Pro Tip: Include Example Data!

Nothing helps users more than:
📂 example_input.csv
📂 run_tool.sh
📂 expected_output.csv

If a tool makes me guess what the input should look like, I move on.

Clear example data = instant usability boost.


5. Does It Play Nice With Others?

A tool might be well-documented, but if it doesn’t integrate with existing workflows, it’s a problem.

✔️ Does it work with Seurat, Scanpy, or Nextflow?
✔️ Does it accept standard formats (FASTQ, BAM, CSV)?
✔️ Does it output something I can actually use?

If I need to write 500 lines of glue code just to use it, it’s not worth it.


6. Controversial Take: High-Impact Papers Often Mean High-Maintenance Tools

Some of the worst documented and hardest-to-install tools come from papers in Nature Methods. Why?

🔹 The authors moved on to new projects
🔹 No long-term funding for tool maintenance
🔹 They optimized for publication, not usability

Meanwhile, well-maintained GitHub projects (often from industry or long-term academic labs) are safer bets.


7. Before Releasing a Tool: Do This One Test

Ask a colleague who’s never seen your tool to install and run it.

Watch silently.
See where they struggle.
That’s where your documentation is failing.

I guarantee you’ll find gaps you never noticed.


8. Key Takeaways

Good tools install easily and integrate well
Documentation should answer what, how, and why
Example data makes adoption 10x easier
Maintenance matters more than publication
Always test with real users before release


Next time you build a bioinformatics tool—or need to choose one—remember: usability beats novelty.

Do you have a favorite (or most frustrating) bioinformatics tool?

Reply and let me know!

Other posts from the past week you may find helpful

  1. I have curated a list of TCR/BCR analysis tools and resources here https://github.com/crazyhottommy/TCR-BCR-seq-analysis (first commit, 9 years ago!)
  2. "Expert in bioinformatics? Let me tell you why that's both true and false, and why it matters for your career."
  3. Hidden Characters in Bioinformatics Files Can Break Your Analysis 🧵
  4. 🧵 "Have you tried turning it off and on again?" isn't just an IT joke. Let me tell you why restarting is actually bioinformatics debugging.
  5. 🧵 "Why are you still typing passwords for ssh? Here's the 2-minute setup that will change how you work in bioinformatics..."
  6. 🧵 Remember when red error messages made your heart skip a beat? Let me tell you why they're actually your friends...
  7. 8 years ago, I wanted to be a professor. Apparently, I am not successful in that :) but I curated many resources for being a faculty ranging from chalk talk, grant writing and funding opportunities: The-world-of-faculty https://github.com/crazyhottommy/The-world-of-faculty Hope it is still useful .

Happy Learning!

Tommy

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/
  4. Lastly, I have a book called "From Cell Line to Command Line" to teach you bioinformatics.

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page