profile

Hi! I'm Tommy Tang

Your sequencing failed? Maybe you're studying bacteria, not human.


Hello Bioinformatics lovers,

After 13 years of analyzing sequencing data, I am an "expert" in doing it.

I gained those experiences not because I am smarter,

but simply because I made more mistakes and I have encountered more problems.

Today, I will talk about:


The Hidden Enemy in Your Cell Line: Mycoplasma Contamination


Have a cell line with poor sequencing results?
Low mapping rates?
Before you blame the aligner, stop.

You might be sequencing bacteria instead of human.


1. The Dirty Secret of Cell Culture
Mycoplasma contamination spreads silently.
No cloudiness. No smell. (you can see some black dots under the microscope when it becomes really bad)
But it destroys your experiment from the inside—slowing growth, skewing gene expression, and corrupting chromatin data.


2. It’s Common—and Often Undetected
Estimated 15–35% of cell lines are contaminated.
Worse: Many labs don’t routinely test.
Why? It’s invisible—until your experiment breaks.


3. My Story: The ATAC-seq That Went Sideways
I analyzed ATAC-seq data from a cancer line.
Mapping rate to human genome? Dismal.
At first, I blamed bad data.
But I looked closer...


4. The Surprise in the Reads
I downloaded a mycoplasma genome.
Mapped the same reads to it.
Over 50% aligned perfectly.

Turns out, we were sequencing bacterial DNA.

Full story and code.


5. Why It’s a Problem
Mycoplasma alters everything:

  • Gene expression
  • Growth kinetics
  • Response to treatment

You’re not studying cancer.
You’re studying a contaminated co-culture.


6. What You Can Do

  • Map to a combined genome: human + mycoplasma
  • See where your reads truly belong
    (Hint: most reads go “home”)
  • Include contamination checks in your pipeline

7. Where It Comes From

  • Contaminated reagents
  • Cross-contamination from other lines
  • Poor aseptic technique

Once it’s in the lab, it spreads fast.


8. Can You Treat It?

  • Some use antibiotics (e.g., Plasmocin)
  • But most discard the line
    Painful, but necessary.

9. Takeaways

  • Low mapping rate? Think contamination
  • Mycoplasma changes everything
  • Screen regularly
  • Prevention is always better than cure

10. Final Thought
If something feels off in your data—
Don’t just keep running scripts.
Pause. Investigate.
Even a minor signal might lead to the real story.

And sometimes, that story is bacterial.


You’re not paranoid. You’re cautious.
And in bioinformatics, that’s what saves your science.

Other posts from the past week that you may find useful

  1. Why are there intronic reads in your bulk RNA-seq data?
  2. Your code keeps failing, this small trick can do the magic.
  3. login shell vs interactive shell, why it is important to know the difference for bioinformatics.
  4. How to survive installing R packages.
  5. How long will the bioinformatics analysis take? It depends. Here’s why
  6. How to write good bioinformatics code?
  7. Hyperparameter in machine learning explained!
  8. A awk and tidyr::seprate_rows() trick
  9. KNN in single-cell RNAseq explained
  10. 10 websites for drawing scientific figures
  11. Doing bioinformatics is not just crunching numbers. It’s running experiments—just like a wet lab scientist does. But with code.

Happy Learning!

Tommy aka crazyhottommy

PS:

If you want to learn Bioinformatics, there are other ways that I can help:

  1. My free YouTube Chatomics channel, make sure you subscribe to it.
  2. I have many resources collected on my github here.
  3. I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/

Stay awesome!

Hi! I'm Tommy Tang

I am a bioinformatician/computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!

Share this page