Claude invented my gene IDs

Published 21 days ago • 2 min read

Hello Bioinformatics lovers,

Tommy here. Last week, I asked Claude Code to highlight specific genes on a volcano plot.

Simple request. The data matrix used Ensembl identifiers — those ENSG + 11-digit codes that map to specific human genes.

I gave Claude Code the gene symbols I wanted highlighted. Instead of telling me it couldn't map the symbols to Ensembl IDs, it fabricated identifiers that looked structurally valid.

No error message. No warning. Just confidently invented gene IDs.

I caught it for two reasons. First, I checked the mapping table.

Second — and this is the part that matters — my positive control gene, one I know should be upregulated, was sitting in the wrong place on the plot. The biology didn't make sense.

Biology saved me. Not the AI.

I'm not quitting AI tools. I'm managing them differently.

I use Claude Code every day now. Multiple sessions running in parallel.

I feed it background context, ask specific questions, and it writes scripts for each figure. My efficiency has gone up significantly.

But my role has changed. I'm less "person who writes all the code" and more "manager reviewing work from a fast but overconfident junior analyst."

That metaphor is precise. A talented junior analyst will produce work quickly and confidently.

They'll also occasionally fabricate something rather than admit they're stuck. Sound familiar?

After getting burned by hallucinated Ensembl IDs, here's the workflow that actually catches errors:

Run AI-generated code line by line, not as a black box.

When Claude Code hands you a script, step through it. Watch what happens at each stage.

The moment you run an entire pipeline on faith is the moment errors slip through.

Cross-check all ID mappings against your reference tables.

Gene ID conversion is one of the most error-prone steps in any bioinformatics workflow — even without AI involved.

Tools like biomaRt, AnnotationDbi, or even a simple lookup against your GTF file can verify that ENSG IDs actually exist and map to the genes you expect.

Sanity-check with positive and negative controls. This is what saved me.

If you know a gene should be differentially expressed, check that it shows up where you expect.

If your negative controls are lighting up, something is wrong — regardless of whether a human or an AI wrote the code.

Treat AI output exactly like you'd treat a colleague's first draft.

You'd review it.
You'd question surprising results.
You'd check the methods.

Give AI the same scrutiny.

The real safeguard isn't better prompts — it's your expertise.

AI makes mistakes. So do we. How many times have you said "I'll double-check that" and moved on?

The difference isn't whether errors happen. It's whether you have the safeguards to catch them.

A solid bioinformatics foundation lets you spot computational errors.

Strong biological domain knowledge lets you catch when results don't make sense.

Without both, you're trusting a tool that will confidently fabricate gene identifiers rather than say "I don't know."

If you're not experimenting with AI coding tools yet — start now.

Not because they're perfect. Because you need to understand what they get wrong.

Build your own verification steps. Develop intuition for where hallucinations tend to happen (ID mappings, parameter defaults, edge cases in data wrangling).

The people who figure out human-AI workflows now — who learn where to trust and where to verify — will have a real advantage as these tools get better.

I'm not getting replaced. But how I work has changed completely.

What's the worst hallucination you've caught from an AI coding tool? Hit reply and tell me — I might feature the best stories in a future newsletter.

Happy Learning!

Tommy aka crazyhottommy

PS:

If you want to learn Bioinformatics, there are four ways that I can help:

My free YouTube Chatomics channel, make sure you subscribe to it.
I have many resources collected on my github here.
I have been writing blog posts for over 10 years https://divingintogeneticsandgenomics.com/
Lastly, go and check my daily Linkedin post on Bioinformatics x AI https://www.linkedin.com/in/%F0%9F%8E%AF-ming-tommy-tang-40650014/recent-activity/all/

Stay awesome!

PPS:

Share this page

Chatomics! — The Bioinformatics Newsletter

Claude invented my gene IDs

Chatomics! — The Bioinformatics Newsletter