I am a computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!
Hello Bioinformatics lovers, Happy Thanksgiving! I promise you will learn something about bioinformatics in the end :) But I want to ask this first: What are you thankful for? I am grateful for all the people who have supported me in my career; and also for the setbacks and failures that have taught me invaluable lessons. Most importantly, I am thankful to YOU, the reader of this newsletter. I started this newsletter to share the hard-learned experience and tips of my bioinformatics learning journey. It is you who motivates me to write this newsletter EVERY Saturday consistently. The responses below make my day. So, Thank you! All the subscribers! Now, let's dive into today's topic: dealing with repetition in bioinformatics.what do I mean by repetition?
First, a simple dummy example.
|
In the most simple form, you perform an operation/analysis for one sample, now you need to do the same thing for many samples.
For a single sample single-cell RNAseq analysis, you know how to run the Seurat/Scanpy workflow with it easily.
The tutorials are usually easy to follow. But in real-life bioinformatics, the story is different.
You may encounter many samples in separate count matrices in GEO.
The idea is to write a function to read in one sample, and then apply it to all samples.
We defined the read_counts function and then used the purrr::map() to map the function to all samples.
The full script is here. and the YouTube video is Creating a Seurat Object from a GEO Dataset.
The command looks like this:
simpleaf quant --reads1 sample_R1.fq --reads2 sample_R2.fq (ommitting other arguments)
Now, you can loop over many samples using shell commands. However, real-life bioinformatics is more complicated.
You usually have more than one fq files for the same sample and you have 20 samples to deal with!
Moreover, the fastq files need to be separated with a comma for the same sample:
simpleaf quant --reads1 sample_L1_R1.fq,sample_L2_R1.fq,sample_L3_R1.fq --reads2 sample_L1_R2.fq,sample_L2_R2.fq,sample_L3_R2.fq
How you automate that?
Read this blog post in which I walk you through a real-life problem for single-cell RNAseq quanitifcation step by step.
It is a little more advanced topic today. You may or may not understand the examples with detailed code. That's FINE!
You at least get the idea why programs are useful and why we are using computers to do repetitive work.
As you grow more experienced, you will understand it and apply those skills in real-world bioinformatics problems.
Happy Learning!
Tommy aka Crazyhottommy
PS:
If you want to learn Bioinformatics, there are four ways that I can help:
Stay awesome!
I am a computational biologist with six years of wet lab experience and over 12 years of computation experience. I will help you to learn computational skills to tame astronomical data and derive insights. Check out the resources I offer below and sign up for my newsletter!