In bioinformatics, many times I need to retrieve information from very large databases (e.g. 1,000,000 rows and 10 columns) for a target list of records (e.g. 1,000 IDs within the database). For example, suppose that we have a database of 1,000,000 SNPs with information on which gene each SNPs sits in. But we just need to know the genes for a target list of 1,000 SNPs within the database. This is a common task that I never do manually since there is a big chance of introducing errors. We need to write some code. I always use Perl, but I believe this can be solved using any programming language (e.g. Python, R...). In Perl, this can be solved very easily by using a HASH table.
Here is my code to do that:
As you can see, this is a very basic use of Perl scripting to solve a fundamental task in bioinformatics. Feel free to use this script or share it with someone that might need it. That's all for today!
Hi, I am Santi. This blog series was mainly created to include a summary of each of my publications. However, this blog is also a place where I will write about science and my life as researcher in the field of evolutionary biology.