Matching target SNP list to database

11/30/2017

In bioinformatics, many times I need to retrieve information from very large databases (e.g. 1,000,000 rows and 10 columns) for a target list of records (e.g. 1,000 IDs within the database). For example, suppose that we have a database of 1,000,000 SNPs with information on which gene each SNPs sits in. But we just need to know the genes for a target list of 1,000 SNPs within the database. This is a common task that I never do manually since there is a big chance of introducing errors. We need to write some code. I always use Perl, but I believe this can be solved using any programming language (e.g. Python, R...). In Perl, this can be solved very easily by using a HASH table.

Here is my code to do that:

As you can see, this is a very basic use of Perl scripting to solve a fundamental task in bioinformatics. Feel free to use this script or share it with someone that might need it. That's all for today!

0 Comments

Matching target SNP list to database

Leave a Reply.

Categories