Transcript Bioperl modules - Ohio State University
Parsing BLAST output
Output of a local BLAST search
“less” program Full path to the BLAST output file
BLAST program used for the search Reference Information of the query sequence Information of the database One-line summary of the search results Detailed information for the first 2 hsps of the first hit: Accession number, description, organism, score, E value, identities, positives, and alignment
Sample BLAST output (continued) Hsp information from the first hit
Press “q” to quit the “less” viewing mode
The size of the BLAST output is limited only by the free disk space you have in your computer. It’s virtually impossible to open a large text file. Let alone going through the file line by line.
The purpose of parsing BLAST output is to extract user-defined information from the BLAST output file for clear visualization and summarization.
Search result parsing
The
Bio::SearchIO
system was designed for parsing sequence database searches (BLAST, sim4, waba, FASTA, HMMER, exonerate, etc.)
One-line summary of the search results Load Bio::SearchIO module Usage information It will appear if the program is invoked without arguments Define the class Print out the header information Process each result
Process each hit Process each HSP Control for the number of hits to be extracted Indicator showing the work is done
Confirm that the perl script and the BLAST output are in place Change directory (cd) to where the perl script and the BLAST output file are stored
Oops… an error message It’s due to Windows and Unix compatibility.
Find the file in Windows system and open it with Notepad++
Select “convert to UNIX format” in the “Format” drop-down menu After the conversion, save the file and exit Notepad++
Another error message This is because the perl interpreter has been installed in another location (/usr/bin/) while the script is looking for the perl interpreter in /usr/local/bin
Solution: Create a symbolic link of /usr/bin/perl in /usr/local/bin Command: ln
Congratulation! You’ve just parsed a BLAST output!
This is the file you’ve just generated.
Let’s see how the file looks like, using “less”.
Here is how it looks like.
The parsed output is tab-delimited and can be imported into Excel for better visualization.
Locate the file in Windows system
Header row Query sequence Accession numbers of the top 3 hits E values of the top 3 hits Descriptions of the top 3 hits Information of each HSP of the top 3 hits