Nimble Perl Programming Using Scriptome
Download
Report
Transcript Nimble Perl Programming Using Scriptome
Nimble Perl
Programming Using
Scriptome
Yannick Pouliot, PhD
Bioresearch Informationist
Lane Medical Library & Knowledge Management Center
1/22/2009
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Objectives
Determining whether Scriptome can …
1.
2.
Enable you to perform operations otherwise
difficult/time-consuming/error-prone?
Help you learn Perl?
Also, we’ll be using
And
don’t worry:
anonymous
pollingThis
to
experiment
won’t hurt
a
determine whether
you’re
happy with the material and
bit!
speed of delivery …
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
So What Is Scriptome?
Scriptome is a resident Perl program that
performs various data manipulation tasks
useful to biologists
Originally developed by Harvard’s FAS
Center for Systems Biology
Maintained and extended by lots more volunteers
not associated with Harvard
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
Why Bother With Scriptome?
Code is visible, enabling learning on how to
do things in Perl … or not
Can handle arbitrarily large files
No size limitations, e.g., Excel
Free; runs on everything: PC, Mac, Linux
It’s programmatic!
Much faster than manual operations
You can string operations together and save these
in e.g. a .bat file
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
How Do You Use Scriptome?
You tell Scriptome which function you want it to
perform (more later)
You can also string Scriptome functions into a
protocol
Input: Scriptome operates on text files
No binary files, but you could add that capability yourself
E.g., process Excel files in native form using Perl modules,
e.g., ParseExcel
Output: command line or write into another file
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
5
Scriptome: Pick Your Flavor
http://lane.stanford.edu/howto/index.html?id=_1257
http://sysbio.harvard.edu/csb/resources/computational/scriptome/
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
6
Installing Scriptome - Windows
1.
Download Scriptome_exe.tar.gz using this link:
http://sysbio.harvard.edu/csb/resources/computational/scriptome/b
in/Scriptome_exe.tar.gz.
→ Final location: I suggest C:/Program Files/Scriptome
2.
3.
Create a directory named “Scriptome”
Decompress Scriptome_exe.tar.gz by double-clicking
→ Notice the four files inside
3.
Update the PATH variable
add this string at the END of the contents of the
PATH variable:
;C:\Program Files\Scriptome\Scriptome;C:\Program
Files\Scriptome\ScriptPack;C:\Program
Files\Scriptome\Scriptome.bat;C:\Program
Files\Scriptome\ScriptPack.bat
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
7
Scriptome Usage
1. Using a specific tool:
Scriptome flags toolname [input_filenames] [> output_filename]
Example
Scriptome -t change_fasta_to_tab LONGhmcad.fst
2. Finding a tool by type:
Scriptome -t tooltype
where tooltype =
Calc
Choose
Sort
Fetch
Merge
Change
Let’s examine each area
briefly before going over
specifics…
Example
Scriptome -t Calc
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
8
Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. I feel nauseous
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
Examples and
noteworthy tools
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
Calc Tool Examples - 1
Compute column sums:
Scriptome -t calc_col_sum SubjectData1.tab
→ select columns to add
IMPORTANT: column numbers start at 0, not 1
Note visible Perl code → easy to modify,
expand
perl -e "
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
$col=1;
while(<>) {
s/\r?\n//;
@F=split /\t/, $_;
$sum += $F[$col];
}
warn qq~\nSum of column $col for $. lines\n\n~;
print qq~$sum\n~
" file.tab
11
Calc Tool Examples - 2
Compute row sums:
Scriptome -t calc_row_sum
SubjectData1.tab
→ enter 1 for column 1, 2 for
perl -e "
column 2, etc
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
@cols=(1, 2, 3);
while(<>) {
s/\r?\n//;
@F=split /\t/, $_;
$sum = 0;
foreach $col (@cols) {
$sum += $F[$col]
};
print qq~$_\t$sum\n~;
}
warn qq~\nSum of columns @cols for each line ($.
lines)\n\n~
" in.tab
12
Change Tool Examples - 1
Create tab-delimited file from
FASTA file:
Scriptome -t
change_fasta_to_tab
LONGhmcad.fst >
LONGhmcad.fst.tab
perl -e "
$count=0;
$len=0;
while(<>) {
s/\r?\n//;
s/\t/ /g;
if (s/^>//) {
if ($. != 1) {
print qq~\n~
}
s/ |$/\t/;
$count++;
$_ .= qq~\t~;
}
else {
→ change_fasta_to_tab is
an important tool because
many Scriptome tools use
tab-delimited files
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
s/ //g;
$len += length($_)
}
print $_;
}
print qq~\n~;
warn qq~\nConverted $count FASTA records in $. lines
to tabular format\nTotal sequence length: $len\n\n~;
" seqs.fna
13
Change Tool Examples - 2
Change rows to columns or vice versa:
Scriptome -t change_transpose_table SubjectData1.tab
Note: change_transpose_table operates on tabdelimited files
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
14
Change Tool Examples - 3
Create tab-delimited file from
FASTA file:
Scriptome -t
change_bio_format_to_bio_format
LONGhmcad.fst
enter ‘fasta’ as input format (no
quotes)
enter ‘genbank’ as output format
(no quotes)
change_bio_format_to_bio_format
addresses the common problem of
converting formats
perl -MBio::SeqIO -e "
$informat= qq~genbank~;
$outformat= qq~fasta~;
$count = 0;
for $infile (@ARGV) {
$in = Bio::SeqIO->newFh(-file => $infile , -format =>
$informat);
$out = Bio::SeqIO->newFh(-format => $outformat);
while (<$in>) {
print $out $_;
$count++;
}
Important: requires Bioperl to be
installed
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
}
warn qq~Translated $count sequences from $informat to
$outformat format\n~
" myseqs.genbank > myseqs.fasta
15
Conclusions
Scriptome is …
A good solution for manipulating medium to
large data files quickly and reliably
A way to learn Perl in a “real” context (no toy
problems)
Able to perform a wide range of tasks, from
simple, generic file manipulations to biospecific complex tasks
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
16
Resources
For Perl help, see resources in workshop
description in Lane’s Perl Programming for
Biologists
Some recommended titles:
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
17
Polling Time: Do you think
Scriptome will be useful to your
research?
1. Definitely
2. Likely
3. Not likely
4. No way
5. What’s the question again?
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu