Perl: subroutines - University of Iowa

Download Report

Transcript Perl: subroutines - University of Iowa

Perl: subroutines (for sorting)
1
Good Programming
Strategies for Subroutines
#!/usr/bin/perl
# example why globals are bad
$one = <STDIN>;
$two = <STDIN>;
$max = &larger;
print "$max\n";
sub larger {
if ($one > $two) {
$one;
}
else {
$two;
}
In this example, it is tedious to use the same code for 2 different variables,
such as $fred and $barney
2
#!/usr/bin/perl
# example why globals are bad
$one = <STDIN>;
$two = <STDIN>;
$max = &larger;
print "$max\n";
$fred = <STDIN>; #hack to make "larger" work with 2 different variables
$barney = <STDIN>;
$keep_one = $one; # redundant stuff
$keep_two = $two;
$one = $fred;
$two = $barney;
$max = &larger;
print "$max\n";
$one = $keep_one;
$two = $keep_two;
sub larger {
if ($one > $two) {
$one;
}
else {
$two;
}
#there HAS to be a better way
3
Arguments Passed to Subroutines
• To pass parameters to a subroutine,
simply list them in parentheses after the
subroutine call
$n = &larger (10, 15);
$X = &larger ($one, $two);
4
Arguments Passed to Subroutines
New automatic variable (@_)
• parameters passed to a subroutine are automatically stored in this
array – as many as are required
• recall array syntax: $_[0] is first element of @_ array
– do NOT confuse this with $_ -- the default variable
– example: $i is scalar, and distinct from @i and $i[0] – both arrays
• scope – the variables in @_ are local to the subroutine
• this means that when the subroutine is called, the values of the
variables used in calling the subroutine are copied into the
automatic argument array – so changes are only local to the
subroutine
• This also means that if a subroutine calls a subroutine, the local
scope of @_ is maintained – there would be two separate and
distinct versions of @_
– this is why recursion works
– this is taken care of by perl – we don't have to worry about it – we just
have to understand that it occurs
5
Arguments Passed to Subroutines
We can re-write our subroutine with parameter passing as:
#!/usr/bin/perl
$x = 5;
$y = 18;
$max = &larger ($x, $y); #call the subroutine
print "$max";
###########################
sub larger {
if($_[0] > $_[1]) {
$_[0];
#local copy of X
}
else {
$_[1];
#local copy of Y
}
However, this is hard to read, write, check, and debug
Another problem – what happens if an extra parameter is included – you don't know from the context
of the name of the subroutine "larger" that it only accepts 2 parameters.
$x = larger(10, 15, 30); #30 will be ignored
6
Arguments Passed to Subroutines
• all variables in perl are global by default
• you can specify local variables with "my"
sub larger {
my ($a, $b); #my $a; my $b;
($a, $b) = @_; #list assignment
# $a = $_[0];
# $b = $_[1];
if ($a > $b) { $a; } else {$b;}
#note, you may omit ;'s for single lines of code within a
#block
}
7
Argument Passing to Sub's
• The variables $a and $b are private ("scoped")
to only this code block – the subroutine
• They will not affect any other $a/$b in the
program
• Changes to $a/$b in other parts of the program
will NOT affect $a/$b in this subroutine
• The subroutine is modular and reusable, it can
be placed into virtually any perl program and
operate predictably.
8
More condensed without comments
sub larger {
my ($a, $b);
($a, $b) = @_;
if ($a > $b) { $a; } else {$b;}
}
# even simpler
sub larger {
my ($a, $b) = @_; #subroutines often have this line
if ($a > $b) { $a; } else {$b;}
}
9
Variable-length Parameter Lists
• many traditional programming languages require subroutine
parameter lists to be strictly typed
– predefined number and type of parameters
• this may be enforced in perl as well
sub larger {
if(@_ != 2) { # scalar context
print "Warning: 2 arguments expected\n"
}
:
:
}
10
Another way:
Variable-length Parameter Lists
#!/usr/bin/perl
$max = &larger (3, 5, 10, 4, 5);
sub larger{
my ($largest) = shift @_; # shift element off of the LHS of the array
foreach (@_) {
# default variable used ($_)
if($_ > $ largest) {
$largest = $_;
}
}
$largest;
}
11
Pragma: use strict;
• A pragma is used to convey information to the perl compiler
• use strict;
– tells perl's compiler that it should enforce some good programming rules
for this code block
– essentially forces you to declare all variables with "my"
• -w (warnings pragma) – already talked about
• same as: use warnings;
• -w applies to all code (modules, subroutines, etc.)
#!/usr/bin/perl
$i =~ m/tag/; #warnings:
Name "main::i" used only once: possible typo at test.pl line2
# Use of uninitialized value in pattern match (m//) at line 2.
12
use strict;
#!/usr/bin/perl
use strict;
$i++;
$i =~ m/tag/;
# Global symbol "$i" requires explicit package name at ./strict.pl line 3.
# Global symbol "$i" requires explicit package name at ./strict.pl line 4.
# Execution of ./strict.pl aborted due to compilation errors.
perldoc warnings
perldoc perllexwarn
perldoc strict
More useful than appears:
example
13
#!/usr/bin/perl
# maxBad.pl
# Example program to show use of subroutines and strict
use strict;
my @numbers = (1, 10, 11, 3, 9, 8, 5, 3);
my $max;
$max = &max(@numbers);
print "max = $max\n";
if($number[1] == $max) {
print "The largest number is at position 1 in array\n";
}
sub max {
my $large = shift @_;
foreach my $i (@_) {
if($i > $large) {
$large = $i;
}
$large;
}
Global symbol "@number" requires explicit package name at ./maxBad.pl line 12.
Execution of ./maxBad.pl aborted due to compilation errors.
Missing right curly or square bracket at ./maxBad.pl line 24, at end of line
syntax error at ./maxBad.pl line 24, at EOF
14
Comment on Pragmas
use warnings;
use strict;
Ideally these would be used from the beginning of
every program that is longer than a few lines.
It can be quite a challenge to take a large and
complicated program and clean up all the
warnings/errors
Adding these checks after the program is written
and debugged defeats the benefit of reducing
development time by finding "mistakes".
15
return Operator
•
return EXPRESSION
– immediately returns a value from a subroutine
my @seqs = qw/TTT GTC CTG ATG GTA CGA/;
my $index = &which_codon_is("ATG", @seqs);
sub which_codon_is {
my ($this_codon, @list) = @_;
foreach (0..$#list) { # indices of list
if($this_codon eq $list[$_] ) { # missing ) on web
return $_;
}
}
return(-1); # -1 if not found
}
# return is optional here – could just put -1
16
#!/usr/bin/perl
# translate
# Take input from STDIN,
# convert DNA to AA's
#
# Modified -- to have function call another function
$line = <>;
while( $line = <>) {
chomp($line); #take care of new lines
$sequence = $sequence.$line;
}
# amino acid hash
# key values pairs
%aminos = ( "TTT", "F", "TTC", "F", "TTA", "L", "TTG", "L",
"CTT", "L", "CTC", "L", "CTA", "L", "CTG", "L",
"ATT", "I", "ATC", "I", "ATA", "I", "ATG", "M",
"GTT", "V", "GTC", "V", "GTA", "V", "GTG", "V",
"TCT", "S", "TCC", "S", "TCA", "S", "TCG", "S",
"CCT", "P", "CCC", "P", "CCA", "P", "CCG", "P",
"ACT", "T", "ACC", "T", "ACA", "T", "ACG", "T",
"GCT", "A", "GCC", "A", "GCA", "A", "GCG", "A",
"TAT", "Y", "TAC", "Y", "TAA", ".", "TAG", ".",
"CAT", "H", "CAC", "H", "CAA", "Q", "CAG", "Q",
"AAT", "N", "AAC", "N", "AAA", "K", "AAG", "K",
"GAT", "D", "GAC", "D", "GAA", "E", "GAG", "E",
"TGT", "C", "TGC", "C", "TGA", ".", "TGG", "W",
"CGT", "R", "CGC", "R", "CGA", "R", "CGG", "R",
"AGT", "S", "AGC", "S", "AGA", "R", "AGG", "R",
"GGT", "G", "GGC", "G", "GGA", "G", "GGG", "G",
);
$peptide = &tlate($sequence);
print "$peptide\n";
############# Subroutines
sub tlate
{
$seq = $_[0];
$peptide="";
$length = length($seq);
while($length>=3)
{
($codon,$seq) = getCodon($seq);
if($aminos{$codon})
{
$peptide = $peptide.$aminos{$codon};
}
else {
$peptide = $peptide."."; #just put in a stop for XXX,
ATX, etc
}
$length = length($seq);
#print "$length\r";
}
print "\n";
return($peptide)
}
sub getCodon
{
$sequence = $_[0];
$codon = substr($sequence,0,3); # look up codon
$sequence = substr($sequence,3,$length-3); #
remove that codon
#print "codon=$codon\n";
return($codon,$sequence);
}
17
Output
./subsubRoutine.pl
Main program
s1 a = 4 b = 6
s2 a = 6 b = 10
return1 = 60
val =
product =
18
Sort Subroutine
A sort subroutine might (INCORRECTLY?) be expected to take 2
values, compare those values, and return them in sorted order:
# POOR example
sub sort_sub {
my $a = shift;
my $b = shift;
# my ($a, $b) = @_; #same thing
if ($a > $b) { return ($b, $a)}
else {return ($a, $b)}
}
19
Sort Subroutine Problems
• this subroutine may be called hundreds/thousands of times
• inefficient because
– allocate variables
– assign values
– return 2 values
• alternative way
– both $a and $b have been allocated in the "calling" code block
– return one coded value
• -1 if $a < $b #Don't swap
• 1 if $a > $b # swap
• 0 if order doesn't matter
• Note – sorting is a special case that is performed so often that it has
been highly optimized (that's why it gets all of this "special"
attention)
20
numeric sort subroutine
sub by_num {
# expect $a and $b
if($a < $b) {-1} elsif ($a > $b) {1} else {0}
} # ;'s are optional if one line in code block
21
"sort" with custom subroutine
The custom sorting subroutine (without &) may be specified for the "sort" operator in perl. It assumes
the -1, 1, 0 cases
#!/usr/bin/perl
use strict;
my @nums = qw/5 2 9 12 37 2 18/;
@results = sort by_num @nums;
sub by_num {
if($a < $b) {-1} elsif ($a > $b) {1} else {0}
}
Note, in "by_num", we need don't need to declare $a (even with strict)
If we do, this will not work correctly.
This is an example of "pass by reference"
The values of the array are passed to the subroutine by reference – the actual values are not
passed back and forth.
That is why this works – and why changing $a changes the array.
22
sort Syntax (pg 217)
sort SUBNAME LIST
sort BLOCK list
sort LIST
#Note, this is a special case where the
subroutine is called WITHOUT the &
23
Example
#!/usr/bin/perl
# sort.pl
#
@array = (9, 8, 7, 1, 2, 3, 9, 69, 5,);
@sorted = sort by_num @array;
print "sorted = @sorted\n";
sub by_num {
print "$a $b\n";
if($a < $b) {-1} elsif ($a > $b) {1} else {0}
}
# example -- sort-brok.pl
24
Shortcut to Sorting
• sorting like this is so common, a special operator exists
to replace:
if($a < $b) {-1} elsif ($a > $b) {1} else {0}
Therefore, we can also replace the whole subroutine:
@results = sort by_num @nums;
@results = sort { $a <=> $b} @nums;
Descending order:
@results = sort { $b <=> $a} @nums;
cmp can be used for string comparisons – but this is how
the sort operator already sorts.
25
Example Profile
#!/usr/bin/perl
# sort-prof.pl (for profile)_
#
# Note how if I change the value of $a, it changes in the original array
use strict;
my @nums = qw/5 2 9 12 37 2 18/;
my @results = sort by_num @nums;
print "@nums\n";
print "@results\n";
sub by_num {
print "$a $b\n";
if($a == 5) { $a = 55}
if($a < $b) {-1} elsif ($a > $b) {1} else {0}
}
26
Example Profile
52
9 12
37 2
29
9 55
55 12
2 18
18 37
22
29
9 18
18 12
18 55
55 37
55 2 9 12 37 2 18
2 2 9 12 18 37 55
27
Sorting a Hash
#!/usr/bin/perl
use strict;
my %score = ("tim" => 195, "tracy" => 205, "indy" => 30);
# sort keys of hash, based on values of hash
my @winners = sort by_score keys %score;
foreach (@winners) {
print "$_ $score{$_}\n";
}
sub by_score { $score{$b} <=> $score{$a} } #note descending order
#tracy 205
#tim 195
#indy 30
28
Regular Expression Example
#!/usr/bin/perl
$i = "4-18-69";
# match 1 digit, dashe, 2 digits, dash, and 2 digits
if($i =~ m/\d-\d\d-\d\d/) {
print "$i\n";
print "$&\n";
}
#But what happens if day is 1 digit
#match both cases
$i = "4-1-69";
if($i =~ m/\d-(\d\d|\d)-\d\d/) {
print "$i\n";
print "$&\n";
}
#Match 1 or more digits
#But, this also matches 100 for day
$i = "4-1-69";
if($i =~ m/\d-\d+-\d\d/) {
print "$i\n";
print "$&\n";
}
#What about 2 digit years VS 4 digit years
# Carefull, this will match 19 before 69
$i = "4-1-1969";
if($i =~ m/\d-\d+-\d\d|\d\d\d\d/) {
print "$i\n";
print "$&\n";
}
#
$i = "4-1-1969";
if($i =~ m/\d-\d+-(\d\d\d\d|\d\d)/) {
print "$i\n";
print "$&\n";
}
#What about 2 digit months
# But this will match 13
$i = "12-1-1969";
if($i =~ m/\d+-\d+-\d\d\d\d|\d\d/) {
print "$i\n";
print "$&\n";
}
#How about this
#Doesn work, becuase it matches 1
$i = "13-1-1969";
if($i =~ m/1|2|3|4|5|6|7|8|9|10|11|12-\d+-\d\d\d\d|\d\d/) {
print "$i\n";
print "$&\n";
}
# Added ^ and $
# Added parenthises for grouping
# Will NOT match because of 14
$i = "14-1-1969";
if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-\d+-(\d\d\d\d|\d\d)$/) {
print "$i\n";
print "$&\n";
}
# Now it will match
$i = "12-1-1969";
if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-\d+-(\d\d\d\d|\d\d)$/) {
print "$i\n";
print "$&\n";
}
# Note limit on days to 1 or 2 digits
# also, storing results into vars
$i = "9-28-1969";
if($i =~ m/^(1|2|3|4|5|6|7|8|9|10|11|12)-(\d{1,2})-(\d\d\d\d|\d\d)$/) {
print "$i\n";
print "$&\n";
$month = $1;
$day = $2;
$year = $3;
print "$month $day $year\n";
}
29
30
End
31