Intermediate Perl - University of Minnesota

Download Report

Transcript Intermediate Perl - University of Minnesota

Intermediate Perl
by
Benjamin J. Lynch
[email protected]
Introduction
 Perl is a powerful interpreted language that takes
very little knowledge to get started. It can be used
to automate many research tasks with little effort.
 The greatest strength and weakness of Perl is the
ability to accomplish the same task using two
very different codes.
Outline
 Review of Perl











Variable types
Context
Operators
Control structure
Pattern Matching
Subroutines
Context
References
grep
map
modules
When should I use Perl?
 Perl stands for: Practical Export Report Language
 Perl is most useful for:





parsing files to extract desired data
Doing almost anything you can do in a shell script
cgi scripts to generate HTML for web pages
updating or retrieving information from databases
acting as in interface between programs
Programming Style
 Questions you should ask:
 Who else might look at the code?
 Co-workers?
 Complete strangers?
 How often will the code be modified?
 Remember your target audience
 There is no substitute for comments
An Interpreted Language
 Perl programs are also called Perl scripts because
Perl is an interpreted language.
 When you execute a Perl script, the script is compiled
into a set of instructions for the Perl interpreter
 This set of instructions (or parse tree) is sent to the Perl
interpreter
 The Perl interpreter shares many similarities to the
virtual machine in Java
 There is no need to compile a Perl script as a
separate preliminary step, making Perl scripts
similar to shell scripts (at least on the exterior).
A Simple Perl Script
#!/usr/bin/perl
print “Hello world! \n”;
blynch@msi[~] % chmod +x hello.pl
blynch@msi[~] % ./hello.pl
Hello world!
blynch@msi[~] %
\n is a new line.
The routine print will print the item or list of items
that follows.
Variable Types
 Scalar
 Reference (scalar pointer to another variable)
 List (array)
 Hash (associative array)
Scalars
 Examples:
$var = ‘3’;
$name = “Larry”;
$float = 1.1235813;
$sum = $a + 1.2;
 A scalar is a single value.
$number = 1;
$text = ‘Hello world!’;
$a = 1.2;
$b = 1.3;
$sum = $a + $b;
print “$sum \n”;
2.5
The scalar can be:
integer
64-bit floating point
string
reference
The way that the data is stored
(integer, floating point ,…) does
not need to be specified. The
Perl interpreter will determine
it automatically
Lists
 A list (or array) of values can be specified like:
@number_list = (1,1,2,3,5,8,13,21);
@grocery_list = (‘apples’,’chicken’,’canned soup’);
 A list always starts with a @
Lists (arrays)
@mylist = (1,2,2,3,4,4,4);
@names = (‘Larry’, ‘ Moe’);
push(@names, ‘ Curly’);
print @names;
Larry Moe Curly
Adds an item to
the end of a list
Lists
A list (or array) of values
@grocery_list = (‘apples’,’chicken’,’canned soup’);
print $grocery_list[2];
Note the numbering of elements
canned soup
A ‘$’ is used in the print statement
because of the context. We
only want print to handle a single value
from the array and so we use $ to denote
the scalar context.
Hashes (associative arrays)
 A Hash is an associative array
 Instead of using an integer index, a hash uses a key to
access elements of the hash
%lunch = (‘monday’
‘tuesday’
‘wednesday’
‘thursday’
=>
=>
=>
=>
‘pizza’,
‘burritos’,
‘sandwich’,
‘fish’);
print “on Tuesday I’ll eat $lunch{‘tuesday’}”;
on Tuesday I’ll eat burritos
Hashes (associative arrays)
 A Hash can be created with a list of key/value
pairs.
 Each key has one value associated with it.
%hash = (‘Larry’ => 1, ‘Moe’ => 2, ‘Curly’ => 3);
%hash = (‘Larry’ , 1, ‘Moe’ , 2, ‘Curly’ , 3);
Either of these work to specify a hash
Variable Context
@number_list = (1,1,2,3,5,8,13,21);
@grocery_list = (‘apples’,’chicken’,’canned soup’);
print @grocery_list;
If we use the array (or list) context,
the print command will print
out all elements from the array.
appleschickencanned soup
Variable Context
@number_list = (1,1,2,3,5,8,13,21);
@grocery_list = (‘apples’,’chicken’,’canned soup’);
print $grocery_list[1];
chicken
If we use the scalar context,
we must specify the element
we want to print from the
list.
Variable Context
@grocery_list = (‘apples’,’chicken’,’canned soup’);
$var = @grocery_list;
print $var;
3
If we request a scalar from a list, the
list will return it’s length.
Perl Operators
$mass*$height;
$a + $b
$a - $b
$a / $b
$str1.$str2
$count++
$missing-$total+= $subtotal
$interest*= $factor
$string.= $more
Multiplication
Addition
Subtraction
division
Concatenate
Increment $count by 1
decrease $missing by 1
increase $total by $subtotal
set $interest to interest*$factor
append $more to the end of $string
rand Perl
rand($num)
returns a random, double-precision floatingpoint number between 0 and $num.
$var = rand(4);
Control structure
#!/usr/bin/perl
@my_grocery_list = (‘apples’,’chicken’,’canned soup’);
foreach $item (@my_grocery_list){
&purchase ($item);
}
while ( some condition is true ){
&do_this
}
Control structure
Two ways to if/then
if ($condition) {print “It is true \n”}
print “It is true \n” if $condition;
Retrieving a random element from a list
 @greeting = (‘Hello’,’Greetings’,’Hola’,’Howdy’)
 print $greeting[rand @greeting];
 print $greeting[rand 4];
 print $greeting[2.59196266661263168];
 print $greeting[2];
 print ‘Hola’;
Subroutines
 Defined like this:
sub my_sub_name {
do something
}
 Used like this:
&mysubroutine(variables passed) ;
Subroutines
 Variables passed to a subroutine enter the
routine as a single list
@list1 = (‘a ’,’b ’,’d ’) ;
$scalar = 42 ;
&mysub(@list1, $scalar) ;
sub mysub{
print @_
}
a b d 42
Returning values from subroutines
 Subroutines return whatever is returned by the
return statement or else the last item evaluated
in the subroutine.
@list1 = (2,3) ;
print &mymult(@list1) ;
sub mymult{
$product = $_[0]*$_[1];
return $product;
}
6
Pattern Matching
Perl uses a very robust pattern matching syntax
The most basic pattern match looks like:
$string =~ /some pattern/
In Perl, anything but
‘’ and 0 are considered
TRUE
$string = ‘ 1 2 three’;
if ($string =~ /2/) {
print “the number 2 is in the string\n”
}
Pattern Matching
$, = “\n”;
$string = “1 2 hello 2 5”;
$matching = ($string =~ /\d/);
print $matching
g is for global.
1
This will allow
the pattern to be
matched multiple
@matches = ($string =~ /\d/g);
times.
print @matches;
1
2
2
5
\d will match any single digit
Pattern Matching
$,=“\n”;
$string = ‘1.45 1.482 1.938 other text 10.2849’;
print (string = ~ /\d.\d+/g);
1.45
1.482
1.938
0.2849
Pattern matching
/pattern/
/(sub-expression1)(sub-expression2)/
\d
number
\s
whitespace
\S
non-whitespace
pattern{2}
will match pattern exactly twice
[character list]
defined character class
[abcDEF]
[^a]
NOT ‘b’
|
OR statement - it will match pattern on either side
/(bb|[^b]{2})/
This is written on a T-Shirt I own
/(bb|[^b]{2})/
bb
We want 2 of them
OR statement
New character class
NOT ‘b’
Pattern Matching
-----------------------------------------------Charge Models 2 and 3 (CM2 and CM3) and
Solvation Model SM5.42 GAMESSPLUS version 4.3
-----------------------------------------------Gas-phase
-----------------------------------------------Center
Atomic
CM3
RLPA
Lowdin
Number
Number
Charge
Charge
Charge
-----------------------------------------------1
3
.218
-1.090
-.938
Gas-phase dipole moment (Debye)
-----------------------------------------------X
Y
Z
Total
CM3
-.718
-.592
-1.748
1.980
RLPA
-.327
1.122
-.840
1.440
Lowdin
-.116
1.662
-.761
1.832
------------------------------------------------
Pattern Matching
if (/ CM3\s+([-]?\d*\.\d+)\s*([-]?\d*\.\d+)\s*([-]?\d*\.\d+)\s*([-]?\d*\.\d+)\s*/) {
$amsol[9]=$4;
if (/ CM3\s+([-]?\d*\.\d+\s*){3}([-]?\d*\.\d+)\s*/) {
$amsol[9]=$2;
if (/ CM3\s+(-?\d*\.\d+\s*){3}(-?\d*\.\d+)\s*/) {
$amsol[9]=$2;
if (/ CM3(\s+\S+){3}\s+(\S+)/) {
$amsol[9]=$2;
Pattern Matching
if (/ CM3\s+(-?\d*\.\d+\s*){3}(?\d*\.\d+)\s*/) {
$amsol[9]=$2;
if (/ CM3(\s+\S+){3}\s+(\S+)/) {
$amsol[9]=$2;
Substitutions
s/search pattern/replace/
$string = ‘words9words383words’;
$string =~ s/\d+/, /g;
print $string
words, words, words
Special Variables
 $1, $2, $3, …
 Holds the contents of the most recent subpatterns matched
if ($string =~ /(Larry) (Moe) Curly/){
print $2
}
Moe
Special Variables
 $[
 Determines which index in a list is the first,
the default is 0.
my @mylist = (Larry, Moe, Curly);
print $mylist[1];
$[ = 1;
print $mylist[1];
Moe
Larry
Special Variables
 $&
 Entire pattern from most recent match
Special Variables
 $/
 Input record separator, default is \n
undef $/;
open(FILE,<input.txt);
$buffer=<INFILE>;
 $buffer contains the entire file
Special Variables
 $.
 Current line number
Special Variables
 $,
 Default separator used when a list is printed, default is
‘’
 $,=‘ ‘; will add a space between each item if you print
out a list.
 $\
 Default record separator, default is ‘’
 $\ = “\n”; will add a blank line after each print
statement.
Special Variables





$^T time the perl program was executed
$| autoflush
$$ process ID number for Perl
$0 name of perl script executed
%ENV hash containing environmental
variables.
Special Variables
 @ARGV is a list that old all the arguments
passed to the Perl script.
 @_ is a list of all the variables passed to the
current subroutine
Special Variables
$_ is a variable that hold the current topic.
e.g.;
while (<FILE1>){
print “line $. $_”
}
Special Variables
$_ is a variable that hold the current topic.
This is the current line number
e.g.;
while (<FILE1>){
print “line $. $_”
}
This is the current line being
processed in FILE1
References
 A reference is a scalar
 Instead of number or string, a reference
holds a memory location for another
variable or subroutine.
$myref = \$variable;
$subref= \&subroutine;
Dereferencing the Reference
 To retrieve the value stored in a reference, you
must dereference it.
$name = ‘Larry’;
$ref_name = \$name;
print $ref_name , “\n”;
print $$ref_name, “\n”;
SCALAR(0x60000000000218a0)
Larry
Dereferencing the Reference
 Modifying a dereferenced reference to a variable
is the same as modifying the variable.
$name = ‘Larry’;
$ref_name = \$name;
$$ref_name .= ‘, Moe, and Curly’;
print $$ref_name ,
print $name,
Larry, Moe, and Curly
Larry, Moe, and Curly
“\n”;
“\n”;
Where do we want to use a reference?
 References are very useful when passing lists to a
subroutine.
@mylist = (‘Larry’, ‘Moe’, ‘Curly’);
$list_ref = \@mylist;
&mysub($list_ref );
sub mysub {
my $ref = $_[0];
my @list = @$ref;
print $list[2], “\n”;
}
Where do we want to use a reference?
 References are very useful when passing lists, hashes, or
subroutines to a subroutine.
%myhash = (1 => Larry, 2 => Moe, 3 => Curly);
$hash_ref = \%myhash;
&mysub($hash_ref );
sub mysub {
my $ref_inside = $_[0];
print $$ref_inside{2}, “\n”;
print ${$_[0]}{2},
“\n”;
}
Where do we want to use a reference?
 References are very useful when passing lists, hashes, or
subroutines to a subroutine.
%myhash = (1 => Larry, 2 => Moe, 3 => Curly);
$hash_ref = \%myhash;
&mysub($hash_ref );
Both print the same thing
sub mysub {
my $ref_inside = $_[0];
print $$ref_inside{2}, “\n”;
print ${$_[0]}{2},
“\n”;
}
We can even pass subroutines
$sub_ref = \&my_subroutine;
&run_this($sub_ref );
sub runthis {
my $ref = $_[0];
&$ref;
}
GREP
@matching_lines=grep(/expression/,@input_lines)
@matching_lines=grep {/expression/} @input_lines
@no_comments=grep {!/^#/} @lines_of_code
open(FILE1,”<mycode.pl”) ;
@no_comments=grep {!/^#/} <FILE1> ;
MAP
 map BLOCK @array
 Returns the list generated by executing BLOCK
for each value of @array
foreach $number (@mylist){
print $number+1
}
print map {$_+1} @mylist
MAP
 The block can have any amount of code or
subroutines
map {&mysub($_)} @array;
map {
&sub1($_);
&sub2($_);
$a=1;
} @array
This map would simply
return a list of 1s.
The last value evaluated is returned
keys
keys %hash
 Will return a list of the keys in the hash
%myhash = (1 , ‘Larry’, 2, ‘Moe’,3,’Curly’) ;
@keylist = keys %myhash ;
print @keylist;
123
Modules
 Modules are reusable packages defined in a library
file
 They offer simple access to routines such as:





Database access
Matrix manipulations
Communication libraries
Editing standard binary formats (.doc, xls, …)
Graphics libraries (OpenGL, Tk, …)
Using a Module
use Mail::Sendmail;
...
foreach $user (@email_list){
%mail_mess = ( To => "$user",
From=>'[email protected]',
Subject => "$subject",
Message => "$message"
);
sendmail(%mail_mess);
}
New Objects in Modules
 Perl is an object-orientated language
 You can define new object types
 A scalar can hold other object types, such as
a matrix, a tensor, a window, a database, …
 The behavior of new objects are defined in
the corresponding modules.
 See www.cpan.org for a few thousand
useful perl modules.
Overloading Operators
 The standard operators in Perl can be
defined for additional operations when
placed between objects that are not
fundamental types.
$a = $b + 4;
use Math::MatrixSparse;
…
$matrix_product = $matrix1*matrix2;
global, local, my
 Global variables are the default in Perl.
 Globals can be seen in any subroutines.
$var = 1;
We didn’t pass $var,
&printme();
but this works because
$var is global.
sub printme{
print $var
}
global, local, my
 Local variables are also global
 Local variables become undef when they go
out of scope.
{
local $name
&mysub();
}
$name has become undef here
global, local, my
 my variables are preferred for most Perl
code.
$scalar will not be
my $scalar;
&sub1(\%myhash, @array);
available in &sub1
because it is not
explicitly passed.
my Code
use strict;
my @array;
my %hash;
…
use strict will not allow
global variables to be defined
on the fly.
Global variables that appear
partway through the code often
make your script unreadable.
How much have you learned?
 How do we make a Perl script that takes a
list as its argument, and returns the unique
values from that list?
@array = (1, 1, 2,3, 4, 4, 34, 20 , 20);
$,=‘, ’;
print &unique(@array);
1, 2, 3, 4, 34, 20
sub unique{
$max_el=@_-1;
my @u_list;
$u_list[0]=$_[0];
for $i (1 .. $max_el){
$original=1;
foreach $item (@u_list){
if ($item == $_[$i]){
$original=0;
}
}
if ($original){
push (@u_list, $_[$i]);
}
}
return @u_list;
}
This is how a FORTRAN77
Expert might solve the problem.
Simple, straightforward, lots of code.
A smaller script
sub unique{
my @u_list= keys %{{ map {$_ => 1} @_ }};
return @u_list;
}
How does that work?
A smaller script
sub unique{
my @u_list= keys %{{ map {$_ => 1} @_ }};
return @u_list;
}
map { $_ => 1 } @_
This will return a list of key/value pairs.
The keys will be each value ($_) in @_
(the array passed to the subroutine).
The values will all be 1
A smaller script
sub unique{
my @u_list= keys %{ { key/value pairs } };
return @u_list;
}
{ key/value pairs }
This creates an anonymous hash and returns a
reference to it.
A smaller script
sub unique{
my @u_list= keys %{ $reference_to_a_hash };
return @u_list;
}
%{ $reference_to_a_hash }
We dereference the anonymous hash
A smaller script
sub unique{
my @u_list= keys %hash;
return @u_list;
}
This returns the keys in the hash
Where did we wipe out the duplicates?
A smaller script
sub unique{
my @u_list= keys %{ { key/value pairs } };
return @u_list;
}
{ key/value pairs }
When we create our anonymous hash, we assign the value
1 for each key. When a key is repeated, it simply reassigns that
key to the value specified (always 1 in this case).
Our Anonymous Hash
&unique(a,b,c,c,d,d);
contents
{‘a’ => 1,
{a =>1}
‘b’ => 1,
{a =>1, b=>1}
‘c’ => 1,
{a =>1, b=>1, c=>1}
‘c’ => 1,
{a =>1, b=>1, c=>1}
‘d’ => 1,
{a =>1, b=>1, c=>1, d=>1}
‘d’ => 1}
{a =>1, b=>1, c=>1, d=>1}
A smaller script
sub unique{
my @u_list= keys %{{ map {$_ => 1} @_ }};
return @u_list;
}
We don’t need to explicitly create @u_list
A subroutine will return the object most recently
returned by an operator in the subroutine, unless
another object is returned explicitly with a return statement
A smaller script
sub unique{
keys %{{ map {$_ => 1} @_ }}
}
Our compact and slightly cryptic routine to return unique items
Why some people dislike Perl
sub unique{
keys %{{ map {$_ => 1} @_ }}
}
sub unique{
@l{@_}=();
keys %l
}
sub unique{
grep{!$l{$_}++}@_
}
The End
Questions?
 [email protected][email protected]
 612-626-0802 (MSI helpline)