Document 7247738

Download Report

Transcript Document 7247738

Chapter 11:
Perl Scripting
Off Larry’s Wall
In this chapter …
•
•
•
•
•
•
•
Background
Terminology
Syntax
Variables
Control Structures
File Manipulation
Regular Expressions
Perl
• Practical Extraction and Report Language
• Developed by Larry Wall in 1987
• Originally created for data processing and
report generation
• Elements of C, AWK, sed, scripting
• Add-on modules and third party code make it
a more general programming language
Features
•
•
•
•
•
C-derived syntax
Ambiguous variables & dynamic typing
Singular and plural variables
Informal, easy to use
Many paradigms – procedural, functional,
object-oriented
• Extensive third party modules
Features, con’t
•
•
•
•
•
As elegant as you make it
Do What I Mean intelligence
Fast, easy, down and dirty coding
Interpreted, not compiled
perldoc – man pages for Perl modules
Terminology
• Module – one stand alone piece of code
• Distribution – set of modules
• Package – a namespace for one or more
distributions
• Package variable – declared in package,
accessible between modules
• Lexical variable – local variable (scope)
Terminology, con’t
• Scalar – variable that contains only one
value (number, string, etc)
• Composite – variable made of one or more
scalars
• List – series of one or more scalars
– e.g. (2, 4, ‘Zach’)
• Array – composite variable containing a list
Invoking Perl
• perl –e ‘text of perl program’
• perl perl_script
• Make perl script executable and you can
execute the script itself
– i.e. ./my_script.pl
• Common file extension .pl not required
• Like other scripts start with #! to specify
execution program
Invoking Perl, con’t
• Use perl –w to display warnings
– Will warn if using undeclared variables
– Instead of –w, use warnings; in your script
• Same effect
• Usually you’ll find perl in /usr/bin/perl
Syntax
• Each perl statement ended by semicolon (;)
• Can have multiple statements per line
• Whitespace ignored largely
– Except within quoted strings
• Double quotes allow interpretation of
variables and special characters (like \n)
• Single quotes don’t (just like the shell)
Syntax, con’t
• Forward slash used to delimit regular
expressions (e.g. /.*sh?/)
• Backslash used for escape characters
– E.g. \n – newline, \t – tab
• Lines beginning with # are ignored as
comments
Output
• Old way
– print what_to_print;
– Concatenate
• print item_1, item_2
– Want a newline?
• print what_to_print, “\n”
• New way
– say what_to_print
• Automatically adds newline
Output, con’t
• what_to_print can be many things
– Quoted string – “Here’s some text”
– Variables - $myvar
– Result of a function – toupper($myvar)
– A combination
• print “Sub Tot: $total \n”, “Tax: $total*$tax \n”
• Want to display an error and exit?
– die “Uh-oh!\n”;
Variables
• Perl variables can be singular or plural
• Data typing done dynamically at runtime
• Three types
– Scalar (singular)
– Array (plural)
– Hash a.k.a. Associative Arrays (plural)
• Variable names are case sensitive
• Can contain letters, numbers, underscore
Variables, con’t
• Each type of variable starts with a different
special character to mark type
• By default all variables are package in scope
• To make lexical, preface declaration with my
keyword
• Lexical variables override package variables
• Include use strict; to not allow use of
undeclared variables
Variables, con’t
• We’ve already covered use warnings;
• Undeclared variables, if referenced, have a
default value of undef
– Equates to 0 or null string
– Can check by using defined() function
• $. is equal to the line number you’re on
• $_ is the default operand – ‘it’
Scalars
• Singular, holds one value, either string or
number
• Must be preceded with $ i.e. $myvar
• Perl will automatically cast between strings
and numbers
• Will treat as a number or string, whichever is
appropriate in context
Arrays
•
•
•
•
•
•
Plural, containing an ordered list of scalars
Zero-based indexing
Dynamic size and allocation
Begin with @ e.g. @myarray
@variable references entire array
To reference a single element (which would
be a scalar, right?) $variable[index]
Arrays, con’t
• $#array returns the index of the last element
– Zero based – this means it’s one less than the
size of the array
• @array[x..y] returns a ‘slice’ or sublist
• Printing arrays
– Array enclosed in double quotes prints space
delimited list
– Not in quotes all entries concatenated
Arrays, con’t
• Arrays can be treated like FIFO queues
– shift(@array) – pop first element off
– push(@array, scalar) – push element on at end
• Use splice to combine arrays
– splice(@array,offset,length,@otherarray)
Hashes
•
•
•
•
•
•
•
Plural, contain an array of key-value pairs
Prefix with % i.e. %myhash
Keys are strings, act as indexes to array
Each key must be unique, returns one value
Unordered
Optimized from random access
Keys don’t need quotes unless there are
spaces
Hashes, con’t
• Element access
– $hashvar{index} = value
• e.g. $myvar{boat} =“tuna”; print $myvar{boat};
– %hashvar = ( key => value, …);
• e.g. %myvar = ( boat => “tuna”, 4 => “fish”);
– Get array of keys or values
• keys(%hashvar)
• values(%hashvar)
Evaluating Expressions
• Most control structures use an expression to
evaluate whether they are run
• Perl uses different comparison operators for
strings and numbers
• Also uses the same file operators (existence,
access, etc) that bash uses
Expressions
• Numeric operators
– ==, !=, <, >, <=, >=
– <=> returns 0 if equal, 1 if >, -1 if <
• String Operators
– eq, ne, lt, gt, le, ge
– cmp same as <=>
Control Structures
•
•
•
•
•
•
if (expr) {…}
unless (expr) {…}
if (expr) {…} else {…}
if (expr) {…} elsif (expr) {…} … else {…}
while (expr) {…}
until (expr) {…}
Control Structures, con’t
• for and foreach are interchangeble
• Syntax 1
– Similar to bash for…in structure
– foreach [var] (list) {…}
– If var not defined, $_ assumed
– For each loop iteration, the next value from list is
populated in var
Control Structures, con’t
• for/foreach Syntax 2
– Similar to C’s for loop
– foreach (expr1; expr2; expr3) {…}
– expr1 sets initial condition
– expr2 is the terminal condition
– expr3 is the incrementor
Control Structures, con’t
• Short-circuiting loops
– Use last to break out of loop altogether
• Same as bash’s break
– Use next to skip to the next iteration of the loop
• Same as bash’s continue
Handles
• A handle is essentially a variable linked to a
file or process
• Perl automatically opens handles for the
default streams
– STDIN, STDOUT, STDERR
• You can open additional handles
– To a file for input/output/appending
– To a process for input/output
Handles, con’t
• Basic syntax
– open(handle, [‘mode’], “ref”);
– handle is a variable to reference the handle
– mode can be many things
• Simple cases: <, >, >>, |
• Input (<) implied if omitted
– ref is what to open – file or process
– mode and ref can be combined as one string
Handles, con’t
• Once open access via handle variable
• Output
– print handle “what to print”
• Input
– $var = <handle> gets one line of input
– Use <handle> as a loop condition to read input
one line at a time, populating $_
Handles, con’t
• <> - magic handle, pulls from STDIN or
command line arguments to perl
• Line of input contains EOL character
– Use chomp($var) to remove it
– Use chop($var) to remove the last character
• When done close(handle);
– Housekeeping, good coding practice
– Perl actually closes all open handles for you
Handles, con’t
• Examples
–
–
–
–
–
open(my $INPUT, “/path/to/file”);
open(my $ERRLOG, “>>/var/log/errors”);
open(my $SORT, “| sort –n”);
open(my $ALIST, "grep \'^[Aa]\' /usr/share/dict/words|")
while(<INPUT>) { print $ERRLOG $_; }
Regular Expressions
• Recall Appendix A
• Perl has a few unique features and caveats
• Regular Expressions (RE) delimited by
forward slash
• Perl uses the =~ operator for RE matching
– Ex. if ($myvar =~ /^T/) { …} # if myvar starts w/ T
• To negate RE matching use !~ operator
RE, con’t
• =~ operator can also be used to do
replacement
– Ex. $result =~s/old/new/;
– ‘old’ replaced with ‘new’ if matched
• Remember, RE (esp. in Perl) are greedy
– Will match longest possible match
• Bracketed expressions don’t need to be
escaped, just use parentheses