Transcript Perl Basics
Perl Basics
A Perl Tutorial
NLP Course - 2006
What is Perl?
Practical Extraction and Report Language
Interpreted Language
Optimized for String Manipulation and File I/O
Full support for Regular Expressions
Running Perl Scripts
Windows
Download ActivePerl from ActiveState
Just run the script from a 'Command Prompt'
window
UNIX – Cygwin
Put the following in the first line of your script
#!/usr/bin/perl
Run the script
% perl script_name
Basic Syntax
Statements end with semicolon ‘;’
Comments start with ‘#’
Only single line comments
Variables
You don’t have to declare a variable before you
access it
You don't have to declare a variable's type
Scalars and Identifiers
Identifiers
A variable name
Case sensitive
Scalar
A single value (string or numerical)
Accessed by prefixing an identifier with '$'
Assignment with '='
$scalar = expression
Strings
Quoting Strings
With ' (apostrophe)
Everything is interpreted literally
With " (double quotes)
Variables get expanded
With ` (backtick)
The text is executed as a separate process, and
the output of the command is returned as the
value of the string
Check 01_printDate.pl
Comparison Operators
String
Operation
Arithmetic
lt
less than
<
gt
greater than
>
eq
equal to
==
le
less than or equal to
<=
ge
greater than or equal to
>=
ne
not equal to
!=
cmp
compare, return 1, 0, -1
<=>
Logical Operators
Operator
Operation
||, or
logical or
&&, and
logical and
!, not
logical not
xor
logical xor
String Operators
Operator
Operation
.
string concatenation
x
string repetition
.=
concatenation and assignment
$string1 = "potato";
$string2 = "head";
$newstring = $string1 . $string2; #"potatohead"
$newerstring = $string1 x 2; #"potatopotato"
$string1 .= $string2; #"potatohead"
Check concat_input.pl
Perl Functions
Perl functions are identified by their unique names
(print, chop, close, etc)
Function arguments are supplied as a comma
separated list in parenthesis.
The commas are necessary
The parentheses are often not
Be careful! You can write some nasty and unreadable
code this way!
Check 02_unreadable.pl
Lists
Ordered collection of scalars
Zero indexed (first item in position '0')
Elements addressed by their positions
List Operators
(): list constructor
, : element separator
[]: take slices (single or multiple element chunks)
List Operations
sort(LIST)
a new list, the sorted version of LIST
reverse(LIST)
a new list, the reverse of LIST
join(EXPR, LIST)
a string version of LIST, delimited by EXPR
split(PATTERN, EXPR)
create a list from each of the portions of EXPR that
match PATTERN
Check 03_listOps.pl
Arrays
A named list
Dynamically allocated, can be saved
Zero-indexed
Shares list operations, and adds to them
Array Operators
@: reference to the array (or a portion of it, with [])
$: reference to an element (used with [])
Array Operations
push(@ARRAY, LIST)
add the LIST to the end of the @ARRAY
pop(@ARRAY)
remove and return the last element of @ARRAY
unshift(@ARRAY, LIST)
add the LIST to the front of @ARRAY
shift(@ARRAY)
remove and return the first element of @ARRAY
scalar(@ARRAY)
return the number of elements in the @ARRAY
Check 04_arrayOps.pl
Associative Arrays - Hashes
Arrays indexed on arbitrary string values
Key-Value pairs
Use the "Key" to find the element that has the
"Value"
Hash Operators
% : refers to the hash
{}: denotes the key
$ : the value of the element indexed by the key
(used with {})
Hash Operations
keys(%ARRAY)
return a list of all the keys in the %ARRAY
values(%ARRAY)
return a list of all the values in the %ARRAY
each(%ARRAY)
iterates through the key-value pairs of the %ARRAY
delete($ARRAY{KEY})
removes the key-value pair associated with {KEY} from
the ARRAY
Arrays Example
#!/usr/bin/perl
# Simple List operations
# Address an element in the list
@stringInstruments =
("violin","viola","cello","bass");
@brass =
("trumpet","horn","trombone","euphonium",
"tuba");
$biggestInstrument = $stringInstruments[3];
print("The biggest instrument: ",
$biggestInstrument);
# Join elements at positions 0, 1, 2 and 4 into a
white-space delimited string
print("orchestral brass: ",
join(" ",@brass[0,1,2,4]),
"\n");
@unsorted_num = ('3','5','2','1','4');
@sorted_num = sort( @unsorted_num );
# Sort the list
print("Numbers (Sorted, 1-5): ",
@sorted_num,
"\n");
#Add a few more numbers
@numbers_10 = @sorted_num;
push(@numbers_10, ('6','7','8','9','10'));
print("Numbers (1-10): ",
@numbers_10,
"\n");
# Remove the last
print("Numbers (1-9): ",
pop(@numbers_10),
"\n");
# Remove the first
print("Numbers (2-9): ",
shift(@numbers_10),
"\n");
# Combine two ops
print("Count elements (2-9): ",
$#@numbers_10;
#
scalar( @numbers_10 ),
"\n");
print("What's left (numbers 2-9): ",
@numbers_10,
"\n");
Hashes Example
#!/usr/bin/perl
# Simple List operations
$player{"clarinet"} = "Susan Bartlett";
$player{"basson"} = "Andrew Vandesteeg";
$player{"flute"} = "Heidi Lawson";
$player{"oboe"} = "Jeanine Hassel";
@woodwinds = keys(%player);
@woodwindPlayers = values(%player);
# Who plays the oboe?
print("Oboe: ", $player{'oboe'}, "\n");
$playerCount = scalar(@woodwindPlayers);
while (($instrument, $name) =
each(%player))
{
print( "$name plays the $instrument\n"
);
}
Pattern Matching
A pattern is a sequence of characters to be
searched for in a character string
/pattern/
Match operators
=~: tests whether a pattern is matched
!~: tests whether patterns is not matched
Patterns
Pattern
Matches
Pattern
Matches
/def/
"define"
/d.f/
dif
/\bdef\b/
a def word
/d.+f/
dabcf
/^def/
df, daffff
/^def$/
def in start of /d.*f/
line
/de{1,3}f/
def line
/de?f/
df, def
/de{3}f/
deeef
/d[eE]f/
def, dEf
/de{3,}f/
deeeeef
/d[^eE]f/
daf, dzf
/de{0,3}f/
up to deeef
deef, deeef
Character Ranges
Escape
Pattern
Sequence
\d
[0-9]
Description
Any digit
\D
[^0-9]
Anything but a digit
\w
[_0-9A-Za-z]
Any word character
\W
[^_0-9A-Za-z]
Anything but a word char
\s
[ \r\t\n\f]
White-space
\S
[^\r\t\n\f]
Anything but white-space
Backreferences
Memorize the matched portion of input
Use of parentheses.
/[a-z]+(.)[a-z]+\1[a-z]+/
asd-eeed-sdsa, sd-sss-ws
NOT as_eee-dfg
They can even be accessed immediately after the
pattern is matched
\1 in the previous pattern is what is matched by (.)
Pattern Matching Options
Escape
Description
Sequence
g
Match all possible patterns
i
Ignore case
x
Ignore white-space in pattern
Substitutions
Substitution operator
s/pattern/substitution/options
If $string = "abc123def";
$string =~ s/123/456/
Result: "abc456def"
$string =~ s/123//
Result: "abcdef"
$string =~ s/(\d+)/[$1]/
Result: "abc[123]def“
Use of backreference!
Predefined Read-only Variables
$&
is the part of the string that matched the regular expression
$`
is the part of the string before the part that matched
$'
is the part of the string after the part that matched
EXAMPLE
$_ = "this is a sample string";
/sa.*le/; # matches "sample" within the string
# $` is now "this is a "
# $& is now "sample"
# $' is now " string"
Because these variables are set on each successful match, you should save the
values elsewhere if you
need them later in the program.
The split and join Functions
The split function takes a regular expression and a string, and looks for all
occurrences of the regular expression within that string. The parts of the string
that don't match the regular expression are returned in sequence as a list of
values.
The join function takes a list of values and glues them together with a glue string
between each list element.
Split Example
$line =
"merlyn::118:10:Randal:/home/merlyn
:/usr/bin/perl";
@fields = split(/:/,$line); # split $line,
using : as delimiter
# now @fields is
("merlyn","","118","10","Randal",
# "/home/merlyn","/usr/bin/perl")
Join Example
$bigstring = join($glue,@list);
For example to rebuilt the password file
try
something like:
$outline = join(":", @fields);
String - Pattern Examples
A simple Example
#!/usr/bin/perl
print ("Ask me a question politely:\n");
$question = <STDIN>;
# what about capital P in "please"?
if ($question =~ /please/)
{
print ("Thank you for being polite!\n");
}
else
{
print ("That was not very polite!\n");
}
String – Pattern Example
#!/usr/bin/perl
print ("Enter a variable name:\n");
$varname = <STDIN>;
chop ($varname);
# Try asd$asdas... It gets accepted!
if ($varname =~ /\$[A-Za-z][_0-9a-zA-Z]*/)
{
print ("$varname is a legal scalar variable\n");
}
elsif ($varname =~ /@[A-Za-z][_0-9a-zA-Z]*/)
{
print ("$varname is a legal array variable\n");
}
elsif ($varname =~ /[A-Za-z][_0-9a-zA-Z]*/)
{
print ("$varname is a legal file variable\n");
}
else
{
print ("I don't understand what $varname is.\n");
}