Transcript Perl Basics

Perl Basics

A Perl Tutorial NLP Course - 2003

What is Perl?

  Practical Extraction and Report Language Interpreted Language   Optimized for String Manipulation and File I/O Full support for Regular Expressions

Running Perl Scripts

  Windows   Download ActivePerl from ActiveState Just run the script from a 'Command Prompt' window UNIX – Cygwin  Put the following in the first line of your script   #!/usr/local/bin/perl Make the script executable % chmod +x

script_name

Run the script % ./

script_name

Basic Syntax

   Statements end with semicolon Comments start with ‘#’  Only single line comments Variables   You don’t have to declare a variable before you access it You don't have to declare a variable's type

Scalars and Identifiers

  Identifiers   A variable name Case sensitive Scalar    A single value (string or numerical) Accessed by prefixing an identifier with '$' Assignment with '=' $scalar = expression

Strings

 Quoting Strings    With ' (apostrophe)  Everything is interpreted literally With " (double quotes)  Variables get expanded With ` (backtick)  The text is executed as a separate process, and the output of the command is returned as the value of the string Check 01_printDate.pl

Comparison Operators

String lt gt eq le ge ne cmp Operation

less than greater than equal to less than or equal to greater than or equal to not equal to compare, return 1, 0, -1

Arithmetic < > == <= >= != <=>

Logical Operators

Operator ||, or &&, and !, not xor Operation

logical or logical and logical not logical xor

String Operators

Operator .

x .= Operation

string concatenation string repetition concatenation and assignment

$string1 = "potato"; $string2 = "head"; $newstring = $string1 . $string2; #"potatohead" $newerstring = $string1 x 2; #"potatopotato" $string1 .= $string2; #"potatohead"

Check concat_input.pl

Perl Functions

  Perl functions are identified by their unique names ( print , chop , close , etc) Function arguments are supplied as a comma separated list in parenthesis.    The commas are necessary The parentheses are often not Be careful! You can write some nasty and unreadable code this way!

Check 02_unreadable.pl

Lists

  Ordered collection of scalars  Zero indexed (first item in position '0')  Elements addressed by their positions List Operators 

()

: list constructor 

,

: element separator 

[]

: take slices (single or multiple element chunks)

List Operations

   

sort(LIST)

a new list, the sorted version of LIST

reverse(LIST)

a new list, the reverse of LIST

join(EXPR, LIST)

a string version of LIST, delimited by EXPR

split(PATTERN, EXPR)

create a list from each of the portions of EXPR that match PATTERN Check 03_listOps.pl

Arrays

  A named list    Dynamically allocated, can be saved Zero-indexed Shares list operations, and adds to them Array Operators  @ : reference to the array (or a portion of it, with [])  $ : reference to an element (used with [])

Array Operations

    

push(@ARRAY, LIST)

add the LIST to the end of the @ARRAY

pop(@ARRAY)

remove and return the last element of @ARRAY

unshift(@ARRAY, LIST)

add the LIST to the front of @ARRAY

shift(@ARRAY)

remove and return the first element of @ARRAY

scalar(@ARRAY)

return the number of elements in the @ARRAY Check 04_arrayOps.pl

Associative Arrays - Hashes

  Arrays indexed on arbitrary string values   Key-Value pairs Use the "Key" to find the element that has the "Value" Hash Operators    % : refers to the hash {}: denotes the key $ : the value of the element indexed by the key (used with {})

Hash Operations

   

keys(%ARRAY)

return a list of all the keys in the %ARRAY

values(%ARRAY)

return a list of all the values in the %ARRAY

each(%ARRAY)

iterates through the key-value pairs of the %ARRAY

delete($ARRAY{KEY})

removes the key-value pair associated with {KEY} from the ARRAY

Pattern Matching

  A pattern is a sequence of characters to be searched for in a character string  /pattern/ Match operators   =~: tests whether a pattern is matched !~: tests whether patterns is not matched

Patterns

Pattern /def/ /\bdef\b/ /^def/ /^def$/ /de?f/ /d[eE]f/ /d[^eE]f/ Matches

"define" a def word def word def df, def def, dEf daf, dzf

Pattern /d.f/ /d.+f/ /d.*f/ /de{1,3}f/ /de{3}f/ /de{3,}f/ /de{0,3}f/

dif

Matches

dabcf df, daffff deef, deeef deeef deeeeef up to deeef

Character Ranges

Escape Sequence \d Pattern [0-9] \D [^0-9] \w \W \s \S [_0-9A-Za-z] [^_0-9A-Za-z] [ \r\t\n\f] [^\r\t\n\f] Description

Any digit Anything but a digit Any word character Anything but a word char White-space Anything but white-space

Backreferences

  Memory of matched portion of input    /[a-z]+(.)[a-z]+\1[a-z]+/ asd-eeed-sdsa, sd-sss-ws NOT as.eee-dfg They can even be accessed immediately after the pattern is matched  (.) in the previous pattern is $1

Pattern Matching Options

Escape Sequence g Description

Match all possible patterns

i

Ignore case

m o s x

Treat string as multiple lines Only evaluate once Treat string as single line Ignore white-space in pattern

Substitutions

  Substitution operator  s/pattern/substitution/options If $string = "abc123def";  $string =~ s/123/456/ Result: "abc456def"  $string =~ s/123// Result: "abcdef"  $string =~ s/(\d+)/[$1]/ Result: "abc[123]def"