A Brief History

Download Report

Transcript A Brief History

Perl
Perl
• Perl - Practical extraction report
language
–
–
–
–
–
for text files
system management
combines C, SED, AWK, SH
interpreted
dynamic
Perl notes 2
Data Structures
• scalars
$num
• arrays
@num
• associative arrays %num
• $num[50]
– 50th element of the array num
• $#num
– last index of num
Perl notes 3
Examples
#! /usr/local/bin/perl -w
# find the sum of a list of numbers from
STDIN
# one number per line
$sum = 0;
while( <STDIN> ) {
$sum += int $_;
}
print "the sum is $sum\n";
Perl notes 4
Examples
#!/usr/bin/perl -w
# find the sum of a list of numbers
from STDIN
# several numbers per line
$sum = 0;
while( <STDIN> ) {
@nums = split;
foreach (@nums) {
$sum += int $_;
}
}
print "the sum is $sum\n";
Perl notes 5
Average
#!/usr/bin/perl -w
# find the average of a list
of
# numbers from STDIN
# several numbers per line
$sum = 0;
$count = 0;
while( <STDIN> ) {
@nums = split;
foreach (@nums) {
$sum += int
$_;
$count++;
}
}
print "the average is ", Perl notes 6
$sum/$count, "\n";
median
#!/usr/bin/perl -w
# find the median of a list of number
# from STDIN
# several numbers per line
@nums = ();
while( <STDIN> ) {
@nums = (@nums, split );
}
@nums = sort @nums;
if($#nums % 2) {
$median = ($nums[($#nums - 1)/2]
+ $nums[($#nums + 1)/2])/2;
}
else {
$median = $nums[$#nums/2];
}
print "the median is $median\n";
Perl notes 7
Output?
#!/usr/bin/perl -w
@stuff = ("one", "two", "three");
print @stuff, "\n";
$stuff = ("one", "two", "three");
print $stuff, "\n";
$stuff = @stuff;
print $stuff, "\n";
onetwothree8
three
3
Perl notes 8
Pattern Matching
m//
s///
Modifiers
• i case-insensitive
• m multiple lines
• s single line
• x extend
Perl notes 9
Regular Expressions
Code
\w
\W
\s
\S
\d
\D
\b
\B
\A ^
\Z $
.
Meaning
Alphanumeric Characters
Non-Alphanumeric Characters
White Space
Non-White Space
Digits
Non-Digits
Word Boundary
Non-Word Boundary
At the Beginning of a String
At the End of a String
Match Any Single Character
Perl notes 10
Regular Expressions
*
Zero or More Occurrences
?
Zero or One Occurrence
+ One or More Occurrences
{ N } Exactly N Occurrences
{ N,M }
Between N and M
Occurrences
.* <thingy> Greedy Match, up to the
last thingy
.*? <thingy>
Non-Greedy Match,
up to the first thingy
[ set_of_things ] Match Any Item in
the Set
[ ^ set_of_things ]
Does Not
Match Anything in the Set
( some_expression )
Tag an
Expression
$1..$N
Tagged Expressions used
in Substitutions
Perl notes 11
Rules
• Rule 1
– The engine tries to match as far left
as it can
• Rule 2
– The regular expression is regarded
as set of alternatives. Tries them left
to right. (see page 61)
• Rule 3
– Items that have choices match from
left to right
/x*y*/
• Rule 4
– Assertions
– ^ $ \b \B \A \Z \G (?…)
(?!…)
Perl notes 12
Rules
• Rule 5
– A quantified atom matches only if
the atom itself matches some
number of times allowed by the
quantifier
Maximal
{n,m}
{n,}
{n}
*
+
?
minimal
{n,m}?
{n,}?
{n}?
*?
+?
??
At least n
Exactly n
0 or more
1 or more
0 or 1
Perl notes 13
Rules
• Rule 6
– Each atom matches according to its
type
– (…) ==> grouping + storage $1, $2
– . matches any char except \n
– […] groups
– Special characters \a \n \r …
– \1 \2 ... backreference to (…)
– \033 octal char
– \xf7 hex char
– \cD control char
– any other \ matches the char itself
Perl notes 14
precedence
•
•
•
•
() (?: )
Repetition
Sequence
| alteration
Pattern
/ab*c/
/abc*/
/(abc)*/
/ed|jo/
/(ed)|(jo)/
/ed|jo{1,3}/
/ed|jo{1,3}?/
/^ed|jo$/
/^(ed|jo)$/
$pat = ‘ bob’;
/$pat{3}/
$pat = ‘ bob’;
/($pat){3}/
strings
abc, ac, ababd, abbbc
a, ab, abc, abccc, abcabc
abc, abcc. empty s tring, abcabc
ed, jo, edo, ejo
ed, jo, edo, ejo
ed, jo, edo, ejo, joo, jooooo
ed, jo, edo, ejo, joo, jooooo
fred and joe, ed jo, fred jo, jo
fred and joe, ed jo, fred jo, jo
pat, bob, bobbobbob, bobbb, patt
pat, bob, bobbobbob, bobbb, patt
Perl notes 15
Pattern
/\w+/
/\w*/
/n[et]*/
/n[et]+/
/G.*t/
/(‘.*’)/
strings
Greetings, planet
Greetings, planet
Greetings, planet
Greetings, planet
Greetings, planet
this ‘test’ isn’t
earth!
earth!
earth!
earth!
earth!
good
• How do you fix it?
/(‘[^’]’*’)/
Perl notes 16
Examples
s/^([^ ]) +([^ ]+)/$2 $1/
/(\w+)\s*=\s*\1/
/.{40,}/
/^((\d+\.?\d*|\.\d+)$/
if (/Time:
$hours =
$minutes
$seconds
}
(..):(..):(..)/){
$1;
= $2;
= $3;
Perl notes 17
Default arguments
• $_, @_, @ARGV, STDIN
sub foo{
my $x = shift; # @_ default
• in the main program @ARGV
while($_ = shift) {
if(/^-(.*)/){
process_optein($1);
} else {
process_file($_);
}
}
Perl notes 18
Reading a stream
open FIN, “myfile” or die;
while (<FIN>){
# do something with $_
}
foreach (<FIN>){
# do something with $_
}
print sort <FIN>;
Perl notes 19
Reading a stream
# print a window
@f = <FIN>;
foreach ( 0..$#f ) {
if[$[$_] =~ /\bShazam\b/){
$lo = ($_ > 0)? $_ -1 : $_;
$hi = ($_ < $#f) )? $_ +1 : $_;
print map{“$_: $f[$_]”} $lo .. $hi;
}
}
Perl notes 20
Sorting
• sort numerically
sub numerically { $a <=> $b }
@list = sort numerically
(16, 1, 8, 2, 4, 32);
or
@list = sort { $a <=> $b }
(16, 1, 8, 2, 4, 32);
@list = sort{uc($a) cmp uc($b)}
qw(this is a test);
#reverse
@list = sort { $b <=> $a }
(16, 1, 8, 2, 4, 32);
Perl notes 21
example
#! /usr/bin/perl -w
# This script will count the frequency of distinct
words
# in the file that is given as an argument.
# Warning: Error checking is minimal!
die "usage: $0 file\n" unless @ARGV;
while(<>){
tr/A-Z/a-z/;
# translate to
lowercase
@w = split(/[\W]+/,$_); # split into words
foreach (@w){
$list{$_}++;
# increment the
counter
}
}
foreach $key (sort {$list{$b} <=> $list{$a}} keys
%list) {
print $key, ' = ', $list{$key}, "\n";
}
Perl notes 22
Tokenizing
# tokenize an arithmetic expression
while($_){
if(/^(\d+)/) {
push @tok, ‘num’, $1;
} elsif(/^([+\-\/*()])/) {
push @tok, ‘punct’, $1;
} elsif (/^([\d\D])/) {
die “invalid char $1 in input”;
}
$_ = substr($_, length $1);
}
• substr slows things down
– cut start of string
Perl notes 23
Tokenizing 2
while(/
(\d+) |
([+\-\/*()]) |
([\d\D])/gx) {
if($1 ne “”){
push @tok, ‘num’, $1;
}elsif ($2 ne “”) {
push @tok, ‘punct’, $2;
}else {
die “invalid char $3 in input”;
}
}
Perl notes 24
Tokenizing 3
{
if(/\G(\d+)/gc) {
push @tok, ‘num’, $1;
} elsif(/\G([+\-\/*()])/gc) {
push @tok, ‘punct’, $1;
} elsif (/\G([\d\D])/gc) {
die “invalid char $1 in input”;
}else{
last;
}
redo;
}
Perl notes 25
Use split for clarity
($a, $b, $c) =
/^(\S+)\s+(\S+)\s+(\S+)/;
($a, $b, $c) = split /\s+/, $_;
($a, $b, $c) = split;
Get the fifth field:
($a) =
/[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/;
or
($a) = /(?:[^:]*:){4}([^:]*)/;
or
($a) = (split /:/)[4];
Perl notes 26
unpac
ps l
F
UID
PID PPID PRI NI
100 1216 30562 30561
7
0
000 1216 30658 30562 10
0
VSZ RSS WCHAN STAT TTY
2804 1768 rt_sig S
pts/2
2780 1080 R
pts/2
TIME COMMAND
0:00 -tcsh
0:00 ps l
chomp (@ps = `ps l`);
shift @ps;
for(@ps){
($uid, $pid, $sz, $tt) =
unpack '@3 A6 @9 A7 @30 A5 @52
A7', $_;
print "$uid, $pid, $sz, $tt\n";
}
Perl notes 27
Avoid regex for simple strings
do_it() if $answer eq ‘yes’;
do_it() if $answer =~ /^yes$/;
do_it() if $answer =~ /yes/;
do_it() if lc($answer) eq ‘yes’;
do_it() if $answer =~ /^yes$/i;
Perl notes 28
#!/usr/bin/perl
# remove the comments from a C program
$filename = shift or die "usage $0
filename\n";
open FIN, $filename or die "can't open
file";
while (<FIN>){
for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){
if($in_comment){
$in_comment = 0 if $_ eq "*/";
} else {
if ($_ eq "/*") {
$in_comment = 1;
print " ";
} else {
print;
}
}
}
print "\n";
Perl notes 29
}
References
$a = 3.1416;
$scalar_ref = \$a;
$array_ref = \@a;
$hash_ref = \%a;
$array_el_ref = \$a[3];
$hash_el_ref = \$a{‘John’};
Perl notes 30
Lists of Lists
@LoL = (
[“fred”, “barney” ],
[“george”, “jane”, “elroy” ],
[“homer”, “marge”, “bart” ],
);
print $LoL[2][2]; # prints “bart”
$ref_to_LoL = [
[“fred”, “barney” ],
[“george”, “jane”, “elroy” ],
[“homer”, “marge”, “bart” ],
];
print $ref_to_LoL ->[2][2];
• Note:
$LoL[2][2] implies $LoL[2]->[2]
Perl notes 31
Grow your own
while(<>){
@tmp = split;
push @LoL, [ @tmp ];
}
Perl notes 32
Hashes of Arrays
%HoL = (
flinstones => [“fred”, “barney” ],
jetsons => [“george”, “jane”, “elroy” ],
simpsons => [“homer”, “marge”, “bart” ],
);
• generation
# reading from a file with format:
# flistones: fred barney ..
while(<>){
next unless s/^(.*?):\s*//;
$HoL{$1} = [ split ];
}
• or
while($line = <>){
($who, $rest) = split /:\s*/, 2;
@fields = split ‘ ‘, $rest;
$Hol{$who} = [ @fields ];
}
Perl notes 33
Hashes of Arrays
# calling a function
for $group (flinstones, jetsons, simpsons) {
%HoL($group) = [ get_family($group) ];
);
# append member to existing family
push @{ $HoL{flinstones} }, “wilma”, “betty”;
• access
$HoL{flinstone}[0] = “fred”;
Perl notes 34
Packages, Modules, and Object
Classes
Perl notes 35