Regular Expressions

Download Report

Transcript Regular Expressions

Perl-CGI 2
Regular Expressions
1
Regular Expressions
Regular Expressions are used for pattern matching.
$scalarName = ”This is a line with pattern in it”.
Matching:
• if ($scalarName =~ m”pattern”) { ….
- evaluates to true (1)
Substituting or Replacing:
• $line =~ s”patternA”patternB”; # searches for patternA in $line and
replaces it with patternB.
Translating
• $scalarName =~ tr/A-Z/a-z/
2
Regular Expressions
Regular Expressions Match only on scalars
#!/usr/local/bin/perl
$name = "Smith";
if ($name =~ m"it"){
print "yes\n";}
$name =~ s/S/s/;
#substitution
print "$name\n";
$pattern = 'abc';
$a = 'We start with abcdef and more abcdef';
$status = ($a =~ /abc/);
#Double quoted strings can be used
$status = ($a =~ "abc");
print "$status\n"; # 1
$browser = $ENV{‘HTTP_USE_AGENT’};
If ($browser =~m/Mozilla/){….}
3
Regular Expressions
$pattern = 'abc';
$a = 'We start with abcdef and more abcdef';
$b = 'bcdefjl';
$status = ($a =~ s/abc/ABC/);
print "$status\n";
if ($a =~ m/abc/) {
print "match\n";
}
if ($b !~ m/abc/){
print "No match\n";
}
if ($a =~ m/${pattern}/){ #Variable interpolation
print "match\n";
}
4
Regular Expressions
Regular Expressions can be used to match against the values in the
special variable, $_ without using !~ =~
Example:
my @elements = (‘a1’,’a2’,’a3’);
foreach (@elements) { s/a/b); }
#the special variable $_ will contain the elements from the list,
# one at a time through the iterations.
# The elements will be b1, b2 and b3 because of substitution.
while (<FD>) { print if m/ERROR/); } # prints the line that contains
ERROR
$text = “abcdef”;
$text =~ m/a/ #does it match character a, return true if it does
$text !~ m/a/ #does it match character a, return false if it does
$text =~ s/a/A # substitute A for a and return true if it happens
$text !~ s/a/A # substitute A for a and return false if it happens
5
Regular Expressions
•
•
•
•
. (period) – matches any single character.
Example:
[] – matches any one character in the range of characters given
Examples:
$level = "12345";
if ( $level =~ /^[1-9]/ ) {…} # begins with 1-9
if ( $level =~ /[^1-9]/ ){print ":ok";} # Negated class. if level does not
contain any from 1-9
$level = “123450”;
if ( $level =~ /[^1-9]/ ){print "ok";}# what is the output?
#Matching on the word boundary
$line = "A rugged rug";
$line =~ s/\brug\b/ RUG/;
print $line; # A rugged RUG
6
Regular Expressions
Example: Grouping and quantifying
$line = “tootootoo";
if ($line =~ m/((too){2})/){
print $1, "\n";
}
# match against $_
Example:
my @vars = ('a1','a2','a3');
foreach (@vars) { s/a/b/;}
print "@vars\n";
7
Common Wildcards
•
•
•
•
•
•
•
\d – matches a digit
\D – matches a non digit
\w – matches a word character(an upper case, or lower case letter, a digit or an
underscore)
\s – matches a space character
\S – matches a non-space character
\b – require an element to appear at the beginning or end of a word
\B - require an element not to appear at the beginning or end of a word
•
•
•
•
•
•
•
•
•
^ - beginning of the string
$ - end of the string
* - zero or more
+ - one or more
? – zero or one time
{ X } – match ‘X’ times
{X, } – match ‘X’ or more times
{ X, Y} – match X to Y times
Alternation - | match one or more patterns.
8
Regular Expressions
Regular Expressions Match only on scalars
Use grep to match array elements (More Examples on grep later)
@array = ('one','two','ton');
$n = grep (m/on/,@array)
9
Regular Expressions
A Regular Expression matches the earliest possible match of a
given pattern. By default, it only matches or replaces a given
regular expression once.
Example:
$variable = “A crazy horse jumped over a crazy fox”
$pattern = “crazy fox”
• If a partial match is found, it “backtracks” to the least possible
amount in the string and starts matching again.
10
Regular Expressions
Regular Expressions can take ANY and ALL characters that doublequoted strings can.
Examples:
$name = "John Smith";
$line = " John Smith is the author of this book.";
if ($line =~m/author of/){ print "Author\n“ }
if ($line =~m/${name} is/){ print "match\n"} #matches
John Smith is
• If there are special characters in the pattern, use either back slashes or double
quotes.
Example:
$path =~m”usr/local/bin”
• Use the function quotemeta() to to automatically
backslash things.
Example:
$pattern = “({“
$variable =~m/”$pattern” is same as saying,
$variable =~m/”({“ # this will cause runtime error because
({ are special characters.
$variable = quotemeta (“{(“); # will make the pattern =
\{\(
11
Regular expressions
A Regular Expression creates two things in the process of being evaluated:
result status and back references.
Result Status: is an indication of how many times a given regular
expression matched your string.
$line = " round and round the rugged rock the rascal
ran";
$status = ($line =~ m"round");
print "$status\n";
#1
$matches = ($line =~ s"round"Round");
print "Matches = $matches\n";
#1
$matches = ($line =~ s"round"Round“g);
print "Matches = $matches\n";
#2
12
Regular Expressions
Back References will enable you to save some of the matches for later
use. The symbols that you want to match are enclosed in ().
Example:
$line = "round and round the rugged rock the
rascal ran";
$matches = ($line =~ m"(round) the (rugged)");
print "Matches = $matches\n";# 1
print $1
# round
Print $2
# rugged
Print $&
# $& contains the matched string
# prints round the rugged
if ($line =~ m"(round) the (rugged)"){
$first = $1;
$second = $2;
}
13
Regular Expressions
Using the back references in the regular expressions
Switch the words
$line = "first minus second ";
$line =~ s"(first) minus (second)"$2 minus $1";
print $line; $second minus first
Example:
$line = "A taxi and a fox";
if ($line =~ s/(a+x+)/--/ ){
#print "Yes", $1,"\n";
print $line,"\n";
print "Before:
", $`,"\n";
print "After:
", $',"\n";
print "Match:
", $&,"\n";
#A t--i and a fox
#Before: A t
#After: i and a fox
#Match: ax
}
14
Exercise
• Write a regular expression to swap the first two words
• Match a line of 80 characters
15
Backreferences
$line = 'It is THIS and not THAT';
$line =~ /(TH..)/;
print "$1\n";
#earliest match - default
behavior
$line =~ /(TH..).*(TH..)/;
print "$1
$2\n";
($one,$two) = ($line =~ /(Th..).*(TH..)/);
print "1:$one
2:$two
\n";
16
Using back references
What happens to the back references if the regular expression fails to match?
Example:
$text = “This is scary”;
$text =~ m”(scari)”;
print $1 ; # $1 does not get set if the regular expression match fails.
#Make use of short-circuit evaluation as shown below:
($text =~ m”(scari)”) && ($found = $1);
17
Using back references
Nesting backreferences
Some rules:
The earlier a backreference is in an expression, the lower ots backreference
number.
Example:
$string = 'abracadabra';
$string =~ m"(a)(b)";
print "$1
$2\n";
#a
b
18
Using back references
Nesting backreferences
Some rules:
The more general a backreference is, the lower the backreference number.
Example:
$string = "softly slowly surely subtly";
$string =~ m"((s...ly\s*)*)";
print "$1\n";# softly slowly surely subtly
print "$2\n"; # subtly
Explanation:
The pattern, "(s...ly\s*)* matches multiple things:
First, softly, then slowly, then surely and then subtly.
Since it matches multiples, the first matches are thrown out and $2 has
subtly.
19
Backreferences
• Back references can be used in the regular expression itself
• If you put () around a group of characters and you want to refer to the
back references in the second part of s “ ” ”, you use $1, $2 etc. If you
want to use the back references in m” “ or the first part of the s” “ “,
you use \1 \2 etc.
Example:
$string = "sample examples";
if ($string =~ m"(amp..) ex\1"){
print "Matches! \n";
}
20
Backreferences
Example:
• $text = ‘bballball’;
• $string =~ s”(b)\1(a..)\1\2”$1$2”; #Does this match
the string in $text?
• Steps in matching:
1. The first b in () matches the first b in $text and is saved in $1 and
\1.
2. \1 matches the second b in the string.
3. (a..) matches the string all and is stored in $2 and \2.
4. \1 matches the next b.
5. \2 (contains all) matches the next and last three characters, all.
21
Multiple-Match Operators
There are six multiple match operators.
They are: * - zero or more
• + - one or moew
• ? – zero or one time
• { X } – match ‘X’ times
• {X, } – match ‘X’ or more times
• { X, Y} – match X to Y times
Example:
22
Greediness
Example:
$line = 'This is the example of the greedy match
pattern';
$line =~ m /This (.*)the/;
print $1;
#is the example of
$line = 'Somethings';
$line =~ m"(\w{2,3})";
print "$1\n";
#Som
23
Greediness
The multiple match operators will, by default, will gobble up the
maximum number of characters in a string and still match the
pattern.
Example 1:Find and replace the first “round” with “square” in the
string.
$line = “About a round and round rock”;
#Tries to match the maximal number of “any” characters represented
by .*.
$line =~ s/.*(round)/square/;
print $line;
# square rock
#A question mark after the greedy quantifiers, the smallest quantity is
chosen.
$line =~ s/.*?(round)/square/;
print $line; # square and round rock
24
Curbing Greediness
*? – match zero, 1 or many times, but match the fewest number of times.
+? – match 1 or many times, but match the fewest number of times.
?? - match zero or 1 time, but match the fewest possible number of times.
25
Some modifiers
The /g modifier in substitution means that every single instance of a
regular expression is replaced.
$line = " round and round the rugged rock the
rascal ran";
$matches = ($line =~ s"round"Round");
print "Matches = $matches\n";
#1
$matches = ($line =~ s"round"Round“g);
print "Matches = $matches\n";
#2
Perl uses the /g modifier in a different way with match than it does
with substitution.
26
Some modifiers
Perl uses the /g modifier in a different way with match than it does
with substitution.
Perl attaches an iterator to the g operator. When you match once, Perl
remembers where the match occurred. Therefore, you can continue to
match where you left of. When Perl reaches the end of the string, the
iterator is reset.
Example:
$line = ‘hello Susan hello Jane’;
while ($line =~ m/hello (\w+)/g) {
print “$1\n”;
}
Output:
Susan
Jane
27
Some modifiers
• Perl uses the /g modifier in a different way with match than it does
with substitution.
• specifies global pattern matching--that is, matching as many times as
possible within the string.
• How it behaves depends on the context. In list context, it returns a list
of all the substrings matched by all the parentheses in the regular
expression. If there are no parentheses, it returns a list of all the
matched strings, as if there were parentheses around the whole pattern.
• In scalar context, each execution of m//g finds the next match,
returning TRUE if it matches, and FALSE if there is no further match.
Example:
$line = ‘hello Susan hello Jane’;
@matches = ($line =~ m/hello (\w+)/g);
print “@matches\n”;
#Susan Jane
28
Alternation to match more than one set of characters
• To match more than one pattern.
• Alternation always tries to match the first item in parentheses. If it
does not match, the second pattern is tried and so on.
Example:
$line = “starship”
$line =~ m”(.*|star)”; #matches starship
29
Perl’s grep
• The grep function evaluates the BLOCK or EXPR for each element
of LIST, locally setting the $_ variable equal to each element.
• BLOCK is one or more Perl statements delimited by curly brackets.
• LIST is an ordered set of values.
• EXPR is one or more variables, operators, literals, functions, or
subroutine calls.
• grep returns a list of those elements for which the EXPR or BLOCK
evaluates to TRUE.
• LIST can be a list or an array. In a scalar context, grep returns the
number of times the expression was TRUE.
30
grep -Some examples
#!/usr/local/bin/perl -w
@numbers = (1,2,3,4,5,6,7,8,9);
@lessThanFive = grep ($_ < 5, @numbers);
print @lessThanFive, "\n";
@tokens = ("A String", 234, "String 2", 111);
@ints = grep (m"^\d+$", @tokens);
print @ints, "\n";
@words = qw(a silly fox jumped over a silly
horse);
print @words, "\n";
31
grep - Some Examples
$howMany = grep /silly/, @words;
print $howMany;
@line = qw(jumped over a rail and hopped across a
meadow);
@list = grep { s/ed/s/ if /^jump/ } @words;
print @list;
32
grep - Some Examples
@numbers = (1,30,200,50,200,450,2000);
@greaterThan50 = grep { $_ > 50 } @numbers;
print "@greaterThan50\n";
@colors = ();
$colors[1] = "yellow"; $colors[5] = "green";
$colors[10] = "blue";
@array = grep { defined $_ } @colors;
print "@array\n";
What does this do?
@results = grep { $array[$_] =~ /PACKAGE/ }
(0..$#array)
33
grep – Some Examples
open FILE "<myfile" or die "can't open myfile:
$!";
@lines = <FILE>;print grep /xml|xslt/i, @lines;
@unique = grep { not $found{$_}++ }
d e f g f h h);
print "@unique\n";
qw(a b a c d
34