Lecture 6 Lecture 7 Regular grep Expressions Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search.
Download
Report
Transcript Lecture 6 Lecture 7 Regular grep Expressions Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search.
Lecture 6
Lecture 7
Regular
grep
Expressions
Why Regular Expressions?
Regular expressions are used to describe text
patterns/filters
Unix commands/utilities that support regular
expressions:
grep(fgrep, egrep) - search a file for a string or
regular expression
sed - stream editor
awk (nawk) - pattern scanning and processing
language
There are some minor differences between the
regular expressions supported by these programs
We will cover the general matching operators first.
Character Class
[] matches any of the enclosed chars
[abc] matches a single a b or c
[a-z] matches any of abcdef…xyz
[^A-Za-z] matches a single character as long as it
is not a letter.
Example: [Dd][Aa][Vv][Ee]
Matches "Dave" or "dave" or "dAVE",
Does not match "ave" or "da"
Regular Expression Operators
Any character (except a metacharacter!) matches itself.
.
Matches any single character except newline.
*
Matches 0 or more of the immediately preceding R.E.
?
Matches 0 or 1 instances of the immediately preceding
R.E.
+
Matches 1 or more instances of immediately preceding
R.E.
^
Matches the preceding R.E. at the beginning of the line
$
Matches the preceding R.E. at the end of the line
|
Matches the R.E. specified before or after this symbol
\
Turn off the special meaning
Examples of R.E.
x[abc]?x
matches "xax" or "xx“
[abc]* matches "aaaaa" or "acbca"
0*10
matches "010" or "0000010"or "10"
^(dog)$
matches lines starting and ending
with dog
[\t ]*
(A|a)+b*c?
Grouping with parens
If
you put a subpattern inside parens you
can use + * and ? to the entire subpattern.
a(bc)*d matches "ad" and "abcbcd"
does not match "abcxd" or "bcbcd"
Example
1.
2.
3.
4.
5.
6.
7.
8.
Christian Scott lives here and will put on a Christmas party
There are around 30 to 35 people invited.
They are:
Tom
Dan
Rhonda Savage
Nicky and Kimberly.
Steve, Suzanne, Ginger and Larry
^[A-Z]..$
^[A-Z][a-z]*3[0-5]
^ *[A-Z][a-z][a-z]$
^[A-Z][a-z]*[^,][A-Za-z]*$
[a-z]*\.
Review: Metacharacters
for filename abbreviation
Matches anything: ls Test*.doc
? Matches any single character
*
ls Test?.doc
[abc…] Matches any of the enclosed
characters: ls T[eE][sS][tT].doc
[a-z] matches any character in a range
ls [a-zA-Z]*
[!abc…] matches any character except those
listed: ls [!0-9]*
Difference !!
Although there are similarities to the metacharacters
used in filename expansion – we are talking about
something different!
Filename expansion is done by the shell.
Regular expressions are used by commands (programs).
However, be careful about specifying RE on the
command line as a result of this overlap
Good idea to always quote RE with special chars (‘’or “”)on
the command line
Example:
% grep ‘[a-z]*’ chap[12]*
Note: filename mask expanded by shell w/o ``
grep - search for a string
grep
[-bchilnsvw] PATTERN [filename...]
Read files or standard /redirected input
Search for specified pattern in each line
Send results to the standard output
Examples:
%grep ‘^X11’ *- search all files for lines starting with
the string “X11”
%grep -v text file - print lines that do not match “text”
Regular expressions for grep
c
\c
^
$
.
[...]
[^....]
r*
any non special character
turn off any special meaning of character c
beginning of line
end of line
any single character
any of characters in range .…
any single character not in range .…
zero or more occurrences of r
Regular Expressions for grep
\<
beginning of word anchor
\<abc matches “abcd” but not “dabc”
\>
end of work anchor
abc\> matches “dabc” but not “abcd”
\(…\)
stores the pattern …
\(abc\)def matches “abcdef” and stores
abc in \1. So \(abc\)def\1 matches
“abcdefabc”. Can store up to 9 matches
grep - options
Some
-c
-h
-l
-v
-n
useful options
count number of lines
do not display filename
list only the files with matching lines
display lines that do not match
print line numbers
File db
northwest
western
southwest
southern
southeast
eastern
northeast
north
central
NW
WE
SW
SO
SE
EA
NE
NO
CT
Charles Main
Sharon Gray
Lewis Dalsass
Suan Chin
Patricia Heme
TB Savage
AM Main Jr.
Margot Webber
Ann Stephens
3.0
5.3
2.7
5.1
4.0
4.4
5.1
4.5
5.7
.98
.97
.8
.95
.7
.84
.94
.89
.94
3
5
2
4
4
5
3
5
5
34
23
18
15
17
20
13
9
13
grep with pipes
Remember,
we can use pipes when a file
is expected
ls –l | grep ‘\<Feb.*3\>’
egrep
Extended
grep
allows for more kinds of regular expressions
unfortunately, egrep regular expressions are
not a superset of grep regular expressions
• some of grep’s regular expressions are not
available in egrep
grep vs. egrep
new to egrep
matches one or more occurrences of f
matches zero or one occurrences of f
matches f or g
groups characters a and b together
only in grep
f+
f?
f|g
(ab)
\( … \), \<, \>
Final Note: Different versions of grep/egrep may
support different expressions. Make sure to
check the man pages.
Recommended Reading
Chapter
3
Chapter 4, sections 4.1 – 4.5