CS 497C - Lecture 12

Download Report

Transcript CS 497C - Lecture 12

CS 497C – Introduction to UNIX
Lecture 29: - Filters Using Regular
Expressions – grep and sed
Chin-Chih Chang
[email protected]
Regular Expressions
• egrep’s extended set includes two special
characters - + and ?. They are often used in
place of * to restrict the matching scope.
• + - matches one or more occurrences of the
previous character.
• ? – matches zero or one occurrence of the
previous character.
$ egrep “true?man” emp.lst
Regular Expressions
• The |, ( and ) can be used to search for
multiple patterns.
$ egrep ‘wood(house|cock)’ emp.lst
• sed is a multipurpose too which combines
the work of several filters.
• Designed by Lee McMahon, it is derived
from the ed line editor.
• sed is used to perform noniteractive
operations.
sed: The Stream Editor
• sed has numerous features – almost
bordering on a programming language but
its functions have been taken over by perl.
• Everything in sed is an instruction. An
instruction combines an address for
selecting lines with an action to be taken on
them:
sed options ‘address action’ file(s)
• The address and action are enclosed within
single quotes.
sed: The Stream Editor
• The components of a sed instruction are
shown as below:
sed ’1,$ s/^bold/BOLD/g’ foo
address action
• You can have multiple instructions in a
single sed command, each with its own
address and action components.
• Addressing in sed is done in two ways:
– By line number (like 3,7p).
– By specifying a pattern (like /From:/p).
Line Addressing
• In the first form, the address specifies either
a single line or a set of two (3,7) to select a
group of contiguous lines.
• The second one uses one or two patterns.
• In either case, the action (p, the print
command) is appended to this address.
• You can simulate head -3 by the 3q
instruction in which 3 is the address and q is
the quit action.
Line Addressing
$ sed ‘3q’ emp.lst
• sed uses the p (print) command to print the
output.
$ sed ‘1,2p’ emp.lst
• By default, sed prints all lines on the
standard output in addition to the lines
affected by the action. So the addressed
lines are printed twice.