Sed : a “Stream EDitor” - University of Washington

Download Report

Transcript Sed : a “Stream EDitor” - University of Washington

An Introduction to Sed & Awk
Presented Tues, Jan 14th, 2003
Send any suggestions to Siobhan Quinn
([email protected])
Sed : a “Stream EDitor”
What is Sed ?


A “non-interactive” text editor that is called from the unix command
line.
Input text flows through the program, is modified, and is directed to
standard output.
An Example:
The following sentence is input to the sed program:
echo "Instrumental in ruining entire operation for a Midwest chain
operation." |
sed 's/ruining/running/‘
Instrumental in running entire operation for a Midwest chain operation.
Why Use Sed?


Eliminate the tedium of routine editing tasks! (find, replace, delete,
append, insert)
… but your word processor can already do that right? Wrong.
Sed is extremely powerful AND comes with every Unix system in the
world!
Sed is designed to be especially useful in three cases:
1.
To edit files too large for comfortable interactive editing;
2.
To edit any size file when the sequence of editing commands is too
complicated to be comfortably typed in interactive mode.
3.
To perform multiple `global' editing functions efficiently in one pass
through the input.
How Sed Works
While (read line){
1 ) Sed reads an input line from STDIN or a given file, one line at a
time, into the pattern space.
Pattern Space = a data buffer - the “current text” as it’s being edited
2) For each line, sed executes a series of editing commands (written by
the user, you) on the pattern space.
3) Writes the pattern space to STDOUT.
}
How Sed Works (cont…)
echo “amy enjoys hiking and ben enjoys skiing” |
sed –e ‘s/skiing/hiking/g; s/hiking/biking/g’
1 ) Sed read in the line “amy enjoys hiking and
the first ‘substitute’ command.
The resulting line – in the pattern space:
ben enjoys skiing”
and executed
“amy enjoys hiking and ben enjoys hiking”
2) Then the second substitute command is executed on the line in the pattern
space, and the result is :
“amy enjoys biking and ben enjoys biking”
3) And the result is written to standard out.
Invoking Sed Commands
$ sed [-e script] [-f script-file] [-n] [files...]
-e
an "in-line" script, i.e. a script to sed execute given on the command line. Multiple command line scripts can
be given, each with an -e option.
-n
by default, sed writes each line to stdout when it reaches the end of the script (being whatever on the line)
this option prevents that. i.e. no output unless there is a command to order SED specifically to do it
-f
read scripts from specified file, several -f options can appear
files
are the files to read, if a "-" appears, read from stdin,if no files are given, read also from stdin
Different Ways to Invoke Sed:
sed –e 'command;command;command' input_file
see results
sed –e 'command;command;command' input_file > output_file
save results
.... | sed –e 'command;command;command' | ....
use in a pipeline
sed -f sedcommands input_file > output_file
commands are in file somewhere else
Invoking Sed (some notes)
1.
sed commands are usually on one line
2.
if we want more (multi-line commands), then we must end the first line with an `\'
3.
if a command is one line only, it can be separated by a `;‘
4.
if it is a multi-line, then it must contain all of its line (except the first) by
themselves
5.
on command line, what follows a `-e' is like a whole line in a sed script
Regular Expressions
Sed uses regular expressions to match patterns in the input text, and then
perform operations on those patterns.
^
$
.
\
(character)*
(character)?
(character)+
[abcdef]
[^abcdef]
(character)\{m,n\}
(character)\{m,\}
(character)\{,n\}
(character)\{n\}
\{n,m\}
\(expression\)
expression1|expression2
()
matches the beginning of the line
matches the end of the line
Matches any single character
Escapes any metacharacter that follows, including itself.
Match arbitrarily many occurences of (character)
Match 0 or 1 instance of (character)
Match 1 or more instances of (character)
Match any character enclosed in [ ] (in this instance, a b c d e or f)
Match any character NOT enclosed in [ ]
Match m-n repetitions of (character)
Match m or more repetitions of (character)
Match n or less (possibly 0) repetitions of (character)
Match exactly n repetitions of (character)
range of occurrences, n and m are integers
Group operator.
Matches expression1 or expression 2.
groups regular expressions
Regular Expressions (character classes)
The following character classes are short-hand for matching special characters.
[:alnum:]
Printable characters (includes white space)
[:alpha:]
Alphabetic characters
[:blank:]
Space and tab characters
[:cntrl:]
Control characters
[:digit:]
Numeric characters
[:graph:]
Printable and visible (non-space) characters
[:lower:]
Lowercase characters
[:print:]
Alphanumeric characters
[:punct:]
Punctuation characters
[:space:]
Whitespace characters
[:upper:]
Uppercase characters
[:xdigit:]
Hexadecimal digits
Regular Expressions (cont…)
/^M.*/
Line begins with capital M, 0 or more chars follow
/..*/
At least 1 character long (/.+/ means the same thing)
/^$/
The empty line
ab|cd
Either ‘ab’ or ‘cd’
a(b*|c*)d
matches any string beginning with a letter a, followed by either zeroor
more of the letter b, or zero or more of the letter c, followed by
the letter d.
[[:space:][:alnum:]]
Matches any character that is either a white space character or
alphanumeric.
Note:
Sed always tries to find the longest matching pattern in the
input. How would you match a tag in an HTML document?
Line Addresses
Each line read is counted, and one can use this information to absolutely select which
lines commands should be applied to.
1
2
...
$
i,j
first line
second line
last line
from i-th to j-th line, inclusive. j can be $
Examples :
sed ’53!d’
sed –n ‘4,9p’
prints through line number 52
prints only lines 4 through 9
Context Addresses
The second kind of addresses are context, or Regular Expression, addresses. Commands will be
executed on all pattern spaces matched by that RE.
Examples:
sed ‘/^$/d’
sed ‘/./,$!d’
will delete all empty lines
will delete all leading blank lines at the top of file
Some Rules:





commands may take 0, 1 or 2 addresses
if no address is given, a command is applied to all pattern spaces
if 1 address is given, then it is applied to all pattern spaces that match that address
if 2 addresses are given, then it is applied to all formed pattern spaces between the pattern
space that matched the first address, and the next pattern space matched by the second
address.
If pattern spaces are all the time single lines, this can be said like, if 2 addrs are given, then the
command will be executed on all lines between first addr and second (inclusive)
Sed Commands
We will go over the only some basic sed commands.
a
c
d
i
p
s
append
change lines
delete lines
insert
print lines
substitute
Sed Commands (cont… )
APPEND
[address]a\
text
Append text following each line matched by address.
In hello_world.script there is :
From the command prompt type:
/<body/i\
$ cat <an html file> | sed –f hello_world.script
Hello World
CHANGE
[address1[,address2]]c\
text
Replace (change) the lines selected by the address with text.. The append and insert commands can be
applied only to a single line address, not a range of lines. The change command, however, can address
a range of lines. In this case, it replaces all addressed lines with a single copy of the text. In other words,
it deletes each line in the range but the supplied text is output only once.
In change_mail.script there is:
From the command prompt type:
/^From /,/^$/c\
<Mail Header Removed>
$ cat address_file | sed –f
change_mail.script > new_mail.html
Sed Commands (cont…)
DELETE
[address1[,address2]]d
Delete line(s) from pattern space. Thus, the line is not passed to standard output. A new line of input is
read and editing resumes with first command in script.
What do these do ?
cat homework.html | sed -e '/[Hh]omework/d' >newhomework.html
cat homework.html | sed –e ‘1,20d’ > newhomework2.html\
INSERT
[address1]i\
text
Insert text before each line matched by address. (See append example)
PRINT
[address1[,address2]]p
Print the addressed line(s). Note that this can result in duplicate output unless default output is
suppressed by using "#n" or the -n command-line option.
sed –n ‘/regexp/,$p’
prints from regexp to the end of the
file.
Sed Commands (cont…)
SUBSTITUE
s [address1[,address2]]s/pattern /replacement /[flags]
Substitute replacement for pattern on each addressed line. If pattern addresses are used,
the pattern // represents the last pattern address specified. The following flags can be
specified:
•
n Replace nth instance of /pattern/ on each addressed line. n is any number in the range
1 to 512, and the default is 1.
g
Replace all instances of /pattern/ on each addressed line, not just the first
instance.
p
Print the line if a successful substitution is done. If several successful substitutions are
done, multiple copies of the line will be printed.
Substitute bad grades for better ones
Input File:
CLASS
GRADE
cat grades.txt |
sed 's/1\..\|2\..\|3\../4.0/g‘ old_transcript >
new_transcript
•
CSE 321
CSE 341
Speech Com
CSE 378
Delete ONLY the regular expression and not the entire line
sed 's/ <a bad word>//g' somefile > censured_file
2.4
3.0
4.0
1.4
Short Comings of Sed
The 'sed' program takes the 'infile' A LINE AT A TIME and applies to that ONE LINE, ALL of
the commands in 'cmdfile' in the order that they are written. If the problems with the file
depend on what's on other lines (like LaTeX which creates environments), 'sed' will not
be a quick fix. For example you cannot take 'times' (meaning math multiplication) in
troff and substitute '\times' because the line-at-a-time substitution doesn't know when
you are or are not in the equation environment. If you do you'll also get '\times' in the
text (ditto with 'sum', 'over', etc.).