Lecture 6 Lecture 7 Regular grep Expressions Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search.
Download ReportTranscript Lecture 6 Lecture 7 Regular grep Expressions Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search.
Lecture 6 Lecture 7 Regular grep Expressions Why Regular Expressions? Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep(fgrep, egrep) - search a file for a string or regular expression sed - stream editor awk (nawk) - pattern scanning and processing language There are some minor differences between the regular expressions supported by these programs We will cover the general matching operators first. Character Class [] matches any of the enclosed chars [abc] matches a single a b or c [a-z] matches any of abcdef…xyz [^A-Za-z] matches a single character as long as it is not a letter. Example: [Dd][Aa][Vv][Ee] Matches "Dave" or "dave" or "dAVE", Does not match "ave" or "da" Regular Expression Operators Any character (except a metacharacter!) matches itself. . Matches any single character except newline. * Matches 0 or more of the immediately preceding R.E. ? Matches 0 or 1 instances of the immediately preceding R.E. + Matches 1 or more instances of immediately preceding R.E. ^ Matches the preceding R.E. at the beginning of the line $ Matches the preceding R.E. at the end of the line | Matches the R.E. specified before or after this symbol \ Turn off the special meaning Examples of R.E. x[abc]?x matches "xax" or "xx“ [abc]* matches "aaaaa" or "acbca" 0*10 matches "010" or "0000010"or "10" ^(dog)$ matches lines starting and ending with dog [\t ]* (A|a)+b*c? Grouping with parens If you put a subpattern inside parens you can use + * and ? to the entire subpattern. a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd" Example 1. 2. 3. 4. 5. 6. 7. 8. Christian Scott lives here and will put on a Christmas party There are around 30 to 35 people invited. They are: Tom Dan Rhonda Savage Nicky and Kimberly. Steve, Suzanne, Ginger and Larry ^[A-Z]..$ ^[A-Z][a-z]*3[0-5] ^ *[A-Z][a-z][a-z]$ ^[A-Z][a-z]*[^,][A-Za-z]*$ [a-z]*\. Review: Metacharacters for filename abbreviation Matches anything: ls Test*.doc ? Matches any single character * ls Test?.doc [abc…] Matches any of the enclosed characters: ls T[eE][sS][tT].doc [a-z] matches any character in a range ls [a-zA-Z]* [!abc…] matches any character except those listed: ls [!0-9]* Difference !! Although there are similarities to the metacharacters used in filename expansion – we are talking about something different! Filename expansion is done by the shell. Regular expressions are used by commands (programs). However, be careful about specifying RE on the command line as a result of this overlap Good idea to always quote RE with special chars (‘’or “”)on the command line Example: % grep ‘[a-z]*’ chap[12]* Note: filename mask expanded by shell w/o `` grep - search for a string grep [-bchilnsvw] PATTERN [filename...] Read files or standard /redirected input Search for specified pattern in each line Send results to the standard output Examples: %grep ‘^X11’ *- search all files for lines starting with the string “X11” %grep -v text file - print lines that do not match “text” Regular expressions for grep c \c ^ $ . [...] [^....] r* any non special character turn off any special meaning of character c beginning of line end of line any single character any of characters in range .… any single character not in range .… zero or more occurrences of r Regular Expressions for grep \< beginning of word anchor \<abc matches “abcd” but not “dabc” \> end of work anchor abc\> matches “dabc” but not “abcd” \(…\) stores the pattern … \(abc\)def matches “abcdef” and stores abc in \1. So \(abc\)def\1 matches “abcdefabc”. Can store up to 9 matches grep - options Some -c -h -l -v -n useful options count number of lines do not display filename list only the files with matching lines display lines that do not match print line numbers File db northwest western southwest southern southeast eastern northeast north central NW WE SW SO SE EA NE NO CT Charles Main Sharon Gray Lewis Dalsass Suan Chin Patricia Heme TB Savage AM Main Jr. Margot Webber Ann Stephens 3.0 5.3 2.7 5.1 4.0 4.4 5.1 4.5 5.7 .98 .97 .8 .95 .7 .84 .94 .89 .94 3 5 2 4 4 5 3 5 5 34 23 18 15 17 20 13 9 13 grep with pipes Remember, we can use pipes when a file is expected ls –l | grep ‘\<Feb.*3\>’ egrep Extended grep allows for more kinds of regular expressions unfortunately, egrep regular expressions are not a superset of grep regular expressions • some of grep’s regular expressions are not available in egrep grep vs. egrep new to egrep matches one or more occurrences of f matches zero or one occurrences of f matches f or g groups characters a and b together only in grep f+ f? f|g (ab) \( … \), \<, \> Final Note: Different versions of grep/egrep may support different expressions. Make sure to check the man pages. Recommended Reading Chapter 3 Chapter 4, sections 4.1 – 4.5