Chapter Five Advanced File Processing

Download Report

Transcript Chapter Five Advanced File Processing

Chapter Five
Advanced File Processing
Lesson A
Selecting, Manipulating, and
Formatting Information
2
Objectives
• Use the pipe operator to redirect the
output of one command to another
command
• Use the grep command to search for
a specified pattern in a file
• Use the uniq command to remove
duplicate lines from a file
3
Objectives
• Use the comm and diff commands to
compare two files
• Use the wc command to count words,
characters and lines in a file
• Use the manipulate and format
commands: sed, tr, and pr
4
Advancing Your
File Processing Skills – Selection
Commands
• The select commands, which extract data
5
Advancing Your
File Processing Skills – Manipulation and
Transformation Commands
• The manipulation and transformation commands alter
and transform into useful and appealing formats data
6
Using the Select Commands
Using Pipes – The pipe operator (|) redirects the output of one
command to the input of another command. The character used to
represent the pipe is the <Shift> \ immediately above the right
<Enter> key (It looks like a : ).
– An example would be to redirect the output of the ls command to
the more command
• ls | more
– The pipe operator can connect several commands on the same
command line
• First_command | second command | third command
• The output of the first command goes into the second command as
input, and the output of the second command goes into the third
command as input.
• For example: cat products | cut –f2 –d: | sort
– Will cat the products file, cut out the description field and sort by the
desciption field.
7
Using Pipes
Using pipe
operators and
connecting
commands is
useful when
viewing directory
information
8
Using the grep Command
• Used to search for a specific pattern in a file,
such as a word or phrase. This is used to
search within a file or files for a word or phrase.
• grep’s options and wildcard support allow for
powerful search operations
• You can increase grep’s usefulness by
combining with other commands, such as head
or tail
9
Using the grep Command
grep can take
input from other
commands and
also be directed to
provide input for
other commands
10
grep
• Grep can be extremely useful to find files
based on text within the file.
– Options:
•
•
•
•
-i ignores case
-l lists only file names
-c counts the number of lines
-r searches through all subdirectories beneath the
current directory
11
grep
• Grep can be used to search for text within
a command output.
– For example, the ps (process status)
command displays all of the processes
currently running on the system. To see only
the processes running for a particular user,
type in:
• ps –gaux | grep username
12
Using the uniq Command
• Removes duplicate lines from a file
• It compares only consecutive lines, therefore
uniq requires sorted input
• Uniq has an option that allows you to generate
output that contains a copy of each line that has
a duplicate
13
Using the comm Command
• Used to identify duplicate lines in sorted files
• Unlike uniq, it does not remove duplicates, and it
works with two files rather than one
• It compares lines common to file1 and file2, and
produces three column output
– Column one contains lines found only in file1
– Column two contains lines found only in file2
– Column three contains lines found in both files
14
Using the diff Command
• Attempts to determine the minimal
changes needed to convert file1 to file2
• The output displays the line(s) that differ
• The associated codes in the output
indicate that in order for the files to match,
specific lines must be added or deleted
15
Using the wc Command
• Used to count the number of lines, words, and
bytes or characters in text files
• You may specify all three options in one
issuance of the command
• If you don’t specify any options, you see counts
of lines, words, and characters (in that order)
• wc –c filename displays byte count in file
• wc – l filename displays line count in file
• wc – w filename displays word count in file
16
Using the wc Command
The options for
the wc command:
–l for lines
–w for words
–c for characters
17
Using the Manipulate and
Format Commands
• These commands are: sed, tr, pr
• Used to edit and transform the
appearance of data before it is
displayed or printed
18
Introducing sed
• sed is a UNIX editor that allows you to make
global changes to large files. It is a line editor,
which means that you specify changes by line
number and do not use the arrow keys to move
around within the file.
• Minimum requirements are an input file and a
command that lets sed know what actions to
apply to the file
• sed commands have two general forms
– Specify an editing command on the command line
– Specify a script file containing sed commands
19
Introducing sed
The many options
of sed allow you to
create new files
containing the
specific data you
specify
20
sed options
• Options
– n specifies line numbers to edit
• sed –n 3,4p file1 (p must be used after range to
print the specified lines)
– d deleted lines specified by –n
– s substitutes specified text
– e specifies multiple commands per line
a\ appends text (no hyphen)
21
sed examples
• sed [-options] [command] [file(s)]
– Can specify a command to be performed on
one or more files
• sed [-options] [-f scriptfile] [file(s)]
– Can use a scriptfile to specify a set of
commands to be performed on one or more
files
22
Translating Characters
Using the tr command
• tr copies data from the standard input to the standard
output, substituting or deleting characters specified by
options and patterns
• Options
– d deletes characters
– s substitutes or replaces characters
• The patterns are strings and the strings are sets of
characters
• A popular use of tr is converting lowercase characters to
uppercase
– tr [a-z] [A-Z] <product1
Converts all lowercase chars in the file
to uppercase
23
Using the pr Command to
Format Your Output
• pr prints specified files on the standard
output in paginated form
• By default, pr formats the specified files
into single-column pages of 66 lines
• Each page has a five-line header, its latest
modification date, current page, and fiveline trailer consisting of blank lines
24
Using the pr Command to
Format Your Output
25
Using the pr Command to
Format Your Output
26
Lesson B
Using UNIX File-Processing Tools
to Create an Application
27
Objectives
• Design a new file-processing
application
• Design and create files to implement
the application
• Use awk to generate formatted output
28
Objectives
• Use cut, sort, and join to organize and
transform selected file information
• Develop customized shell scripts to extract
and combine file data
• Test individual shell scripts and combine
all scripts into a final shell program
29
Designing a New FileProcessing Application
• The most important phase in developing a
new application is the design
• The design defines the information an
applications needs to produce
• The design also defines how to organize
this information into files, records, and
fields, which are called logical structures
30
Designing Records
• The first task is to define the fields in the
records and produce a record layout
• A record layout identifies each field by
name and data type (numeric or
nonnumeric)
• Design the file record to store only those
fields relevant to the record’s primary
purpose
31
Linking Files with Keys
• Multiple files are joined by a key – a common
field that each of the linked files share
• Another important task in the design phase is to
plan a way to join the files
• The flexibility to gather information from multiple
files comprised of simple, short records is the
essence of a relational database system. UNIX
provides several commands providing this
flexibility
32
33
Creating the Programmer
and Project Files
• With the basic design complete, you now
implement your application design
• UNIX file processing predominantly uses
flat files. Working with these files is easy,
because you can create and manipulate
them with text editors like vi and Emacs
34
35
Formatting Output
• The awk command is used to prepare
formatted output
• For the purposes of developing a new fileprocessing application, we will focus
primarily on the printf action of the awk
command
36
Formatting Output
Awk provides a
shortcut to
other UNIX
commands
37
Using a Shell Script to
Implement the Application
• Shell scripts should contain:
– The commands to execute
– Comments to identify and explain the script so
that users or programmers other than the
author can understand how it works
• Use the pound (#) character to mark
comments in a script file
38
Running a Shell Script
• You can run a shell script in virtually any
shell that you have on your system
• The Bash shell accepts more variations in
command structures that other shells
• Run the script by typing sh followed by the
name of the script, or make the script
executable and type ./ prior to the script
name
39
Putting it all together to
Produce the Report
• An effective way to develop applications is
to combine many small scripts in a larger
script file
• Have the last script added to the larger
script print a report indicating script
functions and results
40
Putting it all together to
Produce the Report
41
Putting it all together to
Produce the Report
42
Chapter Summary
• The UNIX file-processing commands can be
organized into two categories: (1) select and
(2) manipulation and transformation
• The uniq command removes duplicate lines
from a sorted file
• The comm command compares lines common
to file1 and file2, and produces output that
shows the variances between the two
• The diff command attempts to determine the
minimal set of changes needed to convert file1
into file2
43
Chapter Summary
• The tr command copies data read from the
standard input to the standard output,
substituting or deleting characters specified
• The se command is a file editor designed to
make global changes to large files
• The pr command prints the standard output in
pages
• The design of a file-processing application
reflects what the application needs to produce
• Use record layout to identify each field by
name and data type
44
Chapter Summary
• Shell programs should contain commands to
execute programs and comments to identify
and explain the programs. The pound (#)
character denotes comments
• Write shell scripts in stages so that you can
test each part before combining them into one
script. Using small shell scripts and combining
them in a final shell script file is an effective
way to develop applications
45
46
47