Chapter One - Clark College

Download Report

Transcript Chapter One - Clark College

Chapter Five Advanced File Processing

Guide To UNIX Using Linux

Fourth Edition Chapter 5 Unix (34 slides) CTEC 110 1

Objectives • • • • • Use the pipe operator to redirect the output of one command to another command Use the grep command to search for a specified pattern in a file Use the uniq command to remove duplicate lines from a file Use the comm and diff commands to compare two files Use the wc command to count words, characters and lines in a file Chapter 5 Unix (34 slides) CTEC 110 2

Objectives (continued) • • Use manipulation and transformation commands, which include sed, tr, and pr Design a new file-processing application by creating, testing, and running shell scripts Chapter 5 Unix (34 slides) CTEC 110 3

Advancing Your File-Processing Skills • Selection commands focus on extracting specific information from files Chapter 5 Unix (34 slides) CTEC 110 4

Advancing Your File-Processing Skills (continued) • Manipulation and transformation commands alter and transform extracted information into useful and appealing formats Chapter 5 Unix (34 slides) CTEC 110 5

Advancing Your File-Processing Skills (continued) Chapter 5 Unix (34 slides) CTEC 110 6

Using the Selection Commands • Using the Pipe Operator – The pipe operator (|) redirects the output of one command to the input of another – An example would be to redirect the output of the ls command to the more command – The pipe operator can connect several commands on the same command line Chapter 5 Unix (34 slides) CTEC 110 7

Using the Pipe Operator Using pipe operators and connecting commands is useful when viewing directory information Chapter 5 Unix (34 slides) CTEC 110 8

Using the grep Command • • • Used to search for a specific pattern in a file, such as a word or phrase grep’s options and wildcard support allow for powerful search operations You can increase grep’s usefulness by combining with other commands, such as head or tail Chapter 5 Unix (34 slides) CTEC 110 9

Using the uniq Command • • • Removes duplicate lines from a file Compares only consecutive lines, therefore uniq requires sorted input uniq has an option that allows you to generate output that contains a copy of each line that has a duplicate Chapter 5 Unix (34 slides) CTEC 110 10

Using the uniq Command (continued) Chapter 5 Unix (34 slides) CTEC 110 11

Using the uniq Command (continued) Chapter 5 Unix (34 slides) CTEC 110 12

Using the comm Command • • • Used to identify duplicate lines in sorted files Unlike uniq, it does not remove duplicates, and it works with two files rather than one It compares lines common to file1 and file2, and produces three column output – Column one contains lines found only in file1 – Column two contains lines found only in file2 – Column three contains lines found in both files Chapter 5 Unix (34 slides) CTEC 110 13

Using the diff Command • • • Attempts to determine the minimal changes needed to convert file1 to file2 The output displays the line(s) that differ Codes in the output indicate that in order for the files to match, specific lines must be added or deleted Chapter 5 Unix (34 slides) CTEC 110 14

Using the wc Command • • • Used to count the number of lines, words, and bytes or characters in text files You may specify all three options in one issuance of the command If you don’t specify any options, you see counts of lines, words, and characters (in that order) Chapter 5 Unix (34 slides) CTEC 110 15

Using the wc Command (continued) The options for the wc command: –l for lines –w for words –c for characters Chapter 5 Unix (34 slides) CTEC 110 16

Using Manipulation and Transformation Commands • • These commands are: sed, tr, pr Used to edit and transform the appearance of data before it is displayed or printed Chapter 5 Unix (34 slides) CTEC 110 17

• • • Introducing the sed Command sed is a UNIX/Linux editor that allows you to make global changes to large files Minimum requirements are an input file and a command that lets sed know what actions to apply to the file sed commands have two general forms – Specify an editing command on the command line – Specify a script file containing sed commands Chapter 5 Unix (34 slides) CTEC 110 18

Translating Characters Using the tr Command • • • tr copies data from the standard input to the standard output, substituting or deleting characters specified by options and patterns The patterns are strings and the strings are sets of characters A popular use of tr is converting lowercase characters to uppercase Chapter 5 Unix (34 slides) CTEC 110 19

Using the pr Command to Format Your Output • • • pr prints specified files on the standard output in paginated form By default, pr formats the specified files into single column pages of 66 lines Each page has a five-line header containing the file name, its latest modification date, and current page, and a five-line trailer consisting of blank lines Chapter 5 Unix (34 slides) CTEC 110 20

Designing a New File-Processing Application • • • The most important phase in developing a new application is the design The design defines the information an application needs to produce The design also defines how to organize this information into files, records, and fields, which are called logical structures Chapter 5 Unix (34 slides) CTEC 110 21

Designing Records

• • • The first task is to define the fields in the records and produce a record layout A record layout identifies each field by name and data type (numeric or nonnumeric) Design the file record to store only those fields relevant to the record’s primary purpose 22 Chapter 5 Unix (34 slides) CTEC 110

• • • Linking Files with Keys Multiple files are joined by a key: a common field that each of the linked files share Another important task in the design phase is to plan a way to join the files The flexibility to gather information from multiple files comprised of simple, short records is the essence of a relational database system Chapter 5 Unix (34 slides) CTEC 110 23

Chapter 5 Unix (34 slides) CTEC 110 24

Creating the Programmer and Project Files • • • With the basic design complete, you now implement your application design UNIX/Linux file processing predominantly uses flat files Working with these files is easy, because you can create and manipulate them with text editors like vi and Emacs Chapter 5 Unix (34 slides) CTEC 110 25

Creating the Programmer and Project Files (continued) Chapter 5 Unix (34 slides) CTEC 110 26

Formatting Output • • The awk command is used to prepare formatted output For the purposes of developing a new file processing application, we will focus primarily on the printf action of the awk command Awk provides a shortcut to other UNIX/Linux commands Chapter 5 Unix (34 slides) CTEC 110 27

Using a Shell Script to Implement the Application • • Shell scripts should contain: – The commands to execute – Comments to identify and explain the script so that users or programmers other than the author can understand how it works Use the pound (#) character to mark comments in a script file Chapter 5 Unix (34 slides) CTEC 110 28

Running a Shell Script • • • You can run a shell script in virtually any shell that you have on your system The Bash shell accepts more variations in command structures that other shells Run the script by typing sh followed by the name of the script, or make the script executable and type ./ prior to the script name Chapter 5 Unix (34 slides) CTEC 110 29

Putting it All Together to Produce the Report • • An effective way to develop applications is to combine many small scripts in a larger script file Have the last script added to the larger script print a report indicating script functions and results Chapter 5 Unix (34 slides) CTEC 110 30

Chapter Summary • • • • UNIX/Linux file-processing commands are (1) selection and (2) manipulation and transformation commands uniq removes duplicate lines from a sorted file comm compares lines common to file1 and file2 diff tries to determine the minimal set of changes needed to convert file1 into file2 Chapter 5 Unix (34 slides) CTEC 110 31

Chapter Summary (continued) • • • tr copies data read from the standard input to the standard output, substituting or deleting characters specified sed is a file editor designed to make global changes to large files pr prints the standard output in pages Chapter 5 Unix (34 slides) CTEC 110 32

Chapter Summary (continued) • • • The design of a file-processing application reflects what the application needs to produce Use record layout to identify each field by name and data type Shell scripts should contain commands to execute programs and comments to identify and explain the programs Chapter 5 Unix (34 slides) CTEC 110 33

Chapter 5 Unix Exercises

Work through Hands-on Projects at end of chapter 5 • Canvas: Review Questions 5 –

(Do not do questions 22,23,24 and 25)

• Quiz 5 Unix… Chapter 5 Unix (34 slides) CTEC 110 34