In-Service Workshop on Statistical Methods

Download Report

Transcript In-Service Workshop on Statistical Methods

COS 131: Computing for Engineers
Chapter 8: File Input and Output
Douglas R. Sizemore, Ph.D.
Professor of Computer Science
This lecture was given in Fall, 2008 by Professor Sizemore and refers to an older
Version of MATLAB than R2011A.
July 17, 2015
1
Introduction
• Addresses three levels of capability for
reading and writing files in MATLAB
– Saving and restoring the workspace
– High-level functions for accessing files in
specific formats
– Low-level file access programs for generalpurpose file processing
– Consider conditions under which each is
appropriate
July 17, 2015
2
Introduction
• Consider three types of activities that read and
write data files.
– MATLAB has basic ability to save your workspace (or
parts of) to a file and restore it later for further
processing
– Have high-level functions in MATLAB that take the
name of a file in any one of a number of popular
formats and produce an internal representation of the
data from that file in a form ready for processing
– Need to deal with lower-level capabilities for
manipulating text files that do not have recognizable
structures
July 17, 2015
3
Introduction
• Will consider files containing:
–
–
–
–
Workspace variables
Spreadsheet data
Text files with delimited numbers
Text files with plain text
• MATLAB also has the ability to access
binary files – files whose data are not in text
form; we will not consider binary files
here.
July 17, 2015
4
Concept: Serial Input and Output (I/O)
• Refer to the process of reading and writing data
files as Input/Output or I/O
• All computer file systems save and retrieve data as
a sequential stream of characters; remember these
characters are small sets of ones and zeros
corresponding to digital electronic signals of +/- 5
volts dc which represent the binary number system
or 1s and 0s
• Input and output streams depicted in the slides on
the next slide
July 17, 2015
5
Concept: Serial Input and Output (I/O)
• Input and Output Streams:
Input Stream
Output Stream
July 17, 2015
6
Concept: Serial Input and Output (I/O)
• Input and Output
– Data control characters are mixed in with the
regular characters – we an make sense of what
is happening; specify the organization of the
data
July 17, 2015
7
Concept: Serial Input and Output (I/O)
• File Processing Scenario for INPUT
– Program opens the file for reading
– Continually requests values from the file data
stream until the end of file (EOF) is reached
– As the data is received, the program uses the
delimiting characters included in the data
stream to reformat the data to reconstruct the
organization of the data as represented in the
file
July 17, 2015
8
MATLAB Workspace (I/O)
• MATLAB allows you to save your
workspace to a file with the SAVE
command; allows you to reload your
workspace from a file with the LOAD
command
• File will be the name you give it with a .mat
extension
• The default filename is matlab.mat
July 17, 2015
9
MATLAB Workspace (I/O)
• Can also identify specific variables that you want
to save by
– listing them explicitly
– Providing logical expressions to indicate the variable
names
• Example:
>>save mydata.mat a b c*
– The above example would save the variables a b and
any variable beginning with the letter c.
– Not practical as it only saves the results and not the
code
– Almost always better to save the scripts and raw data
that created the workspace
July 17, 2015
10
High-Level I/O Functions
• Now examine the general case of file I/O
– Will need to load data from external sources
– Will need to process those data
– Will need to save those data back to the file
system
July 17, 2015
11
High-Level I/O Functions
• When we attempt to read or write data from an
external file this is extremely difficult without
knowing something of the
– Types of data contained in the file
– The organization of the data in the file
• Good habit:
– explore the data in a file by whatever tools you have at
your disposal
– Commit to processing the data according to your
observations
• Table in following slide shows the file readers and
writers available in MATLAB
July 17, 2015
12
High-Level I/O Functions
• File
File I/O
Content Functions
File
Extension
Reader
Writer
Data Format
Plain text
Any
textscan
fprintf
Specified in the
function calls
Comma-separated
numbers
CSV
csvread
csvwrite
Double array
Tab-separated text
TAB
dlmread
dlmwrite
Double array
General delimited
text
DLM
dlmread
dlmwrite
Double array
Excel worksheet
XLS
xlsread
xlswrite
Double or cell array
Lotus 1-2-3 workshet
WK1
wk1read
wk1write
Double or cell array
Scientific data in
Common Data
Format
CDF
cdfread
cdfwrite
Cell array of CDF
records
Flexible Image
Transport
System data
FITS
fitsread
Primary or extension
table data
HDF or HDF-EOS
data set
Data in Hierarchical
Data Format
HDF
hdfread
Extended Markup
Language (XML)
XML
xmlread
xmlwrite
Document Object
Model node
Image data
Various
imread
imwrite
True color grayscale,
or indexed image
Audio
Julyfile
17, 2015
AU or WAV
auread or wavread
auwrite or wavwrite
Sound data and
sample rate
13
Movie
AVI
aviread
MATLAB movie
High-Level I/O Functions
• Exploration
– Most common files encountered are text files and spreadsheets
– Delimited text files are presumed to contain numerical values
– Spreadsheet data may be either numerical data stored as
doubles (typically 64 bits or 8 bytes per number) or string data
stored in cell arrays.
– Text files are usually delimited by a special character:
•
•
•
•
•
•
Comma
Tab
Space
or another designated character
Designates the column divider
New-line character designates the rows
July 17, 2015
14
High-Level I/O Functions
• Exploration
– Exception is the plain text reader that requires a format to
define columns and rows
– The file extension as in .txt gives you a significant clue to the
nature of the data
– For plain text files you can use a simple editor like Notepad in
Windows to examine the organization of the data and obtain
clues as to how to proceed
July 17, 2015
15
High-Level I/O Functions
• Excel spreadsheets
– Rectangular arrays containing labeled rows and columns of
cells
July 17, 2015
16
High-Level I/O Functions
• Excel spreadsheets
– MATLAB xlsread(…) function separates the text and
numerical portions of a spreadsheet
– The input parameter of xlsread(…) is the name of the file
– Can have up to three return variables
• First return variable will hold all numerical values in an array of
doubles
• Second return variable will hold all tlhe text data in cell arrays
• Third return variable (optional) will hold both string and numerical
data in cell arrays
– Exercise 8.1: Reading Excel Data
– Smith text, page 189-190, bottom-top
July 17, 2015
17
High-Level I/O Functions
• Excel spreadsheets
– Observations from Exercise 8.1
• Excel reader function determines the smallest rectangle on the
spreadsheet containing all of the numerical data; referred to as the
number rectangle
• First result is essentially this number rectangle; if there are any nonnumeric values within the rectangle, they are replaced by NaN, the
built-in MATLAB name for something that is not a number
• Second result is all character data as strings in a cell array; numbers
encountered are given as empty strings
• Third result consists of cell arrays of both numbers and character
strings; missing values are assumed to be numeric and are assigned the
value, NaN
July 17, 2015
18
High-Level I/O Functions
• Excel spreadsheets
– Will likely want to write back to the file or to another new or
existing file
– Excel spreadsheets can be written using:
•
•
•
•
•
Xlswrite(<filename>, <array>, <sheet>, <range>)
Where <filename> is the name of the file
<array> is the data source, a cell array
<sheet> is the sheet name
<range> is the range of cells in Excel identify notation
July 17, 2015
19
High-Level I/O Functions
• Delimited Text Files – Numerical Data Only
– Data are frequently presented in text file form
– If data in a text file are all numerical values, MATLAB can
read the file directly into an array
– Necessary for data to be separated or delimited by commas,
spaces, or tab characters
– Numerical data of this type can be read using
• Dlmread(file, delimiter)
• Delimiter is a single character that ca be used to specify an unusual
delimiting character
• Function produces a numerical array containing the data values
• Array elements where data are not supplied are filled with zeros
July 17, 2015
20
High-Level I/O Functions
• Delimited Text Files – Numerical Data Only
– Exercise 8.2: Reading delimited files
– Smith text, page 191, bottom
– Listing 8.1: Sample delimited text file:
– Delimited data files can be written using:
•
•
•
•
dlmwrite( <filename>, <array>, <dlm>)
<filename> is the name of the file
<array> is the data source – a numerical array
<dlm> is the delimiting character; not specified is a comma (CSV)
July 17, 2015
21
Lower-Level File I/O
• Introduction
– You may encounter text files that cannot be read or written by
the higher level functions defined above
– MATLAB includes functions for general purpose reading and
writing of data files
– When we open these files we return a file handle
– A file handle is used by any functions employed in the reading
from and writing to the file
– Once the read and write activities have been completed, the
file must be closed
July 17, 2015
22
Lower-Level File I/O
• Opening and Closing Files
– To open a file for reading or writing:
• fh = fopen( <filename>, <purpose> )
• fh is a file handle used in subsequent function calls to identify the
particular I/O stream
• <filename> is the name of the file
• <purpose> is a string specifying the purpose for opening the file
– r – file must already exist
– w – file will be overwritten if it exists
– a – data will be appended to the file if it exists
– To close the file,
• fclose( fh )
July 17, 2015
23
Lower-Level File I/O
• Reading Text Files
– Three levels of support are provided when reading text files:
• Reading whole lines with or with out the new line character
• Parsing into tokens with delimiters
• Parsing into cell arrays using a format string
– To read a whole line including the new line character, use:
• str = fgets( fh );
• Will return each line as a string until the end of the file (EOF)
• Use fgetl(…) to leave out each new line character
– To parse each line into tokens (elementary text strings)
separated by white space delimiters, use a combination of
fgetl(…) and the tokenizer function:
• [tk, rest] = strtok( ln ); where tk is a string token, rest is the remainder
of the line, and ln is a string to be parsed into tokens
July 17, 2015
24
Lower-Level File I/O
• Reading Text Files
– To parse a line according to a specific format string into a cell
array, use:
• ca = textscan( fh, <format> ); where ca is the resulting cell arrray, fh is
the file handle, and <format> is a format control string we used for
sscanf(…). (Chapter 6)
July 17, 2015
25
Lower-Level File I/O
• Examples of Reading Text Files
– Listing 8.2 shows a script that will list any text file in the
Command window
Refer to notations
On Listing 8.2 on
page 193-194
of the Smith text
July 17, 2015
26
Lower-Level File I/O
• Examples of Reading Text Files
– Listing 8.3 shows the difference in output results between the
conventional listing script and the tokenizing lister
Refer to notations
On Listing 8.2 on
page 194
of the Smith text
July 17, 2015
27
Lower-Level File I/O
• Examples of Reading Text Files
– Exercise 8.3: Using file listers – illustrates both traditional and
tokenizer approaches to file listing
– Smith text, pages 194-195, bottom-top
July 17, 2015
28
Lower-Level File I/O
• Writing Text Files
– Must have file open
– The fprintf(…) function used to write to it by including its file
handle as the first parameter
– Listing 8.4 alters Listing 8.2, copys a text file instead of listing
it
Refer to notations
On Listing 8.2 on
page 195
of the Smith text
July 17, 2015
29
Engineering Example: Spreadsheet Data
• Adaptation of the structure assembly problem form
Chapter 7
• In this example the data are presented in a spreadsheet
as given here:
July 17, 2015
30
Engineering Example: Spreadsheet Data
• Start by considering the layout of the data
• Also consider the process necessary to extract what we
need
• Which of the three forms of data returned from
xlsread(…) for our use?
– Numerical data are not really important in this application
– Not exclusively a text processing problem either
– Will process the raw data provided by xlsread(…), giving bot
the string and numerical data
• Create a function the will read this file and produce the
same model/structure as in Chapter 7
July 17, 2015
31
Engineering Example: Spreadsheet Data
• Listing 8.5: Reading structure data
July 17, 2015
32
Engineering Example: Spreadsheet Data
• Observations on Listing 8.5
– Note at line 2 the function reads the spreadsheet and only
keeps the raw data
– In traversing the array, note that we begin with an offset that
ignores column 1 and row 1
– As the function cycles through the rows, it is important to
empy the array CONN before each pass to avoid “inheriting”
data from a previous row
– You can test this function by replacing the structure array
construction in lines 1-11 of Listing 7.7 in Chapter7 with the
following line:
– data = readStruct(‘Structure_array.xls’);
July 17, 2015
33