Transcript 슬라이드 1
Chap 5. Managing Files of Records
Chapter Objectives
Extend the file structure concepts of Chapter 4:
Search keys and canonical forms
Sequential search and Direct access
Files access and file organization
Examine other kinds of the file structures in terms of
Abstract data models
Metadata
Object-oriented file access
Extensibility
Examine issues of portability and standardization.
Contents
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record I/O Ops in a Single Class
5.4 File Access and File Organization
5.5 Beyond Record Structures
5.6 Portability and Standardization
Record Access
Record Key
Canonical form : a standard form of a key
– e.g. Ames or ames or AMES (need conversion)
Distinct keys : uniquely identify a single record
Primary keys, Secondary keys, Candidate keys
– Primary keys should be dataless (not updatable)
– Primary keys should be unchanging
Social-securiy-number: good primary key
– but, 999-99-9999 for all non-registered aliens
Sequential Search (1)
O(n), n : the number of records
Use record blocking
• A block of several records
fields < records < blocks
• O(n), but blocking decreases the number of seeking
• e.g.- 4000 records, 512 bytes length
Unblocked (sector-sized buffers): 512byte size buffer
=> average 2000 READ() calls
Blocked (16 recs / block) : 8K size buffer
==> average 125 READ() call
Sequential Search (2)
UNIX tools for sequential processing
cat, wc, grep
When sequential search is useful
Searching patterns in ASCII files
Processing files with few records
Searching records with a certain secondary key value
Direct Access
O(1) operation
RRN ( Relative Record Number )
It gives relative position of the record
Byte offset = N X R
r : record size, n : RRN value
In fixed length records
Class IOBuffer includes
direct read (DRead)
direct write (DWrite)
Choosing a record length and structure
• Record length is related to the size of the fields
• Access vs. fragmentaion vs. implementation
• Fixed length record
(a) With a fixed-length fields
(b) With a variable-length fields
Unused space portion is filled with null character in C
Ames
John
123 Maple
Stillwater
OK74075
Mason
Alan
90
Ada
OK74820
Eastgate
(a)
Ames|John|123 Maple|Stillwater|OK|74075|
Mason|Alan|90 Eastgate|Ada|OK|74820|
(b)
Unused space
Unused space
Header Records
General information about file
date and time of recent update, count of the num of records
Header record is often placed at the beginning of the file
Header records are a widely used, important file design tool
IO Buffer Class definition(1)
class IOBuffer
Abstract base class for file buffers
public :
virtual int Read( istream & ) = 0; // read a buffer from the stream
virtual int Write( ostream &) const = 0; // write a buffer to the stream
// these are the direct access read and write operations
virtual int DRead( istream &, int recref ); //read specified record
virtual int DWrite( ostream &, int recref ) const; // write specified record
// these header operations return the size of the header
virtual int ReadHeader ( istream & );
virtual int WriteHeader ( ostream &) const;
protected :
int Initialized ; // TRUE if buffer is initialized
char *Buffer; // character array to hold field values
IO Buffer Class definition(2)
The full definition of buffer class hierarchy
write method : adds header to a file and return the number of bytes in the
header
read method : reads the header and check for consistency
WriteHeader method : writes the string IOBuffer at the beginning of the
file.
ReadHeader method : reads the record size from the header and checks
that its value is the same as that of the BufferSize member of the buffer
object
DWrite/DRead methods : operates using the byte address of the record
as the record reference. Dread method begins by seeking to the requested
spot.
ReadHeader, WriteHeader Member Functions
static const char * headerStr = "IOBuffer";
static const int headerSize = strlen (headerStr);
int IOBuffer::ReadHeader (istream & stream)
{
char str[headerSize+1];
stream . seekg (0, ios::beg);
stream . read (str, headerSize);
if (! stream . good ()) return -1;
if (strncmp (str, headerStr, headerSize)==0) return headerSize;
else return -1;
}
int IOBuffer::WriteHeader (ostream & stream) const
{
stream . seekp (0, ios::beg);
stream . write (headerStr, headerSize);
if (! stream . good ()) return -1;
return headerSize;
}
WriteHeader Member Function – VariableLengthBuffer
int VariableLengthBuffer :: WriteHeader (ostream & stream) const
// write a buffer header to the beginning of the stream
// A header consists of the
//
IOBUFFER header
//
header string
//
Variable sized record of length fields
//
that describes the file records
{
int result;
// write the parent (IOBuffer) header
result = IOBuffer::WriteHeader (stream);
if (!result) return FALSE;
// write the header string
stream . write (headerStr, headerSize);
if (!stream . good ()) return FALSE;
// write the record description
return stream . tellp();
}
ReadHeader Member Function - VariableLengthBuffer
int VariableLengthBuffer :: ReadHeader (istream & stream)
// read the header and check for consistency
{
char str[headerSize+1];
int result;
// read the IOBuffer header
result = IOBuffer::ReadHeader (stream);
if (!result) return FALSE;
// read the header string
stream . read (str, headerSize);
if (!stream.good()) return FALSE;
if (strncmp (str, headerStr, headerSize) != 0) return FALSE;
// read and check the record description
return stream . tellg ();
}
Read Member Function - VariableLengthBuffer
int VariableLengthBuffer :: Read (istream & stream)
// write the number of bytes in the buffer field definitions
// the record length is represented by an unsigned short value
{
if (stream.eof()) return -1;
int recaddr = stream . tellg ();
Clear ();
unsigned short bufferSize;
stream . read ((char *)&bufferSize, sizeof(bufferSize));
if (! stream . good ()){stream.clear(); return -1;}
BufferSize = bufferSize;
if (BufferSize > MaxBytes) return -1; // buffer overflow
stream . read (Buffer, BufferSize);
if (! stream . good ()){stream.clear(); return -1;}
return recaddr;
}
Write Member Function - VariableLengthBuffer
int VariableLengthBuffer :: Write (ostream & stream) const
// write the length and buffer into the stream
{
int recaddr = stream . tellp ();
unsigned short bufferSize;
bufferSize = BufferSize;
stream . write ((char *)&bufferSize, sizeof(bufferSize));
if (!stream) return -1;
stream . write (Buffer, BufferSize);
if (! stream . good ()) return -1;
return recaddr;
}
DRead, DWrite Member Functions
int IOBuffer::DRead (istream & stream, int recref)
// read specified record
{
stream . seekg (recref, ios::beg);
if (stream . tellg () != recref) return -1;
return Read (stream);
}
int IOBuffer::DWrite (ostream & stream, int recref) const
// write specified record
{
stream . seekp (recref, ios::beg);
if (stream . tellp () != recref) return -1;
return Write (stream);
}
Encapsulation Record I/O Ops in a Single Class(1)
Good design for making objects persistent
provide operation to read and write objects directly
Write operation until now :
two operation : pack into a buffer + write the buffer to a file
Class ‘RecordFile’
supports a read operation that takes an object of some class and writes it to a
file.
the use of buffers is hidden inside the class
problem with defining class ‘RecordFile’:
– how to make it possible to support files for different object types without
needing different versions of the class
BufferFile Class Definition
class BufferFile // file with buffers
{ public:
BufferFile (IOBuffer &); // create with a buffer
int Open(char * fname, int MODE); // open an existing file
int Create (char * fname, int MODE); // create a new file
int Close();
int Rewind(); // reset to the first data record
// Input and Output operations
int Read(int recaddr = -1);
int Write(int recaddr = -1);
int Append(); // write the current buffer at the end of file
protected:
IOBuffer & Buffer; // reference to the file’s buffer
fstream File; // the C++ stream of the file
};
Usage: DelimFieldBuffer buffer;
BufferFile file(buffer);
file.open(myfile);
file.Read();
buffer.Unpack(myobject);
Encapsulation Record I/O Operation in a Single Class(2)
Class ‘RecordFile’
uses C++ template features to solve the problem
definition of the template class RecordFile
template <class RecType>
class RecordFile : public BufferFile
{
public:
int Read(RecType& record, int recaddr = -1);
int Write(const RecType& record, int recaddr = -1 );
RecordFile(IOBuffer& buffer) : BufferFile(buffer) { }
};
// template method bodies
template <class RecType>
int RecordFile<RecType>::Read (RecType & record, int recaddr = -1)
{ int writeAdd, result;
writeAddr = BufferFile::Read (recaddr);
if (!writeAddr) return -1; result = record.Unpack(Buffer);
if (!result) return -1;
return writeAddr;
}
template <class RecType>
int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1)
{ int result;
result = record . Pack (Buffer);
if (!result) return -1;
return BufferFile::Write (recaddr);
}
File Access and File Organization
There is difference between file access and file organization.
File Organization
Variable-length Records
Fixed-length records
File Access
Sequential access
Direct access
Variable-length records
Sequential access is suitable
Fixed-length records
Direct access and sequential access are possible
Abstract Data Model
Data object such as document, images, sound
e.g. color raster images, FITS image file
Abstract Data Model does not view data as it appears on a particular medium.
application-oriented view
Headers and Self-describing files
Metadata
Data that describe the primary data in a file
A place to store metadata in a file is the header record
Standard format
FITS (Flexible Image Transport System) by International
Astronomers’ union (see Figure 5.7)
Mixing object Types in a file
Each field is identified using “keyword = value”
Index table with tags
e.g.
Object-oriented file access
Separate translating to and from the physical format and application (representationindependent file access)
Program find_star
:
read_image(“star1”, image)
process image
:
end find_star
star1
image :
star2
RAM
Disk
Extensibility
Advantage of using tags
Identify object within files is that we do not have to know a priori what
all of the objects will look like
When we encounter new type of object, we implement method for reading and
writing that object and add the method.
Factor affecting Portability
Differences among operating system
Differences among language
Differences in machine architecture
Differences on platforms
EBCDIC and ASCII
Achieving Portability (1)
Standardization
Standard physical record format
– extensible, simple
Standard binary encoding for data elements
– IEEE, XDR
File structure conversion
Number and text conversion
Achieving Portability (2)
File system difference
Block size is 512 bytes on UNIX systems
Block size is 2880 bytes on non-UNIX systems
UNIX and Portability
UNIX support portability by being commonly available on a large
number of platforms
UNIX provides a utility called dd
– dd : facilitates data conversion
Portability
화일 공유
화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능
이식성 (Portability) 과 표준화 (Standardization)
이식성에 영향을 주는 요인들
두 회사가 화일을 공유
– A 회사: sun 컴퓨터, C 프로그래밍, B 회사: IBM PC 에서 Turbo
PASCAL 프로그래밍
운영체제 사이의 차이점들
– 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할
수 있음
프로그래밍 언어들 사이의 차이점들
Portability
이식성의 달성
표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름
– 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으
로 같게 표현되는 것
– ex) FITS
데이터 요소를 위한 표준 이진 코드화에 동의
– 기본적 데이터 요소 : 텍스트, 숫자
– ex) IEEE 표준형식과 XDR
Portability
변환 1: 직접 변환 형태
IBM
VAX
Cray
Sun 3
IBM PC
IBM
VAX
Cray
Sun 3
IBM PC
변환 2 : 중간 표준 형태
IBM
VAX
Cray
Sun 3
IBM PC
XDR
IBM
VAX
Cray
Sun 3
IBM PC
Let’s Review !!!
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record I/O Ops in a Single Class
5.4 File Access and File Organization
5.5 Beyond Record Structures
5.6 Portability and Standardization