슬라이드 1

Download Report

Transcript 슬라이드 1

Chap 5. Managing Files of Records
Chapter Objectives

Extend the file structure concepts of Chapter 4:
 Search keys and canonical forms
 Sequential search and Direct access
 Files access and file organization

Examine other kinds of the file structures in terms of
 Abstract data models
 Metadata
 Object-oriented file access
 Extensibility
 Examine issues of portability and standardization.
Contents
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record I/O Ops in a Single Class
5.4 File Access and File Organization
5.5 Beyond Record Structures
5.6 Portability and Standardization
Record Access
 Record Key
 Canonical form : a standard form of a key
– e.g. Ames or ames or AMES (need conversion)
 Distinct keys : uniquely identify a single record
 Primary keys, Secondary keys, Candidate keys
– Primary keys should be dataless (not updatable)
– Primary keys should be unchanging
 Social-securiy-number: good primary key
– but, 999-99-9999 for all non-registered aliens
Sequential Search (1)
 O(n), n : the number of records
 Use record blocking
• A block of several records
fields < records < blocks
• O(n), but blocking decreases the number of seeking
• e.g.- 4000 records, 512 bytes length
Unblocked (sector-sized buffers): 512byte size buffer
=> average 2000 READ() calls
Blocked (16 recs / block) : 8K size buffer
==> average 125 READ() call
Sequential Search (2)
 UNIX tools for sequential processing
 cat, wc, grep
 When sequential search is useful
 Searching patterns in ASCII files
 Processing files with few records
 Searching records with a certain secondary key value
Direct Access

O(1) operation

RRN ( Relative Record Number )
 It gives relative position of the record

Byte offset = N X R
 r : record size, n : RRN value
 In fixed length records

Class IOBuffer includes
 direct read (DRead)
 direct write (DWrite)
Choosing a record length and structure
• Record length is related to the size of the fields
• Access vs. fragmentaion vs. implementation
• Fixed length record
(a) With a fixed-length fields
(b) With a variable-length fields
Unused space portion is filled with null character in C
Ames
John
123 Maple
Stillwater
OK74075
Mason
Alan
90
Ada
OK74820
Eastgate
(a)
Ames|John|123 Maple|Stillwater|OK|74075|
Mason|Alan|90 Eastgate|Ada|OK|74820|
(b)
Unused space
Unused space
Header Records
 General information about file
 date and time of recent update, count of the num of records
 Header record is often placed at the beginning of the file
 Header records are a widely used, important file design tool
IO Buffer Class definition(1)
class IOBuffer
Abstract base class for file buffers
public :
virtual int Read( istream & ) = 0; // read a buffer from the stream
virtual int Write( ostream &) const = 0; // write a buffer to the stream
// these are the direct access read and write operations
virtual int DRead( istream &, int recref ); //read specified record
virtual int DWrite( ostream &, int recref ) const; // write specified record
// these header operations return the size of the header
virtual int ReadHeader ( istream & );
virtual int WriteHeader ( ostream &) const;
protected :
int Initialized ; // TRUE if buffer is initialized
char *Buffer; // character array to hold field values
IO Buffer Class definition(2)
 The full definition of buffer class hierarchy
 write method : adds header to a file and return the number of bytes in the
header
 read method : reads the header and check for consistency
 WriteHeader method : writes the string IOBuffer at the beginning of the
file.
 ReadHeader method : reads the record size from the header and checks
that its value is the same as that of the BufferSize member of the buffer
object
 DWrite/DRead methods : operates using the byte address of the record
as the record reference. Dread method begins by seeking to the requested
spot.
ReadHeader, WriteHeader Member Functions
static const char * headerStr = "IOBuffer";
static const int headerSize = strlen (headerStr);
int IOBuffer::ReadHeader (istream & stream)
{
char str[headerSize+1];
stream . seekg (0, ios::beg);
stream . read (str, headerSize);
if (! stream . good ()) return -1;
if (strncmp (str, headerStr, headerSize)==0) return headerSize;
else return -1;
}
int IOBuffer::WriteHeader (ostream & stream) const
{
stream . seekp (0, ios::beg);
stream . write (headerStr, headerSize);
if (! stream . good ()) return -1;
return headerSize;
}
WriteHeader Member Function – VariableLengthBuffer
int VariableLengthBuffer :: WriteHeader (ostream & stream) const
// write a buffer header to the beginning of the stream
// A header consists of the
//
IOBUFFER header
//
header string
//
Variable sized record of length fields
//
that describes the file records
{
int result;
// write the parent (IOBuffer) header
result = IOBuffer::WriteHeader (stream);
if (!result) return FALSE;
// write the header string
stream . write (headerStr, headerSize);
if (!stream . good ()) return FALSE;
// write the record description
return stream . tellp();
}
ReadHeader Member Function - VariableLengthBuffer
int VariableLengthBuffer :: ReadHeader (istream & stream)
// read the header and check for consistency
{
char str[headerSize+1];
int result;
// read the IOBuffer header
result = IOBuffer::ReadHeader (stream);
if (!result) return FALSE;
// read the header string
stream . read (str, headerSize);
if (!stream.good()) return FALSE;
if (strncmp (str, headerStr, headerSize) != 0) return FALSE;
// read and check the record description
return stream . tellg ();
}
Read Member Function - VariableLengthBuffer
int VariableLengthBuffer :: Read (istream & stream)
// write the number of bytes in the buffer field definitions
// the record length is represented by an unsigned short value
{
if (stream.eof()) return -1;
int recaddr = stream . tellg ();
Clear ();
unsigned short bufferSize;
stream . read ((char *)&bufferSize, sizeof(bufferSize));
if (! stream . good ()){stream.clear(); return -1;}
BufferSize = bufferSize;
if (BufferSize > MaxBytes) return -1; // buffer overflow
stream . read (Buffer, BufferSize);
if (! stream . good ()){stream.clear(); return -1;}
return recaddr;
}
Write Member Function - VariableLengthBuffer
int VariableLengthBuffer :: Write (ostream & stream) const
// write the length and buffer into the stream
{
int recaddr = stream . tellp ();
unsigned short bufferSize;
bufferSize = BufferSize;
stream . write ((char *)&bufferSize, sizeof(bufferSize));
if (!stream) return -1;
stream . write (Buffer, BufferSize);
if (! stream . good ()) return -1;
return recaddr;
}
DRead, DWrite Member Functions
int IOBuffer::DRead (istream & stream, int recref)
// read specified record
{
stream . seekg (recref, ios::beg);
if (stream . tellg () != recref) return -1;
return Read (stream);
}
int IOBuffer::DWrite (ostream & stream, int recref) const
// write specified record
{
stream . seekp (recref, ios::beg);
if (stream . tellp () != recref) return -1;
return Write (stream);
}
Encapsulation Record I/O Ops in a Single Class(1)

Good design for making objects persistent
 provide operation to read and write objects directly

Write operation until now :
 two operation : pack into a buffer + write the buffer to a file

Class ‘RecordFile’
 supports a read operation that takes an object of some class and writes it to a
file.
 the use of buffers is hidden inside the class
 problem with defining class ‘RecordFile’:
– how to make it possible to support files for different object types without
needing different versions of the class
BufferFile Class Definition
class BufferFile // file with buffers
{ public:
BufferFile (IOBuffer &); // create with a buffer
int Open(char * fname, int MODE); // open an existing file
int Create (char * fname, int MODE); // create a new file
int Close();
int Rewind(); // reset to the first data record
// Input and Output operations
int Read(int recaddr = -1);
int Write(int recaddr = -1);
int Append(); // write the current buffer at the end of file
protected:
IOBuffer & Buffer; // reference to the file’s buffer
fstream File; // the C++ stream of the file
};
Usage: DelimFieldBuffer buffer;
BufferFile file(buffer);
file.open(myfile);
file.Read();
buffer.Unpack(myobject);
Encapsulation Record I/O Operation in a Single Class(2)

Class ‘RecordFile’
 uses C++ template features to solve the problem
 definition of the template class RecordFile
template <class RecType>
class RecordFile : public BufferFile
{
public:
int Read(RecType& record, int recaddr = -1);
int Write(const RecType& record, int recaddr = -1 );
RecordFile(IOBuffer& buffer) : BufferFile(buffer) { }
};
// template method bodies
template <class RecType>
int RecordFile<RecType>::Read (RecType & record, int recaddr = -1)
{ int writeAdd, result;
writeAddr = BufferFile::Read (recaddr);
if (!writeAddr) return -1; result = record.Unpack(Buffer);
if (!result) return -1;
return writeAddr;
}
template <class RecType>
int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1)
{ int result;
result = record . Pack (Buffer);
if (!result) return -1;
return BufferFile::Write (recaddr);
}
File Access and File Organization
There is difference between file access and file organization.
File Organization
Variable-length Records
Fixed-length records
File Access
Sequential access
Direct access
Variable-length records
Sequential access is suitable
Fixed-length records
Direct access and sequential access are possible
Abstract Data Model

Data object such as document, images, sound
 e.g. color raster images, FITS image file

Abstract Data Model does not view data as it appears on a particular medium.
 application-oriented view

Headers and Self-describing files
Metadata

Data that describe the primary data in a file

A place to store metadata in a file is the header record

Standard format
 FITS (Flexible Image Transport System) by International
Astronomers’ union (see Figure 5.7)
Mixing object Types in a file

Each field is identified using “keyword = value”

Index table with tags
 e.g.
Object-oriented file access

Separate translating to and from the physical format and application (representationindependent file access)
Program find_star
:
read_image(“star1”, image)
process image
:
end find_star
star1
image :
star2
RAM
Disk
Extensibility

Advantage of using tags
 Identify object within files is that we do not have to know a priori what
all of the objects will look like

When we encounter new type of object, we implement method for reading and
writing that object and add the method.
Factor affecting Portability

Differences among operating system

Differences among language

Differences in machine architecture

Differences on platforms
 EBCDIC and ASCII
Achieving Portability (1)
 Standardization
 Standard physical record format
– extensible, simple
 Standard binary encoding for data elements
– IEEE, XDR
 File structure conversion
 Number and text conversion
Achieving Portability (2)

File system difference
 Block size is 512 bytes on UNIX systems
 Block size is 2880 bytes on non-UNIX systems

UNIX and Portability
 UNIX support portability by being commonly available on a large
number of platforms
 UNIX provides a utility called dd
– dd : facilitates data conversion
Portability

화일 공유
 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능
 이식성 (Portability) 과 표준화 (Standardization)

이식성에 영향을 주는 요인들
 두 회사가 화일을 공유
– A 회사: sun 컴퓨터, C 프로그래밍, B 회사: IBM PC 에서 Turbo
PASCAL 프로그래밍
 운영체제 사이의 차이점들
– 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할
수 있음
 프로그래밍 언어들 사이의 차이점들
Portability
 이식성의 달성
 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름
– 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으
로 같게 표현되는 것
– ex) FITS
 데이터 요소를 위한 표준 이진 코드화에 동의
– 기본적 데이터 요소 : 텍스트, 숫자
– ex) IEEE 표준형식과 XDR
Portability
 변환 1: 직접 변환 형태
IBM
VAX
Cray
Sun 3
IBM PC
IBM
VAX
Cray
Sun 3
IBM PC
 변환 2 : 중간 표준 형태
IBM
VAX
Cray
Sun 3
IBM PC
XDR
IBM
VAX
Cray
Sun 3
IBM PC
Let’s Review !!!
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record I/O Ops in a Single Class
5.4 File Access and File Organization
5.5 Beyond Record Structures
5.6 Portability and Standardization