Manajemen Basis Data Pertemuan 3 Matakuliah : M0264/Manajemen Basis Data

Download Report

Transcript Manajemen Basis Data Pertemuan 3 Matakuliah : M0264/Manajemen Basis Data

Matakuliah
Tahun
: M0264/Manajemen Basis Data
: 2008
Manajemen Basis Data
Pertemuan 3
Objectives
• Data on External Storage (Data pada penyimpanan
eksternal)
• File Organization (Organisasi File)
• Indexing (Pengindeksan)
Bina Nusantara
Data on External Storage
• Disk
The most important external storage devices.
• Tapes
Sequential access devices and force us to read data
one page after the other.
Bina Nusantara
File Organization
• Many alternatives exist, each ideal for some situation ,
and not so good in others:
–
–
–
Bina Nusantara
Heap files: Suitable when typical access is a file scan retrieving
all records.
Sorted Files: Best if records must be retrieved in some order, or
only a `range’ of records is needed.
Hashed Files: Good for equality selections.
• File is a collection of buckets. Bucket = primary page plus
zero or more overflow pages.
• Hashing function h: h(r) = bucket in which record r belongs.
h looks at only some of the fields of r, called the search
fields.
File Organization
Cost Model for Our Analysis. We ignore CPU costs, for
simplicity:
–
–
–
–
–
Bina Nusantara
B: The number of data pages
R: Number of records per page
D: (Average) time to read or write disk page
Measuring number of page I/O’s ignores gains of pre-fetching
blocks of pages; thus, even I/O cost is only approximated.
Average-case analysis; based on several simplistic assumptions.
File Organization
• Single record insert and delete.
• Heap Files:
–
–
Equality selection on key; exactly one match.
Insert always at end of file.
• Sorted Files:
–
–
Files compacted after deletions.
Selections on sort field(s).
• Hashed Files:
–
Bina Nusantara
No overflow buckets, 80% page occupancy.
Indexing
• There are three main alternatives for what to store as a
data entry in a index :
– A data entry k* is an actual data record (with search key value k).
– A data entry is a (k, rid), where rid is the record id of a data
record with search key value k.
– A data entry is a (k, rid-list) pair, where rid-list is a list of records
ids of data records with search key value k.
Bina Nusantara
Indexing
• An index on a file speeds up selections on the search
key fields for the index.
–
–
Any subset of the fields of a relation can be the search key for an
index on the relation.
Search key is not the same as key (minimal set of fields that
uniquely identify a record in a relation).
• An index contains a collection of data entries, and
supports efficient retrieval of all data entries k* with a
given key value k.
Bina Nusantara
Indexing
Clustered vs. Unclustered Index
Bina Nusantara