w11_1_INF280_Record_Storage.ppt

Download Report

Transcript w11_1_INF280_Record_Storage.ppt

Record Storage
Index-Files Structures
1.
2.
3.
4.
External Memory
Files by structure and by access
Index files: B+ trees
Performance measures:
•
•
D. Christozov
storage size
access time
INF 280 Database Systems
Record Storage/Index Files
1
External Memory
External Memory:
Random Access Memory – Disks (pseudo)
Sequential Access Memory – Tapes/Disks
Memory size:
measured in blocks
Physical addresses: cylinders, tracks, blocks
(in multi-volume DB also volume)
Time:
D. Christozov
measured in I/O operation
(a whole block is transfer in one I/O operation)
INF 280 Database Systems
Record Storage/Index Files
2
External Memory: disk mechanics
disk volume
cylinder
track
plate
D. Christozov
RW heads
INF 280 Database Systems
Record Storage/Index Files
blocks
or
sectors
3
External Memory: disk elements forming
physical address
Volume:
pack of disks, mounted as a separate device
in the computer system
Cylinder:
set of tracks on all surfaces of the plates, accessible
simultaneously in one loop of the pack by all
Read/Write Heads
Track:
storage, located on a single surface of a plate,
accessible in one loop of the pack by a single
Read/Write Head
Block/Sector: amount of data, transferred from the disk to the
primary memory in a single I/O operation (may
vary depending on the track – more external tracks
may hold more data compared with the more
internal tracks)
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
4
Field/Record/File
Field:
single, atomic, data object taken from given domain
Special fields: Primary key, Unique, Not Null, etc.
Record:
Composition of different fields.
Schema/Type is represented by the sequence of
fields, with their names, domains and other
properties;
State refer to a particular values of fields in the
record.
File:
Collection of different records of given type/schema
Assumption: Records in the file are ordered
according to the primary key.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
5
Files by Structure
Files with fixed-length records
SSN
5
6
0
3
Name
1
2
D i
m i
t
a
Age
r
C
5
Salary
1
2
0
0
0
.
0
.
0
0
Files with variable-length records
SSN
5
6
0
3
Name
1
2
D i
m i
t
a
Salary
r
C h
2
0
0
0
Fields’ separator
Records’ terminator
Files, allowing random order of fields in records
SSN=5603126307
D. Christozov
Name=Dimitar Christozov
Age=51
INF 280 Database Systems
Record Storage/Index Files
Salary=2000.00
6
Files by access: Unspanned vs. Spaned
Unspanned
Block i
Block i+1
Record 1
Record 2
Record 4
Record 5
Record 3
Record 6
idle
idle
Spanned
Block i
Record 1
Block i+1
Record 4 (rest)
D. Christozov
Record 2
Record 5
INF 280 Database Systems
Record Storage/Index Files
Record 3
Record 6
Record 4 P
Record 7
P
7
Files by access: direct access, sequential
Sequential access files:
to access a record all previous
records must be accessed.
Magnetic
Tape
Direct accessed files: a record is accessed immediately by
its address.
Chip (flash) memory allows literally direct access.
Disks allows pseudo-direct access, but with reasonable
small time to locate and access a data.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
8
Master file vs. Index files
Master files: files holding useful data
Index files: files used to facilitate the access to the data in
the master file.
An Index-File record usually has much smaller size than the
Master-File record. The number of blocks need to be accessed (I/O
operations) to locate searched information is smaller.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
9
Index files: primary key
Master file
Index file
Block 0
KR1
KR4
KR7
Block 1
KR10
KR13
KR17
…
…
Block i
KR3i+1
KR3i+4
KR3i+7
KR3k+1 KR3k+4
R2
R3
Block 1
R4
R5
R6
Block 2
R7
R8
R9
Block 3
R10
R11
R12
Block 4
…
…
…
R3i+1
R3i+2
…
…
…
Block i+1
…
…
…
…
…
…
…
…
R3n+1
R3n+2
R3i
…
Block k
R1
KRk+7
Master file is ordered by primary key.
R3n
…
Block i
Block n
Every block of the index file contains values
of the primary key of the first record in a Number of entries in the index =
block and physical address of the block. number of blocks in the master file.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
10
Index files: secondary key (unique field)
Master file
Index file
Block 0
KR1
KR4
KR7
Block 1
KR10
KR13
KR17
…
…
Block i
KR3i+1
KR3i+4
KR3i+7
KR3k+1 KR3k+4
R2
R3
Block 1
R4
R5
R6
Block 2
R7
R8
R9
Block 3
R10
R11
R12
Block 4
…
…
…
R3i+1
R3i+2
…
…
…
Block i+1
…
…
…
…
…
…
…
…
R3n+1
R3n+2
R3i
…
Block k
R1
KRk+7
Master file is NOT ordered by secondary key.
R3n
…
Block i
Block n
Every block of the index file contains values of
the secondary key and physical address of the Number of entries in the index =
number of records in the master file.
block where the associated record resides.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
11
Index files: cluster (duplicate field)
Master file
Index file: categories
Block 0
KR1
KR4
KR7
Block 1
KR10
KR13
KR17
…
…
Block i
KR3i+1
KR3i+4
KR3i+7
…
Block k
KR3k+1 KR3k+4
Clusters
R2
R3
Block 1
R4
R5
R6
Block 2
R7
R8
R9
Block 3
Block 4
A1
A2
A3
R10
R11
R12
A4
A5
A6
…
…
…
A7
A8
A9
R3i+1
R3i+2
A10
A11
A12
…
…
…
Block i+1
…
…
…
…
…
…
…
…
R3n+1
R3n+2
KRk+7
Index file holds different values (categories)
of given domain and address of a block cluster, where addresses of the records in
master files are stored.
D. Christozov
R1
INF 280 Database Systems
Record Storage/Index Files
R3i
R3n
…
Block i
Block n
12
Index files: B+ tree – structure of a B+ tree node
Ptr1
key1
Ptr2
key2
...
PtrM-1
keyM-1
PtrM
Key1 < Key 2 < … < Key M-1
M – order of
B+ tree
Most-left sub-tree
keys < Key 1
Most-right sub-tree
keys > Key M-1
Key 1 < keys < Key 2
B+-tree
Ptr1
key1
B+-tree
Ptr2
Data,
associated
with Key 1
D. Christozov
key2
...
B+-tree
B+-tree
Next
Data,
associated
with Key 2
Ptri
keyi
Ptrj
keyj
...
Next
Data,
associated
with Key i
INF 280 Database Systems
Record Storage/Index Files
13
Index files: B+ tree – structure of the tree
Internal node
Leaf level
B+ tree combines B-tree structure for internal nodes and linked-list for leaf-level.
Keys, used in internal nodes are duplicated in the leaves and only leaf-nodes holds
links to master file. In this way, the order of B+ tree is higher than the order of
similar B-tree. Link-list on leaf-level allows better performance in ‘between’ queries.
Data,
associated
with Key 1
D. Christozov
Data,
associated
with Key 2
Data,
associated
with Key 3
INF 280 Database Systems
Record Storage/Index Files
14
Performance measures: storage size
Storage is measured in blocks.
For a given file, required storage includes:
- Number of blocks for the master file.
Consider:
unspanned or spanned;
fixed length or variable length records
- Number of blocks for index files
– for every field you can construct index file
– the index for primary key, usually is created by default
– index files are organized as B+ trees
- This is just a preliminary assessment. It is not precise! The size
of the actual storage used depends also on distribution of the data
stored.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
15
Performance measures: storage size
Let a file is described by:
Size of a record – Sr;
Number of records – Nr;
Number of indexed fields – Ni and size of these fields {Si,i=1,Ni}
And disk by:
size of the block Sb and size of the physical address Spa in bytes.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
16
Performance measures: storage size – fixed length, unspanned
The storage is assessed by the following steps:
1.
2.
3.
4.
5.
Calculate number of records in a block – Nrb
Number of blocks of the master file: Nb = Nr / Nrb;
Number of blocks for the index file for primary key – the number of entries
in the index file is equal to the number of blocks in the master file;
The order M of B+ tree is calculated according to Sb, Spa, Si:
M*Spa + (M-1)*Si <= Sb, assume also 50% full nodes;
Calculate the number of blocks in the leaf-level: Nbll = Nb / (0.5*M);
Assess the number of levels L in the B+ tree by counting how many
times Nbll may be divided by (0.5*M) and assess the number of blocks
needed for internal nodes;
Number of blocks for the index file for a secondary key (unique field) – the
number of entries in the index file is equal to the number of records in the
master file;
Number of blocks for the index file in case of cluster-index – the number of
entries in the index file is equal to the number of different categories of
values in the domain of the field. Additionally, assessment is required for the
clusters – blocks holding physical addresses associated with particular value;
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
17
Performance measures: storage size – fixed length, spanned
The storage is assessed by the following steps:
1.
2.
3.
4.
Number of blocks of the master file: Nb = Nr*Sr / (Sb + Spa);
Number of blocks for the index file for primary key – the number of entries
in the index file is equal to the number of blocks in the master file;
The order M of B+ tree is calculated according to Sb, Spa, Si:
M*Spa + (M-1)*Si <= Sb, assume also 50% full nodes;
Calculate the number of blocks in the leaf-level: Nbll = Nb / (0.5*M);
Assess the number of levels L in the B+ tree by counting how many
times Nbll may be divided by (0.5*M) and assess the number of blocks
needed for internal nodes;
Number of blocks for the index file for a secondary key (unique field) – the
number of entries in the index file is equal to the number of records in the
master file;
Number of blocks for the index file in case of cluster-index – the number of
entries in the index file is equal to the number of different categories of
values in the domain of the field. Additionally, assessment is required for the
clusters – blocks holding physical addresses associated with particular value;
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
18
Performance measures: storage size – variable length, unspanned
The storage is assessed by the following steps:
1.
2.
3.
4.
5.
6.
Evaluate the average size of a record – Sra. Include the delimiters for fields
and for records;
Calculate the expected number of records in a block: Nrb=Sb/Sra;
Number of blocks of the master file: Nb = Nr / Nrb;
Number of blocks for the index file for primary key – the number of entries
in the index file is equal to the number of blocks in the master file;
The order M of B+ tree is calculated according to Sb, Spa, Si:
M*Spa + (M-1)*Si <= Sb, assume also 50% full nodes;
Calculate the number of blocks in the leaf-level: Nbll = Nb / (0.5*M);
Assess the number of levels L in the B+ tree by counting how many
times Nbll may be divided by (0.5*M) and assess the number of blocks
needed for internal nodes;
Number of blocks for the index file for a secondary key (unique field) – the
number of entries in the index file is equal to the number of records in the
master file;
Number of blocks for the index file in case of cluster-index – the number of
entries in the index file is equal to the number of different categories of
values in the domain of the field. Additionally, assessment is required for the
clusters – blocks holding physical addresses associated with particular value;
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
19
Performance measures: storage size – variable length, spanned
The storage is assessed by the following steps:
1.
2.
3.
4.
5.
6.
Evaluate the average size of a record – Sra. Include the delimiters for fields and
for records;
Number of blocks of the master file: Nb = Nr*Sra / (Sb + Spa);
Number of blocks of the master file: Nb = Nr / Nrb;
Number of blocks for the index file for primary key – the number of entries in the
index file is equal to the number of blocks in the master file;
The order M of B+ tree is calculated according to Sb, Spa, Si:
M*Spa + (M-1)*Si <= Sb, assume also 50% full nodes;
Calculate the number of blocks in the leaf-level: Nbll = Nb / (0.5*M);
Assess the number of levels L in the B+ tree by counting how many times
Nbll may be divided by (0.5*M) and assess the number of blocks needed for
internal nodes;
Number of blocks for the index file for a secondary key (unique field) – the
number of entries in the index file is equal to the number of records in the master
file;
Number of blocks for the index file in case of cluster-index – the number of
entries in the index file is equal to the number of different categories of values in
the domain of the field. Additionally, assessment is required for the clusters –
blocks holding physical addresses associated with particular value;
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
20
Performance measures: access/update time
Time is measured with the number of I/O operations
The actual access/update time depends also on hardware.
Evaluation of the following four operations are required:
1.
2.
3.
4.
Search
Insert
Delete
Modify
Two update strategies:
1. Immediate update of master file
2. Collect a batch of records to be updated and perform
update once for the whole batch.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
21
Performance measures: access/update time – only in master file
There are two cases: search by primary key (the file is ordered)
and search by a secondary key (not ordered). For update
operations only primary key case is interesting:
1. Search:
•
•
Primary key: O(log2Nb)
Secondary key: O(Nb)
by applying binary search
linear search;
2. Insert:
Search the place where to insert the new record – O(log2Nb). All records that
follows it has to be moved by one position back – O(Nb) (two I/O operations
for every block that follows the block where the new record has to be placed);
3. Delete
Search the record to be deleted – O(log2Nb). All records that follows it has to
be moved by one position foreword – O(Nb) (two I/O operations for every
block that follows the block where the new record has to be placed);
4. Modify
Search the record to be modified – O(log2Nb) + 2; additionally 2 I/O
operations – to read and write the block with modified record.
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
22
Performance measures: access/update time – using index
Update the index file + update the master file
•
Search:
L + 1 (L – search in B+ tree + 1 I/O – to read the master file)
•
Insert:
O(log2L) + 2 (Insert the key in the B+ tree and insert the record in the last
block of master file)
•
Delete
O(log2L) + 2 (Delete the key from the B+ tree and delete the record from the
master file – this will leave the block in the master file not full)
•
Modify
L + 2 (Search in the B+ tree and read/write the block holding the record of the
master file)
D. Christozov
INF 280 Database Systems
Record Storage/Index Files
23