SECTIONS 13.1 – 13.3

Download Report

Transcript SECTIONS 13.1 – 13.3

1
SECONDARY STORAGE
MANAGEMENT
SECTIONS 13.1 – 13.3
Sanuja Dabade & Eilbroun Benjamin
CS 257 – Dr. TY Lin
Presentation Outline
2

13.1 The Memory Hierarchy
13.1.1 The Memory Hierarchy
 13.1.2 Transfer of Data Between Levels
 13.1.3 Volatile and Nonvolatile Storage
 13.1.4 Virtual Memory


13.2 Disks
13.2.1 Mechanics of Disks
 13.2.2 The Disk Controller
 13.2.3 Disk Access Characteristics

Presentation Outline (con’t)
3

13.3 Accelerating Access to Secondary
Storage
13.3.1 The I/O Model of Computation
 13.3.2 Organizing Data by Cylinders
 13.3.3 Using Multiple Disks
 13.3.4 Mirroring Disks
 13.3.5 Disk Scheduling and the Elevator
Algorithm
 13.3.6 Prefetching and Large-Scale
Buffering

13.1.1 Memory Hierarchy
4



Several components for data storage having
different data capacities available
Cost per byte to store data also varies
Device with smallest capacity offer the fastest
speed with highest cost per bit
Memory Hierarchy Diagram
5
Programs,
DBMS
Main Memory DBMS’s
As Visual Memory
Tertiary Storage
Disk
Main Memory
Cache
File System
13.1.1 Memory Hierarchy
6

Cache
 Lowest
level of the hierarchy
 Data items are copies of certain locations of main
memory
 Sometimes, values in cache are changed and
corresponding changes to main memory are delayed
 Machine looks for instructions as well as data for those
instructions in the cache
 Holds limited amount of data
13.1.1 Memory Hierarchy (con’t)
7


No need to update the data in main memory
immediately in a single processor computer
In multiple processors data is updated immediately
to main memory….called as write through
Main Memory
8


Everything happens in the computer i.e. instruction
execution, data manipulation, as working on
information that is resident in main memory
Main memories are random access….one can
obtain any byte in the same amount of time
Secondary storage
9



Used to store data and programs when they are not
being processed
More permanent than main memory, as data and
programs are retained when the power is turned
off
E.g. magnetic disks, hard disks
Tertiary Storage
10


Holds data volumes in terabytes
Used for databases much larger than what can be
stored on disk
13.1.2 Transfer of Data Between levels
11




Data moves between adjacent levels of the
hierarchy
At the secondary or tertiary levels accessing the
desired data or finding the desired place to store
the data takes a lot of time
Disk is organized into bocks
Entire blocks are moved to and from memory called
a buffer
13.1.2 Transfer of Data Between level
(cont’d)
12


A key technique for speeding up database
operations is to arrange the data so that when one
piece of data block is needed it is likely that other
data on the same block will be needed at the same
time
Same idea applies to other hierarchy levels
13.1.3 Volatile and Non Volatile
Storage
13



A volatile device forgets what data is stored on it
after power off
Non volatile holds data for longer period even
when device is turned off
All the secondary and tertiary devices are non
volatile and main memory is volatile
13.1.4 Virtual Memory
14



Typical software executes in virtual memory
Address space is typically 32 bit or 232 bytes or
4GB
Transfer between memory and disk is in terms of
blocks
13.2.1 Mechanism of Disk
15

Mechanisms of Disks
 Use
of secondary storage is one of the important
characteristic of DBMS
 Consists of 2 moving pieces of a disk
 1.
disk assembly
 2. head assembly
 Disk
assembly consists of 1 or more platters
 Platters rotate around a central spindle
 Bits are stored on upper and lower surfaces of platters
13.2.1 Mechanism of Disk
16




Disk is organized into tracks
The track that are at fixed radius from center form
one cylinder
Tracks are organized into sectors
Tracks are the segments of circle separated by gap
17
13.2.2 Disk Controller
18


One or more disks are controlled by disk controllers
Disks controllers are capable of
 Controlling
the mechanical actuator that moves the
head assembly
 Selecting the sector from among all those in the cylinder
at which heads are positioned
 Transferring bits between desired sector and main
memory
 Possible buffering an entire track
13.2.3 Disk Access Characteristics
19

Accessing (reading/writing) a block requires 3 steps
 Disk
controller positions the head assembly at the
cylinder containing the track on which the block is
located. It is a ‘seek time’
 The disk controller waits while the first sector of the
block moves under the head. This is a ‘rotational
latency’
 All the sectors and the gaps between them pass the
head, while disk controller reads or writes data in these
sectors. This is a ‘transfer time’
13.3 Accelerating Access to Secondary
Storage
20

Several approaches for more-efficiently accessing
data in secondary storage:
Place blocks that are together in the same cylinder.
 Divide the data among multiple disks.
 Mirror disks.
 Use disk-scheduling algorithms.
 Prefetch blocks into main memory.



Scheduling Latency – added delay in accessing
data caused by a disk scheduling algorithm.
Throughput – the number of disk accesses per
second that the system can accommodate.
13.3.1 The I/O Model of Computation
21

The number of block accesses (Disk I/O’s) is a good
time approximation for the algorithm.


This should be minimized.
Ex 13.3: You want to have an index on R to identify
the block on which the desired tuple appears, but
not where on the block it resides.
For Megatron 747 (M747) example, it takes 11ms to
read a 16k block.
 A standard microprocessor can execute millions of
instruction in 11ms, making any delay in searching for
the desired tuple negligible.

13.3.2 Organizing Data by Cylinders
22


If we read all blocks on a single track or cylinder
consecutively, then we can neglect all but first seek
time and first rotational latency.
Ex 13.4: We request 1024 blocks of M747.
If data is randomly distributed, average latency is
10.76ms by Ex 13.2, making total latency 11s.
 If all blocks are consecutively stored on 1 cylinder:

 6.46ms
+ 8.33ms * 16 = 139ms
(1 average seek)
(time per rotation)
(# rotations)
13.3.3 Using Multiple Disks
23


If we have n disks, read/write performance will
increase by a factor of n.
Striping – distributing a relation across multiple disks
following this pattern:




Data on disk R1: R1, R1+n, R1+2n,…
Data on disk R2: R2, R2+n, R2+2n,…
…
Data on disk Rn: Rn, Rn+n, Rn+2n, …
Ex 13.5: We request 1024 blocks with n = 4.

6.46ms + (8.33ms * (16/4)) = 39.8ms
(1 average seek)
(time per rotation)
(# rotations)
13.3.4 Mirroring Disks
24




Mirroring Disks – having 2 or more disks hold
identical copied of data.
Benefit 1: If n disks are mirrors of each other, the
system can survive a crash by n-1 disks.
Benefit 2: If we have n disks, read performance
increases by a factor of n.
Performance increases further by having the
controller select the disk which has its head closest
to desired data block for each read.
13.3.5 Disk Scheduling and the
Elevator Problem
25


Disk controller will run this algorithm to select which
of several requests to process first.
Pseudo code:
requests[] // array of all non-processed data requests
 upon receiving new data request:

 requests[].add(new

request)
while(requests[] is not empty)
 move
head to next location
 if(head location is at data in requests[])


retrieve data
remove data from requests[]
 if(head

reaches end)
reverse head direction
13.3.5 Disk Scheduling and the
Elevator Problem (con’t)
26
Events:
Head starting point
Request data at 8000
Request data at 24000
Request data at 56000
Get data at 8000
Request data at 16000
Get data at 24000
Request data at 64000
Get data at 56000
Request Data at 40000
Get data at 64000
Get data at 40000
Get data at 16000
64000
56000
48000
40000
32000
24000
16000
8000
Current time
13.6
26.9
34.2
45.5
56.8
4.3
10
20
30
0
data
time
8000..
4.3
24000..
13.6
56000..
26.9
64000..
34.2
40000..
45.5
16000..
56.8
13.3.5 Disk Scheduling and the
Elevator Problem (con’t)
27
Elevator
Algorithm
FIFO
Algorithm
data
time
data
time
8000..
4.3
8000..
4.3
24000..
13.6
24000..
13.6
56000..
26.9
56000..
26.9
64000..
34.2
16000..
42.2
40000..
45.5
64000..
59.5
16000..
56.8
40000..
70.8
13.3.6 Prefetching and Large-Scale
Buffering
28

If at the application level, we can predict the order
blocks will be requested, we can load them into
main memory before they are needed.
29
-
Disk failure ways and their mitigation
13.4
By Priya Gangaraju and Xiaqing He
Ways in which disks can fail:
30

Intermittent failure

Media Decay

Write failure

Disk Crash
Intermittent Failures
31



Read or write operation on a sector successful not
on first try, but after repeated tries.
The most common form of failure.
Parity checks can be used to detect this kind of
failure.
Media Decay
32

Serious form of failure.

Bit/Bits are permanently corrupted.


Impossible to read a sector correctly even after
many trials.
Stable storage technique for organizing a disk is
used to avoid this failure.
Write failure
33




Attempt to write a sector is not possible.
Attempt to retrieve previously written sector is
unsuccessful.
Possible reason – power outage while writing of the
sector.
Stable Storage Technique can be used to avoid this.
Disk Crash
34
•
•
•
Most serious form of disk failure.
Entire disk becomes unreadable, suddenly and
permanently.
RAID techniques can be used for coping with disk
crashes.
More on Intermittent failures…
35


When we try to read a sector, but the correct
content of that sector is not delivered to the disk
controller.
If the controller has a way to tell that the sector is
good or bad (checksums), it can then reissue the
read request when bad data is read.
More on Intermittent Failures..
36



The controller can attempt to write a sector, but the
contents of the sector are not what was intended.
The only way to check this is to let the disk go
around again read the sector.
One way to perform the check is to read the sector
and compare it with the sector we intend to write.
Continued
37


Instead of performing the complete comparison at
the disk controller, simpler way is to read the sector
and see if a good sector was read.
If it is good sector, then the write was correct
otherwise the write was unsuccessful and must be
repeated.
Checksums
38



Technique used to determine the good/bad status
of a sector.
Each sector has some additional bits called the
checksum that are set depending on the values of
the data bits in that sector.
If checksum is not proper on reading, then there is
an error in reading.
More on Checksums…
39


There is a small chance that the block was not read
correctly even if the checksum is proper.
The probability of correctness can be increased by
using many checksum bits.
Checksum calculation..
40



Checksum is based on the parity of all bits in the
sector.
If there are odd number of 1’s among a collection
of bits, the bits are said to have odd parity. A
parity bit ‘1’ is added.
If there are even number of 1’s then the collection
of bits is said to have even parity. A parity bit ‘0’ is
added.
Continued
41



The number of 1’s among a collection of bits and
their parity bit is always even.
During a write operation, the disk controller
calculates the parity bit and append it to the
sequence of bits written in the sector.
Every sector will have a even parity.
Examples…
42


A sequence of bits 01101000 has odd number of
1’s. The parity bit will be 1. So the sequence with
the parity bit will now be 011010001.
A sequence of bits 11101110 will have an even
parity as it has even number of 1’s. So with the
parity bit 0, the sequence will be 111011100.
Continued
43


Any one-bit error in reading or writing the bits
results in a sequence of bits that has odd-parity.
The disk controller can count the number of 1’s and
can determine if the sector has odd parity in the
presence of an error.
Odds…
44



There are chances that more than one bit can be
corrupted and the error can be unnoticed.
Increasing the number of parity bits can increase
the chances of detecting errors.
In general, if there are n independent bits as
checksum, the chances of error will be one in 2n.
Stable Storage
45



Checksums can detect the error but cannot correct it.
Sometimes we overwrite the previous contents of a
sector and yet cannot read the new contents
correctly.
To deal with these problems, Stable Storage policy
can be implemented on the disks.
Continued
46


Sectors are paired and each pair represents one
sector-contents X.
The left copy of the sector may be represented as
XL and XR as the right copy.
Assumptions
47


We assume that copies are written with sufficient
number of parity bits to decrease the chance of
bad sector looks good when the parity checks are
considered.
Also, If the read function returns a good value w for
either XL or XR then it is assumed that w is the true
value of X.
Stable -Storage Writing Policy:
48
1.
2.
Write the value of X into XL. Check the value has
status “good”; i.e., the parity-check bits are
correct in the written copy. If not repeat write. If
after a set number of write attempts, we have not
successfully written X in XL, assume that there is a
media failure in this sector. A fix-up such as
substituting a spare sector for XL must be adopted.
Repeat (1) for XR.
Stable-Storage Reading Policy:
49


The policy is to alternate trying to read XL and XR
until a good value is returned.
If a good value is not returned after pre chosen
number of tries, then it is assumed that X is truly
unreadable.
Error-Handling capabilities:
50
Media failures:
•
If after storing X in sectors XL and XR, one of them
undergoes media failure and becomes permanently
unreadable, we can read from the second one.
• If both the sectors have failed to read, then sector X
cannot be read.
• The probability of both failing is extremely small.
51
Write Failure:
• When writing X, if there is a system failure(like
power shortage), the X in the main memory is lost
and the copy of X being written will be erroneous.
•
Half of the sector may be written with part of new
value of X, while the other half remains as it was.
52

1.
2.
The possible cases when the system becomes
available:
The failure occurred when writing to XL. Then XL is
considered bad. Since XR was never changed, its
status is good. We can make a copy of XR into XL,
which is the old value of X.
The failure occurred after XL is written. Then XL will
have the good status and XR which has the old value
of XR has bad status. We can copy the new value
of X to XR from XL.
Recovery from Disk Crashes
53



To reduce the data loss by Dish crashes, schemes
which involve redundancy, extending the idea of
parity checks or duplicate sectors can be applied.
The term used for these strategies is RAID or
Redundant Arrays of Independent Disks.
In general, if the mean time to failure of disks is n
years, then in any given year, 1/nth of the surviving
disks fail.
54




Each of the RAID schemes has data disks and redundant
disks.
Data disks are one or more disks that hold the data.
Redundant disks are one or more disks that hold
information that is completely determined by the
contents of the data disks.
When there is a disk crash of either of the disks, then
the other disks can be used to restore the failed disk to
avoid a permanent information loss.
Content
55
1)Focus on :
“How to recover from disk crashes”
common term RAID
“redundancy array of independent disks”
2)Several schemes to recover from disk crashes:




Mirroring—RAID level 1;
Parity checks--RAID 4;
Improvement--RAID 5;
RAID 6;
1) Mirroring
56



The simplest scheme to recovery from Disk Crashes
How does Mirror work?
-- making two or more copied of the data on
different disks
Benefit:
-- save data in case of one disk will fail;
-- divide data on several disks and let access
to several blocks at once
1) Mirroring (con’t)
57
For mirroring, when the data can be lost?
-- the only way data can be lost if there is a second
(mirror/redundant) disk crash while the first (data) disk crash
is being repaired.
 Possibility:
Suppose:
 One disk: mean time to failure = 10 years;
 One of the two disk: average of mean time to failure = 5
years;
 The process of replacing the failed disk= 3 hours=1/2920
year;
So:
 the possibility of the mirror disk will fail=1/10 * 1/2,920
=1/29,200;
 The possibility of data loss by mirroring: 1/5 * 1/29,200 =
1/146,000

2)Parity Blocks
58

why changes?
-- disadvantages of Mirroring:
redundant disks
uses so many
What’s new?
-- RAID level 4: uses only one redundant disk


How this one redundant disk works?
-- modulo-2 sum;
-- the jth bit of the redundant disk is the modulo-2
sum of the jth bits of all the data disks.

Example
2)Parity Blocks(con’t)___Example
59
Data disks:
 Disk1: 11110000
 Disk2: 10101010
 Disk3: 00111000
Redundant disk:
 Disk4: 01100010
2)RAID 4 (con’t)
60
Reading
-- Similar with reading blocks from any disk;

Writing
1)change the data disk;
2)change the corresponding block of the redundant
disk;


Why?
-- hold the parity checks for the corresponding
blocks of all the data disks
2)RAID 4 (con’t) _ writing
61
For a total N data disks:
1) naïve way:

read N data disks and compute the modulo-2 sum
of the corresponding blocks;

rewrite the redundant disk according to modulo-2
sum of the data disks;
2) better way:
 Take modulo-2 sum of the old and new version of the
data block which was rewritten;
 Change the position of the redundant disk which was
1’s in the modulo-2 sum;
2)RAID 4 (con’t) _ writing_Example
62









Data disks:
Disk1: 11110000
Disk2: 10101010  01100110
Disk3: 00111000
to do:
Modulo-2 sum of the old and new version of disk 2:
11001100
So, we need to change the positions 1,2,5,6 of the redundant
disk.
Redundant disk:
Disk4: 01100010  10101110
2)RAID 4 (con’t) _failure recovery
63

Redundant disk crash:
-- swap a new one and recomputed data from all the data
disks;

One of Data disks crash:
-- swap a new one;
-- recomputed data from the other disks including data disks
and redundant disk;

How to recomputed? (same rule, that’s why there will be some
improvement)
-- take modulo-2 sum of all the corresponding bits of all the
other disks
3) An Improvement: RAID 5
64


Why need a improvement?
-- Shortcoming of RAID level 4: suffers from a bottleneck defect
(when updating data disk need to read and write the redundant
disk);
Principle of RAID level 5 (RAID 5):
-- treat each disk as the redundant disk for some of the blocks;
Why it is feasible?
The rule of failure recovery for redundant disk and data disk is the
same:

“take modulo-2 sum of all the corresponding bits of all the other disks”
So, there is no need to retreat one as redundant disk and others as data disks
3) RAID 5 (con’t)
65

How to recognize which blocks of each disk treat this disk
as redundant disk?
-- if there are n+1 disks which were labeled from 0 to
N, then we can treat the ith cylinder of disk J as
redundant if J is the remainder when I is divided by
n+1;

Example;
66
3) RAID 5 (con’t)_example
N=3;
 The first disk, labeled as 0 : 4,8,12…;
 The second disk, labeled as 1 : 1,5,9…;
 The third disk, labeled as 2 : 2,6,10…;
 ……….
Suppose all the 4 disks are equally likely to be
written, for one of the 4 disks, the possibility of
being written:
 1/4 + 3 /4 * 1/3 =1/2
 If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m
4)Coping with multiple disk crashes
67


RAID 6
– deal with any number of disk crashes if using enough
redundant disks
Example
a system of seven disks ( four data disks_number 14 and 3 redundant disks_ number 5-7);
How to set up this 3*7 matrix ?
(why is 3? – there are 3 redundant disks)
1)every column values three 1’s and 0’s except for all three 0’s;
2) column of the redundant disk has single 1’s;
3) column of the data disk has at least two 1’s;
•
4) Coping with multiple disk crashes (con’t)
68





Reading:
read form the data disks and ignore the redundant
disk
Writing:
Change the data disk
change the corresponding bits of all the redundant
disks
4) Coping with multiple disk crashes (con’t)
69






In those system which has 4 data disks and 3
redundant disk, how they can correct up to 2 disk
crashes?
Suppose disk a and b failed:
find some row r (in 3*7 matrix)in which the column
for a and b are different (suppose a is 0’s and b is
1’s);
Compute the correct b by taking modulo-2 sum of
the corresponding bits from all the other disks
other than b which have 1’s in row r;
After getting the correct b, Compute the correct a with all other disks
available;
Example
4) Coping with multiple disk crashes (con’t)_example
70
3*7 matrix
data disk
redundant disk
disk number
1
2
3
4
5
6
7
1
1
1
0
1
0
0
1
1
0
1
0
1
0
1
0
1
1
0
0
1
4) Coping with multiple disk crashes
(con’t)_example
71
First block of all the disks
disk
1)
2)
3)
4)
5)
6)
7)
contents
11110000
10101010
00111000
01000001
01100010
00011011
10001001
4) Coping with multiple disk crashes
(con’t)_example
72
Two disks crashes;
disk
1)
2)
3)
4)
5)
6)
7)
contents
11110000
?????????
00111000
01000001
?????????
00011011
10001001
4) Coping with multiple disk crashes
(con’t)_example
73
In that 3*7 matrix, find in row 2, disk 2 and 5 have
different value and disk 2’s value is 1 and 5’s value is 0.
so: compute the first block of disk 2 by modulo-2 sum of
all the corresponding bits of disk 1,4,6;
then compute the first block of disk 2 by modulo-2 sum of
all the corresponding bits of disk 1,2,3;
1)
2)
3)
4)
5)
6)
7)
11110000
????????? => 00001111
00111000
01000001
????????? => 01100010
00011011
10001001
Summary..
74

1.
2.
3.
4.
•
Disk failures - their mitigation:
Intermittent failure - checksums
Media decay – Stable Storage Technique
Write Failure – Stable Storage Technique
Disk Crashes – RAID Techniques
“How to recover from disk crashes”
--- by RAID
Material taken from
75
Disk Failures (Chapter 13.4.1 to 13.4.9)
Database Systems – The Complete Book
Second Edition.

76
13.5 Arranging data on disk
Meghna Jain
ID-205
CS257
Prof: Dr. T.Y.Lin
Data elements are represented as records, which stores
in consecutive bytes in same same disk block.
77
Basic layout techniques of storing data :
Fixed-Length Records
Allocation criteria - data should start at word boundary.
Fixed Length record header
1. A pointer to record schema.
2. The length of the record.
3. Timestamps to indicate last modified or last read.
Example
78
CREATE TABLE employee(
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
Data should start at word boundary and contain header and four fields
name, address, gender and birthdate.
79
Packing Fixed-Length Records into Blocks :
Records are stored in the form of blocks on the disk and they
move into main memory when we need to update or access
them.
A block header is written first, and it is followed by series of
blocks.
Block header contains the following information
:
80





Links to one or more blocks that are part of a network of
blocks.
Information about the role played by this block in such a
network.
Information about the relation, the tuples in this block belong
to.
A "directory" giving the offset of each record in the block.
Time stamp(s) to indicate time of the block's last modification
and/or access.
81
Example
Along with the header we can pack as many record as we can
in one block as shown in the figure and remaining space will
be unused.
82
13.6 REPRESENTING BLOCK
AND RECORD ADDRESSES
Ramya Karri
CS257 Section 2
ID: 206
INTRODUCTION
83

Address of a block and Record
 In Main Memory
 Address of the block is the virtual memory address of the first
byte
 Address of the record within the block is the virtual memory
address of the first byte of the record
 In

Secondary Memory: sequence of bytes describe
the location of the block in the overall system
Sequence of Bytes describe the location of the
block : the device Id for the disk, Cylinder number,
etc.
ADDRESSES IN CLIENT-SERVER
SYSTEMS
84


The addresses in address space are represented in two ways
 Physical Addresses: byte strings that determine the place within the
secondary storage system where the record can be found.
 Logical Addresses: arbitrary string of bytes of some fixed length
Physical Address bits are used to indicate:
 Host to which the storage is attached
 Identifier for the disk
 Number of the cylinder
 Number of the track
 Offset of the beginning of the record
85
ADDRESSES IN CLIENT-SERVER SYSTEMS (CONTD..)
 Map Table relates logical addresses to physical
addresses.
Logical
Physical
Logical Address
Physical Address
LOGICAL AND STRUCTURED
ADDRESSES
86



Purpose of logical address?
Gives more flexibility, when we
 Move the record around within the block
 Move the record to another block
Gives us an option ofUnused
deciding what to do when a
record is deleted?
Recor Recor Recor Recor
d4
d3
d2
d1
Offset table
Header
POINTER SWIZZLING
87



Having pointers is common in an object-relational
database systems
Important to learn about the management of
pointers
Every data item (block, record, etc.) has two
addresses:
 database address: address on the disk
 memory address, if the item is in virtual memory
POINTER SWIZZLING (CONTD…)
88

Translation Table: Maps database address to memory address
Dbaddr
Mem-addr
Database address

All addressable items in the database have entries in the map table, while
only those items currently in memory are mentioned in the translation table
Memory Address
POINTER SWIZZLING (CONTD…)
89

Pointer consists of the following two fields

Bit indicating the type of address

Database or memory address

Disk
Example 13.17
Memory
Swizzled
Block 1
Block 1
Unswizzled
Block 2
EXAMPLE 13.7
90


Block 1 has a record with pointers to a second
record on the same block and to a record on
another block
If Block 1 is copied to the memory
 The first pointer which points within Block 1 can be
swizzled so it points directly to the memory address
of the target record
 Since Block 2 is not in memory, we cannot swizzle
the second pointer
POINTER SWIZZLING (CONTD…)
91

Three types of swizzling
 Automatic Swizzling
 As
soon as block is brought into memory, swizzle all
relevant pointers.
 Swizzling
 Only

on Demand
swizzle a pointer if and when it is actually followed.
No Swizzling
 Pointers
are not swizzled they are accesses using the
database address.
PROGRAMMER CONTROL OF
SWIZZLING
92

Unswizzling
 When a block is moved from memory back to disk,
all pointers must go back to database (disk)
addresses
 Use translation table again
 Important to have an efficient data structure for the
translation table
PINNED RECORDS AND BLOCKS
93


A block in memory is said to be pinned if it cannot
be written back to disk safely.
If block B1 has swizzled pointer to an item in block
B2, then B2 is pinned
 Unpin a block, we must unswizzle any pointers to it
 Keep in the translation table the places in memory
holding swizzled pointers to that item
 Unswizzle those pointers (use translation table to
replace the memory addresses with database
(disk) addresses
94
Eswara Satya Pavan Rajesh Pinapala
CS 257
ID: 221
Topics
95





Records with Variable Length Fields
Records with Repeating Fields
Variable Format Records
Records that do not fit in a block
BLOBS
Example
96
name
0
address
30
gender
286
birth date
287
Fig 1 : Movie star record with four fields
297
Records with Variable Fields
97
An effective way to represent variable length
records is as follows
 Fixed length fields are Kept ahead of the
variable length fields
Record header contains
• Length of the record
• Pointers to the beginning of all variable
length fields except the first one.
98
Records
with
Variable
Length
Fields
header information
record length
to address
gender
birth date
name
address
Figure 2 : A Movie Star record with name and address implemented as
variable length character strings
Records with Repeating Fields
99
Records contains variable number of occurrences of a field F
All occurrences of field F are grouped together and the record
header contains a pointer to the first occurrence of field F
 L bytes are devoted to one instance of field F
 Locating an occurrence of field F within the record
• Add to the offset for the field F which are the integer
multiples of L starting with 0 , L ,2L,3L and so on to locate
•We stop upon reaching the offset of the field F.
Records with Repeating Fields
100
other header information
record length
to address
to movie pointers
name
address
pointers to movies
Figure 3 : A record with a repeating group of references to movies
Records with Repeating Fields
101
record header
information
address
to name
length of name
to address
length of address
to movie references
number of references
name
Figure 4 : Storing variable-length fields separately from the record
Records with Repeating Fields
102
Advantage
 Keeping the record itself fixed length allows record to be
searched more efficiently, minimizes the overhead in the block
headers, and allows records to be moved within or among the
blocks with minimum effort.
Disadvantage
 Storing variable length components on another block increases
the number of disk I/O’s needed to examine all components of a
record.
Records with Repeating Fields
103
A compromise strategy is to allocate a fixed portion of
the record for the repeating fields
 If the number of repeating fields is lesser than
allocated space, then there will be some unused space
 If the number of repeating fields is greater than
allocated space, then extra fields are stored in a
different location and
Pointer to that location and count of additional
occurrences is stored in the record
104
Variable Format Records
 Records that do not have fixed schema
 Variable format records are represented by sequence of
tagged fields
 Each of the tagged fields consist of information
• Attribute or field name
• Type of the field
• Length of the field
• Value of the field
 Why use tagged fields
• Information – Integration applications
• Records with a very flexible schema
Variable Format Records
105
code for name
code for string type
length
N
S
14
Clint Eastwood
code for restaurant owned
code for string type
length
R
S
16
Fig 5 : A record with tagged fields
Hog’s Breath Inn
Records that do not fit in a block
106
 When the length of a record is greater than block size ,then
then record is divided and placed into two or more blocks
 Portion of the record in each block is referred to as a
RECORD FRAGMENT
 Record with two or more fragments is called
SPANNED RECORD
 Record that do not cross a block boundary is called
UNSPANNED RECORD
107
Spanned Records
Spanned records require the following extra header
information
• A bit indicates whether it is fragment or not
• A bit indicates whether it is first or last fragment of
a record
• Pointers to the next or previous fragment for the
same record
Records that do not fit in a block
108
block header
record header
record 1
block 1
record
2-a
record
2-b
record 3
block 2
Figure 6 : Storing spanned records across blocks
BLOBS
109
 Large binary objects are called BLOBS
e.g. : audio files, video files
Storage of BLOBS
Retrieval of BLOBS
110
Record Modifications
Chapter 13
Section 13.8
Neha Samant
CS 257
(Section II) Id 222
110
Modification types
111

Insertion

Deletion

Update
111
Insertion
112

Insertion of records without order
Records can be placed in a block with empty space or in a new block.
Insertion of records in fixed order
 Space available in the block
 No space available in the block (outside the block)
Structured address
Pointer to a record from outside the block.
112
Insertion in fixed order
113
Space available within the block


Use of an offset table in the header of each block with pointers to the location of each
record in the block.
The records are slid within the block and the pointers in the offset table are adjusted.
Offset
table
header
unused
Record 4
Record 3
Record 2
Record 1
113
Insertion in fixed order
114
No space available within the block (outside the block)

Find space on a “nearby” block.
•
•

In case of no space available on a block, look at the following block in sorted order of
blocks.
If space is available in that block ,move the highest records of first block 1 to block 2
and slide the records around on both blocks.
Create an overflow block
•
•
•
Records can be stored in overflow block.
Each block has place for a pointer to an overflow block in its header.
The overflow block can point to a second overflow block as shown below.
Block
B
Overflow
block for B
114
Deletion
115

Recover space after deletion

When using an offset table, the records can be slid around the block so there will be
an unused region in the center that can be recovered.

In case we cannot slide records, an available space list can be maintained in the
block header.

The list head goes in the block header and available regions hold the links in the list.
115
Deletion
116

Use of tombstone

The tombstone is placed in a record in order to avoid pointers to the deleted
record to point to new records.

The tombstone is permanent until the entire database is reconstructed.

If pointers go to fixed locations from which the location of the record is
found then we put the tombstone in that fixed location. (See examples)

Where a tombstone is placed depends on the nature of the record pointers.

Map table is used to translate logical record address to physical address.
116
Deletion
117

Use of tombstone

If we need to replace records by tombstones, place the bit that serves as the
tombstone at the beginning of the record.

This bit remains the record location and subsequent bytes can be reused for
another record
Record 1
Record 2
Record 1 can be replaced, but the tombstone remains, record 2 has no
tombstone and can be seen when we follow a pointer to it.
117
Update
118

Fixed Length update
No effect on storage system as it occupies same space as before
update.

Variable length update


Longer length
Short length
118
Update
119
Variable length update (longer length)


Stored on the same block:
 Sliding records
 Creation of overflow block.
Stored on another block
 Move records around that block
 Create a new block for storing variable length fields.
119
Update
120
Variable length update (Shorter length)

Same as deletion
 Recover space
 Consolidate space.
120
BTrees & Bitmap Indexes
14.2, 14.7
DATABASE SYSTEMS – The Complete Book
Presented By:
Deepti Kundu
Under the supervision of:
Dr. T.Y.Lin
Maciej Kicinski
122
B Trees ►►
Structure
123



A balanced tree, meaning that all paths from the
leaf node have the same length.
There is a parameter n associated with each Btree
block. Each block will have space for n searchkeys
and n+1 pointers.
The root may have only 1 parameter, but all other
blocks most be at least half full.
Structure
124
● A typical
node >
● a typical interior
node would have
pointers pointing to
leaves with out
values
● a typical leaf would
have pointers point
to records
N search keys
N+1 pointers
Application
125



The search key of the Btree is the primary key for
the data file.
Data file is sorted by its primary key.
Data file is sorted by an attribute that is not a
key,and this attribute is the search key for the Btree.
Lookup
126
If at an interior node, choose the correct pointer to use. This
is done by comparing keys to search value.
Lookup
127
If at a leaf node, choose the key that matches what
you are looking for and the pointer for that leads
to the data.
Insertion
128




When inserting, choose the correct leaf node to put
pointer to data.
If node is full, create a new node and split keys
between the two.
Recursively move up, if cannot create new pointer
to new node because full, create new node.
This would end with creating a new root node, if
the current root was full.
Deletion
129
Perform lookup to find node to delete and delete it.
If node is no longer half full, perform join on
adjacent node and recursively delete up, or key
move if that node is full and recursively change
pointer up.
Efficiency
130
Btrees allow lookup, insertion, and deletion of
records using very few disk I/Os.
Each level of a Btree would require one read. Then
you would follow the pointer of that to the next or
final read.
Efficiency
131
Three levels are sufficient for Btrees. Having each block
have 255 pointers, 255^3 is about 16.6 million.
You can even reduce disk I/Os by keeping a level of a Btree
in main memory. Keeping the first block with 255 pointers
would reduce the reads to 2, and even possible to keep the
next 255 pointers in memory to reduce reads to 1.
132
Bitmap Indexes ►►
Definition
133
A bitmap index for a field F is a collection of
bit-vectors of length n, one for each possible
value that may appear in that field F.[1]
What does that mean?
134

Assume relation R with
2
attributes A and B.
 Attribute A is of type
Integer and B is of type
String.
 6 records, numbered 1
through 6 as shown.
A
B
1
30
foo
2
30
bar
3
40
baz
4
50
foo
5
40
bar
6
30
baz
Example Continued…
135

A bitmap for attribute B is:
A
B
1
30
foo
2
30
bar
Value
Vector
foo
100100
3
40
baz
bar
010010
4
50
foo
001001
5
40
bar
6
30
baz
baz
Where do we reach?
136


A bitmap index is a special kind of database index
that uses bitmaps.[2]
Bitmap indexes have traditionally been considered
to work well for data such as gender, which has a
small number of distinct values, e.g., male and
female, but many occurrences of those values.[2]
A little more…
137



A bitmap index for attribute A of relation R is:
 A collection of bit-vectors
 The number of bit-vectors = the number of distinct
values of A in R.
 The length of each bit-vector = the cardinality of R.
 The bit-vector for value v has 1 in position i, if the ith
record has v in attribute A, and it has 0 there if not.[3]
Records are allocated permanent numbers.[3]
There is a mapping between record numbers and record
addresses.[3]
Motivation for Bitmap Indexes
138

Very efficient when used for partial match queries.[3]

They offer the advantage of buckets [2]
 Where we find tuples with several specified attributes
without first retrieving all the record that matched in
each of the attributes.

They can also help answer range queries [3]
Another Example
139
Multidimensional Array of multiple types
{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}
5 = 100010
79 = 010100
4 = 001000
6 = 000001
d = 101100
t = 010010
a = 000001
Example Continued…
140
{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}
Searching for items is easy, just AND together.
To search for (5,d)
5 = 100010
d = 101100
100010 AND 101100 = 100000
The location of the
record has been traced!
Compressed Bitmaps
141




Assume:
 The number of records in R are n
 Attribute A has m distinct values in R
The size of a bitmap index on attribute A is m*n.
If m is large, then the number of 1’s will be around 1/m.
 Opportunity to encode
A common encoding approach is called run-length
encoding.[1]
Run-length encoding
142

Represents runs


A run of i 0’s followed by a 1 is encoded by:





A run is a sequence of i 0’s followed by a 1, by some suitable binary
encoding of the integer i.
First computing how many bits are needed to represent i, Say k
Then represent the run by k-1 1’s and a single 0 followed by k bits
which represent i in binary.
The encoding for i = 1 is 01. k = 1
The encoding for i = 0 is 00. k = 1
We concatenate the codes for each run together, and the sequence of bits is the
encoding of the entire bit-vector
Understanding with an Example
143


Let us decode the sequence 11101101001011
Staring at the beginning (left most bit):
 First run: The first 0 is at position 4, so k = 4. The next 4 bits are
1101, so we know that the first integer is i = 13
 Second run: 001011
k


=1
i=0
Last run: 1011
k=1
i=3
Our entire run length is thus 13,0,3, hence our bit-vector is:
0000000000000110001
Managing Bitmap Indexes
144
1) How do you find a specific bit-vector for a
value efficiently?
2) After selecting results that match, how do you
retrieve the results efficiently?
3) When data is changed, do you you alter bitmap
index?
1) Finding bit vectors
145
 Think
of each bit-vector as a key to a value.[1]
 Any secondary storage technique will be
efficient in retrieving the values.[1]
 Create secondary key with the attribute value
as a search key [3]
Btree
Hash
2) Finding Records
146


Create secondary key with the record number as a search
key [3]
Or in other words,
 Once you learn that you need record k, you can create
a secondary index using the kth position as a search
key.[1]
3) Handling Modifications
147
Two things to remember:
Record numbers must remain fixed once assigned
Changes to data file require changes to bitmap index
148
Deletion
Tombstone replaces deleted record
Corresponding bit is set to 0
149
Insertion
Record assigned the next record
number.
A bit of value 0 or 1 is appended to each
bit vector
If new record contains a new value of
the attribute, add one bit-vector.
150
Modification
Change the bit corresponding to the old
value of the modified record to 0
Change the bit corresponding to the
new value of the modified record to 1
If the new value is a new value of A,
then insert a new bit-vector.
References
151
[1] Database Systems : The Complete Book - Hector Garcia-Molina, Jeffrey D. Ullman,
Jennifer D. Widom
[2] http://en.wikipedia.org/wiki/Bitmap_index#Example
[3] faculty.kfupm.edu.sa/ICS/adam/ICS541/L10-md-bitmap-indexing.ppt
[4] http://csis.bitspilani.ac.in/faculty/goel/Data%20Warehousing/Lecture%20Notes/Lecture%20%239%2
0-%20Bitmap%20Indexes%20in%20DW.doc (- a good doc file to read the concepts of
bitmap indexes)
Query Execution
152
Chapter 15
Section 15.1
Presented by
Khadke, Suvarna
CS 257
(Section II) Id 213
152
Agenda
153








Query Processor and major parts of Query
processor
Physical-Query-Plan Operators
Scanning Tables
Basic approaches to locate the tuples of a
relation R
Sorting While Scanning Tables
Computation Model for Physical Operator
I/O Cost for Scan Operators
Iterators
153
Query Processor
154



Group of components of a DBMS that converts a
user queries and data-modification commands into
a sequence of database operations
It also executes those operations
Must supply detail regarding how the query is to be
executed
154
Major parts of Query processor
155
Query execution:
The algorithms that
manipulate the data of
the database.
Focus on the
operations of extended
relational algebra.
155
Outline of Query Compilation
156
Query compilation



Parsing : A parse tree for the query is
constructed
Query Rewrite : The parse tree is
converted to an initial query plan
(algebraic representation of the query).
And transformed into logical query plan
(less time)
Physical Plan Generation : Logical Q
Plan is converted into physical query plan
by selecting algorithms and order of
execution of these operator.
156
Physical-Query-Plan Operators
157


Physical operators are implementations of the
operator of relational algebra.
They can also be use in non relational algebra
operators like “scan” which scans tables.
157
Scanning Tables
158



One of the basic thing we can do in a Physical
query plan is to read the entire contents of a
relation R.
Variation of this operator involves simple predicate
Read only those tuples of the relation R that satisfy
the predicate.
158
159
Basic approaches to locate the
tuples of a relation R
Table Scan
 Relation R is stored in secondary memory with its
tuples arranged in blocks
 It is possible to get the blocks one by one
Index-scan
 If there is an index on any attribute of Relation R
 Can use this index to get all the tuples of R
 For Example: sparse index on R
159
Sorting While Scanning Tables
160





Number of reasons to sort a relation
Query could include an ORDER BY clause, requiring
that a relation be sorted.
Algorithms to implement relational algebra
operations requires one or both arguments to be
sorted relations.
Physical-query-plan operator sort-scan takes a
relation R, attributes on which the sort is to be made,
and produces R in that sorted order
Another reason : B-tree index on a, multiway merge160
sort
161
Computation Model for Physical
Operator



Physical-Plan Operator should be selected wisely
which is essential for good Query Processor .
For “cost” of each operator is estimated by number
of disk I/O’s for an operation.
The total cost of operation depends on the size of
the answer and includes the final write back cost to
the total cost of the query.
161
Parameters for Measuring Costs
162





B: The number of blocks are needed to hold all
tuples of R.
Also known as B(R)
T:The number of tuples in R.
Also known as T(R)
V: The number of distinct values that appear in a
column of a relation
V(R, a)- is the number of distinct values of column
for a in R
162
I/O Cost for Scan Operators
163



If relation R is clustered, then the number of disk
I/O for the table-scan operator is approximately B
disk I/O’s
If relation R is not clustered, then the number of
required disk I/O generally is much higher
A index on a relation R occupies many fewer than
B(R) blocks
That means a scan of the entire R which takes at
least B disk I/O’s will require more I/O’s than the
entire index
163
164
Iterators for Implementation of
Physical Operators





Many physical operators can be implemented as an
Iterator.
Three methods forming the iterator for an operation
are:
1. Open( ) :
This method starts the process of getting tuples
It initializes any data structures needed to perform
the operation
164
165
Iterators for Implementation of
Physical Operators







2. GetNext( ):
Returns the next tuple in the result
Adjusts data structures as necessary to allow
subsequent tuples to be obtained
If there are no more tuples to return, GetNext
returns a special value NotFound
3. Close( ) :
Ends the iteration after all tuples
It calls Close on any arguments of the operator
165
Reference
166

ULLMAN, J. D., WISDOM J. & HECTOR G., DATABASE
SYSTEMS THE COMPLETE BOOK, 2nd Edition, 2008.
166
167
By
Swathi Vegesna
At a glimpse
168








Introduction
Partitioning Relations by Hashing
Algorithm for Duplicate Elimination
Grouping and Aggregation
Union, Intersection, and Difference
Hash-Join Algorithm
Sort based Vs Hash based
Summary
Introduction
169
Hashing is done if the data is too big to store in main
memory buffers.
 Hash
all the tuples of the argument(s) using an
appropriate hash key.
 For all the common operations, there is a way to select
the hash key so all the tuples that need to be
considered together when we perform the operation
have the same hash value.
 This reduces the size of the operand(s) by a factor
equal to the number of buckets.
Partitioning Relations by Hashing
170
Algorithm:
initialize M-1 buckets using M-1 empty buffers;
FOR each block b of relation R DO BEGIN
read block b into the Mth buffer;
FOR each tuple t in b DO BEGIN
IF the buffer for bucket h(t) has no room for t THEN
BEGIN
copy the buffer t o disk;
initialize a new empty block in that buffer;
END;
copy t to the buffer for bucket h(t);
END ;
END ;
FOR each bucket DO
IF the buffer for this bucket is not empty THEN
write the buffer to disk;
171
Duplicate Elimination
For the operation δ(R) hash R to M-1 Buckets.
(Note that two copies of the same tuple t will hash to the
same bucket)
 Do duplicate elimination on each bucket Ri
independently, using one-pass algorithm
 The result is the union of δ(Ri), where Ri is the portion of
R that hashes to the ith bucket

Requirements
172
Number of disk I/O's: 3*B(R)
B(R) < M(M-1), only then the two-pass, hash-based
algorithm will work
 In order for this to work, we need:
 hash function h evenly distributes the tuples among the
buckets
 each bucket Ri fits in main memory (to allow the one-pass
algorithm)
 i.e., B(R) ≤ M2

Grouping and Aggregation
173



Hash all the tuples of relation R to M-1 buckets, using a
hash function that depends only on the grouping
attributes
(Note: all tuples in the same group end up in the same
bucket)
Use the one-pass algorithm to process each bucket
independently
Uses 3*B(R) disk I/O's, requires B(R) ≤ M2
Union, Intersection, and Difference
174






For binary operation we use the same hash function
to hash tuples of both arguments.
R U S we hash both R and S to M-1
R ∩ S we hash both R and S to 2(M-1)
R-S we hash both R and S to 2(M-1)
Requires 3(B(R)+B(S)) disk I/O’s.
Two pass hash based algorithm requires
min(B(R)+B(S))≤ M2
Hash-Join Algorithm
175





Use same hash function for both relations; hash function
should depend only on the join attributes
Hash R to M-1 buckets R1, R2, …, RM-1
Hash S to M-1 buckets S1, S2, …, SM-1
Do one-pass join of Ri and Si, for all i
3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M2
Sort based Vs Hash based
176




For binary operations, hash-based only limits size
to min of arguments, not sum
Sort-based can produce output in sorted order,
which can be helpful
Hash-based depends on buckets being of equal
size
Sort-based algorithms can experience reduced
rotational latency or seek time
Summary
177






Partitioning Relations by Hashing
Algorithm for Duplicate Elimination
Grouping and Aggregation
Union, Intersection, and Difference
Hash-Join Algorithm
Sort based Vs Hash based
178
Index-Based Algorithms
Chapter 15
Section 15.6
Presented by
Fan Yang
CS 257
Class ID218
178
Clustering and Nonclustering
Indexes
179


Clustered Relation: Tuples are packed into roughly as
few blocks as can possibly hold those tuples
Clustering indexes: Indexes on attributes that all the
tuples with a fixed value for the search key of this
index appear on roughly as few blocks as can hold
them
179
180
Clustering and Nonclustering
Indexes


A relation that isn’t clustered cannot have a
clustering index
A clustered relation can have nonclustering indexes
180
Index-Based Selection
181


For a selection σC(R), suppose C is of the form a=v,
where a is an attribute
For clustering index R.a:
the number of disk I/O’s will be B(R)/V(R,a)
181
Index-Based Selection
182

The actual number may be higher:
1. index is not kept entirely in main memory
2. they spread over more blocks
3. may not be packed as tightly as possible into
blocks
182
Example
183





B(R)=1000, T(R)=20,000 number of I/O’s required:
1. clustered, not index
1000
2. not clustered, not index
20,000
3. If V(R,a)=100, index is clustering
10
4. If V(R,a)=10, index is nonclustering 2,000
183
Joining by Using an Index
184

Natural join R(X, Y) S S(Y, Z)
Number of I/O’s to get R
Clustered: B(R)
Not clustered: T(R)
Number of I/O’s to get tuple t of S
Clustered: T(R)B(S)/V(S,Y)
Not clustered: T(R)T(S)/V(S,Y)
184
Example
185

R(X,Y): 1000 blocks S(Y,Z)=500 blocks
Assume 10 tuples in each block,
so T(R)=10,000 and T(S)=5000
V(S,Y)=100
If R is clustered, and there is a clustering index on Y
for S
the number of I/O’s for R is: 1000
the number of I/O’s for S
is10,000*500/100=50,000
185
Joins Using a Sorted Index
186



Natural join R(X, Y) S (Y, Z) with index on Y for
either R or S
Extreme case: Zig-zag join
Example:
relation R(X,Y) and S(Y,Z) with index on Y for both
relations
search keys (Y-value) for R: 1,3,4,4,5,6
search keys (Y-value) for S: 2,2,4,6,7,8
186
187
The Query Compiler
16.1 Parsing and Preprocessing
Meghna Jain(205)
Dr. T. Y. Lin
188
Presentation Outline
16.1 Parsing and Preprocessing
16.1.1 Syntax Analysis and Parse Tree
16.1.2 A Grammar for Simple Subset of SQL
16.1.3 The Preprocessor
16.1.4 Processing Queries Involving Views
189
Query compilation is divided
into three steps
1. Parsing: Parse SQL query into parser tree.
2. Logical query plan: Transforms parse tree into expression
tree of relational algebra.
3.Physical query plan:
Transforms logical query plan into
physical query plan.
. Operation performed
. Order of operation
. Algorithm used
. The way in which stored data is obtained and passed from one
operation to another.
Query
Parser
190
Preprocessor
Logical Query plan
generator
Query rewrite
Preferred logical
query plan
Form a query to a logical query plan
Syntax Analysis and Parse Tree
191
Parser takes the sql query and convert it to parse
tree. Nodes of parse tree:
1. Atoms: known as Lexical elements such as key
words, constants, parentheses, operators, and
other schema elements.
2. Syntactic categories: Subparts that plays a
similar role in a query as <Query> , <Condition>
Grammar for Simple Subset of SQL
<Query> ::= <SFW>
<Query> ::= (<Query>)
192
<SFW> ::= SELECT <SelList> FROM <FromList> WHERE <Condition>
<SelList> ::= <Attribute>,<SelList>
<SelList> ::= <Attribute>
<FromList> ::= <Relation>, <FromList>
<FromList> ::= <Relation>
<Condition> ::= <Condition> AND <Condition>
<Condition> ::= <Tuple> IN <Query>
<Condition> ::= <Attribute> = <Attribute>
<Condition> ::= <Attribute> LIKE <Pattern>
<Tuple> ::= <Attribute>
Atoms(constants), <syntactic categories>(variable),
::= (can be expressed/defined as)
Query and Parse T ree
193
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in 1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
194
Another query equivalent
195
SELECT title
FROM StarsIn, MovieStar
WHERE starName = name AND
birthdate LIKE '%1960%' ;
Parse Tree
<Query>
196
<SFW>
SELECT <SelList> FROM
<Attribute>
<FromList>
WHERE
<RelName> , <FromList>
title
StarsIn
<Condition>
AND
<RelName>
MovieStar
<Query>
<Condition>
<Attribute>
starName
=
<Attribute>
name
<Condition>
<Attribute> LIKE <Pattern>
birthdate
‘%1960’
The Preprocessor
197
Functions of Preprocessor
. If a relation used in the query is virtual view then each use of this relation
in the form-list must replace by parser tree that describe the view.
. It is also responsible for semantic checking
1. Checks relation uses : Every relation mentioned in FROMclause must be a relation or a view in current schema.
2. Check and resolve attribute uses: Every attribute mentioned
in SELECT or WHERE clause must be an attribute of same
relation in the current scope.
3. Check types: All attributes must be of a type appropriate to
their uses.
198
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in 1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
Preprocessing Queries Involving Views
199
When an operand in a query is a virtual view, the preprocessor needs to
replace the operand by a piece of parse tree that represents how the
view is constructed from base table.
Base Table: Movies( title, year, length, genre, studioname,
producerC#)
View definition : CREATE VIEW ParamountMovies AS
SELECT title, year FROM movies
WHERE studioName = 'Paramount';
Example based on view:
SELECT title FROM ParamountMovies WHERE year = 1979;
200
16.2 ALGEBRAIC LAWS FOR
IMPROVING QUERY PLANS
Ramya Karri
ID: 206
Optimizing the Logical Query Plan
201



The translation rules converting a parse tree to a logical query
tree do not always produce the best logical query tree.
It is often possible to optimize the logical query tree by applying
relational algebra laws to convert the original tree into a more
efficient logical query tree.
Optimizing a logical query tree using relational algebra laws is
called heuristic optimization
Relational Algebra Laws
202
These laws often involve the properties of:

commutativity - operator can be applied to operands independent
of order.
 E.g.

A + B = B + A - The “+” operator is commutative.
associativity - operator is independent of operand grouping.
 E.g.
A + (B + C) = (A + B) + C - The “+” operator is
associative.
Associative and Commutative Operators
203

The relational algebra operators of cross-product (×), join
(⋈), union, and intersection are all associative and
commutative.
Commutative
Associative
R X S=S X R
(R X S) X T = S X (R X T)
R⋈S=S⋈R
(R ⋈ S) ⋈ T= S ⋈ (R ⋈ T)
RS=SR
(R  S)  T = S  (R  T)
R ∩S =S∩ R
(R ∩ S) ∩ T = S ∩ (R ∩ T)
Laws Involving Selection
204

Complex selections involving AND or OR can be broken into two or more
selections: (splitting laws)
σC1 AND C2 (R) = σC1( σC2 (R))
σC1 OR C2 (R) = ( σC1 (R) ) S ( σC2 (R) )

Example






R={a,a,b,b,b,c}
p1 satisfied by a,b, p2 satisfied by b,c
σp1vp2 (R) = {a,a,b,b,b,c}
σp1(R) = {a,a,b,b,b}
σp2(R) = {b,b,b,c}
σp1 (R) U σp2 (R) = {a,a,b,b,b,c}
Laws Involving Selection (Contd..)
205

Selection is pushed through both arguments for
union:
σC(R  S) = σC(R)  σC(S)

Selection is pushed to the first argument and
optionally the second for difference:
σC(R - S) = σC(R) - S
σC(R - S) = σC(R) - σC(S)
Laws Involving Selection (Contd..)
206


All other operators require selection to be pushed to only
one of the arguments.
For joins, may not be able to push selection to both if
argument does not have attributes selection requires.
σC(R × S) = σC(R) × S
σC(R ∩ S) = σC(R) ∩ S
σC(R ⋈ S) = σC(R) ⋈ S
σC(R ⋈D S) = σC(R) ⋈D S
Laws Involving Selection (Contd..)
207
Example
 Consider relations R(a,b) and S(b,c) and the
expression
 σ (a=1 OR a=3) AND b<c (R ⋈S)

σ a=1 OR a=3(σ b<c (R ⋈S))
 σ a=1 OR a=3(R ⋈ σ b<c (S))
 σ a=1 OR a=3(R) ⋈ σ b<c (S)

Laws Involving Projection
208

Like selections, it is also possible to push projections
down the logical query tree. However, the
performance gained is less than selections because
projections just reduce the number of attributes
instead of reducing the number of tuples.
Laws Involving Projection
209

Laws for pushing projections with joins:
πL(R × S) = πL(πM(R) × πN(S))
πL(R ⋈ S) = πL((πM(R) ⋈ πN(S))
πL(R ⋈D S) = πL((πM(R) ⋈D πN(S))
Laws Involving Projection
210

Laws for pushing projections with set operations.

Projection can be performed entirely before union.
πL(R UB S) = πL(R) UB πL(S)

Projection can be pushed below selection as long as we also
keep all attributes needed for the selection (M = L 
attr(C)).
πL ( σC (R)) = πL( σC (πM(R)))
Laws Involving Join
211

We have previously seen these important rules
about joins:
1. Joins are commutative and associative.
2. Selection can be distributed into joins.
3. Projection can be distributed into joins.
Laws Involving Duplicate Elimination
212




The duplicate elimination operator (δ) can be
pushed through many operators.
R has two copies of tuples t, S has one copy of t,
δ (RUS)=one copy of t
δ (R) U δ (S)=two copies of t
Laws Involving Duplicate Elimination
213

Laws for pushing duplicate elimination operator (δ):
δ(R × S) = δ(R) × δ(S)
δ(R
S) = δ(R)
δ(S)
δ(R D S) = δ(R)
D δ(S)
δ( σC(R) = σC(δ(R))
Laws Involving Duplicate Elimination
214

The duplicate elimination operator (δ) can also be
pushed through bag intersection, but not across
union, difference, or projection in general.
δ(R ∩ S) = δ(R) ∩ δ(S)
Laws Involving Grouping
215


The grouping operator (γ) laws depend on the aggregate operators used.
There is one general rule, however, that grouping subsumes duplicate
elimination:
δ(γL(R)) = γL(R)

The reason is that some aggregate functions are unaffected by duplicates
(MIN and MAX) while other functions are (SUM, COUNT, and AVG).
The Query Compiler
(16.3 & 16.4)
DATABASE SYSTEMS – The Complete Book
Presented By:
Under the supervision of:
Deepti Kundu
Dr. T.Y.Lin
Maciej Kicinski
Topics to be covered
217

From Parse to Logical Query Plans





Conversion to Relational Algebra
Removing Subqueries From Conditions
Improving the Logical Query Plan
Grouping Associative/ Commutative Operators
Estimating the Cost of Operation





Estimating Sizes of Intermediate Relations
Estimating the Size of a Projection
Estimating the Size of a Selection
Estimating the Size of a Join
Estimating Sizes for Other Operations
Review
218
Query
Parser
Section 16.1
Preprocessor
Logical query
plan generator
Section 16.3
Query Rewriter
Preferred logical query plan
Two steps to turn Parse tree into
Preferred Logical Query Plan
219

Replace the nodes and structures of the parse tree, in
appropriate groups, by an operator or operators of relational
algebra.

Take the relational algebra expression and turn it into an
expression that we expect can be converted to the most efficient
physical query plan.
Reference Relations
220
StarsIn (movieTitle, movieYear, starName)
 MovieStar (name, address, gender, birthdate)

Conversion to Relational Algebra
221

If we have a <Query> with a <Condition> that has no
subqueries, then we may replace the entire construct – the
select-list, from-list, and condition – by a relationalalgebra expression.
222

The relational-algebra expression consists of the
following from bottom to top:
The products of all the relations mentioned in the
<FromList>, which Is the argument of:
 A selection σC, where C is the <Condition> expression in
the construct being replaced, which in turn is the argument
of:
 A projection πL , where L is the list of attributes in the
<SelList>

A query : Example
223

SELECT movieTitle
FROM Starsin, MovieStar
WHERE starName = name AND
birthdate LIKE ‘%1960’;
SELECT movieTitle
FROM Starsin, MovieStar
WHERE starName = name AND
birthdate LIKE ‘%1960’;
224
Translation to an algebraic expression
tree
225
Removing Subqueries From Conditions
226



For parse trees with a <Condition> that has a
subquery
Intermediate operator – two argument selection
It is intermediate in between the syntactic
categories of the parse tree and the relationalalgebra operators that apply to relations.
Using a two-argument σ
227
πmovieTitle
σ
<Condition>
StarsIn
<Tuple>
<Attribute>
starName
IN
πname
σ birthdate LIKE ‘%1960'
MovieStar
Two argument selection with condition
involving IN
228
Now say we have, two arguments – some relation and the
second argument is a <Condition> of the form t IN S.



‘t’ – tuple composed of some attributes of R
‘S’ – uncorrelated subquery
Steps to be followed:

1.
2.
3.
Replace the <Condition> by the tree that is the expression for S ( δ is
used to remove duplicates)
Replace the two-argument selection by a one-argument selection σC.
Give σC an argument that is the product of R and S.
Two argument selection with condition
involving IN
229
σ
R
σC
<Condition>
t
IN
X
S
R
δ
S
The effect
230
Improving the Logical Query Plan
231

Algebraic laws to improve logical query plans:
 Selections
can be pushed down the expression tree as
far as they can go.
 Similarly, projections can be pushed down the tree, or
new projections can be added.
 Duplicate eliminations can sometimes be removed, or
moved to a more convenient position in the tree.
 Certain selections can be combined with a product
below to turn the pair of operations into an equijoin.
Grouping Associative/ Commutative
Operators
232




An operator that is associative and commutative operators
may be though of as having any number of operands.
We need to reorder these operands so that the multiway join
is executed as sequence of binary joins.
Its more time consuming to execute them in the order
suggested by parse tree.
For each portion of subtree that consists of nodes with the
same associative and commutative operator (natural join,
union, and intersection), we group the nodes with these
operators into a single node with many children.
The effect of query rewriting
233
Π movieTitle
Starname = name
StarsIn
σbirthdate LIKE ‘%1960’
MovieStar
Final step in producing logical query plan
234
=>
R
U
U
U
R
S
T
V
W
U
U
S
T
V
W
An Example to summarize
235


“find movies where the average age of the stars was at most 40
when the movie was made”
SELECT distinct m1.movieTitle, m1,movieYear
FROM StarsIn m1
WHERE m1.movieYear – 40 <= (
SELECT AVG (birthdate)
FROM StartsIn m2, MovieStar s
WHERE m2.starName = s.name AND
m1.movieTitle = m2.movieTitle AND
m1.movieYear = m2.movieyear
);
236
SELECT distinct m1.movieTitle, m1,movieYear
FROM StarsIn m1
WHERE m1.movieYear – 40 <= (
SELECT AVG (birthdate)
FROM StartsIn m2, MovieStar s
WHERE m2.starName = s.name AND
m1.movieTitle = m2.movieTitle AND
m1.movieYear = m2.movieyear );
`
237
238
239
16.4 From Estimating the Cost of Operation ►
Estimating the Cost of Operations
240



After getting to the logical query plan, we turn it into
physical plan.
Consider all the possible physical plan and estimate their
costs – this evaluation is known as cost-based enumeration.
The one with least estimated cost is the one selected to be
passed to the query-execution engine.
Selection for each physical plan
241

We select for each physical plan:
 An order and grouping for associative-and-commutative
operations like joins, unions, and intersections.
 An algorithm for each operator in the logical plan, for
instance, deciding whether a nested-loop join or hash-join
should be used.
 Additional operators – scanning, sorting etc. – that are
needed for the physical plan but that were not present
explicitly in the logical plan.
 The way in which the arguments are passed from on
operator to the next.
Estimating Sizes of Intermediate
Relations
242
1.
2.
3.
Give accurate estimates.
Are easy to compute.
Are logically consistent; that is, the size estimate for
an intermediate relation should not depend on how
that relation is computed.
Estimating the Size of a Projection
243



We should treat a classical, duplicate-eliminating
projection as a bag-projection.
The size of the result can be computed exactly.
There may be reduction in size (due to eliminated
components) or increase in size (due to new
components created as combination of attributes).
Estimating the Size of a Selection
244


While performing selection, we may reduce the number
of tuples but the sizes of tuple remain same.
Size can be computed as:
S = σ A=c (R)
Where A is an attribute of R and c is a constant

The recommended estimate is
T(S) = T(R)/ V(R,A)
Estimating Sizes of Other Operations
245





Union
Intersection
Difference
Duplicate Elimination
Grouping and Aggregation
246
Choosing an Order for Joins
Chapter 16.6 by:
Chiu Luk
ID: 210
Introduction
247

This section focuses on critical problem in cost-based
optimization:
 Selecting
order for natural join of three or more
relations

Compared to other binary operations, joins take
more time and therefore need effective
optimization techniques
Introduction
248
Significance of Left and Right Join
Arguments
249


The argument relations in joins determine the cost of
the join
The left argument of the join is
 Called
the build relation
 Assumed to be smaller
 Stored in main-memory
Significance of Left and Right Join
Arguments
250

The right argument of the join is
 Called
the probe relation
 Read a block at a time
 Its tuples are matched with those of build relation

The join algorithms which distinguish between the
arguments are:
 One-pass
join
 Nested-loop join
 Index join
Join Trees
251




Order of arguments is important for joining two
relations
Left argument, since stored in main-memory, should
be smaller
With two relations only two choices of join tree
With more than two relations, there are n! ways to
order the arguments and therefore n! join trees,
where n is the no. of relations
Join Trees
252




Order of arguments is important for joining two
relations
Left argument, since stored in main-memory, should
be smaller
With two relations only two choices of join tree
With more than two relations, there are n! ways to
order the arguments and therefore n! join trees,
where n is the no. of relations
Join Trees
253





Total # of tree shapes T(n) for n relations given by
recurrence:
T(1) = 1
T(2) = 1
T(3) = 2
T(4) = 5 … etc
Left-Deep Join Trees
254

Consider 4 relations. Different ways to join them are
as follows
255




In fig (a) all the right children are leaves. This is a
left-deep tree
In fig (c) all the left children are leaves. This is a
right-deep tree
Fig (b) is a bushy tree
Considering left-deep trees is advantageous for
deciding join orders
Join order
256

Join order selection
 A1
A2
A3
..
 Left deep join trees
An
An
Ai
 Dynamic
 Best

programming
plan computed for each subset of relations
Best plan (A1, .., An) = min cost plan of(
Best plan(A2, .., An)
A1
Best plan(A1, A3, .., An)
A2
….
Best plan(A1, .., An-1))
An
Dynamic Programming to Select a Join
Order and Grouping
257

Three choices to pick an order for the join of many relations are:
Consider all of the relations
 Consider a subset
 Use a heuristic o pick one
Dynamic programming is used either to consider all or a subset
 Construct a table of costs based on relation size
 Remember only the minimum entry which will required to proceed


Dynamic Programming to Select a Join
Order and Grouping
258
Dynamic Programming to Select a Join
Order and Grouping
259
Dynamic Programming to Select a Join
Order and Grouping
260
Dynamic Programming to Select a Join
Order and Grouping
261
A Greedy Algorithm for Selecting a
Join Order
262



It is expensive to use an exhaustive method like
dynamic programming
Better approach is to use a join-order heuristic for
the query optimization
Greedy algorithm is an example of that
 Make
one decision at a time about order of join and
never backtrack on the decisions once made
Completing the Physical-Query-Plan and
Chapter 16 Summary (16.7-16.8)
CS257 Spring 2009
Professor Tsau Lin
Student: Suntorn Sae-Eung
Donavon Norwood
Outline
264
16.7 Completing the Physical-Query-Plan
I. Choosing a Selection Method
II. Choosing a Join Method
III. Pipelining Versus Materialization
IV. Pipelining Unary Operations
V. Pipelining Binary Operations
VI. Notation for Physical Query Plan
VII. Ordering the Physical Operations
16.8 Summary of Chapter 16
Before complete Physical-Query-Plan
265

A query previously has been
 Parsed
and Preprocessed (16.1)
 Converted to Logical Query Plans (16.3)
 Estimated the Costs of Operations (16.4)
 Determined costs by Cost-Based Plan Selection (16.5)
 Weighed costs of join operations by choosing an Order
for Joins
16.7 Completing the Physical-Query-Plan
266
3 topics related to turning LP into a complete
physical plan

1.
2.
3.
Choosing of physical implementations such as
Selection and Join methods
Decisions regarding to intermediate results
(Materialized or Pipelined)
Notation for physical-query-plan operators
I. Choosing a Selection Method (A)
267

Algorithms for each selection operators
1. Can we use an created index on an attribute?
 If yes, index-scan. Otherwise table-scan)
2. After retrieve all condition-satisfied tuples in (1), then
filter them with the rest selection conditions
Choosing a Selection Method(A) (cont.)
268

Recall  Cost of query = # disk I/O’s

How costs for various plans are estimated from σC(R) operation
1. Cost of table-scan algorithm
a)
b)
B(R)
T(R)
if R is clustered
if R is not clustered
2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan
a)
b)
B(R) / V(R, a)
T(R) / V(R, a)
clustering index
nonclustering index
3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan
a)
b)
B(R) / 3
T(R) / 3
clustering index
nonclustering index
Example
269
Selection: σx=1 AND y=2 AND z<5 (R)
- Where parameters of R(x, y, z) are :
T(R)=5000,
B(R)=200,
V(R,x)=100, and
V(R, y)=500
-
Relation R is clustered
-
x, y have nonclustering indexes, only index on z is clustering.
Example (cont.)
270
Selection options:
1.
2.
3.
4.
Table-scan  filter x, y, z. Cost is B(R) = 200 since R is
clustered.
Use index on x =1  filter on y, z. Cost is 50 since T(R) /
V(R, x) is (5000/100) = 50 tuples, index is not clustering.
Use index on y =2  filter on x, z. Cost is 10 since T(R) /
V(R, y) is (5000/500) = 10 tuples using nonclustering index.
Index-scan on clustering index w/ z < 5  filter x ,y. Cost is
about B(R)/3 = 67
Example (cont.)
271
Costs
option 1 = 200
option 2 = 50
option 3 = 10 
option 4 = 67
The lowest Cost is option 3.

Therefore, the preferred physical plan

1.
2.
retrieves all tuples with y = 2
then filters for the rest two conditions (x, z).
II. Choosing a Join Method
272

Determine costs associated with each join algorithms:
1. One-pass join, and nested-loop join devotes enough buffer to
joining
2. Sort-join is preferred when attributes are pre-sorted or two or
more join on the same attribute such as
(R(a, b)
S(a, c))
T(a, d)
- where sorting R and S on a will produce result of R S to be
sorted on a and used directly in next join
Choosing a Join Method (cont.)
273
3. Index-join for a join with high chance of using index
created on the join attribute such as R(a, b) S(b, c)
4. Hashing join is the best choice for unsorted or nonindexing relations which needs multipass join.
III. Pipelining Versus Materialization
274

Materialization (naïve way)


store (intermediate) result of each operations on disk
Pipelining (more efficient way)

Interleave the execution of several operations, the tuples produced by
one operation are passed directly to the operations that used it

store (intermediate) result of each operations on buffer, which is
implemented on main memory
IV. Pipelining Unary Operations
275


Unary = a-tuple-at-a-time or full relation
selection and projection are the best candidates
for pipelining.
In buf
Unary
operation
Out buf
Unary
operation
Out buf
R
In buf
M-1 buffers
Pipelining Unary Operations (cont.)
276

Pipelining Unary Operations are implemented by iterators
V. Pipelining Binary Operations
277




Binary operations : ,  , - , , x
The results of binary operations can also be
pipelined.
Use one buffer to pass result to its consumer, one
block at a time.
The extended example shows tradeoffs and
opportunities
Example
278

Consider physical query plan for the expression
(R(w, x)

S(x, y))
U(y, z)
Assumption




R occupies 5,000 blocks, S and U each 10,000 blocks.
The intermediate result R S occupies k blocks for some k.
Both joins will be implemented as hash-joins, either one-pass or
two-pass depending on k
There are 101 buffers available.
Example (cont.)
279
First consider join
R S, neither relations
fits in buffers
 Needs two-pass
hash-join to partition
R into 100 buckets
(maximum possible) each bucket has 50 blocks
nd pass hash-join uses 51 buffers, leaving the rest 50
 The 2
buffers for joining result of R S with U.

Example (cont.)
280
Case 1: suppose k  49, the result of
occupies at most 49 blocks.
Steps


1.
2.
3.
4.
R
S
Pipeline in R S into 49 buffers
Organize them for lookup as a hash table
Use one buffer left to read each block of U in turn
Execute the second join as one-pass join.
Example (cont.)
281

The total number of I/O’s is
55,000
45,000 for two-pass hash join
of R and S
 10,000 to read U for one-pass
hash join of
(R S) U.

Example (cont.)
282

1.
2.
3.
Case 2: suppose k > 49 but < 5,000, we can still
pipeline, but need another strategy which intermediate
results join with U in a 50-bucket, two-pass hash-join.
Steps are:
Before start on R S, we hash U into 50 buckets of 200 blocks
each.
Perform two-pass hash join of R and U using 51 buffers as case 1,
and placing results in 50 remaining buffers to form 50 buckets for
the join of R S with U.
Finally, join R S with U bucket by bucket.
Example (cont.)
283

The number of disk I/O’s is:
 20,000
to read U and write its tuples into buckets
 45,000 for two-pass hash-join R S
 k to write out the buckets of R S
 k+10,000 to read the buckets of R S and U in the
final join

The total cost is 75,000+2k.
Example (cont.)
284

Compare Increasing I/O’s between case 1 and
case 2
k
 49 (case 1)
 Disk
k
I/O’s is 55,000
> 50  5000 (case 2)
 k=50
, I/O’s is 75,000+(2*50) = 75,100
 k=51 , I/O’s is 75,000+(2*51) = 75,102
 k=52 , I/O’s is 75,000+(2*52) = 75,104
Notice: I/O’s discretely grows as k increases from 49 50.
Example (cont.)
285

1.
Case 3: k > 5,000, we cannot perform two-pass
join in 50 buffers available if result of R S is
pipelined. Steps are
Compute R S using two-pass join and store the result on
disk.
2.
Join result on (1) with U, using two-pass join.
Example (cont.)
286

The number of disk I/O’s is:
 45,000
for two-pass hash-join R and S
k
to store R S on disk
 30,000 + k for two-pass join of U in R S

The total cost is 75,000+4k.
Example (cont.)
287

In summary, costs of physical plan as function of R
S size.
VI. Notation for Physical Query Plans
288
Several types of operators:

1.
2.
3.
4.

Operators for leaves
(Physical) operators for Selection
(Physical) Sorts Operators
Other Relational-Algebra Operations
In practice, each DBMS uses its own internal
notation for physical query plan.
Notation for Physical Query Plans (cont.)
289
Operator for leaves
1.
A leaf operand is replaced in LQP tree





TableScan(R) : read all blocks
SortScan(R, L) : read in order according to L
IndexScan(R, C): scan index attribute A by
condition C of form Aθc.
IndexScan(R, A) : scan index attribute R.A. This
behaves like TableScan but more efficient if R is not
clustered.
Notation for Physical Query Plans (cont.)
290
(Physical) operators for Selection
2.
Logical operator σC(R) is often combined with
access methods.



If σC(R) is replaced by Filter(C), and there is no
index on R or an attribute on condition C

Use TableScan or SortScan(R, L) to access R
If condition C  Aθc AND D for condition D, and
there is an index on R.A, then we may

Use operator IndexScan(R, Aθc) to access R and

Use Filter(D) in place of the selection σC(R)
Notation for Physical Query Plans (cont.)
291
(Physical) Sort Operators
3.



Sorting can occur any point in physical plan, which
use a notation SortScan(R, L).
It is common to use an explicit operator Sort(L) to
sort relation that is not stored.
Can apply at the top of physical-query-plan tree if
the result needs to be sorted with ORDER BY clause
(г).
Notation for Physical Query Plans (cont.)
292
Other Relational-Algebra Operations
4.






Descriptive text definitions and signs to elaborate
Operations performed e.g. Join or grouping.
Necessary parameters e.g. theta-join or list of elements
in a grouping.
A general strategy for the algorithm e.g. sort-based,
hashed based, or index-based.
A decision about number of passed to be used e.g. onepass, two-pass or multipass.
An anticipated number of buffers the operations will
required.
Notation for Physical Query Plans (cont.)
293

Example of a physical-query-plan

A physical-query-plan in example 16.36 for the case k > 5000
 TableScan
 Two-pass hash join
 Materialize (double line)
 Store operator
Notation for Physical Query Plans (cont.)
294

Another example

A physical-query-plan in example 16.36 for the case k < 49
 TableScan
 (2) Two-pass hash join
 Pipelining
 Different buffers needs
 Store operator
Notation for Physical Query Plans (cont.)
295

A physical-query-plan in example 16.35


Use Index on condition y = 2 first
Filter with the rest condition later on.
296
VII. Ordering of Physical
Operations
The PQP is represented as a tree structure
implied order of operations.
Still, the order of evaluation of interior nodes may
not always be clear.




Iterators are used in pipeline manner
Overlapped time of various nodes will make
“ordering” no sense.
Ordering of Physical Operations (cont.)
297
3 rules summarize the ordering of events in a PQP
tree:

Break the tree into sub-trees at each edge that
represent materialization.
1.

Order the execution of the subtree
2.


3.
Execute one subtree at a time.
Bottom-top
Left-to-right
All nodes of each sub-tree are executed
simultaneously.
Summary of Chapter 16
298
In this part of the presentation I will talk about the
main topics of Chapter 16.
COMPILATION OF QUERIES
299


Compilation means turning a query into a physical
query plan, which can be implemented by query
engine.
Steps of query compilation :
 Parsing
 Semantic checking
 Selection of the preferred logical query plan
 Generating the best physical plan
THE PARSER
300




The first step of SQL query processing.
Generates a parse tree
Nodes in the parse tree corresponds to the SQL
constructs
Similar to the compiler of a programming language
VIEW EXPANSION
301



A very critical part of query compilation.
Expands the view references in the query tree to
the actual view.
Provides opportunities for the query optimization.
SEMANTIC CHECKING
302




Checks the semantics of a SQL query.
Examines a parse tree.
Checks :
 Attributes
 Relation names
 Types
Resolves attribute references.
CONVERSION TO A LOGICAL
QUERY PLAN
303



Converts a semantically parsed tree to a algebraic
expression.
Conversion is straightforward but sub queries need
to be optimized.
Two argument selection approach can be used.
ALGEBRAIC TRANSFORMATION
304


Many different ways to transform a logical query plan to an actual plan
using algebraic transformations.
The laws used for this transformation :
Commutative and associative laws
 Laws involving selection
 Pushing selection
 Laws involving projection
 Laws about joins and products
 Laws involving duplicate eliminations
 Laws involving grouping and aggregation

ESTIMATING SIZES OF RELATIONS
305



True running time is taken into consideration when
selecting the best logical plan.
Two factors the affects the most in estimating the
sizes of relation :
 Size of relations ( No. of tuples )
 No. of distinct values for each attribute of each
relation
Histograms are used by some systems.
COST BASED OPTIMIZING
306


Best physical query plan represents the least costly
plan.
Factors that decide the cost of a query plan :
 Order and grouping operations like joins, unions
and intersections.
 Nested loop and the hash loop joins used.
 Scanning and sorting operations.
 Storing intermediate results.
PLAN ENUMERATION STRATEGIES
307

Common approaches for searching the space for
best physical plan .
 Dynamic programming : Tabularizing the best plan
for each sub expression
 Selinger style programming : sort-order the results
as a part of table
 Greedy approaches : Making a series of locally
optimal decisions
 Branch-and-bound : Starts with enumerating the
worst plans and reach the best plan
LEFT-DEEP JOIN TREES
308



Left – Deep Join Trees are the binary trees with a
single spine down the left edge and with leaves as
right children.
This strategy reduces the number of plans to be
considered for the best physical plan.
Restrict the search to Left – Deep Join Trees when
picking a grouping and order for the join of several
relations.
PHYSICAL PLANS FOR SELECTION
309



Breaking a selection into an index-scan of relation,
followed by a filter operation.
The filter then examines the tuples retrieved by the
index-scan.
Allows only those to pass which meet the portions of
selection condition.
PIPELINING VERSUS
MATERIALIZING
310





This flow of data between the operators can be controlled to implement
“ Pipelining “ .
The intermediate results should be removed from main memory to save
space for other operators.
This techniques can implemented using “ materialization “ .
Both the pipelining and the materialization should be considered by the
physical query plan generator.
An operator always consumes the result of other operator and is passed
through the main memory.
Reference
311
[1] H. Garcia-Molina, J. Ullman, and J. Widom, “Database
System: The Complete Book,” second edition: p.897-913,
Prentice Hall, New Jersey, 2008
Concurrency Control
Chiu Luk
CS257 Database Systems Principles
Spring 2009
Concurrency Control
313



Concurrency control in database management systems (DBMS) ensures
that database transactions are performed concurrently without the
concurrency violating the data integrity of a database.
Executed transactions should follow the ACID rules. The DBMS must
guarantee that only serializable (unless Serializability is intentionally
relaxed), recoverable schedules are generated.
It also guarantees that no effect of committed transactions is lost, and no
effect of aborted (rolled back) transactions remains in the related
database.
Issues with Concurrency: Example
314
Bank database: 3 Accounts
A = 500
Account
Balances
B = 500
C = 500
Property: A + B + C = 1500
Money does not leave the system
Issues with Concurrency: Example
315
Transaction T1: Transfer 100 from A to B
A = 500, B = 500, C = 500
Read (A, t)
t = t - 100
Write (A, t)
Read (B, t)
t = t + 100
Write (B, t)
A = 400, B = 600, C = 500
Issues with Concurrency: Example
316
Transaction T2: Transfer 100 from A to C
Read (A, s)
s = s - 100
Write (A, s)
Read (C, s)
s = s + 100
Write (C, s)
Transaction T1
317
Transaction T2
Read (A, t)
t = t - 100
Read (A, s)
s = s - 100
Write (A, s)
Write (A, t)
Read (B, t)
t = t + 100
Write (B, t)
Read (C, s)
s = s + 100
Write (C, s)
A
B
C
500
500
500
400
500
500
400
500
500
400
600
500
400
600
600
400 + 600 + 600 = 1600
Transaction T1
318
Transaction T2
Read (A, t)
t = t - 100
Write (A, t)
Read (A, s)
s = s - 100
Write (A, s)
Read (B, t)
t = t + 100
Write (B, t)
Read (C, s)
s = s + 100
Write (C, s)
A
B
C
500
500
500
400
500
500
300
500
500
300
600
500
300
600
600
300 + 600 + 600 = 1500
Scheduler
319
Serial and Serializable Schedules
320
In the field of databases, a schedule is a list of actions, (i.e. reading, writing,
aborting, committing), from a set of transactions.
 In this example, Schedule D is the set of 3 transactions T1, T2, T3. The schedule
describes the actions of the transactions as seen by the DBMS. T1 Reads and
writes to object X, and then T2 Reads and writes to object Y, and finally T3 Reads
and writes to object Z. This is an example of a serial schedule, because the actions
of the 3 transactions are not interleaved.

Serial and Serializable Schedules
321


A schedule that is equivalent to a serial schedule has the serializability property.
In schedule E, the order in which the actions of the transactions are executed is not the same as in D, but
in the end, E gives the same result as D.
Conflict actions
322
Two or more actions are said to be in conflict if:


The actions belong to different transactions.

At least one of the actions is a write operation.

The actions access the same object (read or write).
The following set of actions is conflicting:


T1:R(X), T2:W(X), T3:W(X)
While the following sets of actions are not:


T1:R(X), T2:R(X), T3:R(X)

T1:R(X), T2:W(Y), T3:R(X)
Conflict Serializable
323



A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more
serial schedules.
Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if there
exists an acyclic precedence graph/serializability graph for the schedule.
Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>.
Serial Schedule
A
B
C
Read (A, t)
t = t - 100
Write (A, t)
500
500
500
400
600
500
300
600
600
324
T1
Read (B, t)
t = t + 100
Write (B, t)
T2
Read (A, s)
s = s - 100
Write (A, s)
Read (C, s)
s = s + 100
Write (C, s)
300 + 600 + 600 = 1500
Serial Schedule
Read (A, s)
s = s - 100
Write (A, s)
325
T2
T1
Read (C, s)
s = s + 100
Write (C, s)
Read (A, t)
t = t - 100
Write (A, t)
Read (B, t)
t = t + 100
Write (B, t)
A
B
C
500
500
500
400
500
600
300
600
600
300 + 600 + 600 = 1500
Serial Schedule
326
T1
S0
Tn
T2
S1
S2
Consistent States
Sn
Conflict Serializability
327



Two actions Ai and Aj executed on the same data object by
Ti and Tj conflicts if either one of them is a write operation.
Let Ai and Aj are consecutive non-conflicting actions that
belongs to different transactions. We can swap Ai and Aj
without changing the result.
Two schedules are conflict equivalent if they can be turned
one into the other by a sequence of non-conflicting swaps of
adjacent actions.
Conflict Serializability
328
T1
T2
R(A)
W(A)
R(A)
R(B)
W(A)
W(B)
R(B)
W(B)
Conflict Serializability
329
T1
T2
R(A)
W(A)
R(B)
R(A)
W(A)
W(B)
R(B)
W(B)
Conflict Serializability
330
T1
T2
R(A)
W(A)
R(A)
R(B)
W(B)
W(A)
R(B)
W(B)
Conflict Serializability
331
T1
T2
R(A)
W(A)
Serial
Schedule
R(A)
W(B)
R(B)
W(A)
R(B)
W(B)
References
332

Database Systems: The Complete Book (2nd Edition) (Hardcover) by Hector Garcia-Molina (Author),
Jeffrey D. Ullman (Author), Jennifer Widom (Author) Publisher : Prenctice Hall.

http://en.wikipedia.org/wiki/Concurrency_control

http://www.utdallas.edu/~mxk055100/db07files/serilizable-defs.ppt

http://en.wikipedia.org/wiki/Schedule_(computer_science)#Serializable

http://www.cs.duke.edu/~shivnath/courses/fall06/Lectures/11_serial.ppt
333
Concurrency Control
(18.3-18.4)
CS257 Spring/2009
Professor: Tsau Lin
Student: Donavon Norwood
Suntorn Sae-Eung
333
INTRODUCTION
334

Enforcing serializability by locks




Locks
Locking scheduler
Two phase locking
Locking systems with several lock modes




Shared and exclusive locks
Compatibility matrices
Upgrading/updating locks
Incrementing locks
334
18.3 Locks
335
It works like as follows :




A request from transaction
Scheduler checks in the lock table
Generates a serializable schedule of actions.
335
18.3.1 Consistency of transactions
336

Actions and locks must relate each other
 Transactions
can only read & write only if has a lock
and has not released the lock.
 Unlocking an element is compulsory.

Legality of schedules
 No
two transactions can aquire the lock on same
element without the prior one releasing it.
336
18.3.2 Locking scheduler
337


Grants lock requests only if it is in a legal schedule.
Lock table stores the information about current locks on
the elements.
337
18.3.2 The locking scheduler
(contd.)
338

A legal schedule of consistent transactions but
unfortunately it is not a serializable.
338
18.3.2 The locking scheduler
(contd.)
339

The locking scheduler delays requests that would
result in an illegal schedule.
339
18.3.3 Two-phase locking (2PL)
340



Guarantees a legal schedule of consistent
transactions is conflict-serializable.
All lock requests proceed all unlock requests.
The growing phase:
 Obtain

all the locks and no unlocks allowed.
The shrinking phase:
 Release
all the locks and no locks allowed.
340
Working of Two-Phase locking
341


Assures serializability.
Two protocols for 2PL:
 Strict
two phase locking : Transaction holds all its
exclusive locks till commit / abort.
 Rigorous two phase locking : Transaction holds all
locks till commit / abort.

Possible to find a transaction Tj that has a 2PL and
a schedule S for Ti ( non 2PL ) and Tj that is not
conflict serializable.
341
Failure of 2PL.
342

2PL fails to provide security against deadlocks.
342
18.4 Locking Systems with Several
Lock Modes
343

In 18.3, a transaction must lock a database element
(X) either reads or writes.
 No
reason why several transactions could not read X at
the same time, as long as none write X

Introduce locking schemes
 Shared/Read
Lock ( For Reading)
 Exclusive/Write Lock( For Writing)
343
18.4.1 Shared & Exclusive Locks
344

Consistency of Transactions
Cannot write without Exclusive Lock
 Cannot read without holding some lock



Consider lock for writing is “stronger” than for reading
This basically works on 2 principles
A read action can only proceed a shared or an exclusive
lock
 A write lock can only proceed a exclusive lock


All locks need to be unlocked before commit
344
18.4.1 Shared & Exclusive Locks
(cont.)
345

Two-phase locking of transactions


Must precede unlocking
Notation:
sli (X)– Ti requests shared lock on DB element X
xli (X)– Ti requests exclusive lock on DB element X
ui (X)– Ti relinguishes whatever lock on X
345
346
18.4.1 Shared & Exclusive Locks
(cont.)

Legality of Schedules

An element may be locked exclusively by one transaction or
by several in shared mode, but not both
18.4.2 Compatibility Matrices
347

A convenient way to describe lock-management
policies
 Rows
correspond to a lock held on an element by
another transaction
 Columns correspond to mode of lock requested.
 Example :
Lock requested
Lock in
hold
S
X
S
YES
NO
X
NO
NO
347
18.4.3 Upgrading Locks
348



A transaction (T) taking a shared lock is friendly toward
other transaction.
When T wants to read and write a new value X,

At first, take a shared lock on X

T performs operations on X (may spend long time)

When T is ready to write a new value, “Upgrade” lock to
exclusive lock on X.
Transactions with unpredicted read write locks can use
Upgrading Locks.
348
18.4.3 Upgrading Locks (cont.)
349

Observe the example

T1 cannot take an exclusive lock on B until all locks on B are released.
18.4.3 Upgrading Locks (cont.)
350

Upgrading can simply cause a “Deadlock”.

Both the transactions want to upgrade on the same
element
350
18.4.4 Update locks
351

The third lock mode resolving the deadlock
problem on upgrading lock.
Only “Update lock” can be upgraded to a write lock
later.
 An update lock can be granted on X when there are
already shared locks on X.
 Once there is an update lock, it prevents additional
any kinds of lock, and later changes to an exclusive
lock.


Notation: uli (X)
351
18.4.4 Update locks (cont.)
352
• Compatibility matrix (asymmetric)
Lock requested
Lock in
hold
S
X
U
S
YES
NO
YES
X
NO
NO
NO
U
NO
NO
NO
352
18.4.4 Update locks (cont.)
353

Example
18.4.5 Increment Locks
354

A kind of lock which is useful for
increasing/decreasing transactions.
e.g. money transfer between bank accounts.

If 2 transactions (T1, T2) add constants to the same
database element (X),


It doesn’t matter which goes first, but no reads are allowed
in between transaction processing
Let see on following exhibits
354
18.4.5 Increment Locks (cont.)
355
CASE 1
T1: INC (A,2)
A=7
A=5
T2: INC (A,10)
CASE 2
T2: INC (A,10)
A=17
A=15
T1: INC (A,2)
18.4.5 Increment Locks (cont.)
356

What if
T1: INC (A,2)
A=7
T2: INC (A,10)
A=5
A=15
T2: INC (A,10)
A=15
A != 17
T1: INC (A,2)
A=5
A=7
18.4.5 Increment Locks (cont.)
357

INC (A, c):


is increment action of write constant ‘c’ to database element A
stands for an atomic execution of




READ(A,t);
t=t+c;
WRITE(A,t);
Notation:


ili (X)– action of Ti requesting an increment lock on X
inci (X)– action of Ti increments X by some constant; don’t care about the
value of the constant.
18.4.5 Increment Locks (cont.)
358
• Compatibility matrix
Lock requested
Lock in
hold
S
X
I
S
YES
NO
NO
X
NO
NO
NO
I
NO
NO
YES
358
18.4.5 Increment Locks (cont.)
359

Example
Refrences
360

H. Garcia-Molina, J. Ullman, and J. Widom, “Database
System: The Complete Book,” second edition: chapter
18.3-18.4, p.897-913, Prentice Hall, New Jersy, 2008
360
Concurrency Control
361
Chapter 18
Section 18.5
Presented by
Khadke, Suvarna
CS 257
(Section II) Id 213
361
Overview
362

Assume knowledge of:
Lock
 Two phase lock
 Lock modes: shared, exclusive, update


A simple scheduler architecture based on
following principle :
Insert lock actions into the stream of reads, writes, and
other actions
 Release locks when the transaction manager tells it that
the transaction will commit or abort

362
363
Scheduler That Inserts Lock Actions
into the transactions request stream
Scheduler That Inserts Lock Actions
364
Actions requested by a transaction are generally transmitted through
the scheduler and executed on the database. If transaction is
delayed, waiting for a lock,



1.
Part I: Takes the stream of requests generated by the transaction &
insert appropriate lock modes to db operations (read, write, or update)
Part II: Take actions (a lock or db operation) from Part I and executes it
appropriately.
Determine the transaction (T) that action belongs and status of T
(delayed or not). If T is not delayed then
Database access action is transmitted to the database and executed
364
Scheduler That Inserts Lock Actions
365
2.
If lock action is received by PartII, it checks the L Table whether lock can be
granted or not
i> Granted, the L Table is modified to include granted lock
ii>Not G. then update L Table about requested lock then PartII delays
transaction T
3.
When a T = commits or aborts, PartI is notified by the transaction manager
and releases all locks.
If any transactions are waiting for locks PartI notifies PartII.
3.
Part II when notified about the lock on some DB element, determines next
transaction T’ to get lock to continue.
365
The Lock Table
366



A relation that associates database elements with
locking information about that element
Implemented with a hash table using database
elements as the hash key
Size is proportional to the number of lock elements
only, not to the size of the entire database
DB
element A
Lock
information
for A
366
Lock Table Entries Structure
367
Some Sort of information found
in Lock Table entry :
SXU scheme on a typical DB
element A is tuple with following
components
1>Group modes: a summary of
the most stringent conditions
that transaction requesting a
new lock on A faces.
-S: only shared locks are held
-X: one exclusive lock and no
other locks
- U: one update lock and one or
more shared locks
2>wait : one transaction waiting
for a lock on A
3>A list : T currently hold367
locks
on A or Waiting for lock on A
Handling Lock Requests
368



Suppose transaction T requests a lock on A
If there is no lock table entry for A, then there are
no locks on A, so create the entry and grant the lock
request
If the lock table entry for A exists, use the group
mode to guide the decision about the lock request
368
Handling Lock Requests
369
If group mode is U (update) or X (exclusive)
No other lock can be granted
 Deny the lock request by T
 Place an entry on the list saying T requests a lock
 And Wait? = ‘yes’
 If group mode is S (shared)
Another shared or update lock can be granted
 Grant request for an S or U lock
 Create entry for T on the list with Wait? = ‘no’
 Change group mode to U if the new lock is an update lock

369
Handling Unlock Requests
370




Now suppose transaction T unlocks A
Delete T’s entry on the list for A
If T’s lock is not the same as the group mode, no
need to change group mode
Otherwise check entire list for new group mode
 S: GM(S) or nothing
 U: GM(S) or nothing
 X: nothing
370
Handling Unlock Requests
371
 If the value of waiting is “yes" need to grant one or
more locks using following approaches
First-Come-First-Served:
 Grant the lock to the longest waiting request.
 No starvation (waiting forever for lock)
Priority to Shared Locks:
 Grant all S locks waiting, then one U lock.
 Grant X lock if no others waiting
Priority to Upgrading:
 If there is a U lock waiting to upgrade to an X
lock, grant that first.
371
Reference List
372

ULLMAN, J. D., WISDOM J. & HECTOR G., DATABASE SYSTEMS THE
COMPLETE BOOK, 2nd Edition, 2008.
372
373
SECTION 18.7
THE TREE PROTOCOL
By :
Saloni Tamotia (215)
374
BASICS
B-Trees
- Tree data structure that keeps data sorted
- allow searches, insertion, and deletion
- commonly used in database and file systems
 Lock
- Enforce limits on access to resources
- way of enforcing concurrency control
 Lock Granularity
- Level and type of information that lock

TREE PROTOCOL
375






Kind of graph-based protocol
Alternate to Two-Phased Locking
(2PL)
database elements are disjoint pieces
of data
Nodes of the tree DO NOT form a
hierarchy based on containment
Way to get to the node is through its
parent
Example: B-Tree
ADVANTAGES OF TREE
PROTOCOL
376

Unlocking takes less time as
compared to 2PL

Freedom from deadlocks
18.7.1 MOTIVATION FOR
TREE-BASED LOCKING
377



Consider B-Tree Index, treating individual nodes as
lockable database elements.
Concurrent use of B-Tree is not possible with standard set
of locks and 2PL.
Therefore, a protocol is needed which can assure
serializability by allowing access to the elements all the way
at the bottom of the tree even if the 2PL is violated.
18.7.1 MOTIVATION FOR
TREE-BASED LOCKING (cont.)
378
Reason for : “Concurrent use of B-Tree is not possible with
standard set of locks and 2PL.”


every transaction must begin with locking the root node
2PL transactions can not unlock the root until all the required
locks are acquired.
379
18.7.2 ACCESSING TREE
STRUCTURED DATA
Assumptions:
Only
one kind of lock
Consistent transactions
Legal schedules
No 2PL requirement on transaction
380
18.7.2 RULES FOR ACCESSING
TREE STRUCTURED DATA
RULES:
First lock can be at any node.
 Subsequent locks may be acquired only after
parent node has a lock.
 Nodes may be unlocked any time.
 No relocking of the nodes even if the node’s
parent is still locked

18.7.3 WHY TREE
PROTOCOL WORKS?
381



Tree protocol implies a serial order on transactions in the
schedule.
Order of precedence:
Ti < s Tj
If Ti locks the root before Tj, then Ti locks every node in
common with Tj before Tj.
382
ORDER OF PRECEDENCE
383
SECTION 18.8
Timestamps
By :
Rupinder Singh (216)
What is Timestamping?
384
Scheduler assign each transaction T a
unique number, it’s timestamp TS(T).
 Timestamps must be issued in ascending
order, at the time when a transaction first
notifies the scheduler that it is beginning.

Timestamp TS(T)
385

Two methods of generating Timestamps.
 Use
the value of system, clock as the timestamp.
 Use a logical counter that is incremented after a
new timestamp has been assigned.

Scheduler maintains a table of currently
active transactions and their timestamps
irrespective of the method used
386
Timestamps for database element X
and commit bit
RT(X):- The read time of X, which is the highest
timestamp of transaction that has read X.
 WT(X):- The write time of X, which is the highest
timestamp of transaction that has write X.
 C(X):- The commit bit for X, which is true if and
only if the most recent transaction to write X has
already committed.

Physically Unrealizable Behavior
387
Read too late:

A transaction U that started after transaction T,
but wrote a value for X before T reads X.
U writes X
T reads X
T start
U start
Physically Unrealizable Behavior
388
Write too late

A transaction U that started after T, but read X
before T got a chance to write X.
U reads X
T writes X
T start
U start
Figure: Transaction T tries to write too late
Dirty Read
389

It is possible that after T reads the value of X
written by U, transaction U will abort.
U writes X
T reads X
U start
T start
U aborts
T could perform a dirty read if it reads X when shown
390
Rules for Timestamps-Based scheduling
1. Scheduler receives a request rT(X)
a) If TS(T) ≥ WT(X), the read is physically realizable.
1. If C(X) is true, grant the request, if TS(T) > RT(X), set RT(X) := TS(T);
otherwise do not change RT(X).
2. If C(X) is false, delay T until C(X) becomes true or transaction that
wrote X aborts.
b) If TS(T) < WT(X), the read is physically unrealizable. Rollback T.
Rules for Timestamps-Based scheduling
(Cont.)
391
2. Scheduler receives a request WT(X).
a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically realizable and
must be performed.
1. Write the new value for X,
2. Set WT(X) := TS(T), and
3. Set C(X) := false.
b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is physically
realizable, but there is already a later values in X.
a. If C(X) is true, then the previous writers of X is
committed,
and ignore the write by T.
b. If C(X) is false, we must delay T.
c) if TS(T) < RT(X), then the write is physically unrealizable, and T must
be rolled back.
392
Rules for Timestamps-Based scheduling
(Cont.)
3. Scheduler receives a request to commit T. It must find
all the database elements X written by T and set C(X)
:= true. If any transactions are waiting for X to be
committed, these transactions are allowed to proceed.
4. Scheduler receives a request to abort T or decides to
rollback T, then any transaction that was waiting on an
element X that T wrote must repeat its attempt to read
or write.
Multiversion Timestamps
393
Multiversion schemes keep old versions of data
item to increase concurrency.
 Each successful write results in the creation of a
new version of the data item written.
 Use timestamps to label versions.
 When a read(X) operation is issued, select an
appropriate version of X based on the
timestamp of the transaction, and return the
value of the selected version.

Timestamps and Locking
394
Generally, timestamping performs better than
locking in situations where:
 Most transactions are read-only.
 It is rare that concurrent transaction will try to
read and write the same element.
 In high-conflict situation, locking performs better
than timestamps

395
By
Swathi Vegesna
At a Glance
396









Introduction
Validation based scheduling
Validation based Scheduler
Expected exceptions
Validation rules
Example
Comparisons
Summary
References
Introduction
397
What is optimistic concurrency control?
 Timestamp- based scheduling and
 Validation-based scheduling
Validation based scheduling
398


Scheduler keep a record of what the active
transactions are doing.
Executes in 3 phases
1.
2.
3.
Read
Validate
Write
Validation based Scheduler
399


Contains an assumed serial order of transactions.
Maintains three sets:
 START(
): set of T’s started but not completed
validation.
 VAL( ): set of T’s validated but not finished the writing
phase.
 FIN( ): set of T’s that have finished.
Expected exceptions
400
1. Suppose there is a transaction U, such that:
 U is in VAL or FIN; that is, U has validated,
 FIN(U)>START(T); that is, U did not finish before T started
 RS(T) ∩WS(T) ≠φ; let it contain database element X.
2. Suppose there is transaction U, such that:
• U is in VAL; U has successfully validated.
•FIN(U)>VAL(T); U did not finish before T entered its validation phase.
•WS(T) ∩ WS(U) ≠φ; let x be in both write sets.
Validation rules
401


Check that RS(T) ∩ WS(U)= φ for any previously
validated U that did not finish before T has started
ie FIN(U)>START(T).
Check that WS(T) ∩ WS(U)= φ for any previously
validated U that did not finish before T is validated
ie FIN(U)>VAL(T)
Example
402
Solution
403


Validation of U:
Nothing to check
Validation of T:
WS(U) ∩ RS(T)= {D} ∩{A,B}=φ
WS(U) ∩ WS(T)= {D}∩ {A,C}=φ

Validation of R:
RS(V) ∩ WS(T)= {B}∩{A,C}=φ
WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ
RS(V) ∩ WS(U)={B} ∩{D}=φ

Validation of W:
RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}
WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}
WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ (W is not validated)
Comparison
404
Concurrency control
Mechanisms
Storage Utilization
Delays
Locks
Space in the lock table is
proportional to the number of
database elements locked.
Delays transactions but
avoids rollbacks
Timestamps
Space is needed for read and
write times with every database
element, neither or not it is
currently accessed.
Do not delay the
transactions but cause them
to rollback unless Interface is
low
Validation
Space is used for timestamps
and read or write sets for each
currently active transaction, plus
a few more transactions that
finished after some currently
active transaction began.
Do not delay the
transactions but cause them
to rollback unless interface is
low
Summary
405




Concurrency control by validation
The three phases
Validation Rules
Comparison
References
406

Database Systems: The Complete Book
407
21.1 Introduction to Information
Integration
CS257 Fan Yang
Need for Information Integration
408


All the data in the world could put in a single
database (ideal database system)
In the real world (impossible for a single database):
databases are created independently
hard to design a database to support future use
University Database
409




Registrar: to record student and grade
Bursar: to record tuition payments by students
Human Resources Department: to record employees
Other department….
Inconvenient
410



Record grades for students who pay tuition
Want to swim in SJSU aquatic center for free in
summer vacation?
(all the cases above cannot achieve the function by
a single database)
Solution: one database
How to integrate
411
Start over
build one database: contains all the legacy
databases; rewrite all the applications
result: painful
 Build a layer of abstraction (middleware)
on top of all the legacy databases
this layer is often defined by a collection of classes
BUT…

Heterogeneity Problem
412
What is Heterogeneity Problem
Aardvark Automobile Co.
1000 dealers has 1000 databases
to find a model at another dealer
can we use this command:
SELECT * FROM CARS
WHERE MODEL=“A6”;

Type of Heterogeneity
413






Communication Heterogeneity
Query-Language Heterogeneity
Schema Heterogeneity
Data type difference
Value Heterogeneity
Semantic Heterogeneity
Conclusion
414




One database system is perfect, but impossible
Independent database is inconvenient
Integrate database
1. start over
2. middleware
heterogeneity problem
Chapter 21.2
Modes of Information Integration
ID: 219
Name: Qun Yu
Class: CS257 219 Spring 2009
Instructor: Dr. T.Y.Lin
Content Index
416
21.2 Modes of Information Integration
21.2.1 Federated Database Systems
21.2.2 Data Warehouses
21.2.3 Mediators
417
Federations
 The simplest architecture for integrating several
DBs
 One to one connections between all pairs of
DBs
 n DBs talk to each other, n(n-1) wrappers are
needed
 Good when communications between DBs are
limited
Wrapper
418

Wrapper : a software translates incoming queries
and outgoing answers. In a result, it allows
information sources to conform to some shared
schema.
419
Federations Diagram
DB1
DB2
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
DB3
DB4
A federated collection of 4 DBs needs 12 components to translate queries
from one to another.
Example
420
Car dealers want to share their inventory. Each dealer queries the
other’s DB to find the needed car.
Dealer-1’s DB relation: NeededCars(model,color,autoTrans)
Dealer-2’s DB relation: Auto(Serial, model, color)
Options(serial,option)
wrapper
Dealer-1’s DB
wrapper
Dealer-2’s DB
Example…
421
For(each tuple(:m,:c,:a) in NeededCars){
if(:a=TRUE){/* automatic transmission wanted */
SELECT serial
FROM Autos, Options
WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’
AND Autos.model = :m AND Autos.color =:c;
}
Else{/* automatic transmission not wanted */
SELECT serial
FROM Auto
WHERE Autos.model = :m AND
Autos.color = :c AND
NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial
AND option=‘autoTrans’);
}
}
Dealer 1 queries Dealer 2 for needed cars
Data Warehouse
422



Sources are translated from their local schema to
a global schema and copied to a central DB.
User transparent: user uses Data Warehouse just
like an ordinary DB
User is not allowed to update Data Warehouse
Warehouse Diagram
423
User
query
result
Warehouse
Combiner
Extractor
Extractor
Source 1
Source 2
Example
424
Construct a data warehouse from sources DB of 2 car dealers:
Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)
Dealer-2’s schema: Auto(serial,model,color)
Options(serial,option)
Warehouse’s schema:
AutoWhse(serialNo,model,color,autoTrans,dealer)
Extractor --- Query to extract data from Dealer-1’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,autoTrans,’dealer1’
From Cars;
Example
425
Extractor --- Query to extract data from Dealer-2’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’yes’,’dealer2’
FROM Autos,Options
WHERE Autos.serial=Options.serial AND
option=‘autoTrans’;
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’no’,’dealer2’
FROM Autos
WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial
AND option = ‘autoTrans’);
Construct Data Warehouse
426
There are mainly 3 ways to constructing
the data in the warehouse:
1)
Periodically reconstructed from the current data in the sources,
once a night or at even longer intervals.
Advantages:
simple algorithms.
Disadvantages:
1) need to shut down the warehouse;
2) data can become out of date.
Construct Data Warehouse
427
2)
Updated periodically based on the changes(i.e. each night) of the
sources.
Advantages:
involve smaller amounts of data. (important when warehouse is large
and needs to be modified in a short period)
Disadvantages:
1) the process to calculate changes to the warehouse is complex.
2) data can become out of date.
Construct Data Warehouse
428
3) Changed immediately, in response to each change or a small set
of changes at one or more of the sources.
Advantages:
data won’t become out of date.
Disadvantages:
requires too much communication, therefore, it is generally too
expensive.
(practical for warehouses whose underlying sources changes slowly.)
Mediators
429



Virtual warehouse, which supports a virtual view or a
collection of views, that integrates several sources.
Mediator doesn’t store any data.
Mediators’ tasks:
1)receive user’s query,
2)send queries to wrappers,
3)combine results from wrappers,
4)send the final result to user.
A Mediator diagram
430
Result
User query
Mediator
Query
Result
Result
Wrapper
Query
Result
Source 1
Query
Wrapper
Query
Result
Source 2
Example
431
Same data sources as the example of data warehouse, the mediator
Integrates the same two dealers’ source into a view with schema:
AutoMed(serialNo,model,color,autoTrans,dealer)
When the user have a query:
SELECT sericalNo, model
FROM AkutoMed
Where color=‘red’
Example
432
In this simple case, the mediator forwards the same query to each
Of the two wrappers.
Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)
SELECT serialNo,model
FROM cars
WHERE color = ‘red’;
Wrapper2: Autos(serial,model,color); Options(serial,option)
SELECT serial, model
FROM Autos
WHERE color=‘red’;
The mediator needs to interprets serial into serialNo, and then
returns the union of these sets of data to user.
Example
433
There may be different options for the mediator to forward user query,
for example, the user queries if there are a specific model&color car
(i.e. “Gobi”, “blue”).
The mediator decides 2nd query is needed or not based on the result of
1st query. That is, If dealer-1 has the specific car, the mediator doesn’t
have to query dealer-2.
434
INFORMATION
INTEGRATION
SECTIONS 21.4 – 21.5
Sanuja Dabade & Eilbroun Benjamin
CS 257 – Dr. TY Lin
Presentation Outline
435

21.4 Capability Based Optimization





21.4.1The Problem of Limited Source
Capabilities
21.4.2 A notation for Describing Source
Capabilities
21.4.3 Capability-Based Query-Plan
Selection
21.4.4 Adding Cost-Based Optimization
21.5 Optimizing Mediator Queries




21.5.1 Simplified Adornment Notation
21.5.2 Obtaining Answers for Subgoals
21.5.3 The Chain Algorithm
21.5.4 Incorporating Union Views at the
Mediator
21.4 Capability Based Optimization
436

Introduction
 Typical
DBMS estimates the cost of each query plan
and picks what it believes to be the best
 Mediator – has knowledge of how long its sources will
take to answer
 Optimization of mediator queries cannot rely on cost
measure alone to select a query plan
 Optimization by mediator follows capability based
optimization
21.4.1 The Problem of Limited Source
Capabilities
437




Many sources have only Web Based interfaces
Web sources usually allow querying through a
query form
E.g. Amazon.com interface allows us to query about
books in many different ways.
But we cannot ask questions that are too general
 E.g.
Select * from books;
21.4.1 The Problem of Limited Source
Capabilities (con’t)
438

Reasons why a source may limit the ways in which
queries can be asked
 Earliest
database did not use relational DBMS that
supports SQL queries
 Indexes on large database may make certain queries
feasible, while others are too expensive to execute
 Security reasons
 E.g.
Medical database may answer queries about averages,
but won’t disclose details of a particular patient's
information
21.4.2 A Notation for Describing
Source Capabilities
439


For relational data, the legal forms of queries are
described by adornments
Adornments – Sequences of codes that represent
the requirements for the attributes of the relation, in
their standard order
 f(free)
– attribute can be specified or not
 b(bound) – must specify a value for an attribute but
any value is allowed
 u(unspecified) – not permitted to specify a value for a
attribute
21.4.2 A notation for Describing
Source Capabilities….(cont’d)
440
 c[S](choice


from set S) means that a value must be
specified and value must be from finite set S.
 o[S](optional from set S) means either do not specify a
value or we specify a value from finite set S
 A prime (f’) specifies that an attribute is not a part of
the output of the query
A capabilities specification is a set of adornments
A query must match one of the adornments in its
capabilities specification
21.4.2 A notation for Describing
Source Capabilities….(cont’d)
441

E.g. Dealer 1 is a source of data in the form:
Cars (serialNo, model, color, autoTrans, navi)
The adornment for this query form is b’uuuu
21.4.3 Capability-Based Query-Plan
Selection
442


Given a query at the mediator, a capability based
query optimizer first considers what queries it can
ask at the sources to help answer the query
The process is repeated until:
 Enough
queries are asked at the sources to resolve all
the conditions of the mediator query and therefore
query is answered. Such a plan is called feasible.
 We can construct no more valid forms of source queries,
yet still cannot answer the mediator query. It has been
an impossible query.
21.4.3 Capability-Based Query-Plan
Selection (cont’d)
443


The simplest form of mediator query where we
need to apply the above strategy is join relations
E.g we have sources for dealer 2
 Autos(serial,
model, color)
 Options(serial, option)
 Suppose
that ubf is the sole adornment for Auto and
Options have two adornments, bu and uc[autoTrans, navi]
 Query is – find the serial numbers and colors of Gobi
models with a navigation system
21.4.4 Adding Cost-Based
Optimization
444




Mediator’s Query optimizer is not done when the
capabilities of the sources are examined
Having found feasible plans, it must choose among
them
Making an intelligent, cost based query
optimization requires that the mediator knows a
great deal about the costs of queries involved
Sources are independent of the mediator, so it is
difficult to estimate the cost
21.5 Optimizing Mediator Queries
445

Chain algorithm – a greed algorithm that finds a
way to answer the query by sending a sequence of
requests to its sources.
 Will
always find a solution assuming at least one
solution exists.
 The solution may not be optimal.
21.5.1 Simplified Adornment Notation
446


A query at the mediator is limited to b (bound) and
f (free) adornments.
We use the following convention for describing
adornments:
 nameadornments(attributes)
 where:
 name
is the name of the relation
 the number of adornments = the number of attributes
21.5.2 Obtaining Answers for
Subgoals
447

Rules for subgoals and sources:
 Suppose
we have the following subgoal:
Rx1x2…xn(a1, a2, …, an),
and source adornments for R are: y1y2…yn.
 If yi
is b or c[S], then xi = b.
 If xi = f, then yi is not output restricted.
 The
adornment on the subgoal matches the adornment
at the source:
 If yi
is f, u, or o[S] and xi is either b or f.
21.5.3 The Chain Algorithm
448

Maintains 2 types of information:
 An
adornment for each subgoal.
 A relation X that is the join of the relations for all the
subgoals that have been resolved.


Initially, the adornment for a subgoal is b iff the
mediator query provides a constant binding for the
corresponding argument of that subgoal.
Initially, X is a relation over no attributes, containing
just an empty tuple.
21.5.3 The Chain Algorithm (con’t)
449


1.
First, initialize adornments of subgoals and X.
Then, repeatedly select a subgoal that can be
resolved. Let Rα(a1, a2, …, an) be the subgoal:
Wherever α has a b, we shall find the argument in
R is a constant, or a variable in the schema of R.

Project X onto its variables that appear in R.
21.5.3 The Chain Algorithm (con’t)
450
2.
For each tuple t in the project of X, issue a query
to the source as follows (β is a source adornment).



If a component of β is b, then the corresponding
component of α is b, and we can use the
corresponding component of t for source query.
If a component of β is c[S], and the corresponding
component of t is in S, then the corresponding
component of α is b, and we can use the
corresponding component of t for the source query.
If a component of β is f, and the corresponding
component of α is b, provide a constant value for
source query.
21.5.3 The Chain Algorithm (con’t)
451



3.
If a component of β is u, then provide no binding for
this component in the source query.
If a component of β is o[S], and the corresponding
component of α is f, then treat it as if it was a f.
If a component of β is o[S], and the corresponding
component of α is b, then treat it as if it was c[S].
Every variable among a1, a2, …, an is now bound.
For each remaining unresolved subgoal, change its
adornment so any position holding one of these
variables is b.
21.5.3 The Chain Algorithm (con’t)
452
4.
5.


Replace X with X πs(R), where S is all of the
variables among: a1, a2, …, an.
Project out of X all components that correspond to
variables that do not appear
in the head or in any
α
unresolved subgoal.
If every subgoal is resolved, then X is the answer.
If every subgoal is not resolved, then the algorithm
fails.
21.5.3 The Chain Algorithm Example
453

Mediator query:
 Q:
Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c)
Example:
Relation
Data

Adornment
R
S
T
w
x
x
y
y
z
1
2
2
4
4
6
1
3
3
5
5
7
1
4
5
8
bf
c’[2,3,5]f
bu
21.5.3 The Chain Algorithm Example
(con’t)
454

Initially, the adornments on the subgoals are the
same as Q, and X contains an empty tuple.
S
and T cannot be resolved because they each have ff
adornments, but the sources have either a b or c.


R(1,a) can be resolved because its adornments are
matched by the source’s adornments.
Send R(w,x) with w=1 to get the tables on the
previous page.
21.5.3 The Chain Algorithm Example
(con’t)
455

Project the subgoal’s relation onto its second
component, since only the second component of
R(1,a) is a variable.
a
2
3
4


This is joined with X, resulting in X equaling this
relation.
Change adornment on S from ff to bf.
21.5.3 The Chain Algorithm Example
(con’t)
456

Now we resolve Sbf(a,b):
 Project
X onto a, resulting in X.
 Now, search S for tuples with attribute a equivalent to
attribute a in X.

a
b
2
4
3
5
Join this relation with X, and remove a because it
doesn’t appear in the head nor any unresolved
b
subgoal:
4
5
21.5.3 The Chain Algorithm Example
(con’t)
457



Now we resolve Tbf(b,c):
b
c
4
6
5
7
5
8
Join this relation with X and project onto the c
attribute to get the relation for the head.
Solution is {(6), (7), (8)}.
21.5.4 Incorporating Union Views at
the Mediator
458



This implementation of the Chain Algorithm does not
consider that several sources can contribute tuples to
a relation.
If specific sources have tuples to contribute that
other sources may not have, it adds complexity.
To resolve this, we can consult all sources, or make
best efforts to return all the answers.
21.5.4 Incorporating Union Views at
the Mediator (con’t)
459

Consulting All Sources
 We
can only resolve a subgoal when each source for its
relation has an adornment matched by the current
adornment of the subgoal.
 Less practical because it makes queries harder to
answer and impossible if any source is down.

Best Efforts
 We
need only 1 source with a matching adornment to
resolve a subgoal.
 Need to modify chain algorithm to revisit each subgoal
when that subgoal has new bound requirements.
460
Presenter:
Namrata Buddhadev (104_224_21.6.121.6.7)
Professor:
Dr T Y Lin
Index
461

21.6 Local-as-View Mediators
21.6.1 Motivation for LAV Mediators
21.6.2 Terminology for LAV Mediators
21.6.3 Expanding Solutions
21.6.4 Containment of Conjunctive Queries
21.6.5 Why the Containment-Mapping Test Works
21.6.6 Finding Solutions to a Mediator Query
21.6.7 Why the LMSS Theorem Holds
Local-as-View Mediators
462



GAV: Global as view mediators are like view, it
doesn’t exist physically, but piece of it are
constructed by the mediator by asking queries
LAV: Local as view mediators, defines the global
predicates at the mediator, but we do not define
these predicates as views of the source of data
Global expressions are defined for each source
involving global predicates that describe the tuple
that source is able to produce and queries are
answered at mediator by discovering all possible
ways to construct the query using the views
provided by sources
Motivation for LAV Mediators
463


LAV mediators help us to discover how and when to
use that source in a given query
Example: Par(c,p)-> GAV of Par(c,p) gives
information about the child and parent but does not
give information of grandparents
LAV Par(c,p) will help to get information of chlidparent and even grandparent
Terminology for LAV Mediation
464




It is in form of logic that serves as the language for
defining views.
Datalog is used which will remain common for the
queries of mediator and source which is known as
Conjunctive query.
LAV has global predicates which are the subgoals
of mediator queries
Conjunctive queries defines the views which has
unique view predicate and that view has Global
predicates and associated with particular view.
465

1.
2.

1.
2.
3.
Example: Par(c,p)->Global predicate
view defined by conjunctive query:
V1(c,p)<- Par(c,p)
Another source produces: V2(c,g)<-Par(c,p) AND
Par(p,g)
Query at the mediator ask for great grandparents
facts:
Q(w,z)<-Par(w,x) AND Par(x,y) AND Par(y,z)
Or Q(w,z)<-V1(w,x) AND V2(x,z)
Or Q(w,z)<-V2(w,y) AND V1(y,z)
466
Expanding Solutions

1.
Query Q, Solution S, Sub goals : V(a1,a2,..,an)[can
be same]
V(b1,b2,..,bn)<-B (Entire Body)[distinct], we can
replace V(a1,..an) in solution S by a version of body
B that has the sub goals of B with variables possibly
altered.
Rules:
Find local variables of B which are there in the body
but not in the head, we can replace any local
variables within the conjunctive query if it does not
appear elsewhere in the conjunctive query.
•
467
•
•
If there are any local variables of B that appear in
B or in S, replace each one by a distinct new
variable that appears nowhere in V or in S.
In the body B, replace each bi, by ai, for i=1,2,..n.
Example:
V(a,b,c,d)<-E(a,b,x,y) AND F(x,y,c,d)
here for V, x and y are local so,
x, y->e, f
so,
V(a,b,c,d)<-E(a,b,e,f) AND F(e,f,c,d)
a,d ->x, b->y and c->1
V(x,y,1,x) has two subgoals E(x,y,e,f) and F(e,f,1,x).
468
Containment of Conjunctive Queries



Conjunctive query S be the solution to the mediator
Q,
Expansion of S->E, produces same answers that Q
produces, so, E subset Q.
A containment mapping from Q to E is function Γ(x) is
the ith argument of the head E.
Add to Γ the rule that Γ(c) =c for any constant c. IF
P(x1,x2,..xn) is a subgoal of Q, then P(Γ(x1), Γ(x2),..,
Γ(xn)) is a subgoal of E.
Example:
469

Queries:
P1: H(x,y)<-A(x,z) AND A(z,y)
P2: H(a,b)<-A(a,c) AND A(c,d) AND A(d,b)
consider Γ(x)=a and Γ(y)=b, first subgoal A(x,z) can
only map to A(a,c) of P2.
1. Γ(z) must be C as A(x,z) can map A(a,c) of P2.
2. Γ(z) must be d as Γ(y)=b, subgoal A(z,y) of P1
becomes A(d,b) in P2.
So, no containment mapping from P! and P2 exists.
470


Complexity of the containment Mapping Test :
It is NP-complete to decide whether there is an
containment mapping from one conjunctive query to
another.
Importance of containment mappings is expressed
by the theorem:
If Q1 and A2 are conjunctive queries, then Q2 is
subset or equal to Q1, if and only if there is a
containment mapping from Q1 and Q2.
471
Why Containment Mapping Test
Works:

1.
2.
Questions:
If there is containment mapping, why must there be
a containment of conjunctive queries?
If there is containment, why must there be a
containment mapping?
472
Finding Solutions to a Mediator
Query

Query Q, solutions S, Expansion E of S is contained
in Q.
“If a query Q has n subgoals, then any answer
produced by any solution is also produced by a
solution that has at most n subgoals.
This is known by LMSS Theorem
Example:
473

Q1: Q(w,z)<-Par(w,x) AND Par(x,y) AND Par(y,z)
S1: Q(w,z)<-V1(w,x) AND V2(x,z)
S2: Q(w,z)<-V1(w,x) AND V2(x,z) AND V1(t,u) AND
V2(u,v)
by LMSS, E2: Q(w,z)<-Par(w,x) AND Par(x,p) AND
Par(t,u) AND Par(u,q) AND Par(q,v)
and E2 is subset or equal to E1 using containment
mapping that sends each vairable of E1 to the
same variable in E2.
Why the LMSS Theorem Holds
474





Query Q with n subgoals and S with n subgoals, E
of S must be contained in query Q, E is expansion
of Q.
S’ must be the solution got after removing all
subgoals from S those are not the target of Q.
E subset or equal to Q and also E’ is the expansion
of S’.
So, S is subser of S’ : identity mapping.
Thus there is no need for solution s among the
solution S among the solutions to query Q.
Information Integration
Entity Resolution – 21.7
Presented By:
Deepti Bhardwaj
Roll No: 223_103
Contents
476

21.7 Entity Resolution
 21.7.1
Deciding Whether Records Represent a
Common Entity
 21.7.2 Merging Similar Records
 21.7.3 Useful Properties of Similarity and
Merge Functions
 21.7.4 The R-Swoosh Algorithm for ICAR Records
 21.7.5 Other Approaches to Entity Resolution
Introduction
477

Determining whether two records or tuples do or do
not represent the same person, organization, place
or other entity is called ENTITY RESOLUTION.
478
Deciding whether Records represent a
Common Entity


Two records represent the same individual if the two
records have similar values for each of the fields
associated with those records.
It is not sufficient that the values of corresponding fields
be identical because of following reasons:
1. Misspellings
2. Variant Names
3. Misunderstanding of Names
479
Continue: Deciding whether Records
represent a Common Entity
4. Evolution of Values
5. Abbreviations
Thus when deciding whether two records represent
the same entity, we need to look carefully at the
kinds of discrepancies and use the test that
measures the similarity of records.
Deciding Whether Records Represents
a Common Entity - Edit Distance
480



First approach to measure the similarity of records is Edit
Distance.
Values that are strings can be compared by counting the
number of insertions and deletions of characters it takes
to turn one string into another.
So the records represent the same entity if their
similarity measure is below a given threshold.
Deciding Whether Records Represents
a Common Entity - Normalization
481


To normalize records by replacing certain substrings by
others. For instance: we can use the table of abbreviations
and replace abbreviations by what they normally stand for.
Once normalize we can use the edit distance to measure the
difference between normalized values in the fields.
482
Merging Similar Records


Merging means replacing two records that are similar
enough to merge and replace by one single record which
contain information of both.
There are many merge rules:
1. Set the field in which the records disagree to the
empty string.
2. (i) Merge by taking the union of the values in each
field
(ii) Declare two records similar if at least two of the
three fields have a nonempty intersection.
Continue: Merging Similar Records
483
Name
1. Susan
2. Susan
3. Susan
Address
123 Oak St.
456 Maple St.
456 Maple St.
Phone
818-555-1234
818-555-1234
213-555-5678
After Merging
Name
(1-2-3) Susan
213-
Address
Phone
{123 Oak St.,456 Maple St} {818-555-1234,
555-5678}
Useful Properties of Similarity and
Merge Functions
484
The following properties say that the merge operation is a
semi lattice :
1.
Idempotence : That is, the merge of a record with itself
should surely be that record.
2.
Commutativity : If we merge two records, the order in
which we list them should not matter.
3.
Associativity : The order in which we group records
for a merger should not matter.
485
Continue: Useful Properties of Similarity
and Merge Functions
There are some other properties that we expect similarity
relationship to have:
•
Idempotence for similarity : A record is always similar to
itself
•
Commutativity of similarity : In deciding whether two
records are similar it does not matter in which order we
list them
•
Representability : If r is similar to some other record s,
but s is instead merged with some other record t, then r
remains similar to the merger of s and t and can be
merged with that record.
486
R-swoosh Algorithm for ICAR Records



Input: A set of records I, similarity function and a merge function.
Output: A set of merged records O.
Method:
 O:= emptyset;
 WHILE I is not empty DO BEGIN
 Let r be any record in I;
 Find, if possible, some record s in O that is similar to r;
 IF no record s exists THEN
move r from I to O
 ELSE BEGIN
delete r from I;
delete s from O;
add the merger of r and s to I;
 END;
 END;
487
Other Approaches to Entity Resolution
The other approaches to entity resolution are :
 Non-
ICAR Datasets
 Clustering
 Partitioning
Other Approaches to Entity Resolution Non ICAR Datasets
488
Non ICAR Datasets : We can define a dominance relation
r<=s that means record s contains all the information
contained in record r.
If so, then we can eliminate record r from further
consideration.
Other Approaches to Entity Resolution Clustering
489
Clustering: Some time we group the records into clusters such
that members of a cluster are in some sense similar to each
other and members of different clusters are not similar.
Other Approaches to Entity Resolution Partitioning
490
Partitioning: We can group the records, perhaps several
times, into groups that are likely to contain similar records
and look only within each group for pairs of similar records.