xanadu.cs.sjsu.edu

Download Report

Transcript xanadu.cs.sjsu.edu

DATABASE SYSTEM PRINCIPLES
Ankit Patel
112_226
CHAPTER 13
SECTION 13.1-13.3
13.1.1 MEMORY HIERARCHY
Several components for data storage having different
data capacities available
 Cost per byte to store data also varies in different
storages.
 Device with smallest capacity offer the fastest speed
with highest cost per bit

MEMORY HIERARCHY DIAGRAM
Programs,
Main Memory DBMS’s
As Visual Memory
DBMS
Tertiary Storage
Disk
Main Memory
Cache
File System
13.1.1 MEMORY HIERARCHY

Cache





Lowest level of the hierarchy
Data items are copies of certain locations of main
memory
Sometimes, values in cache are changed and
corresponding changes to main memory are delayed
Machine looks for instructions as well as data for
those instructions in the cache
limited amount of data capacity
MAIN MEMORY
Everything happens in the computer i.e.
instruction execution, data manipulation, as
working on information that is resident in main
memory
 Main memories are random access….one can
obtain any byte in the same amount of time
 Cost per data storage in main memory is the
highest.
 Main memory is volatile.

SECONDARY STORAGE
Used to store data and programs when they are
not being processed
 More permanent than main memory, as data and
programs are retained when the power is turned
off
 E.g. magnetic disks, hard disks
 Main memory uses secondary memory sometimes

TERTIARY STORAGE
Holds data volumes in terabytes
 Used for databases much larger than what can be
stored on disk
 This type of storage is the cheapest
 It is a comprehensive computer storage system
that is usually very slow, so it is usually used to
archive data that is not accessed frequently.

13.1.2 TRANSFER OF DATA BETWEEN
LEVELS






Data moves between adjacent levels of the hierarchy
At the secondary or tertiary levels accessing the
desired data or finding the desired place to store the
data takes a lot of time
Disk is organized into bocks
A key technique for speeding up database operations
is to arrange the data so that when one piece of data
block is needed it is likely that other data on the same
block will be needed at the same time
Entire blocks are moved to and from memory called a
buffer.
At the secondary or tertiary levels accessing the
desired data or finding the desired place to store the
data takes a lot of time
13.1.3 VOLATILE AND NON VOLATILE
STORAGE
A volatile device forgets what data is stored on it
after power off
 Non volatile holds data for longer period even
when device is turned off
 All the secondary and tertiary devices are non
volatile and main memory is volatile
 Secondary and tertiary devices are non volatile
 Main memory is volatile

13.1.4 VIRTUAL MEMORY
Typical software executes in virtual memory
 Address space is typically 32 bit or 232 bytes or 4GB
 Transfer between memory and disk is in terms of
blocks
 technique make programming of large applications
easier and use real physical memory (e.g. RAM) more
efficiently

13.2.1 MECHANISM OF DISK

Mechanisms of Disks
Use of secondary storage is one of the important
characteristic of DBMS
 Consists of 2 moving pieces of a disk

1. disk assembly
 2. head assembly

Disk assembly consists of 1 or more platters
 Platters rotate around a central spindle


Bits are stored on upper and lower surfaces of
platters

Bits are stored on upper and lower surfaces of
platters
13.2.1 MECHANISM OF DISK
Disk is organized into tracks
 The track that are at fixed radius from center
form one cylinder
 Tracks are organized into sectors
 Tracks are the segments of circle separated by
gap
 The track that are at fixed radius from center
form one cylinder

13.2.2 DISK CONTROLLER
One or more disks are controlled by disk
controllers
 Disks controllers are capable of


Controlling the mechanical actuator that moves the
head assembly

Controlling the mechanical actuator that
moves the head assembly Transferring bits
between desired sector and main memory
 Possible buffering an entire track
13.2.3 DISK ACCESS CHARACTERISTICS

Accessing (reading/writing) a block requires 3
steps
Disk controller positions the head assembly at the
cylinder containing the track on which the block is
located. It is a ‘seek time’
 The disk controller waits while the first sector of the
block moves under the head. This is a ‘rotational
latency’


All the sectors and the gaps between them
pass the head, while disk controller reads or
writes data in these sectors. This is a ‘transfer
time’
13.3 ACCELERATING ACCESS TO
SECONDARY
STORAGE

Several approaches for more-efficiently accessing data
in secondary storage:







Place blocks that are together in the same cylinder.
Divide the data among multiple disks.
Mirror disks.
Use disk-scheduling algorithms.
Prefetch blocks into main memory.
Scheduling Latency – added delay in accessing data
caused by a disk scheduling algorithm.
Throughput – the number of disk accesses per second
that the system can accommodate.
13.3.1 THE I/O MODEL OF COMPUTATION


The number of block accesses (Disk I/O’s) is a good
time approximation for the algorithm.
 Disk I/o’s proportional to time taken, so should be
minimized.
Ex 13.3: You want to have an index on R to identify
the block on which the desired tuple appears, but not
where on the block it resides.
For Megatron 747 (M747) example, it takes 11ms to read a
16k block.
 A standard microprocessor can execute millions of
instruction in 11ms, making any delay in searching for the
desired tuple negligible.


delay in searching for the desired tuple is
negligible.
13.3.2 ORGANIZING DATA BY CYLINDERS


If we read all blocks on a single track or cylinder
consecutively, then we can neglect all but first seek
time and first rotational latency.
Ex 13.4: We request 1024 blocks of M747.
If data is randomly distributed, average latency is 10.76ms
by Ex 13.2, making total latency 11s.
 If all blocks are consecutively stored on 1 cylinder:


6.46ms + 8.33ms * 16 = 139ms
(1 average seek) (time per rotation) (# rotations)
13.3.3 USING MULTIPLE DISKS



If we have n disks, read/write performance will
increase by a factor of n.
Number of disks is proportional to the factor by which
performance is performance will increase by improved
Striping – distributing a relation across multiple
disks following this pattern:
Data on disk R1: R1, R1+n, R1+2n,…
 Data on disk R2: R2, R2+n, R2+2n,…
…
 Data on disk Rn: Rn, Rn+n, Rn+2n, …


Ex 13.5: We request 1024 blocks with n = 4.

6.46ms + (8.33ms * (16/4)) = 39.8ms
(1 average seek)
(time per rotation)
(# rotations)
13.3.4 MIRRORING DISKS





Mirroring Disks – having 2 or more disks hold
identical copied of data.
Benefit 1: If n disks are mirrors of each other, the
system can survive a crash by n-1 disks.
Benefit 2: If we have n disks, read performance
increases by a factor of n.
Performance increases further by having the
controller select the disk which has its head closest to
desired data block for each read.
Performance and efficiency increases
13.3.5 DISK SCHEDULING AND THE
ELEVATOR PROBLEM


Disk controller will run this algorithm to select which
of several requests to process first.
Pseudo code:
requests[] // array of all non-processed data requests
 upon receiving new data request:



requests[].add(new request)
while(requests[] is not empty)



move head to next location
if(head location is at data in requests[])
 retrieve data
 remove data from requests[]
if(head reaches end)
 reverse head direction
13.3.5 DISK SCHEDULING AND THE
ELEVATOR PROBLEM (CON’T)
Events:
Head starting point
Request data at 8000
Request data at 24000
Request data at 56000
Get data at 8000
Request data at 16000
Get data at 24000
Request data at 64000
Get data at 56000
Request Data at 40000
Get data at 64000
Get data at 40000
Get data at 16000
64000
56000
48000
40000
32000
24000
16000
8000
Current
time
13.6
26.9
34.2
45.5
56.8
4.3
10
20
30
0
data
time
8000..
4.3
24000..
13.6
56000..
26.9
64000..
34.2
40000..
45.5
16000..
56.8
13.3.5 DISK SCHEDULING AND THE
ELEVATOR PROBLEM (CON’T)
Elevator
Algorithm
data
time
FIFO
Algorithm
data
time
8000..
4.3
8000..
4.3
24000..
13.6
24000..
13.6
56000..
26.9
56000..
26.9
64000..
34.2
16000..
42.2
40000..
45.5
64000..
59.5
16000..
56.8
40000..
70.8
13.3.6 PREFETCHING AND LARGE-SCALE
BUFFERING

If at the application level, we can predict the order
blocks will be requested, we can load them into main
memory before they are needed. This even reduces the
cost and even save the time
13.2.DISK FAILURES
INTERMITTENT FAILURES.

Read or write operation on a sector successful not on first try, but after
repeated tries.

The most common form of failure.

Parity checks can be used to detect this kind of failure.
MEDIA DECAY.

Impossible to read a sector correctly even after many trials.

Stable storage technique for organizing a disk is used to
avoid this failure.
Serious form of failure.

Bit/Bits are permanently corrupted.

WRITE FAILURE




Attempt to retrieve previously written sector is
unsuccessful.
Stable Storage Technique can be used to avoid this.
Possible reason – power outage while writing of the
sector.
Attempt to write a sector is not possible.
DISK CRASH
•
•
•
RAID techniques can be used for coping with disk
crashes.
Most serious form of disk failure.
Entire disk becomes unreadable, suddenly and
permanently.
MORE ON INTERMITTENT FAILURES…







When we try to read a sector, but the correct content of
that sector is not delivered to the disk controller.
If the controller has a way to tell that the sector is good or
bad (checksums), it can then reissue the read request when
bad data is read.
The controller can attempt to write a sector, but the
contents of the sector are not what was intended.
The only way to check this is to let the disk go around
again read the sector.
One way to perform the check is to read the sector and
compare it with the sector we intend to write.
If it is good sector, then the write was correct otherwise
the write was unsuccessful and must be repeated.
Instead of performing the complete comparison at the disk
controller, simpler way is to read the sector and see if a
good sector was read.
CHECKSUMS.
Technique used to determine the good/bad status
of a sector.
 There is a small chance that the block was not
read correctly even if the checksum is proper.
 Each sector has some additional bits called the
checksum that are set depending on the values of
the data bits in that sector.
 The probability of correctness can be increased by
using many checksum bits.
 If checksum is not proper on reading, then there
is an error in reading.

CHECKSUM CALCULATION.






Checksum is based on the parity of all bits in the sector.
If there are odd number of 1’s among a collection of bits,
the bits are said to have odd parity. A parity bit ‘1’ is
added.
If there are even number of 1’s then the collection of bits is
said to have even parity. A parity bit ‘0’ is added.
The number of 1’s among a collection of bits and their
parity bit is always even.
During a write operation, the disk controller calculates the
parity bit and append it to the sequence of bits written in
the sector.
Every sector will have a even parity.
STABLE STORAGE.

Sectors are paired and each pair represents one sector-contents X.

The left copy of the sector may be represented as XL and XR as the right copy.

Checksums can detect the error but cannot correct it.

Sometimes we overwrite the previous contents of a sector and yet cannot read
the new conents correctly.

To deal with these problems, Stable Storage policy can be implemented on the
disks.
ASSUMPTIONS.

We assume that copies are written with sufficient
number of parity bits to decrease the chance of
bad sector looks good when the parity checks are
considered.
STABLE -STORAGE WRITING POLICY:
1.
2.
Write the value of X into XL. Check the value has
status “good”; i.e., the parity-check bits are correct
in the written copy. If not repeat write. If after a set
number of write attempts, we have not successfully
written X in XL, assume that there is a media
failure in this sector. A fix-up such as substituting
a spare sector for XL must be adopted.
Repeat (1) for XR.
STABLE-STORAGE READING POLICY:
If a good value is not returned after pre chosen
number of tries, then it is assumed that X is truly
unreadable.
 The policy is to alternate trying to read XL and
XR until a good value is returned.

ERROR-HANDLING CAPABILITIES:
Media failures:
• If after storing X in sectors XL and XR, one of
them undergoes media failure and becomes
permanently unreadable, we can read from the
second one.
If both the sectors have failed to read, then sector
X cannot be read.
Write Failure:
• When writing X, if there is a system failure(like
power shortage), the X in the main memory is
lost and the copy of X being written will be
erroneous.
•
RECOVERY FROM DISK CRASHES.







To reduce the data loss by Dish crashes, schemes which
involve redundancy, extending the idea of parity checks or
duplicate sectors can be applied.
The term used for these strategies is RAID or Redundant
Arrays of Independent Disks.
When there is a disk crash of either of the disks, then the
other disks can be used to restore the failed disk to avoid a
permanent information loss.
In general, if the mean time to failure of disks is n years, then
in any given year, 1/nth of the surviving disks fail.
Each of the RAID schemes has data disks and redundant
disks.
Data disks are one or more disks that hold the data.
Redundant disks are one or more disks that hold information
that is completely determined by the contents of the data
disks.
SECTION 13.4
MIRRORING
The simplest scheme to recovery from Disk Crashes
 Benefit:

-- save data in case of one disk will fail;
-- divide data on several disks and let
access to several blocks at once
 For mirroring, when the data can be lost?
-- the only way data can be lost if there is a
second (mirror/redundant) disk crash while the
first (data) disk crash is being repaired.
PARITY BLOCKS
why changes?
-- disadvantages of Mirroring: uses so many redundant
disks
 What’s new?
-- RAID level 4: uses only one redundant disk

How this one redundant disk works?
-- modulo-2 sum;
-- the jth bit of the redundant disk is the modulo-2 sum of the
jth bits of all the data disks.

Data disks:
Disk1: 11110000
Disk2: 10101010
Disk3: 00111000
Redundant disk:
Disk4: 01100010
RAID 4 (CON’T) _ WRITING
For a total N data disks:
1) naïve way:
 read N data disks and compute the modulo-2 sum of
the corresponding blocks;
 rewrite the redundant disk according to modulo-2 sum
of the data disks;
2) better way:
 Take modulo-2 sum of the old and new version of the
data block which was rewritten;
 Change the position of the redundant disk which was 1’s
in the modulo-2 sum;
Data disks:
Disk1: 11110000
Disk2: 10101010  01100110
Disk3: 00111000
Redundant disk:
Disk4: 01100010  10101110
AN IMPROVEMENT: RAID 5
Why need a improvement?
-- Shortcoming of RAID level 4: suffers from a bottleneck defect (when
updating data disk need to read and write the redundant disk);

Principle of RAID level 5 (RAID 5):
-- treat each disk as the redundant disk for some of the blocks;

Why it is feasible?
The rule of failure recovery for redundant disk and data disk is the
same:

“take modulo-2 sum of all the corresponding bits of all the other
disks”
 How to recognize which blocks of each disk treat this disk as
redundant disk?
--
if there are n+1 disks which were labeled from 0 to N, then we can treat
the ith cylinder of disk J as redundant if J is the remainder when I is
divided by n+1;
So, there is no need to retreat one as redundant disk and others as data
disks
COPING WITH MULTIPLE DISK CRASHES
RAID 6
– deal with any number of disk crashes if using
enough redundant disks
 Example
a system of seven disks ( four data disks_numer 1-4
and 3 redundant disks_ number 5-7);
 Reading:
 read form the data disks and ignore the redundant
disk

Writing:
 Change the data disk
 change the corresponding bits of all the redundant
disks

COPING WITH MULTIPLE
(CON’T)_EXAMPLE
DISK
CRASHES
3*7 matrix
data disk
redundant disk
disk number
1
2
3
4
5
6
7
1
1
1
0
1
0
0
1
1
0
1
0
1
0
1
0
1
1
0
0
1
COPING WITH MULTIPLE DISK CRASHES
(CON’T)_EXAMPLE
In that 3*7 matrix, find in row 2, disk 2 and 5
have different value and disk 2’s value is 1 and
5’s value is 0.
so: compute the first block of disk 2 by modulo2 sum of all the corresponding bits of disk 1,4,6;
then compute the first block of disk 2 by
modulo-2 sum of all the corresponding bits of disk
1,2,3;
Two disks crashes;
1)
2)
3)
4)
5)
6)
7)
11110000
????????? => 00001111
00111000
01000001
????????? => 01100010
00011011
10001001
disk
1)
2)
3)
4)
5)
6)
7)
contents
11110000
?????????
00111000
01000001
?????????
00011011
10001001
SECTION 13.5
Data elements are represented as records, which stores
in consecutive bytes in same same disk block.
Basic layout techniques of storing data :
Fixed-Length Records
Allocation criteria - data should start at word boundary.
Fixed Length record header
1. A pointer to record schema.
2. The length of the record.
3. Timestamps to indicate last modified or last read.
EXAMPLE
CREATE TABLE employee(
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
Data should start at word boundary and contain header and four
fields name, address, gender and birthdate.
PACKING FIXED-LENGTH RECORDS INTO BLOCKS :
Records are stored in the form of blocks on the disk and
they move into main memory when we need to update or
access them.
A block header is written first, and it is followed by series of
blocks.
EXAMPLE
Along with the header we can pack as many record as we
can
in one block as shown in the figure and remaining space
will
be unused.
SECTION 13.6
INTRODUCTION

Address of a block and Record
In Main Memory
 Address of the block is the virtual memory address of the
first byte
 Address of the record within the block is the virtual
memory address of the first byte of the record
 In Secondary Memory: sequence of bytes describe the
location of the block in the overall system


Sequence of Bytes describe the location of the block :
the device Id for the disk, Cylinder number, etc.
ADDRESSES IN CLIENT-SERVER SYSTEMS

Physical Address bits are used to indicate:
Host to which the storage is attached
 Identifier for the disk
 Number of the cylinder
 Number of the track
 Offset of the beginning of the record


The addresses in address space are represented
in two ways
Physical Addresses: byte strings that determine the
place within the secondary storage system where the
record can be found.
 Logical Addresses: arbitrary string of bytes of some
fixed length

ADDRESSES IN CLIENT-SERVER SYSTEMS

Map Table relates logical addresses to physical
addresses.
Logical
Physical
Logical Address
Physical Address
LOGICAL AND STRUCTURED ADDRESSES
Purpose of logical address?
 Gives more flexibility, when we

Move the record around within the block
 Move the record to another block


Gives us an option of deciding what to do when a
record is deleted?
Unused
Rec
ord
4
Offset table
Header
Rec
ord
3
Rec
ord
2
Rec
ord
1
POINTER SWIZZLING
Having pointers is common in an objectrelational database systems
 Important to learn about the management of
pointers
 Every data item (block, record, etc.) has two
addresses:
 database address: address on the disk
 memory address, if the item is in virtual
memory

POINTER SWIZZLING (CONTD…)

Translation Table: Maps database address to
memory address
Dbaddr
Mem-addr
Database address
Memory Address

All addressable items in the database have
entries in the map table, while only those items
currently in memory are mentioned in the
translation table
POINTER SWIZZLING (CONTD…)

Pointer consists of the following two fields
Bit indicating the type of address
 Database or memory address
 Example 13.17

Disk
Memory
Swizzled
Block 1
Block 1
Unswizzled
Block 2
POINTER SWIZZLING (CONTD…)

Three types of swizzling

Automatic Swizzling


No Swizzling


As soon as block is brought into memory, swizzle all relevant
pointers.
Pointers are not swizzled they are accesses using the database
address.
Swizzling on Demand

Only swizzle a pointer if and when it is actually followed.
PROGRAMMER CONTROL OF SWIZZLING

Unswizzling
When a block is moved from memory back to disk, all
pointers must go back to database (disk) addresses
 Use translation table again
 Important to have an efficient data structure for the
translation table

PINNED RECORDS AND BLOCKS



A block in memory is said to be pinned if it
cannot be written back to disk safely.
Unswizzle those pointers (use translation table to
replace the memory addresses with database (disk)
addresses
If block B1 has swizzled pointer to an item in
block B2, then B2 is pinned
Unpin a block, we must unswizzle any pointers to it
 Keep in the translation table the places in memory
holding swizzled pointers to that item

SECTION 13.7
RECORDS WITH VARIABLE FIELDS
An effective way to represent variable length
records is as follows
Record header contains
• Length of the record
• Pointers to the beginning of all variable
length fields except the first one.
 Fixed length fields are Kept ahead of the
variable length fields
RECORDS WITH REPEATING FIELDS
Records contains variable number of occurrences of a
field F
All occurrences of field F are grouped together and the
record Locating an occurrence of field F within the record
• Add to the offset for the field F which are the integer
multiples of L starting with 0 , L ,2L,3L and so on to
locate
•We stop upon reaching the offset of the field F.
header contains a pointer to the first occurrence of field F
 L bytes are devoted to one instance of field F
Records with Repeating Fields
record header to name length of name
information
to address
length of
to movie
address
references number of
references
addres
s
name
Figure 4 : Storing variable-length fields separately from the
record
Records with Repeating Fields
Advantage
 Keeping the record itself fixed length allows record to be
searched more efficiently, minimizes the overhead in the
block headers, and allows records to be moved within or
among the blocks with minimum effort.
Disadvantage
 Storing variable length components on another block
increases the number of disk I/O’s needed to examine all
components of a record.
VARIABLE FORMAT RECORDS
 Records that do not have fixed schema
 Variable format records are represented by sequence of
tagged fields
Why use tagged fields
• Information – Integration applications
• Records with a very flexible schema
 Each of the tagged fields consist of information
• Attribute or field name
• Type of the field
• Length of the field
• Value of the field
VARIABLE FORMAT RECORDS
code for name
code for string
type length
N
S 1
4
Clint
Eastwood
code for restaurant
owned
code for string type
length
R
S
1
6
Fig 5 : A record with tagged fields
Hog’s Breath Inn
RECORDS THAT DO NOT FIT IN A BLOCK
 When the length of a record is greater than block size ,
then record is divided and placed into two or more
blocks
 Portion of the record in each block is referred to as a
RECORD FRAGMENT
 Record with two or more fragments is called
SPANNED RECORD
 Record that do not cross a block boundary is called
UNSPANNED RECORD
RECORDS THAT DO NOT FIT IN A BLOCK
block header
record
header
record 1
block 1
record
2-a
record
2-b
record 3
block 2
Figure 6 : Storing spanned records across
blocks
SECTION 13.8
INSERTION

Insertion of records without order
Records can be placed in a block with empty space or in a new block.
Structured address
Pointer to a record from outside the block.
Insertion of records in fixed order


Space available in the block
No space available in the block (outside the block)
69
INSERTION IN FIXED ORDER
Space available within the block


Use of an offset table in the header of each block with pointers to the
location of each record in the block.
The records are slid within the block and the pointers in the offset table
are adjusted.
Offse
t
table
header
unuse
d
Record 4
Record 3
Record 2
Record 1
70
INSERTION IN FIXED ORDER
No space available within the block (outside the block)

Create an overflow block
•
•
•

Records can be stored in overflow block.
Each block has place for a pointer to an overflow block in its header.
The overflow block can point to a second overflow block as shown below.
Find space on a “nearby” block.
•
•
In case of no space available on a block, look at the following block in sorted
order of blocks.
If space is available in that block ,move the highest records of first block 1 to
block 2 and slide the records around on both blocks.
Block
B
Overflow
block for B
71
DELETION

Recover space after deletion

When using an offset table, the records can be slid around the block
so there will be an unused region in the center that can be recovered.

In case we cannot slide records, an available space list can be
maintained in the block header.

The list head goes in the block header and available regions hold the
links in the list.
Use of tombstone

The tombstone is placed in a record in order to avoid pointers to
the deleted record to point to new records.

The tombstone is permanent until the entire database is
reconstructed.
72
UPDATE


Fixed Length update
No effect on storage system as it occupies same space as before
update.
Variable length update
Longer length
 Short length

Variable length update (longer length)
Stored on the same block:
Sliding records
Creation of overflow block.
Stored on another block
Move records around that block
Create a new block for storing variable length fields.
73
CHAPTER 14
SECTION 14.2
STRUCTURE



A balanced tree, meaning that all paths from the leaf node have the
same length.
There is a parameter n associated with each Btree block. Each block
will have space for n searchkeys and n+1 pointers.
The root may have only 1 parameter, but all other
blocks most be at least half full.
● A typical
node >
● a typical interior
node would have
pointers pointing to
leaves with out
values
● a typical leaf would
have pointers point
to records
N search keys
N+1 pointers
APPLICATION
The search key of the Btree is the primary key for the
data file.
 Data file is sorted by an attribute that is not a key,and this
attribute is the search key for the Btree.
 Data file is sorted by its primary key.

LOOKUP
If at an interior node, choose the
correct pointer to use. This is done
by comparing keys to search value.
INSERTION
When inserting, choose the correct leaf node to put
pointer to data.
 Recursively move up, if cannot create new pointer to
new node because full, create new node.
 This would end with creating a new root node, if
the current root was full.
 If node is full, create a new node and split keys
between the two.

DELETION
If node is no longer half full, perform join on adjacent node and recursively
delete up, or key move if that node is full and recursively change pointer up.
EFFICIENCY
Three levels are sufficient for Btrees. Having each block
have 255 pointers, 255^3 is about 16.6 million.
You can even reduce disk I/Os by keeping a level of a
Btree in main memory. Keeping the first block with 255
pointers would reduce the reads to 2, and even possible
to keep the next 255 pointers in memory to reduce reads
to 1.
SECTION 14.7
DEFINITION
A bitmap index for a field F is a collection
of bit-vectors of length n, one for each
possible value that may appear in that
field F.[1]
WHAT DOES THAT MEAN?

Assume relation R with
 2 attributes A and B.
 Attribute A is of type
Integer and B is of type
String.
 6 records, numbered 1
through 6 as shown.
A
B
1
30
foo
2
30
bar
3
40
baz
4
50
foo
5
40
bar
6
30
baz
EXAMPLE CONTINUED…

A bitmap for attribute B is:
Value
foo
bar
baz
Vector
100100
010010
001001
A
B
1
30
foo
2
30
bar
3
40
baz
4
50
foo
5
40
bar
6
30
baz
WHERE DO WE REACH?



A bitmap index is a special kind of database index that uses
bitmaps.[2]
Bitmap indexes have traditionally been considered to work well
for data such as gender, which has a small number of distinct
values, e.g., male and female, but many occurrences of those
values.[2]
A bitmap index for attribute A of relation R is:






A collection of bit-vectors
The number of bit-vectors = the number of distinct values of A in R.
The length of each bit-vector = the cardinality of R.
The bit-vector for value v has 1 in position i, if the ith record has v in
attribute A, and it has 0 there if not.[3]
Records are allocated permanent numbers.[3]
There is a mapping between record numbers and record
addresses.[3]
MOTIVATION FOR BITMAP INDEXES

Very efficient when used for partial match queries.[3]

They offer the advantage of buckets [2]


Where we find tuples with several specified attributes without first retrieving all
the record that matched in each of the attributes.
They can also help answer range queries [3]
COMPRESSED BITMAPS

Assume:
The number of records in R are n
 Attribute A has m distinct values in R



The size of a bitmap index on attribute A is m*n.
If m is large, then the number of 1’s will be around 1/m.


Opportunity to encode
A common encoding approach is called run-length encoding.[1]
RUN-LENGTH ENCODING

Represents runs
 A run is a sequence of i 0’s followed by a 1, by some suitable
binary encoding of the integer i.

A run of i 0’s followed by a 1 is encoded by:
 First computing how many bits are needed to represent i, Say k
 Then represent the run by k-1 1’s and a single 0 followed by k
bits which represent i in binary.
 The encoding for i = 1 is 01. k = 1
 The encoding for i = 0 is 00. k = 1

We concatenate the codes for each run together, and the sequence of
bits is the encoding of the entire bit-vector
UNDERSTANDING WITH AN EXAMPLE


Let us decode the sequence 11101101001011
Staring at the beginning (left most bit):
 First run: The first 0 is at position 4, so k = 4. The next 4 bits are
1101, so we know that the first integer is i = 13
 Second run: 001011
k=1
 i = 0


Last run: 1011
k=1
 i = 3Our entire run length is thus 13,0,3, hence our bit-vector

is:
0000000000000110001
Deletion
Tombstone replaces deleted record
Corresponding bit is set to 0
Insertion
Record assigned the next record number.
A bit of value 0 or 1 is appended to each
bit vector
If new record contains a new value of the
attribute, add one bit-vector.
Modification
Change the bit corresponding to the old
value of the modified record to 0
Change the bit corresponding to the new
value of the modified record to 1
If the new value is a new value of A, then
insert a new bit-vector.

CHAPTER 15
SECTION 15.1
WHAT IS A QUERY PROCESSOR
 Group
of components of a DBMS that
converts a user queries and datamodification commands into a sequence
of database operations
 It also executes those operations
 Must supply detail regarding how the
query is to be executed
MAJOR PARTS OF QUERY
PROCESSOR
Query Execution:
The algorithms
that manipulate
the data of the
database.
Focus on the
operations of
extended
relational
algebra.
OUTLINE OF QUERY
COMPILATION
Query compilation
 Parsing : A parse tree for the
query is constructed
 Query Rewrite : The parse
tree is converted to an initial
query plan and transformed
into logical query plan (less
time)
 Physical Plan Generation :
Logical Q Plan is converted into
physical query plan by selecting
algorithms and order of
execution of these operator.
PHYSICAL-QUERY-PLAN
OPERATORS
 Physical
operators are
implementations of the operator of
relational algebra.
 They can also be use in non relational
algebra operators like “scan” which
scans tables, that is, bring each tuple of
some relation into main memory
SCANNING TABLES
 One
of the basic thing we can do in a
Physical query plan is to read the
entire contents of a relation R.
 Variation of this operator involves
simple predicate, read only those tuples
of the relation R that satisfy the
predicate.
SCANNING TABLES
Basic approaches to locate the tuples of a relation R
 Table Scan
 Relation R is stored in secondary memory with its
tuples arranged in blocks
 It is possible to get the blocks one by one
 Index-Scan
 If there is an index on any attribute of Relation R,
we can use this index to get all the tuples of
Relation R
SORTING WHILE SCANNING
TABLES
 Number
of reasons to sort a relation
 Query could include an ORDER BY
clause, requiring that a relation be sorted.
 Algorithms to implement relational algebra
operations requires one or both
arguments to be sorted relations.
 Physical-query-plan operator sortscan takes a relation R, attributes on
which the sort is to be made, and produces
R in that sorted order
COMPUTATION MODEL FOR
PHYSICAL OPERATOR
 Physical-Plan
Operator should be
selected wisely which is essential for
good Query Processor .
 For “cost” of each operator is estimated
by number of disk I/O’s for an
operation.
 The total cost of operation depends on
the size of the answer, and includes the
final write back cost to the total cost of
the query.
PARAMETERS FOR MEASURING
COSTS
 Parameters
that affect the performance of a
query
 Buffer space availability in the main
memory at the time of execution of the
query
 Size of input and the size of the
output generated
 The size of memory block on the disk
and the size in the main memory also
affects the performance
PARAMETERS FOR MEASURING
COSTS
B: The number of blocks are needed to hold all
tuples of relation R.
 Also denoted as B(R)
 T:The number of tuples in relationR.
 Also denoted as T(R)
V: The number of distinct values that appear in a
column of a relation R
 V(R, a)- is the number of distinct values of
column for a in relation R

I/O COST FOR SCAN
OPERATORS
 If
relation R is clustered, then the number
of disk I/O for the table-scan operator is =
~B disk I/O’s
 If relation R is not clustered, then the
number of required disk I/O generally is
much higher
 A index on a relation R occupies many fewer
than B(R) blocks
That means a scan of the entire relation
R which takes at least B disk I/O’s will
require more I/O’s than the entire index
ITERATORS FOR IMPLEMENTATION
OF PHYSICAL OPERATORS
 Many
physical operators can be
implemented as an Iterator.
 Three methods forming the iterator for
an operation are:
 1. Open( ) :
 This method starts the process of
getting tuples
 It initializes any data structures
needed to perform the operation
ITERATORS FOR IMPLEMENTATION
OF PHYSICAL OPERATORS
 2.
GetNext( ):
 Returns the next tuple in the result
 If there are no more tuples to return,
GetNext returns a special value
NotFound
 3. Close( ) :
 Ends the iteration after all tuples
 It calls Close on any arguments of the
operator
SECTION 15.2
ONE-PASS ALGORITHM METHODS



Tuple-at-a-time, unary operations: (selection &
projection)
Full-relation, unary operations
Full-relation, binary operations (set & bag
versions of union)
ONE-PASS ALGORITHMS FOR TUPLE-AT-ATIME OPERATIONS

Tuple-at-a-time operations are selection and
projection
read the blocks of R one at a time into an input
buffer
 perform the operation on each tuple
 move the selected tuples or the projected tuples to
the output buffer


The disk I/O requirement for this process
depends only on how the argument relation R is
provided.

If R is initially on disk, then the cost is whatever it
takes to perform a table-scan or index-scan of R.
A SELECTION OR PROJECTION BEING
PERFORMED ON A RELATION R
ONE-PASS ALGORITHMS FOR UNARY, FILLRELATION OPERATIONS

Duplicate Elimination

To eliminate duplicates, we can read each block of R
one at a time, but for each tuple we need to make a
decision as to whether:
1.
2.

It is the first time we have seen this tuple, in which
case we copy it to the output, or
We have seen the tuple before, in which case we
must not output this tuple.
One memory buffer holds one block of R's tuples, and
the remaining M - 1 buffers can be used to hold a
single copy of every tuple.
DUPLICATE ELIMINATION


We need a main-memory structure that allows
each of the operations:

Add a new tuple, and

Tell whether a given tuple is already there
When a new tuple from R is considered, we
compare it with all tuples seen so far
if it is not equal: we copy both to the output and
add it to the in-memory list of tuples we have seen.
 if there are n tuples in main memory: each new
tuple takes processor time proportional to n, so the
complete operation takes processor time
proportional to n2.

ONE-PASS ALGORITHMS FOR UNARY, FILLRELATION OPERATIONS

Grouping

The grouping operation gives us zero or more
grouping attributes and presumably one or more
aggregated attributes

If we create in main memory one entry for each
group then we can scan the tuples of R, one block
at a time.

The entry for a group consists of values for the
grouping attributes and an accumulated value or
values for each aggregation.
ONE-PASS ALGORITHMS FOR BINARY
OPERATIONS

Binary operations include:





Union
Intersection
Difference
Product
Join
SET UNION




We read S into M - 1 buffers of main memory and
build a search structure where the search key is the
entire tuple.
All these tuples are also copied to the output.
Read each block of R into the Mth buffer, one at a
time.
For each tuple t of R, see if t is in S, and if not, we
copy t to the output. If t is also in S, we skip t.
SET INTERSECTION


Read S into M - 1 buffers and build a search
structure with full tuples as the search key.
Read each block of R, and for each tuple t of R, see
if t is also in S. If so, copy t to the output, and if not,
ignore t.
SET DIFFERENCE




Read S into M - 1 buffers and build a search structure
with full tuples as the search key.
To compute R -s S, read each block of R and examine
each tuple t on that block. If t is in S, then ignore t; if it
is not in S then copy t to the output.
To compute S -s R, read the blocks of R and examine
each tuple t in turn. If t is in S, then delete t from the
copy of S in main memory, while if t is not in S do
nothing.
After considering each tuple of R, copy to the output
those tuples of S that remain.
BAG INTERSECTION



Read S into M - 1 buffers.
Multiple copies of a tuple t are not stored individually.
Rather store 1 copy of t & associate with it a count equal
to no. of times t occurs.
Next, read each block of R, & for each tuple t of R see
whether t occurs in S. If not ignore t; it cannot appear in
the intersection. If t appears in S, & count associated
with t is (+)ve, then output t & decrement count by 1. If
t appears in S, but count has reached 0, then do not
output t; we have already produced as many copies of t
in output as there were copies in S.
BAG DIFFERENCE



To compute S -B R, read tuples of S into main
memory & count no. of occurrences of each distinct
tuple.
Then read R; check each tuple t to see whether t
occurs in S, and if so, decrement its associated
count. At the end, copy to output each tuple in main
memory whose count is positive, & no. of times we
copy it equals that count.
To compute R -B S, read tuples of S into main
memory & count no. of occurrences of distinct
tuples.
PRODUCT




Read S into M - 1 buffers of main memory
Then read each block of R, and for each tuple t of R
concatenate t with each tuple of S in main memory.
Output each concatenated tuple as it is formed.
This algorithm may take a considerable amount of
processor time per tuple of R, because each such
tuple must be matched with M - 1 blocks full of
tuples. However, output size is also large, &
time/output tuple is small.
NATURAL JOIN


Convention: R(X, Y) is being joined with S(Y, Z),
where Y represents all the attributes that R and
S have in common, X is all attributes of R that
are not in the schema of S, & Z is all attributes of
S that are not in the schema of R. Assume that S
is the smaller relation.
To compute the natural join, do the following:
1. Read all tuples of S & form them into a
main-memory search structure.
Hash table or balanced tree are good e.g. of
such structures. Use M - 1 blocks of memory
for this purpose.
NATURAL JOIN (…CONTD.)
2.
Read each block of R into 1 remaining mainmemory buffer.
For each tuple t of R, find tuples of S that
agree with t on all attributes of Y, using the
search structure.
For each matching tuple of S, form a tuple
by joining it with t, & move resulting tuple
to output.
SECTION 15.3
TUPLE-BASED NESTED-LOOP JOIN
 Simplest variation of the
nested-loop join
 Loop ranges over
individual tuples
TUPLE-BASED NESTED-LOOP JOIN
Algorithm to compute the Join R(X,Y) | |
S(Y,Z)
FOR each tuple s in S DO
FOR each tuple r in R DO
IF r and s join to make tuple t
THEN
output t
 carelessness in buffering of blocks causes the
use of T(R)T(S) disk I/O’s
 R and S are two Relations with r and s as
tuples.

IMPROVEMENT & MODIFICATION
To decrease the cost

Method 1: Use algorithm for Index-Based joins
We find tuple of R that matches given tuple
of S
 We need not to read entire relation R


Method 2: Use algorithm for Block-Based joins
Tuples of R & S are divided into blocks
 Uses enough memory to store blocks in
order to reduce the number of disk I/O’s.

BLOCK-BASED NESTED-LOOP JOIN
ALGORITHM
 Access
to arguments is organized by
block.
 While reading tuples of inner relation we
use less number of I/O’s disk.
 Using
enough space in main memory to
store tuples of relation of the outer
loop.
 Allows to join each tuple of the inner
relation with as many tuples as possible.
BLOCK-BASED NESTED-LOOP JOIN
ALGORITHM
ALGORITHM:
FOR each chunk of M-1 blocks of S DO
FOR each block b of R DO
FOR each tuple t of b DO
find the tuples of S in memory that join with
t
output the join of t with each of these
tuples
BLOCK-BASED NESTED-LOOP JOIN
ALGORITHM

Assumptions:
 B(S)
≤ B(R)
 B(S) > M
This means that the neither relation
fits in the entire main memory.
ANALYSIS OF NESTED-LOOP JOIN

Number of disk I/O’s:
1
[B(S)/(M- )]*(M-1 +B(R))
or
1
B(S) + [B(S)B(R)/(M- )]
or approximately B(S)*B(R)/M
SECTION 15.4
TWO-PASS ALGORITHMS BASED ON
SORTING

Two-pass Algorithms: where data from the operand
relations is read into main memory, processed in some
way, written out to disk again, and then reread from disk
to complete the operation
BASIC IDEA
Step 1: Read M blocks of R into main memory.
 Step 2:Sort these M blocks in main memory, using an
efficient, main-memory sorting algorithm. so we expect
that the time to sort will not exceed the disk 1/0 time
for step (1).
 Step 3: Write the sorted list into M blocks of disk.

DUPLICATE ELIMINATION
USING SORTING Δ(R)
First we sort the tuples of R in sublists
 Then we use the available main memory to hold one
block from each sorted sublist
 Then we repeatedly copy one to the output and ignore
all tuples identical to it.
 The total cost of this algorithm is 3B(R)
 This algorithm requires only √B(R)blocks of main
memory, rather than B(R) blocks(one-pass algorithm).

EXAMPLE
Suppose that tuples are integers, and only two tuples fit
on a block. Also, M = 3 and the relation R consists of 17
tuples:
2,5,2,1,2,2,4,5,4,3,4,2,1,5,2,1,3
 After first-pass

Sublists
Elements
R1
1,2,2,2,2,5
R2
2,3,4,4,4,5
R3
1,1,2,3,5
EXAMPLE

Second pass
Sublist
In memory
Waiting on disk
R1
1,2
2,2, 2,5
R2
2,3
4,4, 4,5
R3
1,1
2,3,5
Sublist
In memory
Waiting on disk
R1
2
2,2, 2,5
R2
2,3
4,4, 4,5
R3
2,3
5
After processing tuple 1
Output: 1
Continue the same process with next tuple.
GROUPING AND
AGGREGATION USING
SORTING Γ(R)
Two-pass algorithm for grouping and aggregation is
quite similar to the previous algorithm.
 Step 1:Read the tuples of R into memory, M blocks at a
time. Sort each M blocks, using the grouping attributes
of L as the sort key. Write each sorted sublist to disk.
 Step 2:Use one main-memory buffer for each sublist,
and initially load the first block of each sublist into its
buffer.
 Step 3:Repeatedly find the least value of the sort key
(grouping attributes) present among the first available
tuples in the buffers.
 This algorithm takes 3B(R) disk 1/0's, and will work as
long as B(R) < M².

A SORT-BASED UNION ALGORITHM


For bag-union one-pass algorithm is used.
For set-union
Step 1:Repeatedly bring M blocks of R into main memory, sort their
tuples, and write the resulting sorted sublist back to disk.
 Step 2:Do the same for S, to create sorted sublists for relation S.
 Step 3:Use one main-memory buffer for each sublist of R and S.
Initialize each with the first block from the corresponding sublist.
 Step 4:Repeatedly find the first remaining tuple t among all the

buffers. Copy
copies of

t to the output. and remove from the buffers all
t (if R and S are sets there should be at most two copies)
This algorithm takes 3(B(R)+B(S)) disk 1/0's, and will work as
long as B(R)+B(S) < M².
SORT-BASED INTERSECTION AND
DIFFERENCE
For both set version and bag version, the algorithm is
same as that of set-union except that the way we handle
the copies of a tuple t at the fronts of the sorted
sublists.
 For set intersection, output t if it appears in both R and
S.
 For bag intersection, output t the minimum of the
number of times it appears in R and in S.
 For set difference, R-S, output t if and only if it appears
in R but not in S.
 For bag difference, R-S, output t the number of times it
appears in R minus the number of times it appears in S.

A SIMPLE SORT-BASED JOIN
ALGORITHM

To avoid facing this situation, are can try to reduce mainmemory use for other aspects of the algorithm, and thus
make available a large number of buffers to hold the
tuples with a given join-attribute value

When taking a join, the number of tuples from the two
relations that share a common value of the join
attribute(s), and therefore need to be in main memory
simultaneously, can exceed what fits in memory
A SIMPLE SORT-BASED JOIN
ALGORITHM



Given relations R(X, Y) and S(Y, Z) to join, and given M blocks
of main memory for buffers.
Step 1:Sort R and S, using a two-phase, multiway merge sort,
with Y as the sort key.
Step 2:Merge the sorted R and S. The following steps are
done repeatedly:





Find the least value y of the join attributes Y that is currently at the
front of the blocks for R and S.
If y does not appear at the front of the other relation, then remove
the tuple(s) with sort key y.
Otherwise, identify all the tuples from both relations having sort
key y.
Output all the tuples that can be formed by joining tuples from R
and S with a common Y-value y.
If either relation has no more unconsidered tuples in main
memory.,reload the buffer for that relation.
A MORE EFFICIENT SORT-BASED JOIN


If we do not have to worry about very large numbers of
tuples with a common value for the join attribute(s), then we
can save two disk 1/0's per block by combining the second
phase of the sorts with the join itself
To compute R(X, Y) ►◄ S(Y, Z) using M main-memory buffers



Create sorted sublists of size M, using Y as the sort key, for both
R and S.
Bring the first block of each sublist into a buffer
Repeatedly find the least Y-value y among the first available
tuples of all the sublists. Identify all the tuples of both relations
that have Y-value y. Output the join of all tuples from R with all
tuples from S that share this common Y-value
SUMMARY OF SORT-BASED
ALGORITHMS
Operators
Approximate
M required
Disk I/O
γ,δ
√B
3B
U,∩,−
√(B(R) + B(S))
3(B(R) + B(S))
►◄
√(max(B(R),B(S)))
5(B(R) + B(S))
►◄(more efficient)
√(B(R) + B(S))
3(B(R) + B(S))
SECTION 15.5
INTRODUCTION
Hashing is done if the data is too big to store in
main memory buffers.
Hash all the tuples of the argument(s) using an
appropriate hash key.
 For all the common operations, there is a way to
select the hash key so all the tuples that need to be
considered together when we perform the operation
have the same hash value.
 This reduces the size of the operand(s) by a factor
equal to the number of buckets.

PARTITIONING RELATIONS BY HASHING
Algorithm:
initialize M-1 buckets using M-1 empty buffers;
FOR each block b of relation R DO BEGIN
read block b into the Mth buffer;
FOR each tuple t in b DO BEGIN
IF the buffer for bucket h(t) has no room for t THEN
BEGIN
copy the buffer t o disk;
initialize a new empty block in that buffer;
END;
copy t to the buffer for bucket h(t);
END ;
END ;
FOR each bucket DO
IF the buffer for this bucket is not empty THEN
write the buffer to disk;
D
UPLICATE ELIMINATION
 For the operation δ(R) hash R to M-1 Buckets.
(Note that two copies of the same tuple t will hash
to the same bucket)
 Do duplicate elimination on each bucket Ri
independently, using one-pass algorithm
 The result is the union of δ(Ri), where Ri is the
portion of R that hashes to the ith bucket
REQUIREMENTS

Number of disk I/O's: 3*B(R)
 B(R) < M(M-1), only then the two-pass, hash-based
algorithm will work

In order for this to work, we need:
hash function h evenly distributes the tuples among
the buckets
 each bucket Ri fits in main memory (to allow the onepass algorithm)
 i.e., B(R) ≤ M2

GROUPING AND AGGREGATION



Hash all the tuples of relation R to M-1 buckets,
using a hash function that depends only on the
grouping attributes
(Note: all tuples in the same group end up in the
same bucket)
Use the one-pass algorithm to process each
bucket independently
Uses 3*B(R) disk I/O's, requires B(R) ≤ M2
UNION, INTERSECTION, AND DIFFERENCE
For binary operation we use the same hash
function to hash tuples of both arguments.
 R U S we hash both R and S to M-1

R ∩ S we hash both R and S to 2(M-1)
 R-S we hash both R and S to 2(M-1)
 Requires 3(B(R)+B(S)) disk I/O’s.
 Two pass hash based algorithm requires
min(B(R)+B(S))≤ M2

HASH-JOIN ALGORITHM

Use same hash function for both relations; hash
function should depend only on the join
attributes
Hash R to M-1 buckets R1, R2, …, RM-1
 Hash S to M-1 buckets S1, S2, …, SM-1
 Do one-pass join of Ri and Si, for all i
 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M2

SORT BASED VS HASH BASED




For binary operations, hash-based only limits
size to min of arguments, not sum
Sort-based can produce output in sorted order,
which can be helpful
Hash-based depends on buckets being of equal
size
Sort-based algorithms can experience reduced
rotational latency or seek time
SECTION 15.6
CLUSTERING AND NONCLUSTERING
INDEXES


Clustered Relation: Tuples are packed into
roughly as few blocks as can possibly hold those
tuples
Clustering indexes: Indexes on attributes that all
the tuples with a fixed value for the search key of
this index appear on roughly as few blocks as can
hold them
CLUSTERING AND NONCLUSTERING
INDEXES
A relation that isn’t clustered cannot have a
clustering index
 A clustered relation can have nonclustering
indexes

INDEX-BASED SELECTION


For a selection σC(R), suppose C is of the form
a=v, where a is an attribute
For clustering index R.a:
the number of disk I/O’s will be B(R)/V(R,a)
INDEX-BASED SELECTION

The actual number may be higher:
1. index is not kept entirely in main memory
2. they spread over more blocks
3. may not be packed as tightly as possible
into blocks
EXAMPLE
B(R)=1000, T(R)=20,000 number of I/O’s
required:
 1. clustered, not index
1000
 2. not clustered, not index
20,000
 3. If V(R,a)=100, index is clustering
10
 4. If V(R,a)=10, index is nonclustering 2,000

JOINING BY USING AN INDEX

Natural join R(X, Y) S S(Y, Z)
Number of I/O’s to get R
Clustered: B(R)
Not clustered: T(R)
Number of I/O’s to get tuple t of S
Clustered: T(R)B(S)/V(S,Y)
Not clustered: T(R)T(S)/V(S,Y)
EXAMPLE

R(X,Y): 1000 blocks S(Y,Z)=500 blocks
Assume 10 tuples in each block,
so T(R)=10,000 and T(S)=5000
V(S,Y)=100
If R is clustered, and there is a clustering index
on Y for S
the number of I/O’s for R is: 1000
the number of I/O’s for S
is10,000*500/100=50,000
JOINS USING A SORTED INDEX
Natural join R(X, Y) S (Y, Z) with index on Y for
either R or S
 Extreme case: Zig-zag join
 Example:
relation R(X,Y) and R(Y,Z) with index on Y for
both relations
search keys (Y-value) for R: 1,3,4,4,5,6
search keys (Y-value) for S: 2,2,4,6,7,8

SECTION 15.7
WHAT DOES A BUFFER MANAGER DO?
Assume there are M of main-memory buffers needed for
the operators on relations to store needed data.
In practice:
1) rarely allocated in advance
2) the value of M may vary depending on system
conditions
Therefore, buffer manager is used to allow processes
to get the memory they need, while minimizing the
delay and unclassifiable requests.
The role of the buffer manager
Read/Writes
Requests
Buffers
Buffer
manager
Figure 1: The role of the buffer manager : responds to requests for
main-memory access to disk blocks
15.7.1 Buffer Management Architecture
Two broad architectures for a buffer manager:
1) The buffer manager controls main memory directly.
• Relational DBMS
2) The buffer manager allocates buffers in virtual memory,
allowing the OS to decide how to use buffers.
• “main-memory” DBMS
• “object-oriented” DBMS
Buffer Pool
Page Requests from Higher Levels
BUFFER POOL
disk page
free frame
MAIN MEMORY
DISK


DB
choice of frame dictated
by replacement policy
Data must be in RAM for DBMS to operate on it!
Buffer Manager hides the fact that not all data is in RAM.
Buffer-replacement strategy -- LRU
Least-Recently Used (LRU):
To throw out the block that has not been read or written
for the longest time.
• Requires more maintenance but it is effective.
• Update the time table for every access.
• Least-Recently Used blocks are usually less likely to
be accessed sooner than other blocks.
Buffer-replacement strategy -- FIFO
First-In-First-Out (FIFO):
The buffer that has been occupied the longest by the
same block is emptied and used for the new block.
• Requires less maintenance but it can make more
mistakes.
• Keep only the loading time
• The oldest block doesn’t mean it is less likely to be
accessed.
Example: the root block of a B-tree index
Buffer-replacement strategy – “Clock”
The “Clock” Algorithm (“Second Chance”)
Think of the 8 buffers as arranged in a circle, shown as
Figure 3
Flag 0 and 1:
buffers with a 0 flag are ok to sent their contents back
to disk, i.e. ok to be replaced
buffers with a 1 flag are not ok to be replaced
Buffer-replacement strategy – “Clock”
0
0
1
0
the buffer with
a 0 flag will
be replaced
0
0
1
1
Start point to
search a 0 flag
The flag will
be set to 0
By next time the hand
reaches it, if the content of
this buffer is not accessed,
i.e. flag=0, this buffer will
be replaced.
That’s “Second Chance”.
Figure 3: the clock algorithm
System Control helps Buffer-replacement strategy
System Control
The query processor or other components of a DBMS can
give advice to the buffer manager in order to avoid some
of the mistakes that would occur with a strict policy such
as LRU,FIFO or Clock.
For example:
A “pinned” block means it can’t be moved to disk without
first modifying certain other blocks that point to it.
In FIFO, use “pinned” to force root of a B-tree to remain in
memory at all times.
15.7.3 The Relationship Between Physical
Operator Selection and Buffer Management
Questions:
Can the algorithm adapt to changes of M, the
number of main-memory buffers available?
When available buffers are less than M, and some
blocks have to be put in disk instead of in memory.
How the buffer-replacement strategy impact the
performance (i.e. the number of additional I/O’s)?
Example
FOR each chunk of M-1 blocks of S DO BEGIN
read these blocks into main-memory buffers;
organize their tuples into a search structure whose
search key is the common attributes of R and S;
FOR each block b of R DO BEGIN
read b into main memory;
FOR each tuple t of b DO BEGIN
find the tuples of S in main memory that
join with t ;
output the join of t with each of these tuples;
END ;
END ;
END ;
Figure 15.8: The nested-loop join algorithm
Example
The outer loop number (M-1) depends on the average
number of buffers are available at each iteration.
The outer loop use M-1 buffers and 1 is reserved for a block
of R, the relation of the inner loop.
If we pin the M-1 blocks we use for S on one iteration of the
outer loop, we shall not lose their buffers during the round.
Also, more buffers may become available and then we could
keep more than one block of R in memory.
Will these extra buffers improve the running time?
Example
CASE1: NO
Buffer-replacement strategy: LRU
Buffers for R: k
We read each block of R in order into buffers.
By end of the iteration of the outer loop, the last k blocks of R
are in buffers.
However, next iteration will start from the beginning of R
again.
Therefore, the k buffers for R will need to be replaced.
CASE 2: YES
Buffer-replacement strategy: LRU
Buffers for R: k
We read the blocks of R in an order that alternates:
firstlast and then lastfirst.
In this way, we save k disk I/Os on each iteration of the outer
loop except the first iteration.
Other Algorithms and M buffers
Other Algorithms also are impact by M and the
buffer-replacement strategy.
Sort-based algorithm
If M shrinks, we can change the size of a
sublist.
Unexpected result: too many sublists to
allocate each sublist a buffer.
Hash-based algorithm
If M shrinks, we can reduce the number of
buckets, as long as the buckets still can fit in M
buffers.
SECTION 15.8
Multi-pass Sort-based
Algorithms
Suppose we have M main-memory buffers
available to sort a relation R, which we assume is
stored clustered.
If R fits in M blocks (i.e., B(R)<=M)
1. Read R into main memory.
2. Sort it using any main-memory
sorting algorithm.
3. Write the sorted relation to disk.
INDUCTION:
If R does not fit into main memory.
1. Partition the blocks holding R into
M groups,
which we shall call R1, R2, R3…
2. Recursively sort Ri for each
i=1,2,3…M.
3. Merge the M sorted sublists.
For a δ, output one copy of each distinct tuple,
and skip over copies of the tuple.
 For a γ, sort on the grouping attributes only, and
combine the tuples with a given value of these
grouping attributes.

Conclusion
The two pass algorithms based on sorting or
hashing have natural recursive analogs that take
three or more passes and will work for larger
amounts of data.
CHAPTER 16
SECTION 16.1
Query compilation is divided
into three steps
1. Parsing: Parse SQL query into parser tree.
2. Logical query plan: Transforms parse tree into
expression tree of relational algebra.
3.Physical query plan:
Transforms logical query plan
into physical query plan.
. Operation performed
. Order of operation
. Algorithm used
. The way in which stored data is obtained and passed from
one
operation to another.
Query
Parser
Preprocessor
Logical Query plan
generator
Query rewrite
Preferred logical
query plan
Form a query to a logical query
plan
Syntax Analysis and Parse
Tree
Parser takes the sql query and convert it to
parse
tree. Nodes of parse tree:
1. Atoms: known as Lexical elements such as
key
words,
constants,
parentheses,
operators, and other schema elements.
2. Syntactic categories: Subparts that plays a
similar role in a query as <Query> ,
<Condition>
QUERY AND PARSE T REE
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in
1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM
<Attribute>
<FromList>
WHERE
<RelName> , <FromList>
title
StarsIn
<Condition>
starName
=
AND
<RelName>
MovieStar
<Attribute>
<Condition>
<Attribute>
name
<Query>
<Condition>
<Attribute> LIKE <Pattern>
birthdate
‘%1960’
THE PREPROCESSOR
Functions of Preprocessor
. If a relation used in the query is virtual view then each use
of this relation in the form-list must replace by parser tree
that describe the view.
. It is also responsible for semantic checking
1. Checks relation uses : Every relation mentioned in FROMclause must be a relation or a view in current schema.
2. Check and resolve attribute uses: Every attribute
mentioned
in SELECT or WHERE clause must be an attribute of
same
relation in the current scope.
3. Check types: All attributes must be of a type appropriate
to
their uses.
StarsIn(title,year,starName)
MovieStar(name,address,gender,birthdate)
Query:
Give titles of movies that have at least one star born in
1960
SELECT title FROM StarsIn WHERE starName IN
(
SELECT name FROM MovieStar WHERE
birthdate LIKE '%1960%'
);
PREPROCESSING QUERIES INVOLVING VIEWS
When an operand in a query is a virtual view, the
preprocessor needs to replace the operand by a piece of
parse tree that represents how the view is constructed
from base table.
Base Table: Movies( title, year, length, genre, studioname,
producerC#)
View definition : CREATE VIEW ParamountMovies AS
SELECT title, year FROM movies
WHERE studioName = 'Paramount';
Example based on view:
SELECT title FROM ParamountMovies WHERE year =
1979;
SECTION 16.2
OPTIMIZING THE LOGICAL QUERY PLAN



The translation rules converting a parse tree to a
logical query tree do not always produce the best
logical query tree.
It is often possible to optimize the logical query
tree by applying relational algebra laws to
convert the original tree into a more efficient
logical query tree.
Optimizing a logical query tree using relational
algebra laws is called heuristic optimization
RELATIONAL ALGEBRA LAWS
These laws often involve the properties of:

commutativity - operator can be applied to operands
independent of order.


E.g. A + B = B + A - The “+” operator is commutative.
associativity - operator is independent of operand
grouping.

E.g. A + (B + C) = (A + B) + C - The “+” operator is associative.
ASSOCIATIVE AND COMMUTATIVE
OPERATORS

The relational algebra operators of cross-product (×),
join (⋈), union, and intersection are all associative
and commutative.
Commutative
Associative
R X S=S X R
(R X S) X T = S X (R X T)
R⋈S=S⋈R
(R ⋈ S) ⋈ T= S ⋈ (R ⋈ T)
RS=SR
(R  S)  T = S  (R  T)
R ∩S =S∩ R
(R ∩ S) ∩ T = S ∩ (R ∩ T)
LAWS INVOLVING SELECTION
Complex selections involving AND or OR can be broken into
two or more selections: (splitting laws)

σC1 AND C2 (R) = σC1( σC2 (R))
σC1 OR C2 (R) = ( σC1 (R) ) S ( σC2 (R) )
Example







R={a,a,b,b,b,c}
p1 satisfied by a,b, p2 satisfied by b,c
σp1vp2 (R) = {a,a,b,b,b,c}
σp1(R) = {a,a,b,b,b}
σp2(R) = {b,b,b,c}
σp1 (R) U σp2 (R) = {a,a,b,b,b,c}
LAWS INVOLVING SELECTION (CONTD..)

Selection is pushed through both arguments for
union:
σC(R  S) = σC(R)  σC(S)

Selection is pushed to the first argument and
optionally the second for difference:
σC(R - S) = σC(R) - S
σC(R - S) = σC(R) - σC(S)
LAWS INVOLVING SELECTION (CONTD..)


All other operators require selection to be
pushed to only one of the arguments.
For joins, may not be able to push selection to
both if argument does not have attributes
selection requires.
σC(R × S) = σC(R) × S
σC(R ∩ S) = σC(R) ∩ S
σC(R ⋈ S) = σC(R) ⋈ S
σC(R ⋈D S) = σC(R) ⋈D S
LAWS INVOLVING SELECTION (CONTD..)
Example
 Consider relations R(a,b) and S(b,c) and the
expression

σ (a=1 OR a=3) AND b<c (R ⋈S)
σ a=1 OR a=3(σ b<c (R ⋈S))
σ a=1 OR a=3(R ⋈ σ b<c (S))
σ a=1 OR a=3(R) ⋈ σ b<c (S)
LAWS INVOLVING PROJECTION

Like selections, it is also possible to push projections
down the logical query tree. However, the
performance gained is less than selections because
projections just reduce the number of attributes
instead of reducing the number of tuples.
LAWS INVOLVING PROJECTION

Laws for pushing projections with joins:
πL(R × S) = πL(πM(R) × πN(S))
πL(R ⋈ S) = πL((πM(R) ⋈ πN(S))
πL(R ⋈D S) = πL((πM(R) ⋈D πN(S))
LAWS INVOLVING PROJECTION
 Laws
for pushing projections with set operations.
 Projection
union.
can be performed entirely before
πL(R UB S) = πL(R) UB πL(S)
 Projection
can be pushed below selection as long
as we also keep all attributes needed for the
selection (M = L  attr(C)).
πL ( σC (R)) = πL( σC (πM(R)))
LAWS INVOLVING JOIN

We have previously seen these important rules
about joins:
1.
Joins are commutative and associative.
2.
Selection can be distributed into joins.
3.
Projection can be distributed into joins.
LAWS INVOLVING DUPLICATE
ELIMINATION


The duplicate elimination operator (δ) can be pushed
through many operators.
R has two copies of tuples t, S has one copy of t,
δ (RUS)=one copy of t
 δ (R) U δ (S)=two copies of t

LAWS INVOLVING DUPLICATE
ELIMINATION

Laws for pushing duplicate elimination operator
(δ):
δ(R × S) = δ(R) × δ(S)
δ(R S) = δ(R)
δ(S)
δ(R D S) = δ(R)
D δ(S)
δ( σC(R) = σC(δ(R))
LAWS INVOLVING DUPLICATE
ELIMINATION

The duplicate elimination operator (δ) can also be
pushed through bag intersection, but not across
union, difference, or projection in general.
δ(R ∩ S)
=
δ(R) ∩ δ(S)
LAWS INVOLVING GROUPING


The grouping operator (γ) laws depend on the aggregate
operators used.
There is one general rule, however, that grouping subsumes
duplicate elimination:
δ(γL(R)) = γL(R)

The reason is that some aggregate functions are unaffected
by duplicates (MIN and MAX) while other functions are
(SUM, COUNT, and AVG).
SECTION 16.3
TWO STEPS TO TURN PARSE TREE INTO
PREFERRED LOGICAL QUERY PLAN

Replace the nodes and structures of the parse tree, in
appropriate groups, by an operator or operators of
relational algebra.

Take the relational algebra expression and turn it into an
expression that we expect can be converted to the most
efficient physical query plan.
REVIEW
Query
Parser
Section 16.1
Preprocessor
Logical query
plan generator
Section 16.3
Preferred
logical
query plan
Query
Rewriter

The relational-algebra expression consists of the
following from bottom to top:
 The products of all the relations mentioned in the
<FromList>, which Is the argument of:
 A selection σC, where C is the <Condition> expression
in the construct being replaced, which in turn is the
argument of:
 A projection πL , where L is the list of attributes in the
<SelList>
A QUERY : EXAMPLE
 SELECT
movieTitle
FROM Starsin, MovieStar
WHERE starName = name AND
birthdate LIKE ‘%1960’;
SELECT MOVIETITLE
FROM STARSIN, MOVIESTAR
WHERE STARNAME = NAME AND
BIRTHDATE LIKE ‘%1960’;
TRANSLATION TO AN ALGEBRAIC EXPRESSION
TREE
REMOVING SUBQUERIES FROM CONDITIONS
For parse trees with a <Condition> that has a subquery
 Intermediate operator – two argument selection
 It is intermediate in between the syntactic categories of
the parse tree and the relational-algebra operators that
apply to relations.

USING A TWO-ARGUMENT Σ
πmovieTitle
σ
<Condition>
StarsIn
<Tuple>
<Attribute>
starName
IN
πname
σ birthdate LIKE ‘%1960'
MovieStar
TWO ARGUMENT SELECTION WITH CONDITION
INVOLVING IN
Now say we have, two arguments – some relation and
the second argument is a <Condition> of the form t IN
S.



‘t’ – tuple composed of some attributes of R
‘S’ – uncorrelated subquery
Steps to be followed:

1.
2.
3.
Replace the <Condition> by the tree that is the expression for
S ( δ is used to remove duplicates)
Replace the two-argument selection by a one-argument
selection σC.
Give σC an argument that is the product of R and S.
TWO ARGUMENT SELECTION WITH CONDITION
INVOLVING IN
σ
R
σC
<Condition>
t
IN
X
S
R
δ
S
THE EFFECT
IMPROVING THE LOGICAL QUERY PLAN

Algebraic laws to improve logical query plans:
Selections can be pushed down the expression tree as far as
they can go.
 Similarly, projections can be pushed down the tree, or new
projections can be added.
 Duplicate eliminations can sometimes be removed, or moved
to a more convenient position in the tree.
 Certain selections can be combined with a product below to
turn the pair of operations into an equijoin.

GROUPING ASSOCIATIVE/ COMMUTATIVE
OPERATORS
 An
operator that is associative and commutative
operators may be though of as having any number of
operands.
 We need to reorder these operands so that the
multiway join is executed as sequence of binary
joins.
 Its more time consuming to execute them in the order
suggested by parse tree.
 For each portion of subtree that consists of nodes
with the same associative and commutative
operator (natural join, union, and intersection), we
group the nodes with these operators into a single
node with many children.
THE EFFECT OF QUERY REWRITING
Π movieTitle
Starname = name
StarsIn
σbirthdate LIKE ‘%1960’
MovieStar
FINAL STEP IN PRODUCING LOGICAL QUERY
PLAN
=>
R
U
U
U
R
S
T
V
W
U
U
S
T
V
W
AN EXAMPLE TO SUMMARIZE
“find movies where the average age of the stars was at
most 40 when the movie was made”
 SELECT distinct m1.movieTitle, m1,movieYear
FROM StarsIn m1
WHERE m1.movieYear – 40 <= (
SELECT AVG (birthdate)
FROM StartsIn m2, MovieStar s
WHERE m2.starName = s.name AND
m1.movieTitle = m2.movieTitle AND
m1.movieYear = m2.movieyear
);

SELECT distinct m1.movieTitle, m1,movieYear
FROM StarsIn m1
WHERE m1.movieYear – 40 <= (
SELECT AVG (birthdate)
FROM StartsIn m2, MovieStar s
WHERE m2.starName = s.name AND
m1.movieTitle = m2.movieTitle AND
m1.movieYear = m2.movieyear );
SELECTIONS COMBINED WITH A PRODUCT TO TURN
THE PAIR OF OPERATIONS INTO AN EQUIJOIN…
CONDITION PUSHED UP THE EXPRESSION TREE…
`
SELECTIONS COMBINED…
SECTION 16.4
ESTIMATING THE COST OF OPERATIONS
 After
getting to the logical query plan, we turn it
into physical plan.
 Consider all the possible physical plan and estimate
their costs – this evaluation is known as cost-based
enumeration.
 The one with least estimated cost is the one selected
to be passed to the query-execution engine.
SELECTION FOR EACH PHYSICAL PLAN
 We




select for each physical plan:
An order and grouping for associative-andcommutative operations like joins, unions, and
intersections.
An algorithm for each operator in the logical plan, for
instance, deciding whether a nested-loop join or hashjoin should be used.
Additional operators – scanning, sorting etc. – that are
needed for the physical plan but that were not present
explicitly in the logical plan.
The way in which the arguments are passed from on
operator to the next.
ESTIMATING SIZES OF INTERMEDIATE
RELATIONS
1.
2.
3.
Give accurate estimates.
Are easy to compute.
Are logically consistent; that is, the size
estimate for an intermediate relation should
not depend on how that relation is computed.
ESTIMATING THE SIZE OF A PROJECTION
 We
should treat a classical, duplicateeliminating projection as a bag-projection.
 The size of the result can be computed exactly.
 There may be reduction in size (due to
eliminated components) or increase in size (due
to new components created as combination of
attributes).
ESTIMATING THE SIZE OF A SELECTION
 While
performing selection, we may reduce the
number of tuples but the sizes of tuple remain
same.
 Size can be computed as:
S=σ
(R)
A=c
Where A is an attribute of R and c is a constant
 The
recommended estimate is
T(S) = T(R)/ V(R,A)
SECTION 16.6
INTRODUCTION

This section focuses on critical problem in cost-based
optimization:


Selecting order for natural join of three or more relations
Compared to other binary operations, joins take more
time and therefore need effective optimization
techniques
SIGNIFICANCE OF LEFT AND RIGHT
JOIN ARGUMENTS
The argument relations in joins determine the cost of
the join
 The left argument of the join is

Called the build relation
 Assumed to be smaller
 Stored in main-memory

SIGNIFICANCE OF LEFT AND RIGHT
JOIN ARGUMENTS

The right argument of the join is
Called the probe relation
 Read a block at a time
 Its tuples are matched with those of build relation


The join algorithms which distinguish between the
arguments are:
One-pass join
 Nested-loop join
 Index join

JOIN TREES
Order of arguments is important for joining two
relations
 Left argument, since stored in main-memory, should
be smaller
 With two relations only two choices of join tree
 With more than two relations, there are n! ways to
order the arguments and therefore n! join trees, where
n is the no. of relations

JOIN TREES

Total # of tree shapes T(n) for n relations given by
recurrence:
 T(1)
=1
 T(2) = 1
 T(3) = 2
 T(4) = 5 … etc
LEFT-DEEP JOIN TREES

Consider 4 relations. Different ways to join them are
as follows
JOIN ORDER

Join order selection
A1
A2
A3
..
 Left deep join trees

An
An

Dynamic programmingAi

Best plan computed for each subset of relations
 Best plan (A1, .., An) = min cost plan of(
Best plan(A2, .., An)
A1
Best plan(A1, A3, .., An)
A2
….
Best plan(A1, .., An-1))
An
DYNAMIC PROGRAMMING TO SELECT A
JOIN ORDER AND GROUPING
Three choices to pick an order for the join of many
relations are:
 Consider all of the relations
 Consider a subset
 Use a heuristic o pick one
 Dynamic programming is used either to consider all or
a subset
 Construct a table of costs based on relation size
 Remember only the minimum entry which will
required to proceed

DYNAMIC PROGRAMMING TO SELECT A
JOIN ORDER AND GROUPING
DYNAMIC PROGRAMMING TO SELECT A
JOIN ORDER AND GROUPING
DYNAMIC PROGRAMMING TO SELECT A
JOIN ORDER AND GROUPING
DYNAMIC PROGRAMMING TO SELECT A
JOIN ORDER AND GROUPING
A GREEDY ALGORITHM FOR
SELECTING A JOIN ORDER
It is expensive to use an exhaustive method like
dynamic programming
 Better approach is to use a join-order heuristic for the
query optimization
 Greedy algorithm is an example of that


Make one decision at a time about order of join and never
backtrack on the decisions once made
SECTION 16.7
BEFORE COMPLETE PHYSICAL-QUERYPLAN

A query previously has been





Parsed and Preprocessed (16.1)
Converted to Logical Query Plans (16.3)
Estimated the Costs of Operations (16.4)
Determined costs by Cost-Based Plan Selection (16.5)
Weighed costs of join operations by choosing an
Order for Joins
16.7 COMPLETING THE PHYSICAL-QUERYPLAN

3 topics related to turning LP into a complete
physical plan
1.
2.
3.
Choosing of physical implementations such as
Selection and Join methods
Decisions regarding to intermediate results
(Materialized or Pipelined)
Notation for physical-query-plan operators
I. CHOOSING A SELECTION METHOD
(A)

Algorithms for each selection operators
1. Can we use an created index on an attribute?
 If yes, index-scan. Otherwise table-scan)
2. After retrieve all condition-satisfied tuples in (1),
then filter them with the rest selection conditions
CHOOSING A SELECTION METHOD(A)
(CONT.)


Recall  Cost of query = # disk I/O’s
How costs for various plans are estimated from σC(R)
operation
1. Cost of table-scan algorithm
a)
B(R)
if R is clustered
b)
T(R)
if R is not clustered
2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan
a)
B(R) / V(R, a)
clustering index
b)
T(R) / V(R, a)
nonclustering index
3. Cost of a plan picking an inequality term (e.g. b < 20) w/ indexscan
a)
B(R) / 3
clustering index
b)
T(R) / 3
nonclustering index
EXAMPLE
Selection:
σx=1 AND y=2 AND z<5 (R)
- Where paremeters of R(x, y, z) are :
T(R)=5000,
V(R,x)=100, and
-
B(R)=200,
V(R, y)=500
Relation R is clustered
x, y have nonclustering indexes, only index on z
is clustering.
EXAMPLE (CONT.)
Selection options:
1.
2.
3.
4.
Table-scan  filter x, y, z. Cost is B(R) = 200
since R is clustered.
Use index on x =1  filter on y, z. Cost is 50
since T(R) / V(R, x) is (5000/100) = 50 tuples, index
is not clustering.
Use index on y =2  filter on x, z. Cost is 10
since T(R) / V(R, y) is (5000/500) = 10 tuples using
nonclustering index.
Index-scan on clustering index w/ z < 5  filter x
,y. Cost is about B(R)/3 = 67
EXAMPLE (CONT.)
Costs
option 1 = 200
option 2 = 50
option 3 = 10

option 4 = 67
The lowest Cost is option 3.

Therefore, the preferred physical plan

1.
2.
retrieves all tuples with y = 2
then filters for the rest two conditions (x, z).
II. CHOOSING A JOIN METHOD
 Determine
costs associated with each join
algorithms:
1. One-pass join, and nested-loop join devotes
enough buffer to joining
2. Sort-join is preferred when attributes are presorted or two or more join on the same
attribute such as
(R(a, b) S(a, c)) T(a, d)
- where sorting R and S on a will produce
result of R S to be sorted on a and used
directly in next join
CHOOSING A JOIN METHOD
(CONT.)
3. Index-join for a join with high chance of using index
created on the join attribute such as R(a,
b)
S(b, c)
4. Hashing join is the best choice for unsorted or nonindexing relations which needs multipass join.
III. PIPELINING VERSUS
MATERIALIZATION

Materialization (naïve way)


store (intermediate) result of each operations on disk
Pipelining (more efficient way)

Interleave the execution of several operations, the
tuples produced by one operation are passed directly to
the operations that used it

store (intermediate) result of each operations on buffer,
which is implemented on main memory
IV. PIPELINING UNARY OPERATIONS
Unary = a-tuple-at-a-time or full relation
 selection and projection are the best candidates
for pipelining.

In buf
Unary
operation
Out buf
Unary
operation
Out buf
R
In buf
M-1 buffers
PIPELINING UNARY OPERATIONS
(CONT.)
 Pipelining
Unary Operations are
implemented by iterators
V. PIPELINING BINARY OPERATIONS
Binary operations : ,  , - , , x
 The results of binary operations can also be
pipelined.
 Use one buffer to pass result to its consumer, one
block at a time.
 The extended example shows tradeoffs and
opportunities

EXAMPLE
 Consider
physical query plan for the
expression
(R(w, x) S(x, y)) U(y, z)
 Assumption




R occupies 5,000 blocks, S and U each 10,000
blocks.
The intermediate result R S occupies k blocks for
some k.
Both joins will be implemented as hash-joins,
either one-pass or two-pass depending on k
There are 101 buffers available.
EXAMPLE (CONT.)
 First
consider join
R S, neither relations
fits in buffers
 Needs two-pass
hash-join to partition
R into 100 buckets
(maximum possible) each bucket has 50 blocks
 The 2nd pass hash-join uses 51 buffers,
leaving the rest 50 buffers for joining result
of R S with U.
EXAMPLE (CONT.)


Case 1: suppose k  49, the result of
occupies at most 49 blocks.
Steps
1.
2.
3.
4.
R
S
Pipeline in R S into 49 buffers
Organize them for lookup as a hash table
Use one buffer left to read each block of U in turn
Execute the second join as one-pass join.
EXAMPLE (CONT.)
 The
total number of
I/O’s is 55,000
45,000 for two-pass hash
join of R and S
 10,000 to read U for onepass hash join of (R S)
U.

EXAMPLE (CONT.)

1.
2.
3.
Case 2: suppose k > 49 but < 5,000, we
can still pipeline, but need another
strategy which intermediate results join
with U in a 50-bucket, two-pass hashjoin. Steps are:
Before start on R S, we hash U into 50
buckets of 200 blocks each.
Perform two-pass hash join of R and U using 51
buffers as case 1, and placing results in 50
remaining buffers to form 50 buckets for the
join of R S with U.
Finally, join R S with U bucket by bucket.
EXAMPLE (CONT.)

The number of disk I/O’s is:
20,000 to read U and write its tuples into buckets
 45,000 for two-pass hash-join R S
 k to write out the buckets of R S
 k+10,000 to read the buckets of R S and U in the
final join


The total cost is 75,000+2k.
EXAMPLE (CONT.)

Compare Increasing I/O’s between case 1 and case 2

k  49 (case 1)


Disk I/O’s is 55,000
k > 50  5000 (case 2)
k=50 , I/O’s is 75,000+(2*50) = 75,100
 k=51 , I/O’s is 75,000+(2*51) = 75,102
 k=52 , I/O’s is 75,000+(2*52) = 75,104

Notice: I/O’s discretely grows as k increases from 49
50.
EXAMPLE (CONT.)

Case 3: k > 5,000, we cannot perform two-pass
join in 50 buffers available if result of
is pipelined. Steps are
1.
2.
R
S
Compute R S using two-pass join and
store the result on disk.
Join result on (1) with U, using two-pass
join.
EXAMPLE (CONT.)


The number of disk I/O’s is:

45,000 for two-pass hash-join R and S

k to store R

30,000 + k for two-pass join of U in R S
S on disk
The total cost is 75,000+4k.
SECTION 16.7-16.8
VI. NOTATION FOR PHYSICAL QUERY
PLANS

Several types of operators:
1.
2.
3.
4.

Operators for leaves
(Physical) operators for Selection
(Physical) Sorts Operators
Other Relational-Algebra Operations
In practice, each DBMS uses its own internal
notation for physical query plan.
NOTATION FOR PHYSICAL QUERY
PLANS (CONT.)
Operator for leaves
1.

A leaf operand is replaced in LQP tree




TableScan(R) : read all blocks
SortScan(R, L) : read in order according to L
IndexScan(R, C): scan index attribute A by
condition C of form Aθc.
IndexScan(R, A) : scan index attribute R.A. This
behaves like TableScan but more efficient if R is not
clustered.
NOTATION FOR PHYSICAL QUERY
PLANS (CONT.)
(Physical) operators for Selection
2.

Logical operator σC(R) is often combined with access
methods.

If σC(R) is replaced by Filter(C), and there is no index on R
or an attribute on condition C


Use TableScan
or SortScan(R, L) to access R
If condition C  Aθc AND D for condition D, and there is an
index on R.A, then we may
Aθc) to access R and

Use operator IndexScan(R,

Use Filter(D) in place of the selection σC(R)
NOTATION FOR PHYSICAL QUERY
PLANS (CONT.)
(Physical) Sort Operators
3.



Sorting can occur any point in physical plan,
which use a notation SortScan(R, L).
It is common to use an explicit operator Sort(L) to
sort relation that is not stored.
Can apply at the top of physical-query-plan tree if
the result needs to be sorted with ORDER BY
clause (г).
NOTATION FOR PHYSICAL QUERY
PLANS (CONT.)
Other Relational-Algebra Operations
4.

Descriptive text definitions and signs to
elaborate





Operations performed e.g. Join or grouping.
Necessary parameters e.g. theta-join or list of elements
in a grouping.
A general strategy for the algorithm e.g. sort-based,
hashed based, or index-based.
A decision about number of passed to be used e.g. onepass, two-pass or multipass.
An anticipated number of buffers the operations will
required.
NOTATION FOR PHYSICAL QUERY PLANS
(CONT.)
 Example

of a physical-query-plan
A physical-query-plan in example 16.36 for the
case k > 5000
TableScan
 Two-pass hash join
 Materialize (double line)
 Store operator

NOTATION FOR PHYSICAL QUERY PLANS
(CONT.)
 Another

example
A physical-query-plan in example 16.36 for the
case k < 49
TableScan
 (2) Two-pass hash join
 Pipelining
 Different buffers needs
 Store operator

NOTATION FOR PHYSICAL QUERY PLANS
(CONT.)
A


physical-query-plan in example 16.35
Use Index on condition y = 2 first
Filter with the rest condition later on.
VII. ORDERING OF PHYSICAL OPERATIONS
The PQP is represented as a tree structure
implied order of operations.
Still, the order of evaluation of interior nodes
may not always be clear.




Iterators are used in pipeline manner
Overlapped time of various nodes will make
“ordering” no sense.
ORDERING OF PHYSICAL OPERATIONS
(CONT.)

3 rules summarize the ordering of events in a
PQP tree:
1.
Break the tree into sub-trees at each edge that
represent materialization.

2.
Order the execution of the subtree


3.
Execute one subtree at a time.
Bottom-top
Left-to-right
All nodes of each sub-tree are executed
simultaneously.
COMPILATION OF QUERIES
Compilation means turning a query into a physical
query plan, which can be implemented by query
engine.
 Steps of query compilation :

Parsing
 Semantic checking
 Selection of the preferred logical query plan
 Generating the best physical plan

THE PARSER
The first step of SQL query processing.
 Generates a parse tree
 Nodes in the parse tree corresponds to the SQL
constructs
 Similar to the compiler of a programming language

SEMANTIC CHECKING
Checks the semantics of a SQL query.
 Examines a parse tree.
 Checks :

Attributes
 Relation names
 Types


Resolves attribute references.
CONVERSION TO A LOGICAL QUERY
PLAN
Converts a semantically parsed tree to a algebraic
expression.
 Conversion is straightforward but sub queries need to
be optimized.
 Two argument selection approach can be used.

ALGEBRAIC TRANSFORMATION
Many different ways to transform a logical query plan
to an actual plan using algebraic transformations.
 The laws used for this transformation :
 Commutative and associative laws
 Laws involving selection
 Pushing selection
 Laws involving projection
 Laws about joins and products
 Laws involving duplicate eliminations
 Laws involving grouping and aggregation

ESTIMATING SIZES OF RELATIONS
True running time is taken into consideration when
selecting the best logical plan.
 Two factors the affects the most in estimating the
sizes of relation :

Size of relations ( No. of tuples )
 No. of distinct values for each attribute of each relation


Histograms are used by some systems.
COST BASED OPTIMIZING
Best physical query plan represents the least costly
plan.
 Factors that decide the cost of a query plan :

Order and grouping operations like joins, unions and
intersections.
 Nested loop and the hash loop joins used.
 Scanning and sorting operations.
 Storing intermediate results.

PLAN ENUMERATION STRATEGIES

Common approaches for searching the space for best
physical plan .
Dynamic programming : Tabularizing the best plan for
each sub expression
 Selinger style programming : sort-order the results as a
part of table
 Greedy approaches : Making a series of locally optimal
decisions
 Branch-and-bound : Starts with enumerating the worst
plans and reach the best plan

LEFT-DEEP JOIN TREES
Left – Deep Join Trees are the binary trees with a
single spine down the left edge and with leaves as
right children.
 This strategy reduces the number of plans to be
considered for the best physical plan.
 Restrict the search to Left – Deep Join Trees when
picking a grouping and order for the join of several
relations.

PHYSICAL PLANS FOR SELECTION
Breaking a selection into an index-scan of relation,
followed by a filter operation.
 The filter then examines the tuples retrieved by the
index-scan.
 Allows only those to pass which meet the portions of
selection condition.

PIPELINING VERSUS MATERIALIZING
This flow of data between the operators can be
controlled to implement “ Pipelining “ .
 The intermediate results should be removed from
main memory to save space for other operators.
 This techniques can implemented using “
materialization “ .
 Both the pipelining and the materialization should be
considered by the physical query plan generator.
 An operator always consumes the result of other
operator and is passed through the main memory.

CHAPTER 18
SECTION 18.1-18.2
CONCURRENCY CONTROL
Concurrency control in database management systems
(DBMS) ensures that database transactions are
performed concurrently without the concurrency
violating the data integrity of a database.
 Executed transactions should follow the ACID rules.
The DBMS must guarantee that only serializable
(unless Serializability is intentionally relaxed),
recoverable schedules are generated.
 It also guarantees that no effect of committed
transactions is lost, and no effect of aborted (rolled
back) transactions remains in the related database.

TRANSACTION ACID RULES
Atomicity - Either the effects of all or none of its operations
remain when a transaction is completed - in other words, to
the outside world the transaction appears to be indivisible,
atomic.
Consistency - Every transaction must leave the database in
a consistent state.
Isolation - Transactions cannot interfere with each other.
Providing isolation is the main goal of concurrency control.
Durability - Successful transactions must persist through
crashes.
SERIAL
AND
SERIALIZABLE SCHEDULES
In the field of databases, a schedule is a list of actions, (i.e. reading, writing,
aborting, committing), from a set of transactions.
 In this example, Schedule D is the set of 3 transactions T1, T2, T3. The schedule
describes the actions of the transactions as seen by the DBMS. T1 Reads and
writes to object X, and then T2 Reads and writes to object Y, and finally T3 Reads
and writes to object Z. This is an example of a serial schedule, because the actions
of the 3 transactions are not interleaved.

SERIAL


AND
SERIALIZABLE SCHEDULES
A schedule that is equivalent to a serial schedule has the serializability property.
In schedule E, the order in which the actions of the transactions are executed is not
the same as in D, but in the end, E gives the same result as D.
SERIAL SCHEDULE TI PRECEDES T2A
T1
Read(A); A  A+100
Write(A);
Read(B); B  B+100;
Write(B);
T2
25
B
25
125
Read(A);A  A2;
Write(A);
Read(B);B  B2;
Write(B);
125
250
250
250
250
SERIAL SCHEDULE T2 PRECEDES
TL
T1
T2
Read(A);A  A2;
Write(A);
Read(B);B  B2;
Write(B);
Read(A); A  A+100
Write(A);
Read(B); B  B+100;
Write(B);
A
25
B
25
50
50
150
150
150
150
SERIALIZABLE, BUT NOT SERIAL,
SCHEDULE
T1
Read(A); A  A+100
Write(A);
T2
Read(A);A  A2;
Write(A);
Read(B); B  B+100;
Write(B);
A
25
125
250
125
Read(B);B  B2;
Write(B);
250
r1(A); w1 (A): r2(A); w2(A); r1 (B); w1 (B); r2(B); w2(B);
B
25
250
250
NONSERIALIZABLE SCHEDULE
T1
Read(A); A  A+100
Write(A);
T2
Read(A);A  A2;
Write(A);
Read(B);B  B2;
Write(B);
A
25
B
25
125
250
50
Read(B); B  B+100;
Write(B);
250
150
150
SCHEDULE THAT IS SERIALIZABLE ONLY BECAUSE OF THE
DETAILED BEHAVIOR OF THE TRANSACTIONS
T1
Read(A); A  A+100
Write(A);
T2’
Read(A);A  A1;
Write(A);
Read(B);B  B1;
Write(B);
A
25
125
125
25
Read(B); B  B+100;
Write(B);

regardless of the consistent initial state: the final state will be consistent.
B
25
125
125
125
NON-CONFLICTING ACTIONS
Two actions are non-conflicting if whenever they
occur consecutively in a schedule, swapping them
does not affect the final state produced by the
schedule. Otherwise, they are conflicting.
CONFLICTING ACTIONS: GENERAL
RULES

Two actions of the same transaction conflict:


r1(A) w1(B)
Two actions over the same database element
conflict, if one of them is a write
r1(A) w2(A)
 w1(A) w2(A)

CONFLICT



ACTIONS
Two or more actions are said to be in conflict if:

The actions belong to different transactions.

At least one of the actions is a write operation.

The actions access the same object (read or write).
The following set of actions is conflicting:

T1:R(X), T2:W(X), T3:W(X)
While the following sets of actions are not:

T1:R(X), T2:R(X), T3:R(X)

T1:R(X), T2:W(Y), T3:R(X)
CONFLICT SERIALIZABLE
We may take any schedule and make as many
nonconflicting swaps as we wish.

With the goal of turning the schedule into a serial
schedule.

If we can do so, then the original schedule is
serializable, because its effect on the database
state remains the same as we perform each of the
nonconflicting
swaps.

CONFLICT SERIALIZABLE



A schedule is said to be conflict-serializable when the schedule is conflict-equivalent
to one or more serial schedules.
Another definition for conflict-serializability is that a schedule is conflictserializable if and only if there exists an acyclic precedence graph/serializability
graph for the schedule.
Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>.
CONFLICT EQUIVALENT / CONFLICTSERIALIZABLE



Let Ai and Aj are consecutive non-conflicting
actions that belongs to different transactions.
We can swap Ai and Aj without changing the
result.
Two schedules are conflict equivalent if they can
be turned one into the other by a sequence of
non-conflicting swaps of adjacent actions.
We shall call a schedule conflict-serializable if it
is conflict-equivalent to a serial schedule.
CONFLICT-SERIALIZABLE
T1
R(A)
W(A)
R(A)
T2
R(B)
W(B)
W(A)
R(B)
W(B)
CONFLICT-SERIALIZABLE
T1
R(A)
W(A)
R(A)
W(B)
T2
Serial
Schedule
R(B)
W(A)
R(B)
W(B)
SECTION 18.3-18.4
INTRODUCTION
 Enforcing
serializability by locks
Locks
 Locking scheduler
 Two phase locking

 Locking
systems with several lock modes
Shared and exclusive locks
 Compatibility matrices
 Upgrading/updating locks
 Incrementing locks

LOCKS
It works like as follows :




A request from transaction
Scheduler checks in the lock table
Generates a serializable schedule of actions.
CONSISTENCY OF TRANSACTIONS

Actions and locks must relate each other
Transactions can only read & write only if has a lock
and has not released the lock.
 Unlocking an element is compulsory.


Legality of schedules

No two transactions can aquire the lock on same
element without the prior one releasing it.
LOCKING SCHEDULER
 Grants
lock requests only if it is in a legal
schedule.
 Lock table stores the information about current
locks on the elements.
THE LOCKING SCHEDULER (CONTD.)
A
legal schedule of consistent transactions but
unfortunately it is not a serializable.
LOCKING SCHEDULE (CONTD.)

The locking scheduler delays requests that would
result in an illegal schedule.
TWO-PHASE LOCKING
Guarantees a legal schedule of consistent
transactions is conflict-serializable.
 All lock requests proceed all unlock requests.
 The growing phase:



Obtain all the locks and no unlocks allowed.
The shrinking phase:

Release all the locks and no locks allowed.
WORKING OF TWO-PHASE LOCKING
Assures serializability.
 Two protocols for 2PL:

Strict two phase locking : Transaction holds all its
exclusive locks till commit / abort.
 Rigorous two phase locking : Transaction holds
all locks till commit / abort.


Possible to find a transaction Tj that has a 2PL
and a schedule S for Ti ( non 2PL ) and Tj that is
not conflict serializable.
FAILURE OF 2PL.

2PL fails to provide security against deadlocks.
SHARED & EXCLUSIVE LOCKS

Consistency of Transactions
 Cannot write without Exclusive Lock
 Cannot read without holding some lock
 This


basically works on 2 principles
A read action can only proceed a shared or an
exclusive lock
A write lock can only proceed a exclusice lock
 All
locks need to be unlocked before
commit
SHARED AND EXCLUSIVE LOCKS (CONT.)
Two-phase locking of transactions
 Must precede unlocking
 Legality of Schedules
 An element may be locked exclusively by one
transaction or by several in shared mode, but not
both.

COMPATIBILITY MATRICES

Has a row and column for each lock mode.
Rows correspond to a lock held on an element by
another transaction
 Columns correspond to mode of lock requested.
 Example :

LOCK REQUESTE
D
S
X
LOCK
S
YES
NO
HOLD
X
NO
NO
UPGRADING LOCKS

Suppose a transaction wants to read as well as
write :
It aquires a shared lock on the element
 Performs the calculations on the element
 And when its ready to write, It is granted a exclusive
lock.


Transactions with unpredicted read write locks
can use UPGRADING LOCKS.
UPGRADING LOCKS (CONT.)
Indiscriminating use of upgrading produces a
deadlock.
 Example : Both the transactions want to upgrade
on the same element

UPDATE LOCKS
Solves the deadlock occurring in upgrade lock
method.
 A transaction in an update lock can read but cant
write.
 Update lock can later be converted to exclusive
lock.
 An update lock can only be given if the element
has shared locks.

UPDATE LOCKS (CONT.)
An update lock is like a shared lock when you are
requesting it and is like a exclusive lock when
you have it.
 Compatibility matrix :

S
X
U
S
YES
NO
YES
X
NO
NO
NO
U
NO
NO
NO
INCREMENT LOCKS
Used for incrementing & decrementing stored
values.
 E.g. - Transfer money from one bank to another,
Ticket selling transactions in which number
seats are decremented after each transaction.

INCREMENT LOCK (CONT.)
A increment lock does not enable read or write
locks on element.
 Any number of transaction can hold increment
lock on element
 Shared and exclusive locks can not be granted if
an increment lock is granted on element

S
X
I
S
YES
NO
NO
X
NO
NO
NO
I
NO
NO
YES
SECTION 18.4
18.4 LOCKING SYSTEMS WITH SEVERAL
LOCK MODES

In 18.3, if a transaction must lock a database element
(X) either reads or writes,


No reason why several transactions could not read X at the
same time, as long as none write X
Introduce locking schemes
Shared/Read Lock ( For Reading)
 Exclusive/Write Lock( For Writing)

18.4.1 SHARED & EXCLUSIVE LOCKS

Transaction Consistency
 Cannot write without Exclusive Lock
 Cannot read without holding some lock

Consider lock for writing is “stronger” than for
reading
 This
basically works on 2 principles
1. A read action can only proceed a shared or an
exclusive lock
2. A write lock can only proceed a exclusive lock
 All
locks need to be unlocked before commit
18.4.1 SHARED & EXCLUSIVE LOCKS
(CONT.)

Two-phase locking (2PL) of transactions
Ti

Lock  R/W  Unlock
Notation:
sli (X)– Ti requests shared lock on DB element X
xli (X)– Ti requests exclusive lock on DB element X
ui (X)– Ti relinquishes whatever lock on X
18.4.1 SHARED & EXCLUSIVE LOCKS
(CONT.)

Legality of Schedules
 An element may be locked by: one write transaction
or by several read transactions shared mode, but not
both
18.4.2 COMPATIBILITY MATRICES

A convenient way to describe lock-management
policies
Rows correspond to a lock held on an element by
another transaction
 Columns correspond to mode of lock requested.
 Example :

Lock requested
Lock in
hold
S
X
S
YES
NO
X
NO
NO
18.4.3 UPGRADING LOCKS
A
transaction (T) taking a shared lock is
friendly toward other transaction.
 When
T wants to read and write a new value
X,
1. T takes a shared lock on X.
2. performs operations on X (may spend long time)
3. When T is ready to write a new value, “Upgrade”
shared lock to exclusive lock on X.
18.4.3 UPGRADING LOCKS (CONT.)
 Observe
the example
T1 retry and
succeed

‘B’ is released
T1 cannot take an exclusive lock on B until all locks on B
are released.
18.4.3 UPGRADING LOCKS (CONT.)

Upgrading can simply cause a “Deadlock”.

Both the transactions want to upgrade on the same
element
Both transactions will wait forever !!
18.4.4 UPDATE LOCKS
The
third lock mode resolving the
deadlock problem, which rules are
Only “Update lock” can be upgraded to a write (exclusive)
lock later.
 An “Update lock” is allowed to grant on X when there are
already shared locks on X.
 Once there is an “Update lock,” it prevents additional
any kinds of lock, and later changes to a write (exclusive)
lock.

Notation:
uli (X)
18.4.4 UPDATE LOCKS (CONT.)

Example
18.4.4 UPDATE LOCKS (CONT.)
• Compatibility matrix (asymmetric)
Lock requested
Lock in
hold
S
X
U
S
YES
NO
YES
X
NO
NO
NO
U
NO
NO
NO
18.4.5 INCREMENT LOCKS

A useful lock for transactions which
increase/decrease value.
e.g. money transfer between two bank accounts.
 If
2 transactions (T1, T2) add constants to
the same database element (X),

It doesn’t matter which goes first, but no
reads are allowed in between transaction
processing
 Let
see on following exhibits
18.4.5 INCREMENT LOCKS (CONT.)
CASE 1
T1: INC (A,2)
A=7
A=5
T2: INC (A,10)
CASE 2
T2: INC (A,10)
A=17
A=15
T1: INC (A,2)
18.4.5 INCREMENT LOCKS (CONT.)

What if
T1: INC (A,2)
A=5
T2: INC (A,10)
A=15
A=5
T2: INC (A,10)
A=5
A=7
A=5
A=15
A != 17
T1: INC (A,2)
A=7
18.4.5 INCREMENT LOCKS (CONT.)

INC (A, c) –


Increment action of writing on database element A,
which is an atomic execution consisting of
1. READ(A,t);
2. t = t+c;
3. WRITE(A,t);
Notation:
 ili (X)– action of Ti requesting an increment lock
on X
 inci (X)– action of Ti increments X by some
constant; don’t care about the value of the
constant.
18.4.5 INCREMENT LOCKS (CONT.)

Example
18.4.5 INCREMENT LOCKS (CONT.)
• Compatibility matrix
Lock requested
Lock in
hold
S
X
I
S
YES
NO
NO
X
NO
NO
NO
I
NO
NO
YES
CONCURRENCY CONTROL
Chapter 18
Section 18.5
Presented by
Khadke, Suvarna
CS 257
(Section II) Id 213
SECTION 18.5
OVERVIEW

Assume knowledge of:
Lock
 Two phase lock
 Lock modes: shared, exclusive, update


A simple scheduler architecture based on
following principle :
Insert lock actions into the stream of reads, writes,
and other actions
 Release locks when the transaction manager tells it
that the transaction will commit or abort

SCHEDULER THAT INSERTS LOCK
ACTIONS INTO THE TRANSACTIONS
REQUEST STREAM
SCHEDULER THAT INSERTS LOCK
ACTIONS
If transaction is delayed, waiting for a lock,
Scheduler performs following actions
 Part I: Takes the stream of requests generated
by the transaction & insert appropriate lock
modes to db operations (read, write, or update)
 Part II: Take actions (a lock or db operation)
from Part I and executes it.
 Determine the transaction (T) that action
belongs and status of T (delayed or not). If T
is not delayed then
1. Database access action is transmitted to the
database and executed
SCHEDULER THAT INSERTS LOCK
ACTIONS
2.
3.
3.
If lock action is received by PartII, it checks the L
Table whether lock can be granted or not
i> Granted, the L Table is modified to include granted
lock
ii>Not G. then update L Table about requested lock then
PartII delays transaction T
When a T = commits or aborts, PartI is notified by the
transaction manager and releases all locks.
If any transactions are waiting for locks PartI
notifies PartII.
Part II when notified about the lock on some DB
element, determines next transaction T’ to get lock to
continue.
THE LOCK TABLE
A relation that associates database elements with
locking information about that element
 Implemented with a hash table using database
elements as the hash key
 Size is proportional to the number of lock elements
only, not to the size of the entire database

DB
element A
Lock
informatio
n for A
LOCK TABLE ENTRIES STRUCTURE
Some Sort of
information found in
Lock Table entry
1>Group modes
-S: only shared locks are
held
-X: one exclusive lock and
no other locks
- U: one update lock and
one or more shared locks
2>wait : one transaction
waiting for a lock on A
3>A list : T currently
hold locks on A or
Waiting for lock on A
HANDLING LOCK REQUESTS
Suppose transaction T requests a lock on A
 If there is no lock table entry for A, then there
are no locks on A, so create the entry and grant
the lock request
 If the lock table entry for A exists, use the group
mode to guide the decision about the lock request

HANDLING LOCK REQUESTS
If group mode is U (update) or X (exclusive)
No other lock can be granted
 Deny the lock request by T
 Place an entry on the list saying T requests a
lock
 And Wait? = ‘yes’
 If group mode is S (shared)
Another shared or update lock can be
granted
 Grant request for an S or U lock
 Create entry for T on the list with Wait? = ‘no’
 Change group mode to U if the new lock is an
update lock

HANDLING UNLOCK REQUESTS




Now suppose transaction T unlocks A
Delete T’s entry on the list for A
If T’s lock is not the same as the group mode, no
need to change group mode
Otherwise check entire list for new group mode
S: GM(S) or nothing
U: GM(S) or nothing
X: nothing

HANDLING UNLOCK REQUESTS
 If the value of waiting is “yes" need to grant one or more locks using
following approaches
First-Come-First-Served:
 Grant the lock to the longest waiting request.
 No starvation (waiting forever for lock)
Priority to Shared Locks:
 Grant all S locks waiting, then one U lock.
 Grant X lock if no others waiting
Priority to Upgrading:
 If there is a U lock waiting to upgrade to an X lock, grant that first.
SECTION 18.6
MANAGING HIERARCHIES OF DATABASE
ELEMENTS
Two problems that arise with locks when there is a
tree structure to the data are:
 When the tree structure is a hierarchy of lockable
elements
 Determine how locks are granted for both large
elements (relations) and smaller elements
(blocks containing tuples or individual tuples)
 When the data itself is organized as a tree (B-tree
indexes)
 This will be discussed in the next section

LOCKS WITH MULTIPLE GRANULARITY
A database element can be a relation, block or a
tuple
 Different systems use different database
elements to determine the size of the lock
 Thus some may require small database elements
such as tuples or blocks and others may require
large elements such as relations

EXAMPLE OF MULTIPLE GRANULARITY
LOCKS

Consider a database for a bank
 Choosing relations as database elements means
we would have one lock for an entire relation
 If we were dealing with a relation having
account balances, this kind of lock would be very
inflexible and thus provide very little
concurrency
 Why? Because balance transactions require
exclusive locks and this would mean only one
transaction occurs for one account at any time
 But as each account is independent of others we
could perform transactions on different accounts
simultaneously
…(CONTD.)
Thus it makes sense to have block element for
the lock so that two accounts on different blocks
can be updated simultaneously
 Another example is that of a document
 With similar arguments as above, we see that it
is better to have large element (a complete
document) as the lock in this case

WARNING (INTENTION) LOCKS
These are required to manage locks at different
granularities
 In the bank example, if the a shared lock is
obtained for the relation while there are
exclusive locks on individual tuples,
unserializable behavior occurs
 The rules for managing locks on hierarchy of
database elements constitute the warning
protocol

DATABASE ELEMENTS ORGANIZED IN
HIERARCHY
RULES OF WARNING PROTOCOL
These involve both ordinary (S and X) and
warning (IS and IX) locks
 The rules are:
 Begin at the root of hierarchy
 Request the S/X lock if we are at the desired
element
 If the desired element id further down the
hierarchy, place a warning lock (IS if S and IX
if X)
 When the warning lock is granted, we proceed
to the child node and repeat the above steps
until desired node is reached

COMPATIBILITY MATRIX FOR SHARED,
EXCLUSIVE AND INTENTION LOCKS
IS
IX
S
X
IS
Yes
Yes
Yes
No
IX
Yes
Yes
No
No
S
Yes
No
Yes
No
X
No
No
No
No
• The above matrix applies only to locks held by
other transactions
GROUP MODES OF INTENTION LOCKS
An element can request S and IX locks at the
same time if they are in the same transaction (to
read entire element and then modify sub
elements)
 This can be considered as another lock mode,
SIX, having restrictions of both the locks i.e. No
for all except IS
 SIX serves as the group mode

EXAMPLE
Consider a transaction T1 as follows
 Select * from table where attribute1 = ‘abc’
 Here, IS lock is first acquired on the entire
relation; then moving to individual tuples
(attribute = ‘abc’), S lock in acquired on each of
them
 Consider another transaction T2
 Update table set attribute2 = ‘def’ where
attribute1 = ‘ghi’
 Here, it requires an IX lock on relation and
since T1’s IS lock is compatible, IX is granted

On reaching the desired tuple (ghi), as there is no
lock, it gets X too
 If T2 was updating the same tuple as T1, it would
have to wait until T1 released its S lock

PHANTOMS AND HANDLING INSERTIONS
CORRECTLY
This arises when transactions create new sub
elements of lockable elements
 Since we can lock only existing elements the new
elements fail to be locked
 The problem created in this situation is explained
in the following example

EXAMPLE

Consider a transaction T3
Select sum(length) from table where attribute1 = ‘abc’
 This calculates the total length of all tuples having
attribute1
 Thus, T3 acquires IS for relation and S for targeted
tuples

 Now, if another transaction T4 inserts a
new tuple having attribute1 = ‘abc’, the
result of T3 becomes incorrect
EXAMPLE (…CONTD.)
This is not a concurrency problem since the serial
order (T3, T4) is maintained
 But if both T3 and T4 were to write an element
X, it could lead to unserializable behavior
 r3(t1);r3(t2);w4(t3);w4(X);w3(L);w3(X)
 r3 and w3 are read and write operations by T3
and w4 are the write operations by T4 and L is
the total length calculated by T3 (t1 + t2)
 At the end, we have result of T3 as sum of
lengths of t1 and t2 and X has value written by
T3
 This is not right; if value of X is considered to
be that written by T3 then for the schedule to
be serializable, the sum of lengths of t1, t2 and
t3 should be considered

EXAMPLE (…CONTD.)
Else if the sum is total length of t1 and t2 then
for the schedule to be serializable, X should
have value written by T4
 This problem arises since the relation has a
phantom tuple (the new inserted tuple), which
should have been locked but wasn’t since it didn’t
exist at the time locks were taken
 The occurrence of phantoms can be avoided if all
insertion and deletion transactions are treated as
write operations on the whole relation

SECTION 18.7
 B-Trees
BASICS
- Tree data structure that keeps data sorted
- allow searches, insertion, and deletion
- commonly used in database and file systems
 Lock
- Enforce limits on access to resources
- way of enforcing concurrency control
 Lock Granularity
- Level and type of information that lock
protects.
TREE
PROTOCOL
 Kind
of graph-based protocol
 Alternate to Two-Phased
Locking (2PL)
 database elements are disjoint
pieces of data
 Nodes of the tree DO NOT
form a hierarchy based on
containment
 Way to get to the node is
through its parent
 Example: B-Tree
ADVANTAGES OF
TREE PROTOCOL
Unlocking
takes less time as
compared to 2PL
Freedom
from deadlocks
FOR
TREE-BASED
LOCKING
Consider B-Tree Index, treating individual
nodes as lockable database elements.
 Concurrent use of B-Tree is not possible
with standard set of locks and 2PL.
 Therefore, a protocol is needed which can
assure serializability by allowing access to
the elements all the way at the bottom of the
tree even if the 2PL is violated.

18.7.1 MOTIVATION FOR
TREE-BASED LOCKING (CONT.)
Reason for : “Concurrent use of B-Tree is not
possible with standard set of locks and 2PL.”
every
transaction must begin with locking
the root node
2PL transactions can not unlock the root
until all the required locks are acquired.
18.7.2 ACCESSING
TREE
STRUCTURED
DATA
Assumptions:
Only
one kind of lock
Consistent transactions
Legal schedules
No 2PL requirement on transaction
18.7.2 RULES FOR ACCESSING TREE
STRUCTURED DATA
RULES:
 First
lock can be at any node.
 Subsequent locks may be acquired only after
parent node has a lock.
 Nodes may be unlocked any time.
 No relocking of the nodes even if the node’s
parent is still locked
18.7.3 WHY TREE
PROTOCOL
WORKS?
Tree protocol implies a serial order on
transactions in the schedule.
Order of precedence:

Ti < s Tj
If
Ti locks the root before Tj, then Ti locks
every node in common with Tj before Tj.
ORDER OF PRECEDENCE
SECTION 18.8
WHAT IS
TIMESTAMPING?
 Scheduler
assign each transaction T
a unique number, it’s timestamp
TS(T).
 Timestamps must be issued in
ascending order, at the time when a
transaction first notifies the
scheduler that it is beginning.
TIMESTAMP TS(T)
 Two
methods of generating
Timestamps.
Use the value of system, clock as the
timestamp.
 Use a logical counter that is
incremented after a new timestamp has
been assigned.

 Scheduler
maintains a table of
currently active transactions and
their timestamps irrespective of the
method used
TIMESTAMPS FOR DATABASE ELEMENT X
AND COMMIT BIT
 RT(X):-
The read time of X, which is the
highest timestamp of transaction that has read
X.
 WT(X):- The write time of X, which is the
highest timestamp of transaction that has
write X.
 C(X):- The commit bit for X, which is true if
and only if the most recent transaction to write
X has already committed.
PHYSICALLY UNREALIZABLE BEHAVIOR
Read too late:
A
transaction U that started after
transaction T, but wrote a value for X
before T reads X.
U writes X
T reads X
T start
U start
PHYSICALLY UNREALIZABLE BEHAVIOR
Write too late
A
transaction U that started after T, but
read X before T got a chance to write X.
U reads X
T writes X
T start
U start
Figure: Transaction T tries to write too late
DIRTY READ
 It
is possible that after T reads the value
of X written by U, transaction U will abort.
U writes X
T reads X
U start
T start
U aborts
T could perform a dirty read if it reads X when shown
RULES FOR TIMESTAMPS-BASED
SCHEDULING
1. Scheduler receives a request rT(X)
a) If TS(T) ≥ WT(X), the read is physically realizable.
1. If C(X) is true, grant the request, if TS(T) >
RT(X), set RT(X) := TS(T); otherwise do not
change RT(X).
2. If C(X) is false, delay T until C(X) becomes true
or transaction that wrote X aborts.
b) If TS(T) < WT(X), the read is physically unrealizable.
Rollback T.
RULES FOR TIMESTAMPS-BASED
SCHEDULING (CONT.)
2. Scheduler receives a request WT(X).
a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically
realizable and must be performed.
1. Write the new value for X,
2. Set WT(X) := TS(T), and
3. Set C(X) := false.
b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is
physically realizable, but there is already a later values
in X.
a. If C(X) is true, then the previous writers of
X is
committed, and ignore the write by T.
b. If C(X) is false, we must delay T.
c) if TS(T) < RT(X), then the write is physically
unrealizable, and T must be rolled back.
RULES FOR TIMESTAMPS-BASED
SCHEDULING (CONT.)
3. Scheduler receives a request to commit T. It must
find all the database elements X written by T and set
C(X) := true. If any transactions are waiting for X to be
committed, these transactions are allowed to proceed.
4. Scheduler receives a request to abort T or decides to
rollback T, then any transaction that was waiting on
an element X that T wrote must repeat its attempt to
read or write.
MULTIVERSION TIMESTAMPS
 Multiversion
schemes keep old versions of
data item to increase concurrency.
 Each successful write results in the
creation of a new version of the data item
written.
 Use timestamps to label versions.
 When a read(X) operation is issued, select
an appropriate version of X based on the
timestamp of the transaction, and return
the value of the selected version.
TIMESTAMPS AND LOCKING
 Generally,
timestamping performs better
than locking in situations where:
 Most transactions are read-only.
 It is rare that concurrent transaction
will try to read and write the same
element.
 In high-conflict situation, locking
performs better than timestamps
SECTION 18.9
INTRODUCTION
What is optimistic concurrency control?
(assumes no unserializable behavior will occur)
 Timestamp- based scheduling and

Validation-based scheduling
(allows T to access data without locks)
VALIDATION BASED SCHEDULING


Scheduler keeps a record of what the active transactions
are doing.
Executes in 3 phases
1.
2.
3.
Read- reads from RS( ), computes local address
Validate- compares read and write sets
Write- writes from WS( )
VALIDATION BASED SCHEDULER


Contains an assumed serial order of transactions.
Maintains three sets:
START( ): set of T’s started but not completed
validation.
 VAL( ): set of T’s validated but not finished the
writing phase.
 FIN( ): set of T’s that have finished.

EXPECTED EXCEPTIONS
1. Suppose there is a transaction U, such that:
 U is in VAL or FIN; that is, U has validated,
 FIN(U)>START(T); that is, U did not finish before T started
 RS(T) ∩WS(T) ≠φ; let it contain database element X.
2. Suppose there is transaction U, such that:
• U is in VAL; U has successfully validated.
•FIN(U)>VAL(T); U did not finish before T entered its validation phase.
•WS(T) ∩ WS(U) ≠φ; let x be in both write sets.
VALIDATION RULES


Check that RS(T) ∩ WS(U)= φ for any previously
validated U that did not finish before T has
started i.e. FIN(U)>START(T).
Check that WS(T) ∩ WS(U)= φ for any previously
validated U that did not finish before T is
validated i.e. FIN(U)>VAL(T)
EXAMPLE
SOLUTION


Validation of U:
Nothing to check
Validation of T:
WS(U) ∩ RS(T)= {D} ∩{A,B}=φ
WS(U) ∩ WS(T)= {D}∩ {A,C}=φ

Validation of V:
RS(V) ∩ WS(T)= {B}∩{A,C}=φ
WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ
RS(V) ∩ WS(U)={B} ∩{D}=φ

Validation of W:
RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}
WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}
WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ (W is not validated)
COMPARISON
Concurrency
control
Mechanisms
Storage Utilization
Delays
Locks
Space in the lock table is
Delays transactions but
proportional to the number
avoids rollbacks
of database elements locked.
Timestamps
Space is needed for read and
write times with every
database element, neither or
not it is currently accessed.
Do not delay the
transactions but cause
them to rollback unless
Interface is low
Validation
Space is used for
timestamps and read or
write sets for each currently
active transaction, plus a
few more transactions that
finished after some
currently active transaction
began.
Do not delay the
transactions but cause
them to rollback unless
interface is low
CHAPTER 21
SECTION 21.1
NEED FOR INFORMATION INTEGRATION
All the data in the world could put in a single
database (ideal database system)
 In the real world (impossible for a single
database):
databases are created independently
hard to design a database to support future use

INCONVENIENT
Record grades for students who pay tuition
 Want to swim in SJSU aquatic center for free in
summer vacation?
(all the cases above cannot achieve the function
by a single database)


Solution: one database
HOW TO INTEGRATE
Start over
build one database: contains all the legacy
databases; rewrite all the applications
result: painful
 Build a layer of abstraction (middleware)
on top of all the legacy databases
this layer is often defined by a collection of
classes
BUT…

HETEROGENEITY PROBLEM
What is Heterogeneity Problem
Aardvark Automobile Co.
1000 dealers has 1000 databases
to find a model at another dealer
can we use this command:
SELECT * FROM CARS
WHERE MODEL=“A6”;

TYPE OF HETEROGENEITY
Communication Heterogeneity
 Query-Language Heterogeneity
 Schema Heterogeneity
 Data type difference
 Value Heterogeneity
 Semantic Heterogeneity

CONCLUSION
One database system is perfect, but impossible
 Independent database is inconvenient
 Integrate database
1. start over
2. middleware
 heterogeneity problem

CHAPTER 21.2
MODES OF INFORMATION INTEGRATION
ID: 219
Name: Qun Yu
Class: CS257 219 Spring 2009
Instructor: Dr. T.Y.Lin
SECTION 21.2
FEDERATIONS
 The simplest architecture for integrating several
DBs
 One to one connections between all pairs of
DBs
 n DBs talk to each other, n(n-1) wrappers are
needed
 Good when communications between DBs are
limited
WRAPPER

Wrapper : a software translates incoming queries and
outgoing answers. In a result, it allows information
sources to conform to some shared schema.
FEDERATIONS DIAGRAM
DB1
DB2
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
2 Wrappers
DB3
DB4
A federated collection of 4 DBs needs 12 components to translate queries
from one to another.
EXAMPLE
Car dealers want to share their inventory. Each dealer queries the
other’s DB to find the needed car.
Dealer-1’s DB relation: NeededCars(model,color,autoTrans)
Dealer-2’s DB relation: Auto(Serial, model, color)
Options(serial,option)
wrapper
Dealer-1’s DB
wrapper
Dealer-2’s DB
EXAMPLE…
For(each tuple(:m,:c,:a) in NeededCars){
if(:a=TRUE){/* automatic transmission wanted */
SELECT serial
FROM Autos, Options
WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’
AND Autos.model = :m AND Autos.color =:c;
}
Else{/* automatic transmission not wanted */
SELECT serial
FROM Auto
WHERE Autos.model = :m AND
Autos.color = :c AND
NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial
AND option=‘autoTrans’);
}
}
Dealer 1 queries Dealer 2 for needed cars
DATA WAREHOUSE



Sources are translated from their local schema to a
global schema and copied to a central DB.
User transparent: user uses Data Warehouse just like
an ordinary DB
User is not allowed to update Data Warehouse
WAREHOUSE DIAGRAM
User
query
result
Warehouse
Combiner
Extractor
Extractor
Source 1
Source 2
EXAMPLE
Construct a data warehouse from sources DB of 2 car dealers:
Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)
Dealer-2’s schema: Auto(serial,model,color)
Options(serial,option)
Warehouse’s schema:
AutoWhse(serialNo,model,color,autoTrans,dealer)
Extractor --- Query to extract data from Dealer-1’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,autoTrans,’dealer1’
From Cars;
EXAMPLE
Extractor --- Query to extract data from Dealer-2’s data:
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’yes’,’dealer2’
FROM Autos,Options
WHERE Autos.serial=Options.serial AND
option=‘autoTrans’;
INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)
SELECT serialNo,model,color,’no’,’dealer2’
FROM Autos
WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial
AND option = ‘autoTrans’);
CONSTRUCT DATA WAREHOUSE
There are mainly 3 ways to constructing
the data in the warehouse:
1)
Periodically reconstructed from the current data in the
sources, once a night or at even longer intervals.
Advantages:
simple algorithms.
Disadvantages:
1) need to shut down the warehouse;
2) data can become out of date.
CONSTRUCT DATA WAREHOUSE
2)
Updated periodically based on the changes(i.e. each
night) of the sources.
Advantages:
involve smaller amounts of data. (important when warehouse is
large and needs to be modified in a short period)
Disadvantages:
1) the process to calculate changes to the warehouse is complex.
2) data can become out of date.
CONSTRUCT DATA WAREHOUSE
3)
Changed immediately, in response to each change or a
small set of changes at one or more of the sources.
Advantages:
data won’t become out of date.
Disadvantages:
requires too much communication, therefore, it is
generally too expensive.
(practical for warehouses whose underlying sources changes
slowly.)
MEDIATORS



Virtual warehouse, which supports a virtual
view or a collection of views, that integrates
several sources.
Mediator doesn’t store any data.
Mediators’ tasks:
1)receive user’s query,
2)send queries to wrappers,
3)combine results from wrappers,
4)send the final result to user.
A MEDIATOR DIAGRAM
Result
User query
Mediator
Query
Result
Result
Wrapper
Query
Result
Source 1
Query
Wrapper
Query
Result
Source 2
EXAMPLE
Same data sources as the example of data warehouse, the mediator
Integrates the same two dealers’ source into a view with schema:
AutoMed(serialNo,model,color,autoTrans,dealer)
When the user have a query:
SELECT sericalNo, model
FROM AkutoMed
Where color=‘red’
EXAMPLE
In this simple case, the mediator forwards the same query to each
Of the two wrappers.
Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)
SELECT serialNo,model
FROM cars
WHERE color = ‘red’;
Wrapper2: Autos(serial,model,color); Options(serial,option)
SELECT serial, model
FROM Autos
WHERE color=‘red’;
The mediator needs to interprets serial into serialNo, and then
returns the union of these sets of data to user.
SECTION 21.3
WRAPPERS IN MEDIATOR-BASED SYSTEMS
More complicated than that in most data
warehouse system.
 Able to accept a variety of queries from the
mediator and translate them to the terms of
the source.
 Communicate the result to the mediator.

Templates for Query Patterns:

Use notation T=>S to express the idea
that the template T is turned by the
wrapper into the source query S.

Example 1
Dealer 1
Cars (serialNo, model, color, autoTrans,
navi,…)
For use by a mediator with schema
AutoMed (serialNo, model, color,
autoTrans, dealer)

We denote the code representing that color by the
parameter $c, then the template will be:
SELECT *
FROM AutosMed
WHERE color = ’$c’;
=>
SELECT serialNo, model, color, autoTrans, ’dealer1’
FROM Cars
WHERE color=’$c’;
(Template T => Source query S)
There will be total 2n templates if we have the option of
specifying n attributes.
WRAPPER GENERATORS
The wrapper generator creates a table holds the
various query patterns contained in the
templates.
 The source queries that are associated with each.

A driver is used in each wrapper, the task of the driver
is to:
Accept a query from the mediator.
 Search the table for a template that matches
the query.
 The source query is sent to the source, again
using a “plug-in” communication mechanism.
 The response is processed by the wrapper.

Filter
 Have
a wrapper filter to supporting more
queries.
Example 2
 If wrapper is designed with more complicated template
with queries specify both model and color.

SELECT *
FROM AutosMed
WHERE model = ’$m’ AND color = ’$c’;
=>
SELECT serialNo, model, color, autoTrans, ’dealer1’
FROM Cars
WHERE model = ’$m’ AND color=’$c’;
Solution:
1. Use template with $c=‘blue’ find all blue
cars and store them in a temporary relation:
TemAutos (serialNo, model, color,
autoTrans, dealer)
2.The wrapper then return to the mediator
the desired set of automobiles by excuting the
local query:
SELECT*
FROM TemAutos
WHERE model= ’Gobi’;
SECTION 21.4-21.5
21.4 CAPABILITY BASED OPTIMIZATION

Introduction
Typical DBMS estimates the cost of each query plan
and picks what it believes to be the best
 Mediator – has knowledge of how long its sources will
take to answer
 Optimization of mediator queries cannot rely on cost
measure alone to select a query plan
 Optimization by mediator follows capability based
optimization

21.4.1 THE PROBLEM OF LIMITED SOURCE
CAPABILITIES
Many sources have only Web Based interfaces
 Web sources usually allow querying through a
query form
 E.g. Amazon.com interface allows us to query
about books in many different ways.
 But we cannot ask questions that are too general


E.g. Select * from books;
21.4.1 THE PROBLEM OF LIMITED SOURCE
CAPABILITIES (CON’T)

Reasons why a source may limit the ways in
which queries can be asked
Earliest database did not use relational DBMS that
supports SQL queries
 Indexes on large database may make certain queries
feasible, while others are too expensive to execute
 Security reasons


E.g. Medical database may answer queries about averages,
but won’t disclose details of a particular patient's
information
21.4.2 A NOTATION FOR DESCRIBING
SOURCE CAPABILITIES


For relational data, the legal forms of queries are
described by adornments
Adornments – Sequences of codes that represent
the requirements for the attributes of the
relation, in their standard order
f(free) – attribute can be specified or not
 b(bound) – must specify a value for an attribute but
any value is allowed
 u(unspecified) – not permitted to specify a value for a
attribute

21.4.2 A NOTATION FOR DESCRIBING
SOURCE CAPABILITIES….(CONT’D)
c[S](choice from set S) means that a value must be
specified and value must be from finite set S.
 o[S](optional from set S) means either do not specify
a value or we specify a value from finite set S
 A prime (f’) specifies that an attribute is not a part of
the output of the query



A capabilities specification is a set of adornments
A query must match one of the adornments in its
capabilities specification
21.4.3 CAPABILITY-BASED QUERY-PLAN
SELECTION
Given a query at the mediator, a capability based
query optimizer first considers what queries it
can ask at the sources to help answer the query
 The process is repeated until:

Enough queries are asked at the sources to resolve all
the conditions of the mediator query and therefore
query is answered. Such a plan is called feasible.
 We can construct no more valid forms of source
queries, yet still cannot answer the mediator query.
It has been an impossible query.

21.4.3 CAPABILITY-BASED QUERY-PLAN
SELECTION (CONT’D)
The simplest form of mediator query where we
need to apply the above strategy is join relations
 E.g we have sources for dealer 2

Autos(serial, model, color)
 Options(serial, option)

Suppose that ubf is the sole adornment for Auto and
Options have two adornments, bu and uc[autoTrans, navi]
 Query is – find the serial numbers and colors of Gobi
models with a navigation system

21.4.4 ADDING COST-BASED
OPTIMIZATION
Mediator’s Query optimizer is not done when the
capabilities of the sources are examined
 Having found feasible plans, it must choose
among them
 Making an intelligent, cost based query
optimization requires that the mediator knows a
great deal about the costs of queries involved
 Sources are independent of the mediator, so it is
difficult to estimate the cost

21.5 OPTIMIZING MEDIATOR QUERIES

Chain algorithm – a greed algorithm that finds a
way to answer the query by sending a sequence
of requests to its sources.
Will always find a solution assuming at least one
solution exists.
 The solution may not be optimal.

21.5.1 SIMPLIFIED ADORNMENT
NOTATION
A query at the mediator is limited to b (bound)
and f (free) adornments.
 We use the following convention for describing
adornments:

nameadornments(attributes)
 where:

name is the name of the relation
 the number of adornments = the number of attributes

21.5.2 OBTAINING ANSWERS FOR
SUBGOALS

Rules for subgoals and sources:

Suppose we have the following subgoal:
Rx1x2…xn(a1, a2, …, an),
and source adornments for R are: y1y2…yn.
If yi is b or c[S], then xi = b.
 If xi = f, then yi is not output restricted.


The adornment on the subgoal matches the
adornment at the source:

If yi is f, u, or o[S] and xi is either b or f.
21.5.3 THE CHAIN ALGORITHM

Maintains 2 types of information:
An adornment for each subgoal.
 A relation X that is the join of the relations for all the
subgoals that have been resolved.

Initially, the adornment for a subgoal is b iff the
mediator query provides a constant binding for
the corresponding argument of that subgoal.
 Initially, X is a relation over no attributes,
containing just an empty tuple.

21.5.3 THE CHAIN ALGORITHM (CON’T)


1.
First, initialize adornments of subgoals and X.
Then, repeatedly select a subgoal that can be
resolved. Let Rα(a1, a2, …, an) be the subgoal:
Wherever α has a b, we shall find the argument
in R is a constant, or a variable in the schema
of R.

Project X onto its variables that appear in R.
21.5.3 THE CHAIN ALGORITHM (CON’T)
For each tuple t in the project of X, issue a query to
the source as follows (β is a source adornment).
2.



If a component of β is b, then the corresponding
component of α is b, and we can use the corresponding
component of t for source query.
If a component of β is c[S], and the corresponding
component of t is in S, then the corresponding component
of α is b, and we can use the corresponding component of
t for the source query.
If a component of β is f, and the corresponding component
of α is b, provide a constant value for source query.
21.5.3 THE CHAIN ALGORITHM (CON’T)



3.
If a component of β is u, then provide no binding for this
component in the source query.
If a component of β is o[S], and the corresponding
component of α is f, then treat it as if it was a f.
If a component of β is o[S], and the corresponding
component of α is b, then treat it as if it was c[S].
Every variable among a1, a2, …, an is now bound. For
each remaining unresolved subgoal, change its
adornment so any position holding one of these
variables is b.
21.5.3 THE CHAIN ALGORITHM (CON’T)
4.
5.


Replace X with X πs(R), where S is all of the
variables among: a1, a2, …, an.
Project out of X all components that correspond to
variables that do not appear in the head or in any
unresolved subgoal.
α
If every subgoal is resolved, then X is the answer.
If every subgoal is not resolved, then the algorithm
fails.
21.5.3 THE CHAIN ALGORITHM EXAMPLE

Mediator query:

Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c)
Example:
Relation
Data

Adornment
R
S
T
w
x
x
y
y
z
1
2
2
4
4
6
1
3
3
5
5
7
1
4
5
8
bf
c’[2,3,5]f
bu
21.5.3 THE CHAIN ALGORITHM EXAMPLE
(CON’T)

Initially, the adornments on the subgoals are the
same as Q, and X contains an empty tuple.

S and T cannot be resolved because they each have ff
adornments, but the sources have either a b or c.
R(1,a) can be resolved because its adornments
are matched by the source’s adornments.
 Send R(w,x) with w=1 to get the tables on the
previous page.

21.5.3 THE CHAIN ALGORITHM EXAMPLE
(CON’T)

Project the subgoal’s relation onto its second
component, since only the second component of R(1,a)
is a variable.
a
2
3
4
This is joined with X, resulting in X equaling this
relation.
 Change adornment on S from ff to bf.

21.5.3 THE CHAIN ALGORITHM EXAMPLE
(CON’T)

Now we resolve Sbf(a,b):
Project X onto a, resulting in X.
 Now, search S for tuples with attribute a equivalent to
a
b
attribute a in X.


2
4
3
5
Join this relation with X, and remove a because it
doesn’t appear in the head nor any unresolved
subgoal:
b
4
5
21.5.3 THE CHAIN ALGORITHM EXAMPLE
(CON’T)

Now we resolve Tbf(b,c):
b
c
4
6
5
7
5
8
Join this relation with X and project onto the c
attribute to get the relation for the head.
 Solution is {(6), (7), (8)}.

21.5.4 INCORPORATING UNION VIEWS AT
THE MEDIATOR
This implementation of the Chain Algorithm does not
consider that several sources can contribute tuples to
a relation.
 If specific sources have tuples to contribute that other
sources may not have, it adds complexity.
 To resolve this, we can consult all sources, or make
best efforts to return all the answers.

21.5.4 INCORPORATING UNION VIEWS AT
THE MEDIATOR (CON’T)

Consulting All Sources
We can only resolve a subgoal when each source for its
relation has an adornment matched by the current
adornment of the subgoal.
 Less practical because it makes queries harder to answer
and impossible if any source is down.


Best Efforts
We need only 1 source with a matching adornment to
resolve a subgoal.
 Need to modify chain algorithm to revisit each subgoal
when that subgoal has new bound requirements.

SECTION 21.6
LOCAL-AS-VIEW MEDIATORS.
In a LAV mediator, global predicates defined are
not views of the source data.
 for each source, expressions are defined,
involving the global predicates that describe the
tuples that the source is able to produce.
 Queries are answered at the mediator by
discovering all possible ways to construct the
query using the views provided by the source.

MOTIVATION FOR LAV MEDIATORS
Sometimes the the relationship between what
the mediator should provide and what the
sources provide is more subtle.
 For example, consider the predicate Par(c, p)
meaning that p is a parent of c which represents
the set of all child parent facts that could ever
exist.
 The sources will provide information about
whatever child-parent facts they know.

MOTIVATION(CONTD..)
There can be sources which may provide childgrandparent facts but not child- parent facts at
all.
 This source can never be used to answer the
child-parent query under GAV mediators.
 LAV mediators allow to say that a certain source
provides grand parent facts.
 They help discover how and when to use that
source in a given query.

TERMINOLOGY FOR LAV MEDIATION.
The queries at the mediator and the queries that
describe the source will be single Datalog rules.
 A query that is a single Datalog rule is often
called a conjunctive query.
 The global predicates of the LAV mediator are
used as the subgoals of mediator queries.
 There are conjunctive queries that define views.

CONTD..
Their heads each have a unique view predicate
that is the name of a view.
 Each view definition has a body consisting of
global predicates and is associated with a
particular source.
 It is assumed that each view can be constructed
with an all-free adornment.

EXAMPLE..
Consider global predicate Par(c, p) meaning that
p is a parent of c.
 One source produces parent facts. Its view is
defined by the conjunctive queryV1(c, p)  Par(c, p)
 Another source produces some grand parents
facts. Then its conjunctive query will be –
V2(c, g)  Par(c, p) AND Par(p, g)

EXAMPLE CONTD..
The query at the mediator will ask for greatgrand parent facts that can be obtained from the
sources. The mediator query is –
Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z)

One solution can be using the parent
predicate(V1) directly three times.
Q(w, z)  V1(w, x) AND V1 (x, y) AND V1(y, z)

EXAMPLE CONTD..

Another solution can be to use V1(parent facts)
and V2(grandparent facts).
Q(w, z)  V1(w, x) AND V2(x, z)
Or
Q(w, z)  V2(w, y) AND V1(y, z)
EXPANDING SOLUTIONS.
Consider a query Q, a solution S that has a body
whose subgoals are views and each view V is
defined by a conjunctive query with that view as
the head.
 The body of V’s conjunctive query can be
substituted for a subgoal in S that uses the
predicate V to have a body consisting of only
global predicates.

EXPANSION ALGORITHM
A solution S has a subgoal V(a1, a2,…an) where
ai’s can be any variables or constants.
 The view V can be of the form
V(b1, b2,….bn)  B
Where B represents the entire body.
 V(a1, a2, … an) can be replaced in solution S by a
version of body B that has all the subgoals of B
with variables possibly altered.

EXPANSION ALGORITHM CONTD..

1.
2.
3.
The rules for altering the variables of B are:
First identify the local variables B, variables
that appear in the body but not in the head.
If there are any local variables of B that
appear in B or in S, replace each one by a
distinct new variable that appears nowhere in
the rule for V or in S.
In the body B, replace each bi by ai for
i=
1,2…n.
EXAMPLE.
Consider the view definitions,
V1(c, p)  Par(c, p)
V2(c, g)  Par(c, p) AND Par(p, g)
 One of the proposed solutions S is
Q(w, z)  V1(w, x) AND V2(x, z)
 The first subgoal with predicate V1 in the
solution can be expanded as Par(w, x) as there
are no local variables.

EXAMPLE CONTD.
The V2 subgoal has a local variable p which
doesn’t appear in S nor it has been used as a local
variable in another substitution. So p can be left
as it is.
 Only x and z are to be substituted for variables c
and g.
 The Solution S now will be
Q(w, z)  Par(w, x) AND Par(x, p) AND Par(p,z)

CONTAINMENT OF CONJUNCTIVE QUERIES

1.
2.
A containment mapping from Q to E is a
function т from the variables of Q to the
variables and constants of E, such that:
If x is the ith argument of the head of Q, then
т(x) is the ith argument of the head of E.
Add to т the rule that т(c)=c for any constant c.
If P(x1,x2,… xn) is a subgoal of Q, then
P(т(x1), т(x2),… т(xn)) is a subgoal of E.
EXAMPLE.
Consider two Conjunctive queries:
Q1: H(x, y)  A(x, z) and B(z, y)
Q2: H(a, b)  A(a, c) AND B(d, b) AND A(a, d)
 When we apply the substitution,
Т(x) = a, Т(y) = b, Т(z) = d, the head of Q1 becomes
H(a, b) which is the head of Q2.
So,there is a containment mapping from Q1 to Q2.

EXAMPLE CONTD..
The first subgoal of Q1 becomes A(a, d) which is
the third subgoal of Q2.
 The second subgoal of Q1 becomes the second
subgoal of Q2.
 There is also a containment mapping from Q2 to
Q1 so the two conjunctive queries are equivalent.

WHY THE CONTAINMENT-MAPPING TEST
WORKS




Suppose there is a containment mapping т from
Q1 to Q2.
When Q2 is applied to the database, we look for
substitutions σ for all the variables of Q2.
The substitution for the head becomes a tuple t
that is returned by Q2.
If we compose т and then σ, we have a mapping
from the variables of Q1 to tuples of the database
that produces the same tuple t for the head of Q1.
FINDING SOLUTIONS TO A MEDIATOR
QUERY


There can be infinite number of solutions built
from the views using any number of subgoals
and variables.
LMSS Theorem can limit the search which states
that
•

If a query Q has n subgoals, then any answer
produced by any solution is also produced by a
solution that has at most n subgoals.
If the conjunctive query that defines a view V
has in its body a predicate P that doesn’t appear
in the body of the mediator query, then we need
not consider any solution that uses V.
EXAMPLE.
Recall the query
Q1: Q(w, z) Par(w, x) AND Par(x, y) AND
Par(y, z)
 This query has three subgoals, so we don’t have
to look at solutions with more than three
subgoals.

WHY THE LMSS THEOREM HOLDS
Suppose we have a query Q with n subgoals and
there is a solution S with more than n subgoals.
 The expansion E of S must be contained in Query
Q, which means that there is a containment
mapping from Q to E.
 We remove from S all subgoals whose expansion
was not the target of one of Q’s subgoals under
the containment mapping.

CONTD..
We would have a new conjunctive query S’ with
at most n subgoals.
 If E’ is the expansion of S’ then, E’ is a subset of
Q.
 S is a subset of S’ as there is an identity
mapping.
 Thus S need not be among the solutions to query
Q.

SECTION 21.7
INTRODUCTION
ENTITY RESOLUTION: Entity resolution is a
problem that arises in many information
integration scenarios.
 It refers to determining whether two records
or tuples do or do not represent the same
person, organization, place or other entity.

DECIDING WHETHER RECORDS REPRESENT A
COMMON ENTITY


Two records represent the same individual if the
two records have similar values for each of the
fields associated with those records.
It is not sufficient that the values of corresponding
fields be identical because of following reasons:
1. Misspellings
2. Variant Names
3. Misunderstanding of Names
CONTINUE: DECIDING WHETHER
RECORDS REPRESENT A COMMON
ENTITY
4. Evolution of Values
5. Abbreviations
Thus when deciding whether two records represent
the same entity, we need to look carefully at the
kinds of discrepancies and use the test that
measures the similarity of records.
DECIDING WHETHER RECORDS
REPRESENTS A COMMON ENTITY - EDIT
DISTANCE



First approach to measure the similarity of records
is Edit Distance.
Values that are strings can be compared by
counting the number of insertions and deletions of
characters it takes to turn one string into another.
So the records represent the same entity if their
similarity measure is below a given threshold.
DECIDING WHETHER RECORDS
REPRESENTS A COMMON ENTITY NORMALIZATION


To normalize records by replacing certain
substrings by others. For instance: we can use the
table of abbreviations and replace abbreviations by
what they normally stand for.
Once normalize we can use the edit distance to
measure the difference between normalized values
in the fields.
MERGING SIMILAR RECORDS
Merging refers to removal of redundant data
in two records.
 There are many merge rules:

1. Set the field in which the records disagree to the empty
string.
2. (i) Merge by taking the union of the values in each field
(ii) Declare two records similar if at least two of the
three fields have a nonempty intersection.
CONTINUE: MERGING SIMILAR RECORDS
Name
1. Susan
2. Susan
3. Susan
Address
123 Oak St.
456 Maple St.
456 Maple St.
Phone
818-555-1234
818-555-1234
213-555-5678
After Merging
Name
(1-2-3) Susan
213-
Address
Phone
{123 Oak St.,456 Maple St} {818-555-1234,
555-5678}
USEFUL PROPERTIES OF SIMILARITY
AND MERGE FUNCTIONS
The following properties say that merge operation is
a semi lattice:
1.
Idempotence: Merge of a record with itself
yeilds the same record.
2.
Commutativity: Order of merged records
does not matter
3.
Associativity : The order in which we group
records for a merger should not matter.
CONTINUE: USEFUL PROPERTIES OF
SIMILARITY AND MERGE FUNCTIONS
There are some other properties that we expect
similarity relationship to have:
•
Idempotence for similarity: A record is always
similar to itself
•
Commutativity of similarity: In deciding
whether two records are similar it does not matter
in which order we list them
•
Representability: If r is similar to some other
record s, but s is instead merged with some other
record t, then r remains similar to the merger of s
and t and can be merged with that record.
R-SWOOSH ALGORITHM FOR ICAR RECORDS



Input: A set of records I, similarity function and a merge
function.
Output: A set of merged records O.
Method:
O:= emptyset;
 WHILE I is not empty DO BEGIN





Let r be any record in I;
Find, if possible, some record s in O that is similar to r;
IF no record s exists THEN
move r from I to O
ELSE BEGIN
delete r from I;
delete s from O;
add the merger of r and s to I;


END;
END;
OTHER APPROACHES TO ENTITY RESOLUTION
The other approaches to entity resolution are :



Non- ICAR Datasets
Clustering
Partitioning
OTHER APPROACHES TO ENTITY
RESOLUTION - NON ICAR DATASETS
Non ICAR Datasets : We can define a dominance
relation r<=s that means record s contains all the
information contained in record r.
If so, then we can eliminate record r from further
consideration.
OTHER APPROACHES TO ENTITY
RESOLUTION – CLUSTERING &
PARTITIONING
Clustering: Clustering refers to creating clusters
for members that are similar to each other
Partitioning: We can group the records, perhaps
several times, into groups that are likely to contain
similar records and look only within each group for
pairs of similar records.
REFERENCES

H. Garcia-Molina, J. Ullman, and J. Widom, “Database
System: The Complete Book,” second edition: p.897-913,
Prentice Hall, New Jersey, 2008