Here

Transcript Here

CSL 771: Database Implementation
Transaction Processing
Maya Ramanath
All material (including figures) from:
Concurrency Control and Recovery in Database Systems
Phil Bernstein, Vassos Hadzilacos and Nathan Goodman
(http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)
Transactions
• Interaction with the DBMS through SQL
update Airlines set price = price - price*0.1,
status = “cheap” where price < 5000
A transaction is a unit of interaction
ACID Properties
•
•
•
•
Atomicity
Consistency
Isolation
Durability
Database system must ensure ACID properties
Atomicity and Consistency
• Single transaction
– Execution of a transaction: “all-or-nothing”
Either a transaction completes in its entirety
Or it “does not even start”
– As if the transaction never existed
– No partial effect must be visible
2 outcomes: A transaction COMMITs or ABORTs
Consistency and Isolation
• Multiple transactions
– Concurrent execution can cause an inconsistent
database state
– Each transaction executed as if isolated from the
others
Durability
• If a transaction commits the effects are
permanent
• But, durability has a bigger scope
– Catastrophic failures (floods, fires, earthquakes)
What we will study…
• Concurrency Control
– Ensuring atomicity, consistency and isolation when
multiple transactions are executed concurrently
• Recovery
– Ensuring durability and consistency in case of
software/hardware failures
Terminology
• Data item
– A tuple, table, block
• Read (x)
• Write (x, 5)
•
•
•
•
Start (T)
Commit (T)
Abort (T)
Active Transaction
– A transaction which has neither committed nor aborted
High level model
Transaction 1
Transaction 2
Transaction n
Transaction Manager
Scheduler
Recovery Manager
Disk
Cache Manager
Recoverability (1/2)
• Transaction T Aborts
– T wrote some data items
– T’ read items that T wrote
T
Read (x)
Write (x, k)
Read (y)
• DBMS has to…
– Undo the effects of T
– Undo effects of T’
– But, T’ has already committed
T’
Read (x)
Write (y, k’)
Commit
Abort
Recoverability (2/2)
• Let T1,…,Tn be a set of transactions
• Ti reads a value written by Tk, k < i
• An execution of transactions is recoverable if
Ti commits after all Tk commit
T1
T2
Write (x,2)
T1
T2
Write (x,2)
Read (x)
Read (x)
Write (y,2)
Write (y,2)
Commit
Commit
Commit
Cascading Aborts (1/2)
• Because T was aborted, T1,…, Tk also have to
be aborted
T
T’
T’’
Read (x)
Write (x, k)
Read (y)
Read (x)
Write (y, k’)
Abort
Read (y)
Cascading Aborts (2/2)
• Recoverable executions do not prevent
cascading aborts
• How can we prevent them then ?
T1
T2
Write (x,2)
T1
T2
Write (x,2)
Read (x)
Write (y,2)
Commit
Commit
Commit
Read (x)
Write (y,2)
Commit
What we learnt so far…
Reading a value, committing a transaction
Not recoverable
T1
T2
Write (x,2)
Recoverable with
cascading aborts
T1
T2
Write (x,2)
T1
T2
Write (x,2)
Read (x)
Read (x)
Write (y,2)
Write (y,2)
Commit
Recoverable without
cascading aborts
Commit
Commit
Commit
Read (x)
Write (y,2)
Commit
Strict Schedule (1/2)
• “Undo”-ing the effects of a transaction
– Restore the before image of the data item
T1
T2
Write (x,1)
T1
Write (y,3)
Equivalent to
Write (y,1)
Commit
Write (x,1)
Write (y,3)
Commit
Read (x)
Abort
T2
Final value
of y: 3
Strict Schedule (2/2)
Initial value of x: 1
T1
T2
Write (x,2)
T1
T2
Abort
Write (x,3)
Abort
Abort
Abort
Should x be restored to 1 or 3?
T2
Write (x,2)
Write (x,2)
Write (x,3)
T1
Write (x,3)
T1 restores x to 3?
T2 restores x to 2?
Do not read or write a value which has been written by
an active transaction until that transaction has committed
or aborted
The Lost Update Problem
T1
T2
Read (x)
Read (x)
Write (x, 200,000)
Commit
Write (x, 200)
Commit
Assume x is your account balance
Serializable Schedules
• Serial schedule
– Simply execute transactions one after the other
• A serializable schedule is one which equivalent
to some serial schedule
SERIALIZABILITY THEORY
Serializable Schedules
T1: op11, op12, op13
T2: op21, op22, op23, op24
• Serial schedule
– Simply execute transactions one after the other
op11, op12, op13
op21, op22, op23, op24
op21, op22, op23, op24
op11, op12, op13
• Serializable schedule
– Interleave operations
– Ensure end result is equivalent to some serial
schedule
Notation
r1[x] = Transaction 1, Read (x)
w1[x] = Transaction 1, Write (x)
c1 = Transaction 1, Commit
a1= Transaction 1, Abort
r1[x], r1[y], w2[x], r2[y], c1, c2
Histories (1/3)
• Operations of transaction T can be represented
by a partial order.
r1[x]
w1[z]
r1[y]
c1
Histories (2/3)
• Conflicting operations
– Of two ops operating on the same data item, if
one of them is a write, then the ops conflict
– An order has to be specified for conflicting
operations
Histories (3/3)
• Complete History
T1 = r1[x] ® w1[x] ® c1
T2 = r2 [x] ® w2 [y] ® w2 [x] ® c2
T3 = r3[y] ® w3[x] ® w3[y] ® w4 [z] ® c4
A complete history over T = {T1,T2 ,T3 }
r2 [x]
® w2 [y] ® w2 [x] ® c2

H1 = r3[y] ® w3[x] ® w3[y] ® w3[z] ® c3

r1[x] ® w1[x] ®
c1
Serializable Histories
• The goal: Ensure that the interleaving
operations guarantee a serializable history.
• The method
– When are two histories equivalent?
– When is a history serial?
Equivalence of Histories (1/2)
H ≅ H’ if
1. they are defined over the same set of transactions
and they have the same operations
2. they order conflicting operations the same way
Equivalence of Histories (2/2)
y
Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
Serial History
• A complete history is serial if for every pair of
transactions Ti and Tk,
– all operations of Ti occur before Tk OR
– all operations of Tk occur before Ti
• A history is serializable if its committed
projection is equivalent to a serial history.
Serialization Graph
Let H be a history
SG(H ) = (V, E)
where V = {T1,...,Tn }
and
E = {(Ti ,Tj ) | $opi Î Ti which precedes op j Î Tj }
r2 [x]
® w2 [y] ® w2 [x] ® c2

H1 = r3[y] ® w3[x] ® w3[y] ® w3[z] ® c3

r1[x] ® w1[x] ®
c1
T1
T3
T2
Serializability Theorem
A history H is serializable if its serialization
graph SG(H) is acyclic
On your own
How do recoverability, strict schedules,
cascading aborts fit into the big picture?
LOCKING
High level model
Transaction 1
Transaction 2
Transaction n
Transaction Manager
Scheduler
Recovery Manager
Disk
Cache Manager
Transaction Management
Transaction Manager
Transaction 1
Transaction 2
Transaction 3
.
.
.
Transaction n
• Receives Transactions
• Sends operations to
scheduler
Read1(x)
Write2(y,k)
Read2(x)
Commit1
Scheduler
• Execute op
• Reject op
• Delay op
Disk
Locking
• Each data item x has a lock associated with it
• If T wants to access x
– Scheduler first acquires a lock on x
– Only one transaction can hold a lock on x
• T releases the lock after processing
Locking is used by the scheduler to ensure
serializability
Notation
• Read lock and write lock
rl[x], wl[x]
• Obtaining read and write locks
rli[x], wli[x]
• Lock table
– Entries of the form [x, r, Ti]
• Conflicting locks
– pli[x], qlk[y], x = y and p,q conflict
• Unlock
rui[x], wui[x]
Basic 2-Phase Locking (2PL)
RULE 1
Receive pi[x]
is qlk[x] set such
that p and q
conflict?
YES
pi[x] delayed
RULE 2
pli[x] cannot be released
until pi[x] is completed
NO
Acquire pli[x]
RULE 3 (2 Phase Rule)
pi[x] scheduled
Once a lock is released no
other locks may be
obtained.
The 2-phase rule
Once a lock is released no other locks may be
obtained.
T
T
T1: r1[x] w1[y] c1
T2: w2[x] w2[y] c2
1
2
H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y]
wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1
Correctness of 2PL
2PL always produces serializable histories
Proof outline
STEP 1: Characterize properties of the
scheduler
STEP 2: Prove that any history with these
properties is serializable
(That is, SG(H) is acyclic)
Deadlocks (1/2)
T1: r1[x] w1[y] c1
T2: w2[y] w2[x] c2
Scheduler
rl1[x] wl2[y] r1[x] w2[y] <cannot proceed>
Deadlocks (2/2)
Strategies to deal with deadlocks
• Timeouts
– Leads to inefficiency
• Detecting deadlocks
– Maintain a wait-for graph, cycle indicates
deadlock
– Once a deadlock is detected, break the cycle by
aborting a transaction
• New problem: Starvation
Conservative 2PL
• Avoids deadlocks altogether
– T declares its readset and writeset
– Scheduler tries to acquire all required locks
– If not all locks can be acquired, T waits in a queue
• T never “starts” until all locks are acquired
– Therefore, it can never be involved in a deadlock
On your own
Strict 2PL (2PL which ensures only strict schedules)
Extra Information
• Assumption: Data items are organized in a tree
Can we come up with a better (more efficient)
protocol?
Tree Locking Protocol (1/3)
RULE 1
Receive ai[x]
is alk[x] ?
NO
RULE 2
if x is an intermediate
node, and y is a parent of
x, the ali[x] is possible
only if ali[y]
RULE 2
RULE 3
YES
pi[x] scheduled
ali[x] cannot be released
until ai[x] is completed
ai[x] delayed
RULE 4
Once a lock is released
the same lock may not be
re-obtained.
Tree Locking Protocol (2/3)
• Proposition: If Ti locks x before Tk, then for
every v which is a descendant of x, if both Ti
and Tk lock v, then Ti locks v before Tk.
• Theorem: Tree Locking Protocol always
produces Serializable Schedules
Tree Locking Protocol (3/3)
• Tree Locking Protocol avoids deadlock
• Releases locks earlier than 2PL
BUT
• Needs to know the access pattern to be
effective
• Transactions should access nodes from root-toleaf
Multi-granularity Locking (1/3)
• Granularity
– Refers to the relative size of the data item
– Attribute, tuple, table, page, file, etc.
• Efficiency depends on granularity of locking
• Allow transactions to lock at different
granularities
Multi-granularity Locking (2/3)
• Lock Instance Graph
• Explicit and Implicit Locks
• Intention read and intention
write locks
• Intention locks conflict with
explicit read and write locks
but not with other intention
locks
Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
Multi-granularity Locking (3/3)
• To set rli[x] or irli[x], first hold irli[y] or iwli[y],
such that y is the parent of x.
• To set wli[x] or iwli[x], first hold iwli[y], such that
y is the parent of x.
• To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or
wli[y]) where y = x, or y is an ancestor of x.
• To release irli[x] (or iwli[x]) no child of x can be
locked by Ti
The Phantom Problem
• How to lock a tuple, which (currently) does not
exist?
T1: r1[x1], r1[x2], r1[X], c1
T2: w[x3], w[X], c2
rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X],
w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1
NON-LOCK-BASED SCHEDULERS
Timestamp Ordering (1/3)
• Each transaction is associated with a timestamp
– Ti indicates Transaction T with timestamp i.
• Each operation in the transaction has the same
timestamp
Timestamp Ordering (2/3)
TO Rule
If pi[x] and qk[x] are conflicting operations, then
pi[x] is processed before qk[x] iff i < k
Theorem: If H is a history representing an
execution produced by a TO scheduler, then H is
serializable.
Timestamp Ordering (3/3)
• For each data item x, maintain: max-rt(x), max-wt(x), c(x)
• Request ri[x]
– Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x)
– Delay if TS(i) > max-wt(x) and !c(x)
– Else abort and restart Ti
• Request wi[x]
– Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x),
update max-wt (x), set c(x) = false
– Else abort and restart Ti
ON YOUR OWN: Thomas write rule, actions taken when a transaction
has to commit or abort
Validation
• Aggressively schedule all operations
• Do not commit until the transaction is
“validated”
ON YOUR OWN
Summary
• Lock-based Schedulers
– 2-Phase Locking
– Tree Locking Protocol
– Multi-granularity Locking
– Locking in the presence of updates
• Non-lock-based Schedulers
– Timestamp Ordering
– Validation-based Concurrency Control (on your
own)
SOURCE: Database System: The complete book. Garcia-Molina, Ullman
and Widom
RECOVERY
Logging
• Log the operations in the transaction(s)
• Believe the log
– Does the log say transaction T has committed?
– Or does it say aborted?
– Or has only a partial trace (implicit abort)?
• In case of failures, reconstruct the DB from its
log
The basic setup
Buffer Space
for each transaction
Buffer Space for data
and log
Transactions
T1
LOG
The Disk
T2
T3
Tk
Terminology
• Data item: an element which can be read or
written
– tuple, relation, B+-tree index, etc
Input x: fetch x from the disk to buffer
Read x,t: read x into variable local variable t
Write x,t: write value of t into x
Output x: write x to disk
Example
update Airlines set price = price - price*0.1, status =
“cheap” where price < 5000
Read P, x
x -= x* 0.1
Write x,P
Read S, y
y = “CHEAP”
Write y, S
Output P
Output S
System fails here
System fails here
System fails here
Logs
• Sequence of log records
• Need to keep track of
– Start of transaction
– Update operations (Write operations)
– End of transaction (COMMIT or ABORT)
• “Believe” the log, use the log to reconstruct a
consistent DB state
Types of logs
• Undo logs
– Ensure that uncommitted transactions are rolled
back (or undone)
• Redo logs
– Ensure that committed transactions are redone
• Undo/Redo logs
– Both of the above
All 3 logging styles ensure atomicity and
durability
Undo Logging (1/3)
•
•
•
•
<START T>: Start of transaction T
<COMMIT T>
<ABORT T>
<T, A, x>: Transaction T modified A whose
before-image is x.
Undo Logging (2/3)
Read P, x
x -= x* 0.1
Write x,P
Read S, y
y = “CHEAP”
Write y, S
FLUSH LOG
Output P
Output S
FLUSH LOG
<START T>
<T, P, x>
<T, S, y>
<COMMIT T>
U1: <T, X, v> should be
flushed before Output X
U2: <COMMIT T> should be
flushed after all OUTPUTs
Undo Logging (3/3)
• Recovery with Undo log
1. If T has a <COMMIT T> entry, do nothing
2. If T has a <START T> entry, but no <COMMIT T>
•
•
T is incomplete and needs to be undone
Restore old values from <T,X,v> records
• There may be multiple transactions
– Start scanning from the end of the log
Redo Logging (1/3)
• All incomplete transactions can be ignored
• Redo all completed transactions
• <T, A, x>: Transaction T modified A whose
after-image is x.
Redo Logging (2/3)
Read P, x
x -= x* 0.1
Write x,P
Read S, y
y = “CHEAP”
Write y, S
<START T>
R1: <T, X, v> and
<COMMIT T> should be
flushed before Output X
<T, P, x>
<T, S, y>
FLUSH LOG
Output P
Output S
<COMMIT T>
Write-ahead
Logging
Redo Logging (3/3)
• Recovery with Redo Logging
– If T has a <COMMIT T> entry, redo T
– If T is incomplete, do nothing (add <ABORT T>)
• For multiple transactions
– Scan from the beginning of the log
Undo/Redo Logging (1/3)
• Undo logging: Cannot COMMIT T unless all
updates are written to disk
• Redo logging: Cannot release memory unless
transaction commits
• Undo/Redo logs attempt to strike a balance
Undo/Redo Logging (2/3)
<START T>
Read P, x
x -= x* 0.1
Write x,P
Read S, y
y = “CHEAP”
Write y, S
FLUSH LOG
Output P
Output S
<T, P, x, a>
<T, S, y, b>
<COMMIT T>
UR1: <T, X, a, b> should be
flushed before Output X
U1: <T, X, v> should be
flushed before Output X
U2: <COMMIT T> should be
flushed after all OUTPUTs
R1: <T, X, v> and
<COMMIT T> should be
flushed before Output X
Undo/Redo Logging (3/3)
• Recovery with Undo/Redo Logging
– Redo all committed transactions (earliest-first)
– Undo all uncommitted transactions (latest-first)
What happens if there is a crash when you are
writing a log? What happens if there is a crash
during recovery?
Checkpointing
• Logs can be huge…can we throw away
portions of it?
• Can we avoid processing all of it when there is
a crash?
ON YOUR OWN