Concurrency control - OOP January Trimester 2011

Download Report

Transcript Concurrency control - OOP January Trimester 2011

Concurrency control contd…
Database recovery
17. 02 . 2011
• RECAP
– Concurrency control concepts
• Motivation
• If we insist only one transaction can execute at a time, in
serial order, then performance will be quite poor.
• Concurrency Control is a method for controlling or
scheduling the operations of transactions in such a way
that concurrent transactions can be executed safely (i.e.,
without causing the database to reach an inconsistent state)
• If we do concurrency control properly, then we can maximize
transaction throughput while avoiding any chance of
corrupting the database
RECAP
• There are several categories of problems
encountered when concurrent access is
required:
– Lost Update Problem
– Dirty Read problem
– Incorrect Summary Problem
– Non-Repeatable Read problem
RECAP
Implementing Concurrency Control
• Locking
– Locks seek to guarantee that concurrent
transactions can be serialized.
– Locks may be applied to data items in two ways:
• Implicit Locks are applied by the DBMS
• Explicit Locks are applied by application programs
– What is locking?
RECAP
• Lock granularity
– Locks may be applied to:
1.
2.
3.
4.
5.
a single data item (value)
an entire row of a table
a page (memory segment) (many rows worth)
an entire table
an entire database
• Escalation of lock granularity
What do you remember from our discussions
on that
RECAP
• Locks types:
– An Exclusive Lock
– A Shared Lock
depending on the requirements of the
transaction
• Downside of locks?
RECAP
• The basis of concurrency control is protocols
to maintain serialization in DBMSs
• E.g. protocols
– Two-Phase Locking (2PL)
– Timestamps Ordering
– Optimistic Concurrency Control (OCC)
– Mulitversion Concurrency
– Multiversion based on Timestamp
– Multiversion 2PL with certify locks
• Two Phased Locking (2PL)
– 2PL’s two phases: Growing and shrinking.
– Variations of 2PL: Conservative 2PL, Strict 2PL
RECAP
• Variations of 2PL
– In 2PL, transactions may get locks as they need
them and release them as soon as possible by
either:
• the transaction pre-declares which items it will work
with and it acquires all locks before any work is done
(Conservative 2PL)
• once all operations are completed, all of the locks are
released after commit (Strict 2PL,)
RECAP
• Example (Exercise)
– Consider T: Ra Wa Rb Wb Rc Wc Rd Wd
– Further, given the symbols: R = Read. W = write. L
= Lock. U = Unlock.
– Generate example schedules (one implementing
conservative 2PL and another strict 2PL)
RECAP
• Conservative:
– TLa TLb TLc Tld TRa Twa TUa Rb Wb TUb Rc Wc
TUc Rd Wd TUd
• Strict:
– TLa TRa Twa TLb TLc Rb Wb TLd Rc Wc Rd Wd TUa
TUb TUc TUd
• The basis of concurrency control is protocols to
maintain serialization in DBMSs
• E.g. protocols
–
–
–
–
–
–
Two-Phase Locking (2PL)
Timestamps Ordering
Optimistic Concurrency Control (OCC)
Mulitversion Concurrency
Multiversion based on Timestamp
Multiversion 2PL with certify locks
• Timestamp Ordering
– The timestamp of a transaction T is the time at which
that transaction was initiated in the DBMS: TS(T)
– We can use clock time or an incremental identifier
(counter) for TS(T)
– Two timestamps are also associated with each data
item x.
• 1. read_TS(x) is the TS(T) of the last transaction T to read
from x.
• 2. write_TS(x) is the TS(T) of the last transaction T to write to
x.
RECAP
• Two simple rules to follow:
– 1. Before T issues a write(x), check to see if
•
•
•
•
TS(T) < read_TS(x) or if
TS(T) < write_TS(x)
If so, then abort transaction T.
If not, then perform write(x) and set write_TS(x) =
TS(T).
Why Abort?
RECAP
• Two simple rules to follow:
– 2. Before T issues a read(x), check to see if
• TS(T) < write_TS(x) Then abort
- ? transaction T.
• if TS(T) >= write_TS(x) then- ?execute read(x) and set
read_TS(x) = TS(T) only if TS(T) is greater than the
ABORT or EXECUTE?
current read_TS(x)
RECAP
• Two simple rules to follow:
– 2. Before T issues a read(x), check to see if
• TS(T) < write_TS(x) Then abort transaction T.
• if TS(T) >= write_TS(x) then execute read(x) and set
read_TS(x) = TS(T) only if TS(T) is greater than the
current read_TS(x)
– When a transaction is aborted, it is the restarted
and issued a new TS(T).
– Note that with timestamp ordering, deadlock can
not occur.
– However, starvation is possible i.e., a transaction
keeps getting aborted over and over.
• Timestamp ordering can also produce cascading
rollbacks:
– Assume transaction T begins executing and performs some
read and write operations on data items a, b and c
– However, T then reaches a data item it can not read or
write and T must then be aborted.
– Any effects of transaction T must then be rolled back.
– Before T aborts, however, other transactions (T1, T2 and
T3) have read and written data items a, b and c so these
other transactions must also be rolled back.
– There may be other transactions (T4 and T5) that worked
with data items read or written by T1, T2 and T3, etc.
• The basis of concurrency control is protocols to
maintain serialization in DBMSs
• E.g. protocols
–
–
–
–
–
–
Two-Phase Locking (2PL)
Timestamps Ordering
Optimistic Concurrency Control (OCC)
Mulitversion Concurrency
Multiversion based on Timestamp
Multiversion 2PL with certify locks
• Optimistic Concurrency Control
– Two Phase Locking (2PL) and Timestamp Ordering
(TO) are pessimistic concurrency control protocols
- they assume transactions will conflict and take
steps to avoid it. i.e., they address the
concurrency issues before while the transaction is
executing and before the transaction commits.
– 2PL and TO are also syntactic concurrency control
protocols as they deal only with the syntax (set of
read and write operations) of the transactions.
• Optimistic Concurrency Control (2)
– In an optimistic concurrency control protocol, we
assume that most of the time, transactions will
not conflict thus all of the locking and timestamp
checking are not necessary.
– There are a number of different optimistic CC
protocols. Elmasri/Navathe describe an
optimistic/syntactic concurrency control protocol.
– In an optimistic concurrency control protocol, we
assume that most of the time, transactions will not
conflict thus all of the locking and timestamp checking
are not necessary.
– No checking for serialization is done while the
transaction is executing
– During transaction execution, all updates are applied
to local copies of the data items that are kept for the
transaction
– During a validation phase the transactions updates are
check to see if they violate serializability
– The idea behind OCC is to do all the checks at
once
– If there is little interference between transactions,
most will be validated successfully.
– Extra requirements for OCC:
• Local Copy
• Transaction Timestamps
• Must keep track of write_set & read_set
• Optimistic Concurrency Control in three
stages:
1. Read Stage: Transactions can read any data item. Writes
are done to a local copy of the data item e.g., recorded in
a log.
2. Validation stage: Transactions containing Write
operations that are about to commit are validated to see
if the schedule meets the serializability requirements.
3. Write stage: If the transaction will not conflict with other
transactions, then it will be committed (writes to local
copy applied to the database). Otherwise, the
transaction will be rolled back.
– There are also a number of optimistic/semantic
concurrency control protocols that take into
account the semantics of the database application
and schedule transactions accordingly.
– Examples include databases that are "read
mostly" or "read only".
RECAP
END
Concurrency control contd…
Database recovery
Database back up
Database recovery
• Overview:
– Why DB Recovery
– Recovery techniques:
•
•
•
•
•
Manual Reprocessing
Automated Recovery
Checkpoints
Deferred Update Recovery
Immediate Update Recovery
Database recovery
• Motivations/ rationale
– There are many situations in which a transaction
may not reach a commit or abort point.
1. An operating system crash can terminate the DBMS
processes
2. The DBMS can crash
3. The process issuing the transaction can abnormally
terminate
4. The system might lose power
5. A disk may fail or other hardware may fail.
6. Human error can result in deletion of critical data.
?
• In any of these situations, data in the database
may become inconsistent or lost.
• Two general classes of failures:
1. Process or Processor failures in which no data is
lost but the database must be restarted and
recovered.
2. Storage failure wherein data on disk is lost or
corrupted. Data must be restored from backup
and then the database must be restarted and
recovered.
• Types of Failure
• The database may become unavailable for use due to
• Transaction failure: Transactions may fail because of
incorrect input, deadlock, incorrect synchronization.
• System failure: System may fail because of addressing
error, application error, operating system fault, RAM
failure, etc.
• Media failure: Disk head crash, power disruption, etc.
• Definition:
– Database Recovery is the process of restoring the
database and the data to a consistent state. This
may include restoring lost data up to the point of
the event (e.g. system crash).
• There are many different approaches to
recover a database.
Database recovery
• Recovery approaches:
– Manual Reprocessing
– Automated Recovery
– Checkpoints
– Deferred Update Recovery
– Immediate Update Recovery
Manual Reprocessing
• Overview:
– The database is periodically backed up (a database save)
and all transactions applied since the last save are
recorded
– If the system crashes, the latest database save is restored
and all of the transactions are re-applied (by users) to
bring the database back up to the point just before the
crash.
• Weaknesses:
– Time required to re-apply transactions
– Transactions might have other (physical) consequences
– Re-applying concurrent transactions is not straight
forward.
Database recovery
• Recovery approaches:
– Manual Reprocessing
– Automated Recovery
– Checkpoints
– Deferred Update Recovery
– Immediate Update Recovery
Automated Recovery
• The DBMS is able to (re-)apply required changes in the
event of a system failure
• Recall, a transaction can be in one of the following
states:
– Active - when the transaction just begins
– Partially Committed - after the last operation has
completed (but before the commit point is reached)
– Failed - Normal operation is prevented (e.g., in the event
of a system crash)
– Aborted - Transaction is rolled back. That is, all of its
effects are undone
– Committed - Transaction completes all operations and
moved the database to the next consistent state
• Automated Recovery
– Automated recovery applies a similar technique to the
manual recovery, i.e. Make periodic saves of the
database (time consuming operation). However,
maintain a more intelligent log (called the system log
or transaction log) of the transactions that have been
applied. Based on the “intelligent log ” then we can
recover in the event of failure
– NB: Recovery should be idempotent - that is, it should
not matter if we recover many times; the effect to the
database should be the same.
• System log entries:
– Each transaction writes the following information
to the log:
1. Start(T) - the fact that transaction T has started
2. Write(T, X, old_value, new_value) - the fact that
transaction T has written to item X with the
new_value. old_value is also maintained.
3. Read(T, X) - the fact that transaction T has read data
item X
4. Either: Commit(T) - transaction T committed, or
Abort(T) - transaction T was aborted
• Checkpoints
– Checkpoint: A recovery point in the logs where all
current transactions have terminated and all
updates have been written to disk. Consists of 4
steps:
• 1. Cease accepting new transactions
• 2. Allow all unfinished transactions to complete
(commit or abort)
• 3. Write all pending changes to disk and to logs
• 4. Resume accepting new transactions
• Checkpoints (2)
– In many environments, it is possible to take
checkpoints each 15 minutes or half hour, etc.
– Recovery must then only be done from the time of
the last checkpoint (assuming no damage to
storage media).
What is the advantage of using time based checkpoints? Problem?
Database recovery
• Overview:
– Manual Reprocessing
– Automated Recovery
– Checkpoints
– Deferred Update Recovery
– Immediate Update Recovery
Deferred Update Recovery
• Overview
– During a transaction, only record the changes to data
items in the log.
– When the transaction commits, actually update the
data items on disk.
– Two main rules:
1. A transaction cannot change any items in the database
until it commits.
2. A transaction may not commit until all of the write
operations are successfully recorded in the log. This means
that we must check to see that the log is actually written
to disk.
• Consider the following 4 transactions
– T1: Ra Rd Wd C
– T2: Rb Wb Rd Wd C
– T3: Ra Wa Rc Wc C
– T4: Rb Wb Ra Wa C
• We make a schedule: T1, T4, T2, T3
We make a schedule: T1, T4, T2, T3
•T1: Ra Rd Wd C
•T2: Rb Wb Rd Wd C
•T3: Ra Wa Rc Wc C
•T4: Rb Wb Ra Wa C
How do we
recover based on
this log?
• Recovery from the log file
– Since T1 and T4 committed, their changes were
written to disk. However, T2 and T3 did not
commit, hence their changes were not written to
disk. To recover, we simply ignore those
transactions that did not commit.
• Advantages:
– 1. Recovery is made easier:
– Any transaction that reached the commit point (from
the log) has its writes applied to the database (REDO).
All other transactions are ignored.
– 2. Cascading rollback does not occur because no
other transaction sees the work of another until it is
committed (no stale reads).
• Disadvantages:
– Concurrency is limited: Must employ Strict 2PL which
limits concurrency.
Immediate Update Recovery
• Overview
– Immediate Update applies the write operations to the
database as the transaction is executing (as opposed
to at the end as with deferred update).
– Writes are still saved in the log before being applied to
the database - a write-ahead log (WAL)
• Maintains two logs:
– REDO log: A record of each new data item in the
database.
– UNDO log: A record of each updated data item (old
values).
• Two rules:
1. Transaction T may not update the database until
all UNDO entries have been written to the UNDO
log.
2. Transaction T is not allowed to commit until all
REDO and UNDO log entries are written (forcedwritten to disk).
• 4 steps recovery:
– 1. Begin at the end of the log and read backwards to
the last checkpoint. Create two lists:
• C - transactions that have committed
• NC - transactions that did not commit
– 2. For each transaction T in C, make a third list of
transactions, RB, that have read items that were
updated by NC transactions during T.
– 3. UNDO: Undo the effects of the transactions in NC
and RB.
– 4. REDO: Apply the effects of all of the transactions
that did commit (all T in list C)
• Advantages:
– Immediate update allows higher concurrency
because transactions write continuously to the
database rather than waiting until the commit
point.
• Disadvantages:
– Steps 2 and 3 during recovery can lead to
cascading rollbacks - time consuming and may be
problematic.
Database recovery
• Overview:
– Why DB Recovery
– Recovery techniques:
•
•
•
•
•
Manual Reprocessing
Automated Recovery
Checkpoints
Deferred Update Recovery
Immediate Update Recovery
• Home work / self research
– Form groups for research
– In your group, research and report in the next class what
the following DBMSs in the market address recovery
1.
2.
3.
4.
5.
MS Access (Boy)
Oracle (Winnie)
MS SQL (Server) (Esborn)
MySQL (John)
PostgreSQL (Oscar)
Due next week (a small write up + 5minutes presentation)
• Thanks