wiki.epfl.ch

Download Report

Transcript wiki.epfl.ch

Positional Update Handling
in Column Stores
Sándor Héman
Marcin Zukowski
Niels Nes
Lefteris Sidirourgos
Peter Boncz
UPDATE IN PLACE:
A Poison Apple?
Jim Gray, 1981
“..for performance reasons, most disc-based systems have
been seduced into updating the data in place.”
30 years of hardware improvements in sequential/throughput
beating random/latency…. in-place less feasible every year.
alternative: differential approach.
In column stores, in-place updating is by now clearly infeasible
Problem: Column Store Updates
• I/O proportional to number of attributes
– I/O blocks large and compressed
– Sometimes even replicated
– Read-Optimized  Update-Unfriendly 
• Table often kept ordered on sort-key
(SK) attributes
– Uniform update load  scattered write access
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a
differential structure
– Merge with base table during scan
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a
differential structure
– Merge with base table during scan
• Challenges:
– Efficiently maintainable data-structure
– Minimize Merge impact for read-only queries
Naïve Approach: Delta Tables
• For each table,
maintain two update
friendly row-store
tables:
– INS(C1..Cn)
– DEL(SK1..SKm)
– MOD = DEL + INS
Base table: inventory
Sort-Key (SK): [store, prod]
store
prod
prod
new
qty
Berlin
chair
Y
5
Berlin
cloth
Y
20
qty
London
stool
N
10
London
table
N
20
Paris
rug
N
1
Paris
stool
N
5
Inserts table: INS
store
new
Deletes table: DEL
store
Paris
prod
rug
Naïve Approach: Delta Tables
• Rewrite table scans:
MergeUnion[store,prod](Scan(INS),
MergeDiff[store,prod](
Scan(Inventory), Scan(DEL)))
Base table: inventory
Sort-Key (SK): [store, prod]
store
prod
prod
new
qty
Berlin
chair
Y
5
Berlin
cloth
Y
20
qty
London
stool
N
10
London
table
N
20
Paris
rug
N
1
Paris
stool
N
5
Inserts table: INS
store
new
Deletes table: DEL
store
Paris
prod
rug
Naïve Approach: Delta Tables
• Rewrite table scans:
MergeUnion[store,prod](Scan(INS),
MergeDiff[store,prod](
Scan(Inventory), Scan(DEL)))
for up-to-date image
• Expensive!
Actual table: inventory
Sort-Key (SK): [store, prod]
store
prod
new
qty
Berlin
chair
Y
5
Berlin
cloth
Y
20
London
stool
N
10
– I/O to scan SK ‘merge’
London table
N
columns; also if query
Paris
stool
N
does not need SK cols
– Each query pays CPU effort to locate the same
change positions over and over again
20
5
The Idea: Positional Updates
• Remember the position of an update rather than
its SK values
– Merge once at write  Read-Optimized approach
– No need to scan SK columns
– Scan can skip  less CPU overhead
Notation:
• TABLEx state of TABLE at time x
• SID(t): StableID
– Position of tuple t in immutable base TABLE0  Stable
• RIDx(t): RowID
– Position of visible tuple t at time x VOLATILE!
– SID(t) = RID0(t)
SID/RID Example
SID
STORE
PROD
NEW
QTY
RID
0
London
chair
N
30
0
1
London
stool
N
10
1
2
London
table
N
20
2
3
Paris
rug
N
1
3
4
Paris
stool
N
5
4
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)
INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)
INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
TABLE0
SID
STORE
PROD
NEW
QTY
0
Berlin
chair
Y
5
0
0
Berlin
cloth
Y
20
1
0
Berlin
table
Y
10
2
0
London
chair
N
30
3
1
London
stool
N
10
4
2
London
table
N
20
5
3
Paris
rug
N
1
6
4
Paris
stool
N
5
7
TABLE1
RID
SIDs and RIDs
• RID(t)
•
∆(t)
= SID(t) + ∆(t)
= #inserts before t – #deletes before t
= RID(t) – SID(t)
• SID and RID are monotonically increasing 
– organize positional updates on SID in a counting B-Tree that
keeps track cumulative deltas (∆)
• Positional Delta Tree (PDT)
– SIDs are stable
– Only need to maintain cumulative ∆ on path root  leaf
PDT Example
SID
STORE
PROD
NEW
QTY
RID
0
London
chair
N
30
0
1
London
stool
N
10
1
2
London
table
N
20
2
3
Paris
rug
N
1
3
4
Paris
stool
N
5
4
SID
0
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)
INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)
INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
TABLE0
∆
2
Insert Value Table
1
STORE
SID
0
0
SID
0
ins
i0
type
ins
ins
type
value
i2
i1
value
PROD
NEW
QTY
i0
Berlin
table
Y
10
i1
Berlin
cloth
Y
20
i2
Berlin
chair
Y
5
PDT Example
SID
STORE
PROD
NEW
QTY
RID
0
Berlin
chair
Y
5
0
0
Berlin
cloth
Y
20
1
0
Berlin
table
Y
10
2
0
London
chair
N
30
3
1
London
stool
N
10
4
2
London
table
N
20
5
3
Paris
rug
N
1
6
4
Paris
stool
N
5
7
SID
0
DELETE FROM inventory WHERE
store = ‘Berlin’ AND prod = ‘table’
DELETE FROM inventory WHERE
store = ‘Paris’ AND prod = ‘rug’
TABLE1
∆
2
Insert Value Table
-1
STORE
SID
0
0
SID
3
del
d0
type
ins
ins
type
value
i2
i1
value
PROD
NEW
QTY
i0
Berlin
table
Y
10
i1
Berlin
cloth
Y
20
i2
Berlin
chair
Y
5
PDT Example
SID
STORE
PROD
NEW
QTY
RID
0
Berlin
chair
Y
5
0
0
Berlin
cloth
Y
20
1
0
London
chair
N
30
2
1
London
stool
N
10
3
2
London
table
N
20
4
4
Paris
stool
N
5
5
SID
0
INSERT INTO inventory VALUES
(‘Paris’, ‘rack’, Y, 4)
Insert at RID = 5
TABLE2
∆
2
Insert Value Table
-1
STORE
RID 5 > 0 + 2
SID
0
0
SID
3
3
ins
del
del
di00
d0
type
ins
ins
type
value
i2
i1
value
PROD
NEW
QTY
i0
Berlin
Paris
table
rack
Y
20
4
i1
Berlin
cloth
Y
5
20
i2
Berlin
chair
Y
10
5
PDT Example
INSERT INTO inventory VALUES
(‘London’, ‘rack’, Y, 4)
INSERT INTO inventory VALUES
(‘Berlin’, ‘rack’, Y, 4)
SID
∆
SID
∆
1
3
Separator SIDs
Subtree ∆
1
∆
3
Running ∆
RID
4
Separator RIDs
SID
0
2
1
∆
3
1
0
∆
2
∆
4
RID
2
RID
7
SID
0
0
SID
0
SID
1
SID
3
3
type
ins
ins
type
ins
type
ins
type
ins
del
value
i2
i1
value
i4
value
i3
value
i0
d0
∆
0
1
∆
2
∆
3
∆
4
5
RID
0
1
RID
2
RID
4
RID
7
8
Stacking PDTs
• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:
lo
• PDThi contains all differences in time [lo,hi]
PDT t2
t3
t0
vs PDT t1
are
consecutive

t2=t1
PDT
PDT
PDT
t2
PDT t3
t1
PDT
PDT
PDT
PDT t2
PDT
PDT
PDT
PDT t1
Table
t0
Stacking PDTs
• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:
lo
• PDThi contains all differences in time [lo,hi]
PDT t2
t3
t0
vs PDT t1
are
consecutive
aligned


t2=t1
t2=t0
“same base”
PDT
PDT
Table
Stacking PDTs
• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:
lo
• PDThi contains all differences in time [lo,hi]
PDT t2
t3
t0
vs PDT t1
are
consecutive
aligned


t2=t1
PDT
t2=t0
“same base”
overlapping

[t2,t3] overlaps [t0,t1]
PDT
PDT
“uncomparable” / “incompatible”
Table
Stacking for Isolation
• ‘lock’ PDT down for further updates
– Immutable read-PDT  BIG: main memory resident
• ‘stack’ empty PDT on top
– Updateable write-PDT  SMALL: L2 cache resident
– Note: PDTs are consecutive
• once in a while changes are propagated
– Propagate() operation
• Requires consecutive PDTs
TABLEx
Write-PDT
Propagate()
Read-PDT
Stable Table
Snapshot Isolation
Transaction
State
Propagate()
Trans
PDT
Copy
Write-PDT
TABLEx
Write-PDT
Read-PDT
Stable Table
• Transaction creates
snapshot copy of
write-PDT
• Updates go into transPDT
• On commit,
Propagate() trans-PDT
into write-PDT
Optimistic Concurrency Control
• Two concurrent transactions
Trans
A
Trans
PDT
Trans
B
Trans
PDT
Copy
Write-PDT
Copy
Write-PDT
TABLEx
Write-PDT
Read-PDT
Stable Table
Optimistic Concurrency Control
• Two concurrent transactions
• A commits before B
Trans
A
Trans
PDT
Trans
B
Trans
PDT
Copy
Write-PDT
Copy
Write-PDT
TABLEx
Write-PDT
Read-PDT
Stable Table
Optimistic Concurrency Control
Trans
A
Trans
PDT
Trans
B
Trans
PDT
Copy
Write-PDT
TABLEx
Write-PDT
Read-PDT
Stable Table
• Two concurrent transactions
• A commits before B
• Can not commit B into
modified write-PDT!
– A changed RID enumeration
Optimistic Concurrency Control
Trans
PDT
Trans
A
Trans
PDT
Consecutive!
Trans
B
Trans
PDT
Serialize()
• Two concurrent transactions
• A commits before B
• Can not commit B into
modified write-PDT!
– A changed RID enumeration
• Serialize(A, B)
TABLEx
Write-PDT
Read-PDT
Stable Table
– Makes aligned PDTs consecutive
– MAY FAIL!!  trans abort
= succeeds if no conflict
= write set intersection
Optimistic Concurrency Control
Trans
A
Trans
PDT
Trans
B
Trans
PDT
Serialize()
• Two concurrent transactions
• A commits before B
• Can not commit B into
modified write-PDT!
– A changed RID enumeration
• Serialize(A, B)
TABLEx
Write-PDT
Read-PDT
Stable Table
– Makes aligned PDTs consecutive
– MAY FAIL!!  trans abort
= succeeds if no conflict
= write set intersection
Extend to any number of concurrent transactions by serializing
against all PDTs of transactions that committed during its lifetime
(a.k.a. backward looking OCC)
Concluding..
• PDTs speed-up differential update merging
– Reduced I/O volume
– Reduced CPU merge overhead
• Tree structure
– logarithmic lookup & maintenance of volatile RIDs
– main operations: Merge(), Propagate(), Serialize()
• PDTs are stackable, and capture Write-Set
– Great structure for Snapshot Isolation
• Formal definitions, algorithms and
benchmarks in paper
Thank you!
Microbenchmarks
TPCH-30