DATA COORDINATION: SUPPORTING CONTINGENT UPDATES Michael Lawrence, Rachel Pottinger, Sheryl Staub-French The University of British Columbia.
Download
Report
Transcript DATA COORDINATION: SUPPORTING CONTINGENT UPDATES Michael Lawrence, Rachel Pottinger, Sheryl Staub-French The University of British Columbia.
DATA COORDINATION:
SUPPORTING CONTINGENT
UPDATES
Michael Lawrence, Rachel Pottinger, Sheryl Staub-French
The University of British Columbia
Scenario:
Architecture, Engineering and Construction
2
code
description
qty
unit
3310
Install column
formwork
20
ea
9250
metal stud partition
wall
120
sqft
…
…
…
…
Building Design
M. Lawrence, R. Pottinger, S. Staub-French
Cost Estimate
2011/08/31
Data Coordination:
General Problem
3
Related, independent data sources B, C
Keep C up to date with B
Base Source B
(building design)
B
B'
Contingent Source C
(cost estimate)
C
?
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Example:
Coordination Operations
4
ProjectItems
Component
id
0
1
2
type
Column
Wall
Wall
code
CH
9.12
D1
8.1
CS
9.06
CH
9.12
?
area
1
27
12
name
Concrete
Light concrete
Drywall
Concrete
Heavy concrete
Paint
27
27
12
12
27
ItemRates
Material
cid
1
1
2
1
qty
thickness
300
15
200
1
Building Design B
code
CH
9.12
CS
9.06
D1
8.1
?
category
Concrete
Concrete
Drywall
Paint
type
heavy
sealing
12mm
?
rate
25.00
6.45
3.50
?
Cost Estimate C
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Data Coordination Defining
Characteristics
5
Base-Contingent relationship
B
dictates changes to C
E.g. Weather Data (B) Road Network (C)
Autonomous sources
Domain
heterogeneous
Lack of system-wide collaboration
Batch updates
Goal: Final, unambiguous instance of C
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Data Coordination Related Work
6
Hyperion [Rodríguez-Gianolli et al. VLDB 05]
P2P
coordination with active rules (triggers)
ORCHESTRA [Green, Karvounarakis, Ives, Tannen VLDB 07]
P2P
with local querying
Update sharing, fine-grained trust management
Youtopia [Koch, Kot VLDB 09]
Collaborative
Data Integration system
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
7
Overall Approach
Data Coordination Problem
View Differencing
Update Translation
Insertions
Deletions
Combining
Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Approach
8
Use mapping constraints qB = qC
VB(name, area) :− Component(id, type, area), Material(id, name, thickness), type = “Wall”
The set of wall areas and materials should equal the
=
join of project item quantities and categories
VC(category, qty) :− ItemRates(code, category, type, rate), ProjectItems(code, qty)
Class of queries for qC:
Conjunctive
Class of queries for qB:
Union,
negation, aggregation
C stores materialized view V
“Pull” coordination
M. Lawrence, R. Pottinger, S. Staub-French
Building Design (B)
Changes?
V
Cost Estimate (C)
2011/08/31
Data Coordination Problem
Formalization
9
Problem
Given
Find
Ct , Vt , Bt+1
Ct+1
Time
Bt+1
Vt
Base Source
(Building Design)
View (stored by C)
qC
Ct
Ct+1
M. Lawrence, R. Pottinger, S. Staub-French
Contingent Source
(Cost Estimate)
2011/08/31
Data Coordination Problem
Formalization
10
Approach
1.
2.
3.
Find (V+,V-) (view differencing)
(V+,V-) to all possible (C+,C-) (update translation)
User selects final (C+,C-)
Base Source
Bt+1
qB
Vt
-)
(Paint,
(V+,V12)
Vt+1
qC
Ct
(Building Design)
View (stored by C)
qC
(PB, Paint,
Beige,
2.25)
+
(?, Paint,
?,,C
?),
(C(PB,
)(?, 12)
12)
Ct+1
M. Lawrence, R. Pottinger, S. Staub-French
Contingent Source
(Cost Estimate)
2011/08/31
Outline
11
Overall Approach
Data Coordination Problem
View Differencing
Update Translation
Insertions
Deletions
Combining
Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing
12
Find (V+, V-)
a)
b)
Materialize Vt+1 and compare with Vt
Incremental view maintenance [Gupta + Mumick 99]
Old Base Source
Bt
qB
View (stored by C)
Vt
Inputs +
(B ,
B-)
(V+, V-)
Bt+1
Updated Base Source
qB
VVt+1
t+1
Inputs
Output
Outputs
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Incremental View Maintenance
13
Counting Algorithm [Gupta + Mumick 99]
Tuple counts
Rewrite qB as 2k queries (delta rules)
k
= number of relations queried
Evaluates Vt+1 as additive union (U+)
New Extensions:
qB to extract tuple counts
Method for performing U+
Extract (V+, V-) in U+
Rewrite
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
14
Overall Approach
Data Coordination Problem
View Differencing
Update Translation
Insertions
Deletions
Combining
Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation
15
Inputs
Existing Stored View
Vt
(V+, V-)
qC
Existing Contingent Source
Ct
(C+, C-)
Output
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Example
16
ProjectItems
code
CH
9.12
D1
8.1
CS
9.06
CH
9.12
a
VC(category, qty) :− ProjectItems(code, qty),
ItemRates(code, category, type, rate)
category
Concrete
Drywall
Concrete
Paint
V+ Paint
qty
27
27
27
27
12
12
qty
27
27
12
12
12 ProjectItems+
ItemRates
a = CH are
V(Paint,
What
a, b, 27)
and c?
ItemRates+
code
CH
9.12
CS
9.06
D1
8.1
a
category
Concrete
Concrete
Drywall
Paint
M. Lawrence, R. Pottinger, S. Staub-French
type
heavy
sealing
12mm
b
2011/08/31
rate
25.00
6.45
3.50
c
Update Translation Example
17
ProjectItems
VC(category, qty) :− ProjectItems(code, qty),
ItemRates(code, category, type, rate)
category
Concrete
Drywall
V- Concrete
qty
27
27
12
code
CH
D1
CS
CH
qty
27
27
12
12
ItemRates
Not Minimal
Deletes
V(Concrete, 27)
code category type
CH
Concrete heavy
CS
Concrete sealing
D1
Drywall
12mm
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
rate
25.00
6.45
3.50
Update Translation Challenges
18
Ambiguities (many feasible solutions)
Exact solution
No
side-effects (spurious V insertions/deletions)
Only update C
additional
constraint
Sets of insertions/deletions (batch process)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Related Work
19
Translation by constant complement
[Bancilhon & Spyratos TODS 1981]
Data exchange [Fagin et al. 2003, Barceló 2009]
Generate
instance of target schema given source
schema/instance and mappings
Updates through views [Kotidis et al. 2006]
Relax
constraint
Add abstraction level
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
20
Overall Approach
Data Coordination Problem
View Differencing
Update Translation
Insertions
Deletions
Combining
Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Insertions
21
ProjectItems
Chase [Fagin et al. ICDE 2003]
code
9.12
CH
8.1
D1
9.06
CS
9.12
CH
a
Generates incomplete instance
containing free variables
Constrain
Conditional
tables [Grahne 1991]
Find spurious insertions
V
category
Concrete
Drywall
Concrete
Paint
Paint
qty
27
27
12
12
qty
27
27
12
12
12
ItemRates
code
CH
9.12
CS
9.06
D1
8.1
a
M. Lawrence, R. Pottinger, S. Staub-French
category
category
Concrete
Concrete
Concrete
Concrete
Drywall
Drywall
Paint
2011/08/31
type
heavy
sealing
12mm
b
rate
25.00
6.45
3.50
c
Conditional Tables
22
Relation with free variables [Grahne 1991]
Tuple constraints φ
Our approach
student
course
φ
Calculate
spuriousMath
insertions
Sally
z=0
Sally
S=
qC(C
Sally
U C+) – CS
(V U V+)
Condition
is complement of the φs
Force S = Ø
x
z≠0
x ≠ physics
Tuples generated by chase
Sally takes Math or CS (but not both),
and possibly some other course which is not physics
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
23
C U C+
V U V+
category
Concrete
Drywall
Concrete
Paint
qC(C U C+)
ProjectItems
category qty φ
qty
Scode
(spurious
insertions)
Concrete
27
V U V+
qty
27
27
12
12
−
category
Concrete
Drywall
Concrete
Paint
qty
27
27
12
12
a cannot be
CH
27
Drywall
27 φ
category qty
D1
27
Concrete
12
CSPaint
12 27 a = D1
Paint
12
CHPaint
12 27 a = CH
Paint
12 a = CS
a Drywall
12 12 a = D1
Paint
12 a = CS
Paint
27 a = D1
ItemRates
Paint
27 a = CH
code
category
Concrete
12type
a = CH
CHConcrete
Concrete 12
heavy
a = CS
CSDrywall
Concrete 12
sealing
a = D1
Drywall
12mm
CH D1
or D1
a
Paint
b
=
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
rate
25.00
6.45
3.50
c
Outline
24
Overall Approach
Data Coordination Problem
View Differencing
Update Translation
Insertions
Deletions
Combining
Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Experiments
25
TPC-H Instance
Vary Database Size, Update Size, Query Size
View Differencing: C++/MySQL
Update Translation: C++/BerkeleyDB
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Results
26
• View Maintenance linear in
update size
Execution Time (sec)
• Materialize/Compare
decreases due to decreasing
view size
• Additional experiments show
view size and sort time
dominate
Materialize/Compare
performance.
Update Size (% of instance size)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Results
27
• Instance: large hierarchy
Execution Time (sec) – log scale
• View Maintenance
exponential in number of joins
• Only if all relations are
updated
• Materialize/Compare
decreases due to decreasing
view size
• Evaluating qB (MySQL) takes
sharp rise at 23 joins
Number of Joins
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Results
28
• Instance: TPC-H
Execution Time (sec) – log scale
• Insertions exponential due to
exponential number of
potentially spurious insertions
• Deletions perform well due
to hierarchy of many to one
relationships and large
pruning benefit
Number of Joins
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Results
29
• Instance: TPC-H
• Insertions: high degree
polynomial
Execution Time (sec)
• Wasteful to consider
translations of little interest
• Static Tables Heuristic: Only
generate tuples/free
variables for a subset of
relations
• Deletions perform well due
to optimizations available due
to relational normalization
Number of Insertions/Deletions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Conclusions
30
System for coordinating Base – Contingent data sources
with declarative mappings
Three stage approach to the data coordination
problem
View Differencing
Update Translation
User disambiguation
Adaptation of view maintenance for view differencing
Find all feasible update translations using incomplete
information
Insertions, deletions, and the combination
Implementation demonstrating feasibility and useful
optimizations/heuristics
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Summary
31
MAC – sort time dominates
IVM-VD – query size dominates
MAC
IVM-VD
Arbitrary queries (subqueries, recursion,
etc)
Conjunctive queries with union, negation,
aggregation
Requires Vt, Bt+1
Requires (B+, B-), Bt, Vt, Bt+1
Better for large updates (> 2.5%)
Better for small updates
Better for large queries
Better for small queries
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Tuple Generating Dependency
Formulation
32
V = qC(C) corresponds to 2 TGDs
V(x) QC(x,
y)
QC(x, y)
V(x)
Insertion TGD
(violated by V+(x))
Deletion TGD
(violated by V-(x))
(QC – Conjunction of relational predicates)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
33
QC(x, y) V(x)
V-(x) !QC(x,
y)
e.g. V-(x1, x2) !C(x1, y) v !C(y, x2)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
34
V-(0, 2) C-(0, y) or C-(y, 2) (for all y)
V-
x1
x2
0
2
0
3
y=1
8
a
b
0
1
1
2
0
8
8
2
1
3
M. Lawrence, R. Pottinger, S. Staub-French
C
OR
AND
OR
2011/08/31
Deletion Translation (Overview)
35
Use contrapositive of deletion TGD
V-(x)
!QC(x, y)
Formulate expression for minimal deletions
Recursive search w/pruning for feasible solutions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
36
Build expression in conjunctive normal form
e.g.
(C(0, 1) or C(1, 2)) and (C(0, 8) or C(8, 2) …)
Recursively try every combination
Prune infeasible combinations
i.e.
causing spurious deletions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Optimizations
37
Redundancy in constrain step
z
≠ 2 AND (z ≠ 2 OR z ≠ 3)
Redundancy in deletions
{C(0,
8), C(1, 2)} OR {C(0, 8), C(8, 2)}
Worse with multiple deleted tuples
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Generalizing
38
Arithmetic comparisons
V(x1,
x2) :- C(x1, y), C(y, x2), y > 4
Afrati, Li, Pavlaki EDBT 2008
Makes constrain step more difficult
Sets of constraints
Conflicting
updates
Approximate solutions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Extending
39
Ranking
Heuristics
Semantics
Issues Arising over Time
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31