DATA COORDINATION: SUPPORTING CONTINGENT UPDATES Michael Lawrence, Rachel Pottinger, Sheryl Staub-French The University of British Columbia.

Download Report

Transcript DATA COORDINATION: SUPPORTING CONTINGENT UPDATES Michael Lawrence, Rachel Pottinger, Sheryl Staub-French The University of British Columbia.

DATA COORDINATION:
SUPPORTING CONTINGENT
UPDATES
Michael Lawrence, Rachel Pottinger, Sheryl Staub-French
The University of British Columbia
Scenario:
Architecture, Engineering and Construction
2
code
description
qty
unit
3310
Install column
formwork
20
ea
9250
metal stud partition
wall
120
sqft
…
…
…
…
Building Design
M. Lawrence, R. Pottinger, S. Staub-French
Cost Estimate
2011/08/31
Data Coordination:
General Problem
3


Related, independent data sources B, C
Keep C up to date with B
Base Source B
(building design)
B
B'
Contingent Source C
(cost estimate)
C
?
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Example:
Coordination Operations
4
ProjectItems
Component
id
0
1
2
type
Column
Wall
Wall
code
CH
9.12
D1
8.1
CS
9.06
CH
9.12
?
area
1
27
12
name
Concrete
Light concrete
Drywall
Concrete
Heavy concrete
Paint
27
27
12
12
27
ItemRates
Material
cid
1
1
2
1
qty
thickness
300
15
200
1
Building Design B
code
CH
9.12
CS
9.06
D1
8.1
?
category
Concrete
Concrete
Drywall
Paint
type
heavy
sealing
12mm
?
rate
25.00
6.45
3.50
?
Cost Estimate C
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Data Coordination Defining
Characteristics
5

Base-Contingent relationship
B
dictates changes to C
 E.g. Weather Data (B)  Road Network (C)

Autonomous sources
 Domain
heterogeneous
 Lack of system-wide collaboration
 Batch updates

Goal: Final, unambiguous instance of C
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Data Coordination Related Work
6

Hyperion [Rodríguez-Gianolli et al. VLDB 05]
 P2P

coordination with active rules (triggers)
ORCHESTRA [Green, Karvounarakis, Ives, Tannen VLDB 07]
 P2P
with local querying
 Update sharing, fine-grained trust management

Youtopia [Koch, Kot VLDB 09]
 Collaborative
Data Integration system
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
7




Overall Approach
Data Coordination Problem
View Differencing
Update Translation
 Insertions
 Deletions
 Combining

Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Approach
8

Use mapping constraints qB = qC
VB(name, area) :− Component(id, type, area), Material(id, name, thickness), type = “Wall”
The set of wall areas and materials should equal the
=
join of project item quantities and categories
VC(category, qty) :− ItemRates(code, category, type, rate), ProjectItems(code, qty)
Class of queries for qC:
 Conjunctive

Class of queries for qB:
 Union,


negation, aggregation
C stores materialized view V
“Pull” coordination
M. Lawrence, R. Pottinger, S. Staub-French
Building Design (B)
Changes?

V
Cost Estimate (C)
2011/08/31
Data Coordination Problem
Formalization
9

Problem
 Given
 Find
Ct , Vt , Bt+1
Ct+1
Time
Bt+1
Vt
Base Source
(Building Design)
View (stored by C)
qC
Ct
Ct+1
M. Lawrence, R. Pottinger, S. Staub-French
Contingent Source
(Cost Estimate)
2011/08/31
Data Coordination Problem
Formalization
10

Approach
1.
2.
3.
Find (V+,V-) (view differencing)
(V+,V-) to all possible (C+,C-) (update translation)
User selects final (C+,C-)
Base Source
Bt+1
qB
Vt
-)
(Paint,
(V+,V12)
Vt+1
qC
Ct
(Building Design)
View (stored by C)
qC
(PB, Paint,
Beige,
2.25)
+
(?, Paint,
?,,C
?),
(C(PB,
)(?, 12)
12)
Ct+1
M. Lawrence, R. Pottinger, S. Staub-French
Contingent Source
(Cost Estimate)
2011/08/31
Outline
11




Overall Approach
Data Coordination Problem
View Differencing
Update Translation
 Insertions
 Deletions
 Combining

Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing
12

Find (V+, V-)
a)
b)
Materialize Vt+1 and compare with Vt
Incremental view maintenance [Gupta + Mumick 99]
Old Base Source
Bt
qB
View (stored by C)
Vt
Inputs +
(B ,
B-)
(V+, V-)
Bt+1
Updated Base Source
qB
VVt+1
t+1
Inputs
Output
Outputs
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Incremental View Maintenance
13



Counting Algorithm [Gupta + Mumick 99]
Tuple counts
Rewrite qB as 2k queries (delta rules)
k


= number of relations queried
Evaluates Vt+1 as additive union (U+)
New Extensions:
qB to extract tuple counts
 Method for performing U+
 Extract (V+, V-) in U+
 Rewrite
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
14




Overall Approach
Data Coordination Problem
View Differencing
Update Translation
 Insertions
 Deletions
 Combining

Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation
15
Inputs
Existing Stored View
Vt
(V+, V-)
qC
Existing Contingent Source
Ct
(C+, C-)
Output
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Example
16
ProjectItems
code
CH
9.12
D1
8.1
CS
9.06
CH
9.12
a
VC(category, qty) :− ProjectItems(code, qty),
ItemRates(code, category, type, rate)
category
Concrete
Drywall
Concrete
Paint
V+ Paint
qty
27
27
27
27
12
12
qty
27
27
12
12
12 ProjectItems+
ItemRates
a = CH are
 V(Paint,
What
a, b, 27)
and c?
ItemRates+
code
CH
9.12
CS
9.06
D1
8.1
a
category
Concrete
Concrete
Drywall
Paint
M. Lawrence, R. Pottinger, S. Staub-French
type
heavy
sealing
12mm
b
2011/08/31
rate
25.00
6.45
3.50
c
Update Translation Example
17
ProjectItems
VC(category, qty) :− ProjectItems(code, qty),
ItemRates(code, category, type, rate)
category
Concrete
Drywall
V- Concrete
qty
27
27
12
code
CH
D1
CS
CH
qty
27
27
12
12
ItemRates
Not Minimal
Deletes
V(Concrete, 27)
code category type
CH
Concrete heavy
CS
Concrete sealing
D1
Drywall
12mm
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
rate
25.00
6.45
3.50
Update Translation Challenges
18


Ambiguities (many feasible solutions)
Exact solution
 No

side-effects (spurious V insertions/deletions)
Only update C
 additional

constraint
Sets of insertions/deletions (batch process)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Related Work
19

Translation by constant complement


[Bancilhon & Spyratos TODS 1981]
Data exchange [Fagin et al. 2003, Barceló 2009]
 Generate
instance of target schema given source
schema/instance and mappings

Updates through views [Kotidis et al. 2006]
 Relax
constraint
 Add abstraction level
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Outline
20




Overall Approach
Data Coordination Problem
View Differencing
Update Translation
 Insertions
 Deletions
 Combining

Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Insertions
21



ProjectItems
Chase [Fagin et al. ICDE 2003]
code
9.12
CH
8.1
D1
9.06
CS
9.12
CH
a
Generates incomplete instance
containing free variables
Constrain
 Conditional
tables [Grahne 1991]
 Find spurious insertions
V
category
Concrete
Drywall
Concrete
Paint
Paint
qty
27
27
12
12
qty
27
27
12
12
12
ItemRates
code
CH
9.12
CS
9.06
D1
8.1
a
M. Lawrence, R. Pottinger, S. Staub-French
category
category
Concrete
Concrete
Concrete
Concrete
Drywall
Drywall
Paint
2011/08/31
type
heavy
sealing
12mm
b
rate
25.00
6.45
3.50
c
Conditional Tables
22
Relation with free variables [Grahne 1991]
 Tuple constraints φ
Our approach
student
course
φ
 Calculate
spuriousMath
insertions
Sally
z=0


Sally
S=
qC(C
Sally
U C+) – CS
(V U V+)
 Condition
is complement of the φs
Force S = Ø
x
z≠0
x ≠ physics
Tuples generated by chase
Sally takes Math or CS (but not both),
and possibly some other course which is not physics
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
23
C U C+
V U V+
category
Concrete
Drywall
Concrete
Paint
qC(C U C+)
ProjectItems
category qty φ
qty
Scode
(spurious
insertions)
Concrete
27
V U V+
qty
27
27
12
12
−
category
Concrete
Drywall
Concrete
Paint
qty
27
27
12
12
a cannot be
CH
27
Drywall
27 φ
category qty
D1
27
Concrete
12
CSPaint
12 27 a = D1
Paint
12
CHPaint
12 27 a = CH
Paint
12 a = CS
a Drywall
12 12 a = D1
Paint
12 a = CS
Paint
27 a = D1
ItemRates
Paint
27 a = CH
code
category
Concrete
12type
a = CH
CHConcrete
Concrete 12
heavy
a = CS
CSDrywall
Concrete 12
sealing
a = D1
Drywall
12mm
CH D1
or D1
a
Paint
b
=
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
rate
25.00
6.45
3.50
c
Outline
24




Overall Approach
Data Coordination Problem
View Differencing
Update Translation
 Insertions
 Deletions
 Combining

Insertions + Deletions
Experimental Results
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Experiments
25




TPC-H Instance
Vary Database Size, Update Size, Query Size
View Differencing: C++/MySQL
Update Translation: C++/BerkeleyDB
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Results
26
• View Maintenance linear in
update size
Execution Time (sec)
• Materialize/Compare
decreases due to decreasing
view size
• Additional experiments show
view size and sort time
dominate
Materialize/Compare
performance.
Update Size (% of instance size)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Results
27
• Instance: large hierarchy
Execution Time (sec) – log scale
• View Maintenance
exponential in number of joins
• Only if all relations are
updated
• Materialize/Compare
decreases due to decreasing
view size
• Evaluating qB (MySQL) takes
sharp rise at 23 joins
Number of Joins
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Results
28
• Instance: TPC-H
Execution Time (sec) – log scale
• Insertions exponential due to
exponential number of
potentially spurious insertions
• Deletions perform well due
to hierarchy of many to one
relationships and large
pruning benefit
Number of Joins
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Update Translation Results
29
• Instance: TPC-H
• Insertions: high degree
polynomial
Execution Time (sec)
• Wasteful to consider
translations of little interest
• Static Tables Heuristic: Only
generate tuples/free
variables for a subset of
relations
• Deletions perform well due
to optimizations available due
to relational normalization
Number of Insertions/Deletions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Conclusions
30


System for coordinating Base – Contingent data sources
with declarative mappings
Three stage approach to the data coordination
problem
View Differencing
 Update Translation
 User disambiguation



Adaptation of view maintenance for view differencing
Find all feasible update translations using incomplete
information


Insertions, deletions, and the combination
Implementation demonstrating feasibility and useful
optimizations/heuristics
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
View Differencing Summary
31


MAC – sort time dominates
IVM-VD – query size dominates
MAC
IVM-VD
Arbitrary queries (subqueries, recursion,
etc)
Conjunctive queries with union, negation,
aggregation
Requires Vt, Bt+1
Requires (B+, B-), Bt, Vt, Bt+1
Better for large updates (> 2.5%)
Better for small updates
Better for large queries
Better for small queries
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Tuple Generating Dependency
Formulation
32

V = qC(C) corresponds to 2 TGDs
V(x)  QC(x,
y)
QC(x, y) 
V(x)
Insertion TGD
(violated by V+(x))
Deletion TGD
(violated by V-(x))
(QC – Conjunction of relational predicates)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
33
QC(x, y)  V(x)
V-(x)  !QC(x,
y)
e.g. V-(x1, x2)  !C(x1, y) v !C(y, x2)
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
34
V-(0, 2)  C-(0, y) or C-(y, 2) (for all y)
V-
x1
x2
0
2
0
3
y=1
8
a
b
0
1
1
2
0
8
8
2
1
3
M. Lawrence, R. Pottinger, S. Staub-French
C
OR
AND
OR
2011/08/31
Deletion Translation (Overview)
35

Use contrapositive of deletion TGD
 V-(x)


 !QC(x, y)
Formulate expression for minimal deletions
Recursive search w/pruning for feasible solutions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Deletions
36

Build expression in conjunctive normal form
 e.g.


(C(0, 1) or C(1, 2)) and (C(0, 8) or C(8, 2) …)
Recursively try every combination
Prune infeasible combinations
 i.e.
causing spurious deletions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Optimizations
37

Redundancy in constrain step
z

≠ 2 AND (z ≠ 2 OR z ≠ 3)
Redundancy in deletions
 {C(0,
8), C(1, 2)} OR {C(0, 8), C(8, 2)}
 Worse with multiple deleted tuples
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Generalizing
38

Arithmetic comparisons
 V(x1,
x2) :- C(x1, y), C(y, x2), y > 4
 Afrati, Li, Pavlaki EDBT 2008
 Makes constrain step more difficult

Sets of constraints
 Conflicting

updates
Approximate solutions
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31
Extending
39




Ranking
Heuristics
Semantics
Issues Arising over Time
M. Lawrence, R. Pottinger, S. Staub-French
2011/08/31