 On the Inverse rules algorithm

Download Report

Transcript  On the Inverse rules algorithm

 On the Inverse rules algorithm
It is guaranteed to compute the certain answers
But, what about its efficiency?
As presented, it computes tuples using views that cannot
contribute to the rewriting, and then discards these tuples
We show examples, and then how to address the problems
2005
lav-iv
1
Example :
A db: parenthood relation
par(c, p)
A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren
The algorithm inverts the view:
par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G)
Given n tuples in the view, it produces 2n tuples, then joins, the
discards the results that contain f(-,-)
The bucket algorithm will spend more time on rewriting, find:
Q’(X, Y) :- v(X, Y)
And then output the n results
2005
lav-iv
2
Example (university db) :
Views:
v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98
v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q)
v3(s, c)
:- registered(s, c, q), q<=a94
v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97
Query:
q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95
Inverting v3:
registered(s, c, f(s,c)) -: v3(s, c)
This may produce any number of facts for registered, but for this
query none can be used – why?
2005
lav-iv
3
v3(s, c)
:- registered(s, c, q), q<=a94
q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95
• How should the constraint on q in v3 be represented?
Could export it by f(s, c) <=a94 – then notice conflict with f(s, c)
>= a95 in query (how is q in the query transformed to f(s,c)?)
But, what if the view contained no constraint?
The view must export variables constrained in the query
• The query has a join on q with teaches; teaches facts are
derived only from other views, so q will be exported as a
different function symbol, or as q (which of these here?)
 a join will fail (cannot join f1(-,-) with f2(-,-) or a regular variable)
 The view must export join variables of the query
2005
lav-iv
4
The factors that determine usability of a view are the same as in
the bucket algorithm, but the inverse rules algorithm tries to
use all views anyway
Solution: compose query with inverse rules, to obtain a new query
that uses directly the views
Composition:
Consider the heads of inverse rules as a db – collection of facts
Look for valuations – mapping of query variables that map query
atoms to this db
Then repalce query goals by views
2005
lav-iv
5
Example :
A db: parenthood relation
par(c, p)
A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren
The algorithm inverts the view:
par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G)
‘db’
Two candidate valuation mappings:
X  C, Z  f(C,G), Y  G
 q(C, G) :- v(C, G), v(C, G)
X  f(C, G), Z  ,G, Y  f(C, G)  (assuming we add C=G)
q(f(G, G), f(G,G)) :- v(G, G), v(G, G)
2nd is discarded – no function symbols in result
Minimization of 1st gives q(C, G) :- v(C, G), same as bucket
2005
lav-iv
6
q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95
registered(s, c, f(s, c)), f(s, c)<=a94 :- v3(s, c)
Any valuation that uses this fact must map q  f(s, c)
• The constraint f(s, c) <= a94 conflicts with f(s,c)>=a95,
but what if there is no constraint to export?
• The mapping q f(s, c) cannot be used to map teaches to any
fact derived from other views
 v3 cannot be used
2005
lav-iv
7
A mapping will fail to define a valuation if
• a view does not export a join variable, and does not contain the
join (why?)
• The view does not export a variable that is constrained in the
query (cannot ‘check’ the constraint in the ‘db’)
Thus, the results (for a CQ query, possibly with constraints) will be
the same as for bucket (assuming it is correct & complete)
The amount of work invested will probably be similar
Composition can be performed also for Datalog queries, but
weeding out useless mappings is more difficult
2005
lav-iv
8
The MiniCon algorithm --- the final one?
 Motivation
 Preliminaries
 The MiniCon algorithm
2005
lav-iv
9
Motivation
Previous algorithms:
bucket, inverse rules, may be quite
expensive to use, especially for systems with many views.
The bucket algorithm has a narrow peephole in 1st stage – each
bucket is for a single atom
 global constraints are treated only in 2nd stage
 Many useless combinations may be examined
The inverse rules algorithm improved by composition, seems to
perform similar work
The motivation: find an algorithm that will do more work in
preliminary filtering, and will scale up to hundreds of views
2005
lav-iv
10
 Preliminaries
The idea
• Once a view is put in a bucket of a query atom, switch to
considering join variables – and find which other atoms are
necessarily covered by the view
• Along the way, find out also which view head variables need
to be equated
• Given coverage by views, combine views with disjoint covers
Expected gain:
• more filtering in the 1st stage,
• better representation of information
 A smaller number of combinations, reduced number of
containment checks in the 2nd stage
2005
lav-iv
11
Example :
A db: parenthood relation
par(c, p)
A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)
Bucket : one view in each bucket
par(X, Z): {v(X,G)}
par(Z, Y): {v(P, Y)}
When the two view atoms are combined, a containment check
discovers that G=Y  containment, & redundancy of 2nd atom
Alternative: given par(X, Z): v(X,G), since Z (join var) occurs in 2nd
atom of query, add par(Z, Y) to coverage of v(X,G), with G=Y
In 2nd stage, just use v(X, Y)
2005
lav-iv
12
Assumptions, terminology:
• CQ queries and views, for now: no constants / constraints in query/views
• View definitions use variables different from those in query or
other views (disjoint sets of variables)
• b(Q) – body atoms of Q, b(V) – body atoms of view V
• A mapping from vars(Q) to a vars(V) is interesting only if it
maps a non-empty subset of b(Q) to b(V)
• Considered mappings always map Q head vars to V head vars –
head var preservation – (hvp)
• If h maps x in vars(Q) to an existential var in some V, then all
atoms of b(Q) that contain x must be mapped to same V:
join variable condition --- (jvc)
2005
lav-iv
13
Given Q(X), assume Q’ is a rewriting in terms of views
Q’: q(X) :- v1(X1), …, vn(Xn)
(some vi, vj may be occurrences of same view v)
Exists containment mapping h from Q to exp(Q’) (satisfies hvp)
Let
• Gi be the set of atoms of b(Q) mapped to b(exp(vi))
• h/i – h restricted to vars(Gi)
Then
And
Gi Ç Gj = Æ, U Gi = b(Q)
Gi satisfies (jvc):
if h/i maps x of vars(Gi) to existential variable of vi,
then every atom g in b(Q) that contains this atom is in Gi
2005
lav-iv
14
The occurrence of vi in Q’ may have some head variables equated
Example :
the original head might be vi(A, B, C)
the head in Q’ : vi(X, X, Z)
These equalities are given by a unique least set of equality
constraints Ei
(v/E -- the view v, with head variables equated as specified by E)
Summary (so far): the containment mapping can be decomposed
into “disjoint” components (vi, Ei, h/i , Gi)
All we need to do is find such components, then combine them
What is the condition for successful combination?
Does a combination (s.t. Gi Ç Gj = Æ, U Gi = b(Q) ) ever fail ?
2005
lav-iv
15
To find such components, we must use the given view definitions
(variables different from those of Q or exp(Q’)).
Answer : a component and its mapping can be expressed as:
h/i
Gi
exp(vi(Xi))
hi
Here:
h’i
vi/E’i
•
hi is a mapping from Q to the given view definition for vi
•
E’i – the least set of equalities that make hi a good mapping
•
h’i is a variable renaming
E’i and hi depend only on Q and the definition of vi
 We can find components mappings from Q to the view defs,
then combine & rename, possibly equating more head vars
2005
lav-iv
16
One more step :
A component (vi, Ei, hi , Gi) may be further decomposed into
smaller components (vi, Ei1, hi1 , Gi1), (vi, Ei2, hi2 , Gi2)
provided
• each of Gi1, Gi2 satisfies (jvc), and they are disjoint
• Each of Ei1, Ei2 is a subset of Ei, least sets for the mappings
hi1, hi2 to be ok
When these are combined, Ei1 union Ei2 is augmented with the
remaining equalities of Ei
Minimal such components:
• Easier to find
• Can be re-used for different combinations.
2005
lav-iv
17
What is a minimal component?
C = (vi, Ei, hi, Gi) is minimal if
• hi satisfies (hvp) + (jvc) (assuming the equalities in Ei)
• There is no component C1 whose last three components are
contained in C’s last three components (at least one is proper
containment)
A component: minicon (mini containment) description -- MCD
The algorithm constructs and combines minimal MCDs
2005
lav-iv
18
The MiniCon Algorithm
Minimal MCD Construction Algorithm :
For each g in b(Q), each k in each b(vi)
Let E(g,k) be the least set of equalities s.t. a mapping h(g,k)
from g to k that satisfies (hvp) exists
// E(g,k) and h(g,k), if they exist,
// are uniquely determined by g, k
If E(g,k) and h(g,k) exist
find all minimal MCDs that extend them:
(vi, Ei, hi, Gi) extends if
Ei contains E(g,k), hi contains h(g,k), Gi contains g
For the final set of MCDs remove duplicates
2005
lav-iv
19
How do we find minimal MCDs that extend a given mapping?
I. Extension to one more query atom, one view atom
extend (vi, E, h, g, k) // E equalities on head vars of vi
// h: vars(Q)  vars(vi), partial, hvp with E
// g in b(Q), k in b(vi)
try to extend h to map g to k, with hvp, by adding equalities
to E
return fail, or the (uniquely determined) E’,h’
(The first step in alg. of previous page is this one, given empty E and h)
2005
lav-iv
20
How do we find minimal MCDs that extend a given mapping?
II. Extend repeatedly, as long as needed and successful
Given vi, g, k , E(g,k) and h(g,k) :
Let C = {(vi, E(g,k), h(g,k), {g}}, MC = {}
//C – initial component, (jvc) possibly not satisfied
While C not empty
– remove some c = (vi, E, h, G) from C
– if (jvc) satisifed – put in MC
– if not, exists x in vars(Q) s.t. h(x) is existential, g’ that contains x,
g’ not in G
– for each k’ in b(vi)
if extend(vi, E, h, g’, k’) succeeds, put extension in C
Remove duplicates from MC
2005
lav-iv
21
Example :
A db: parenthood relation
par(c, p)
A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)
MCDs:
•
1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={}
need to extend to par(Z, Y), can only map to 2nd view atom
MCD: (v, E={}, h={XC, ZP, YG}, b(Q))
•
1st query atom, 2nd view atom: no mapping
…
The only MCD is the above
2005
lav-iv
22
Comment :
In the paper, if (vi, Ei1, hi1, Gi1) and (vi, Ei2, hi2, Gi2) are both
minimal extensions, and Gi1 is contained in Gi2, then the 2nd
is thrown away (another minimization)
I do not know how to explain this optimization, or prove that
with it the algorithm is still complete
2005
lav-iv
23
2nd phase: MCD combination, and variable renaming :
A set of MCDs {(vi, Ei, hi, Gi)} is a candidate if
Gi Ç Gj = Æ, U Gi = b(Q)
For each candidate set:
Rename variables : for each view variable y :
If hi(x) = y (y a view variable), rename y to x
else rename y to a fresh distinct variable
Note : if x in domain of both hi, hj , then hi(x), hj(x) are head
variables of vi, vj (by def of MCD),
 renaming makes them equal
2005
lav-iv
24
Example (cont’d):
A db: parenthood relation
par(c, p)
A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)
MCD: (v, E={}, h={XC, ZP, YG}, b(Q))
Rename in v C to X, G to Y
Rewriting: q(X, Y) :- v(X, Y)
2005
lav-iv
25
Example :
A db: parenthood relation
A view:
par(c, p)
v(C, G) :- par(C, P), par(P, G) // only grandchildren
A query: Q: q(X, X) :- par(X, Z), par(Z, X) // I am my own grandpa
MCDs:
•
1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={}
need to extend to par(Z, X), can only map to 2nd view atom
MCD: (v, {C=G}, {XC, ZP}, b(Q))
•
1st query atom, 2nd view atom: no mapping
…
The only MCD is the above
2005
lav-iv
26
Example :
A db: parenthood relation
A view:
par(c, p)
v(C, P) :- par(C, P), par(P, G)
// parents where grandparents exist
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)
MCDs:
• h(1,1) = {X C, Z P}, E(1.1) ={}
 MCD A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} )
• h(1, 2) = {X P, Z  G}, E(1,2)={}, fails (why?)
• h(2, 1) = {Z C, Y  P}, E(2,1)={}
 MCD A2 = ( v(C, P), {}, h(2,1), {}, {par(Z,Y)} )
• h(2, 2) = {Z P, Y  G}, fails (why?)
2005
lav-iv
27
A view:
v(C, P) :- par(C, P), par(P, G)
A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)
MCDs:
A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} )
A2 = ( v(C, P), {}, h(2,1), {par(Z,Y)} )
Rewritings: (rename views to have distinct vars)
A1+A2: X C1, Z P1, Z C2, Y  P2 : add P1 (in 1st v) = C2 (in 2nd v)
rewriting v(C1,P1), v(P1, P2)
renaming: v(X, Z), v(Z, Y) – a correct rewriting
2005
lav-iv
28
When Q or views contain constants:
MCD formation:
• a of Q must be mapped to a head variable of vi, or itself
• If x is in headvar(Q), it can be mapped to headvar(vi) or to a
• Whenever x is mapped to a, hi records this fact
MCD combination:
If A1, A2 are defined on x, then allow also
• Both map x to a
• One maps x to a, the other to head var of view
• In either case, rename x to a in rewriting
2005
lav-iv
29
When Q or views contain comparisons:
• If views contain comparisons, no change to algorithm
(it finds contained rewritings anyway)
• If Q contains comparisons, then there may be no Datalog
program that computes the certain answers (can express x != y)
But, we can expect that extending the algorithm for comparisons
will be a good heuristics, and will find certain answers in many
cases
2005
lav-iv
30
When Q or views contain comparisons:
C(Q) – constraints of Q (closed under inference)
MCD formation: (vi, Ei, hi, Gi) (extend the join variable condition)
• If hi(x) is existential of vi, and c(x, y) in C(Q), then hi(y) is
defined
• C(vi) must imply all constraints in hi(C(Q)) that involve at
least one existential of vi
MCD combination:
Add all constraints of C(Q) not covered by those of the views
2005
lav-iv
31