Transcript slides

A DICHOTOMY ON THE COMPLEXITY OF
CONSISTENT QUERY ANSWERING FOR
ATOMS WITH SIMPLE KEYS
Paris Koutris
Dan Suciu
University of Washington
REPAIRS
• An uncertain instance I for a schema with key constraints
• A repair r of I is a subinstance of I that satisfies the key
constraints and is maximal
The 4 possible repairs
R(x, y)
(a1, b1)
(a1, b1)
(a1, b1)
(a1, b2)
(a1, b2)
(a1, b2)
(a2, b2)
(a2, b2)
(a2, b2)
(a2, b2)
(a2, b2)
(a3, b3)
(a3, b4)
(a3, b3)
(a3, b4)
(a3, b3)
(a4, b4)
(a4, b4)
(a4, b4)
(a4, b4)
(a3, b4)
(a4, b4)
2
CONSISTENT QUERY ANSWERING
• If Q is boolean, we say that I is certain for Q, I |= Q, if for
every repair r of I, Q(r) is true
R(x, y)
S(y, z)
(a1, b1)
(b1, c1)
(a1, b2)
(b2, c1)
(a2, b2)
(b2, c2)
(a3, b3)
(b3, c3)
• Q() = R(x, y), S(y, z)
• I |= Q
(a3, b4)
(a4, b4)
3
PROBLEM STATEMENT
CERTAINTY(Q): Given as input an instance I, does I |= Q when
Q is a boolean CQ?
• In general, CERTAINTY(Q) is in coNP
– Q1 = R(x, y), S(y, z) : expressible as a first-order query
– Q2 = R(x, y), S(z, y) : coNP-complete
– Q3 = R(x, y), S(y, x) : PTIME but not first-order expressible
Conjecture For every boolean conjunctive query Q,
CERTAINTY(Q) is either in PTIME or coNP-complete
4
PROGRESS SO FAR
• [Wijsen, 2010]
– Syntactic characterization of FO-expressible acyclic CQs w/o selfjoins
• [Kolaitis and Pema, 2012]
– A trichotomy for CQs with 2 atoms and no self-joins
• [Wijsen, 2010 & 2013]
– PTIME algorithm for cyclic queries: Ck = R1(x1,x2), …, Rk(xk, x1)
– Further classification of acyclic CQs w/o self-joins
5
OUR CONTRIBUTION
A dichotomy for CQs w/o self-joins where atoms have either
• Simple keys : R(x, y, z)
• Keys that consist of all attributes: S(x, y, z)
Theorem For every boolean CQ Q w/o self-joins where
for each atom the key consists of either one attribute or
all attributes, there exists a dichotomy of CERTAINTY(Q)
into PTIME and coNP-complete
6
OUTLINE
1. The Dichotomy Condition
2. Frugal Repairs & Representable Answers
3. Strongly Connected Graphs
7
THE QUERY GRAPH
• We equivalently study boolean CQs consisting only of
binary relations where one attribute is the key: R(x, y)
• Relations can be consistent (Rc) or inconsistent (Ri)
Query Graph: a directed edge (u, v) for each atom R(u,v)
source node uR
x
Q = Ri(x, y), Si(z, w), Tc(y, w)
G[Q]
R
end node vR y
z
S
T
w
8
DEFINITIONS
• x+,R : set of nodes reachable from node x once we remove
the edge R (through a directed path)
• R ~ S [source-equivalent]: source nodes uR, uS are in the
same SCC
• [R]: the equivalence class of R w.r.t ~
u
y
R
S
T
z
V
x
• x+,R = {x, v, w}
• R ~ T and [R] = {R, T}
v
U
w
9
COUPLED EDGES
coupled+(R) = edges in [R] + any inconsistent edge S s.t. the
source node uS is connected to the end node vR through a
(undirected) path that does not intersect with uR+,R
u = uV
y = vR
R
S
v
T
z
V
x = uR
U
w
coupled+(R):
• contains R,T: [R] = {R, T}
• contains V: path from y (= vR )
to u (= uV)
• does not contain U
The set uR+,R
10
SPLITTABLE GRAPHS
• Two inconsistent edges R, S are coupled if
– S in coupled+(R) & R in coupled+(S)
• A graph G[Q] is:
– unsplittable if it contains a pair of coupled edges that are not
source-equivalent.
– splittable otherwise
+
u
y
R
S
T
z
V
x
v
coupled (R) = {R, T, V}
coupled+(T) = {R, T, V}
coupled+(V) = {V}
coupled+(U) = {U,V,R,T}
Only R,T are coupled
U
w
SPLITTABLE!
11
THE DICHOTOMY CONDITION
Dichotomy Theorem
• If G[Q] is splittable, CERTAINTY(Q) is in PTIME
• If G[Q] is unsplittable, CERTAINTY(Q) is coNPcomplete
u
y
R
S
T
z
V
x
Splittable, so in PTIME
v
U
w
12
EXAMPLES
R(x, y), S(y, z)
x
R(x, y), S(y, z), Tc(x, z)
z
z
x
PTIME
coNP-complete
y
y
R(x, y), S(z, y), Uc(y, z)
R(x, y), S(y, z), Uc(z, y)
x
x
z
z
coNP-complete
PTIME
y
y
13
OUTLINE
1. The Dichotomy Condition
2. Frugal Repairs & Representable Answers
3. Strongly Connected Graphs
14
FRUGAL REPAIRS (1)
Definition A repair r of an instance I is frugal for a
boolean query Q if for any other repair r’ of I, Qf(r’) is not
strictly contained in Qf(r)
Qf = all body variables to the head (full query)
R(x, y)
S(y, x)
(a1, b1)
(b1, a1)
not frugal
repair r1 = { R(a1, b1), R(a2, b3), R(a3, b4), R(a4, b4)
S(b1, a1), S(b3, a2), S(b4, a3) }
Qf(r1)
= { (a1, b1), (a2, b3), (a3, b4) }
frugal
(a1, b2)
(a2, b3)
(b3, a2)
(a3, b4)
(b4, a3)
(a4, b4)
(b4, a4)
repair r2 = { R(a1, b2), R(a2, b3), R(a3, b4), R(a4, b4)
S(b1, a1), S(b3, a2), S(b4, a3) }
Qf(r2)
={
(a2, b3), (a3, b4) }
15
FRUGAL REPAIRS (2)
• I |= Q if and only if every frugal repair satisfies Q
• We lose no generality if we study only frugal repairs!
R(x, y)
S(y, x)
(a1, b1)
(b1, a1)
(a1, b2)
(a2, b3)
(b3, a2)
(a3, b4)
(b4, a3)
(a4, b4)
(b4, a4)
Only two frugal repairs:
• Qf(r2) = {(a2, b3), (a3, b4)}
• Qf(r3) = {(a2, b3), (a4, b4)}
16
OR-SETS
• Efficiently represent all answer sets of frugal repairs
• We use or-sets: <1, 2, 3> means 1 or 2 or 3
– A = < {1, 3}, {1, 4}, {2, 3}, {2, 4} >
– We can “compress” A as B = {<1, 2>, <3, 4>}
– [Libkin and Wong, ‘93] “decompression” α operator: α(B) = A
• The or-set of answer sets for frugal repairs of I for Q:
– MQ(I) = < {(a2, b3), (a3, b4)}, {(a2, b3), (a4, b4)} >
• Compressed form (set of or-sets):
– AQ(I) = { < (a2, b3) >, < (a3, b4), (a4, b4) > }
17
REPRESENTABILITY (1)
• An or-set-of-sets S is representable if there exists a set-ofor-sets S0 (compression) such that:
– α(S0) = S
– For any distinct or-sets A, B in S0, the tuples in A and B use distinct
constants in all coordinates
• The compression of a representable set with active domain
of size n has size polynomial in n
< {(a2, b3), (a3, b4)}, {(a2, b3), (a4, b4)} >
compression
< {(a2, b3), (a3, b4)}, {(a2, b2), (a4, b4)} >
not representable
{< (a2, b3) >, <(a3, b4), (a4, b4) >}
18
REPRESENTABILITY (2)
• I |= Q iff the compression AQ(I) is not empty
• If we can compute AQ(I) in polynomial time, deciding
whether I |= Q is in PTIME
Theorem If G[Q] is a strongly connected graph, MQ(I) is
representable and its compression can be computed in
polynomial time in the size of I
19
OUTLINE
1. The Dichotomy Condition
2. Frugal Repairs & Representable Answers
3. Strongly Connected Graphs
20
CYCLES
• Ck= R1(x1, x2), R2(x2, x3)…, Rk(xk, x1)
• The purified instance contains a
collection of disjoint SCCs
• ALGORITHM FrugalC
– Find the SCCs that contain no directed a1
cycle of length > k
– For each such SCC i, create an or-set Ai
that contains all cycles of length k
– Output ACk(I) = {A1, A2, …}
b1
R(x, y)
S(y, z)
T(z, x)
(a1, b1)
(b1, c1)
(c1, a1)
(a2, b2)
(b2, c2)
(c2, a2)
(a2, b3)
(b3, c2)
a2
c1
b2
b3
c2
AC3(I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>}
21
GENERAL CASE: SCCS (1)
• Recursively split a SCC G into a SCC G’ and a directed
path P that intersects G’ only at its start and end node
• The set AG’(I) can be recursively computed
z
x
T
Graph G’
V
S
R
y
U
The path P = y -- > t -- > z
t
AG’(I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>}
A1
A2
22
GENERAL CASE: SCCS (2)
AG’(I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>}
A1
Any value belongs in
a unique or-set
A2
B(a, b)
B1c (b, y)
B2c (b, z)
B0c (z, b)
(A1, [a1b1c1])
([a1b1c1], b1)
([a1b1c1], c1)
(c1, A1)
(A2, [a2b2c2])
([a2b2c2], b2)
([a2b2c2], c2)
(c2, A2)
(A2, [a2b3c2])
([a2b3c2], b3)
([a2b3c2], c2)
a
B0c
B
Replacement
of G’
A cycle C = a -> b -> y -> t -> z -> a + a
chord B2 that is a consistent relation
z
B2c
b
B1c
y
V
U
t
23
REST OF THE PROOF
• PTIME algorithm for splittable graphs
– Find a separator in G[Q] (always exists if a graph is splittable)
– The separator splits G[Q] into cases with fewer inconsistent edges,
which are solved recursively
– Base case: all edges are consistent (check whether Q(I) is true)
• coNP-hardness
– Reduction from the Monotone-3SAT problem
24
CONLUSIONS
• Significant progress towards proving the dichotomy for the
complexity of Certain Query Answering for Conjunctive
Queries
• Settle the dichotomy (or trichotomy) even for queries with
self-joins!
25
Thank you !
26