Testing of Clustering

Download Report

Transcript Testing of Clustering

On Proximity Oblivious Testing
Oded Goldreich - Weizmann Institute of Science
Dana Ron – Tel Aviv University
Property Testing: informal definition
A relaxation of decision problems:
For a fixed property P and any object O,
determine whether O has property P
or is far from having property P
(i.e., O is far from any other object having P).
?
?
?
?
?
Focus: sub-linear time algorithms – performing the
task by inspecting the object at few locations.
Property Testing:
The standard (one-sided error) definition
A property P = n Pn , where Pn is a set of functions
with domain Dn.
The (standard) tester gets explicit input n and ,
and oracle access to a function with domain Dn.
• If f  Pn then Pr[Tf(n,) accepts] = 1.
• If f is -far from Pn then Pr[Tf(n,) rejects] > 2/3.
(Distance is defined as fraction of disagreements.)
Focus: query complexity q(n,)=q() ( « |Dn| )
Terminology:  is called the proximity parameter.
How does a tester use the proximity parameter
Some testers use the proximity parameter  merely
to determine the number of times that a basic test is
performed, where the basic test is oblivious of the
proximity parameter.
We call such basic tests Proximity Oblivious Testers.
Example: the [Blum,Luby,Rubinfeld] (BLR) linearity tester
On input n,  (and access to f),
repeat the following basic test (1/) times:
1. Select uniformly x,y in Dn
2. If f(x) + f(y)  f(x+y) then reject.
If any basic test rejects then Reject o.w. Accept.
Proximity Oblivious Testing
A property P = n Pn ’ where Pn is a set of functions
with domain Dn.
A P.O. Tester (POT) gets explicit input n (but not ),
and oracle access to a function f with domain Dn.
• If f  Pn then Pr[Tf(n) accepts] = 1.
• If f  Pn then Pr[Tf(n) rejects]  (P(f)),
where P(f) denotes the distance of f from P
and : (0,1] (0,1] is the “detection rate”
Note: A standard tester can be obtained by
repeating the POT (i.e., on prox. par. , repeat
(1/()) times).
Focus: constant query complexity q(n)=q
Questions Concerning POTs
1. Which “testable” properties have POTs?
2. How does the complexity of the standard tester
obtained by repeating the POT compare to the
complexity of the best possible standard tester?
Motivational discussion:
Property testing relates local views to global properties POTs take this to an extreme (how does constant-size
view relate to distance to property).
Study of this subclass of testers (those obtained from
POTs) may shed light on property testing at large.
POTs appeared (implicitly) mainly for Algebraic Properties
(e.g., linearity and low-degree polynomials). Here we
focus on Graph Properties (in two standard models).
Models used for Testing Graph Properties
Dense Graphs Model
v
(graph is represented by n x n adjacency matrix)
• Queries: Is (u,v)  E ?
• Distance: Fraction of matrix modifications
(among n2 entries)
• Suitable: Dense graphs
u
1
G=(V,E) is represented by a function fG :[n][n]{0,1}.
Bounded-Degree Graphs Model
(graph is represented by n incidence lists of size d)
• Queries: Who is i’th neighbor of v?
• Distance: Fraction of modifications in lists
(among dn entries)
• Suitable: (Almost)-regular sparse graphs
(in particular, constant-degree graphs)
1
2 … d
1
2 … d
1
n
G=(V,E) is represented by a function fG :[n][d][n].
Our Results
This talk
Dense graphs model:
- Give constant-query POTs for several natural graph properties
and prove matching lower bounds.
- Give example of natural property where there is no constantquery POT.
- Characterize class of graph properties that have constant-query
POTs: show that equal properties that correspond to induced
subgraph freeness. (Note: quite restricted compared to standard
testers as characterized by [Alon, Fischer, Newman, Shapire](
Bounded-degree graphs model:
- Characterize class of graph properties that have constant-query
POTs: show that equal properties that correspond to certain
generalized notion of subgraph freeness (includes induces/noninduces subgraph freeness, but also degree regularity (nonhereditary)).
The dense graphs model:
Two simple examples
Recall: in this model a graph G=(V,E) is
represented by a function fG:[n][n]{0,1}.
Example 1: Clique. The property of being a clique
has a trivial single-query POT with ()=.
Example 2: BiClique. The property of being a
biclique has a three-query POT with ()=.
Select s[n] arbitrarily, and random u,v[n].
Accept iff the induced subgraph is a biclique
(i.e., has an even number of edges).
Example 2 continued
POT: Select s[n] arbitrarily, and random u,v[n].
Accept iff the induced subgraph is a biclique
(i.e., has an even number of edges).
Analysis technique:
s induces a partition,
u and v check it.
s
(s)
[n] \ (s)
x
Suppose that the graph is at
distance  from Biclique. Then:
#edges in same side + #non-edges between sides  N2
w.p.   over u,v
induced subgraph
has 1 or 3 edges
induced subgraph
has 1 edge
Get:
()=
Example 3: Triangle-Freeness
[Alon,Fischer,Krivelevitch,Szegedy], [Alon]
THM: -freeness has a 3-query POT with
()=1/Tower(1/), but no O(1)-query POT with
()=poly().
The point is that being -far from -freeness
means that n2 edges must be omitted to obtain a
-free graph, but this does not mean that the graph
has n3 (nor poly()n3 ) triangles.
Conclusion: easy testability and
POT-ness are not straightforward
(what seems easy is not necessarily so).
Example 4: testing bipartiteness
Recall that Bipartitness is efficiently testable with
poly(1/) queries.
Thm: Bipartitness has no O(1)-query POT.
Pf: Consider an odd-length super-cycle
consisting of (1/1/2) (equal-sized)
independent sets, with complete bipartite
graphs between each adjacent pair.
The graph is -far from bipartite, but no
O(1)-size subgraph gives evidence
Conclusion: easily testable properties
may not have POTs.
Characterization of graph properties
that have a POT
Defn: For a graph G and a set of graphs F, we say
that G is F-free if no induced subgraph of G belongs
to F.
Thm: Property P has an O(1)-query POT iff P equals
the set of F-free graphs for some F that is a fixed
set of O(1)-size graphs.
(To be precise, P= n Pn and Pn equals the set of Fn-free graphs.)
Proof builds on [Goldreich Trevisan] and
[Alon,Fischer,Krivelevitch,Szegedy].
Examples: Clique  I2-free,
Bi-Clique  {
,
} –free
Note: the (detection) function () is not necessarily
polynomial, and may be e.g. a tower.
Example 5: testing Clique Collection (CC)
A graph G belongs to CC if it consists of a
union of cliques (of any number and size).
CC is efficiently testable with Õ(1/) queries
(by a (std.) adaptive tester) and even
Õ(-4/3) non-adaptive queries suffice [GR].
Thm: CC has a 3-query POT with ()=(2),
and no O(1)-query POT can do better.
Conclusion: The (std.) tester obtained
by repeating the best POT may have
significantly higher complexity
than the best standard tester.
Example 6: Testing c-Clique Collection (c-CC)
A graph G belongs to c-CC if it consists of a
union of c cliques (of any size), for a constant c.
c-CC is efficiently testable with Õ(1/) queries
(by a (std.) non-adaptive tester) [GR].
Thm: For every c2, the property c-CC has a
(c+1)-query POT with ()=(c/2),
and no O(1)-query POT can do better.
Conclusion: The (std.) tester obtained
by repeating the best POT may have
tremendously higher complexity
than the best standard tester.
Summary and Open Problems
 Initiate study of Proximity Oblivious Testers in context of
graph properties.
 Give positive and negative results in two standard models of
testing graph properties, and in particular provide
characterization in each model.
 Several conclusions in dense graphs model:
- Easy testability and POT-ness are not straightforward
(what seems easy is not necessarily so).
- Easily testable properties may not have POTs.
- The (std.) tester obtained by repeating the best POT may
have significantly higher complexity than the best
standard tester.
 In dense graphs model: for what sets F does F-freeness have
poly() detection probability? (For single graphs F have answer in
[Alon&Shapire] ).
 In bounded-degree model: issue of “propogation” (Teaser…)
Thanks