Transcript Slide 1

Fast Regression Algorithms
Using Spectral Graph Theory
Richard Peng
OUTLINE
• Regression: why and how
• Spectra: fast solvers
• Graphs: tree embeddings
LEARNING / INFERENCE
Input signal, s:
Output:
Find (hidden) pattern
in (noisy) data
REGRESSION
Mininimize:
Subject to:
|x|p
constraints on x
• p ≥ 1: convex
• Convex constraints
e.g. linear equalities
APPLICATION 0: LASSO
Ax
[Tibshirani `96]:
Min |x|1
s.t. Ax = s
Widely used in practice:
• Structured output
• Robust to noise
APPLICATION 1: IMAGES
MinΣi~j∈E(xi-xj-si~j)2
Poisson image processing
No bears were harmed in the making of these slides
APPLICATION 2: MIN CUT
s
0
0
0
0
1
1
1
1
t
Min Σij∈E|xi-xj|
s.t. xs=0, xt=1
Fractional solution
= integral solution
Remove fewest edges to
separate vertices s and t
REGRESSION ALGORITHMS
Convex optimization
• 1940~1960: simplex, tractable
• 1960~1980: ellipsoid, poly time
• 1980~2000: interior point, efficient
• m = # non-zeros
• Õ hides log factors
Õ(m1/2) interior steps
EFFICIENCY MATTERS
• m > 106 for most images
• Even bigger (109):
• Videos
• 3D medical data
KEY SUBROUTINE
Õ(m1/2)
Each step of interior point
algorithms finds a step direction
Linear system solves
MORE REASONS FOR FAST SOLVERS
[Boyd-Vanderberghe `04], Figure 11.20:
The growth in the average number
of Newton iterations (on randomly
generated SDPs)… is very small
LINEAR SYSTEM SOLVERS
• [1st century CE] Gaussian
Elimination: O(m3)
• [Strassen `69] O(m2.8)
O(m2.3755)
• [Stothers `10] O(m2.3737)
• [Vassilevska Williams`11]
O(m2.3727)
Total: > m2
NOT FAST  NOT USED:
• Preferred in practice: coordinate
• Solution quality traded for time
FAST GRAPH BASED L2 REGRESSION
[SPIELMAN-TENG ‘04]
Input: Linear system where A is
related to graphs, b
Output: Solution to Ax=b
Runtime: Nearly Linear, Õ(m)
GRAPHS USING ALGEBRA
Fast convergence
+ Low cost per step
= state of the art algorithms
[Daitch-Spielman `08]
: mincost fow
[Christiano-Kelner-Mądry-Spielman-Teng
`11]:
approx maximum flow /min cut
EXTENSION 1
[Chin-Mądry-Miller-P `12]: regression,
image processing, grouped L2
EXTENSION 2
s
[Kelner-Miller-P `12]:
t
k-commodity flow
Dual: k-variate labeling of graphs
EXTENSION 3
[Miller-P `13]:
faster for structured
images / separable graphs
NEED: FAST LINEAR SYSTEM SOLVERS
Implication of fast solvers:
• Fast regression routines
• Parallel, work efficient
graph algorithms
OTHER APPLICATIONS
• [Tutte `66]:
planar embedding
• [Boman-Hendrickson-Vavasis`04]: PDEs
• [Orecchia-Sachedeva-Vishnoi`12]:
balanced cut / graph separator
OUTLINE
• Regression: why and how
• Spectra: Linear system solvers
• Graphs: tree embeddings
PROBLEM
Given: matrix A, vector b
Size of A:
• n-by-n
• m non-zeros
SPECIAL STRUCTURE OF A
• Deg: diag(degree)
Aij = deg(i) if i=j
w(ij) otherwise
`
[Gremban-Miller `96]:
extensions to SDD matrices
UNSTRUCTURED GRAPHS
• Social network
• Intermediate
systems of other
algorithms are
NEARLY LINEAR TIME SOLVERS
[SPIELMAN-TENG ‘04]
Input: n by n graph Laplacian A
with m non-zeros, vector b
Where: b = Ax for some x
Output: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: Nearly Linear.
O(m logcn log(1/ε)) expected
• runtime is cost per bit of accuracy.
• Error in the A-norm: |y|A=√yTAy.
HOW MANY LOGS
Runtime: O(mlogcn log(1/ ε))
Value of c: I don’t know

[Spielman]:
[Miller]:
[Koutis]:
c≤70
c≤32
c≤15
[Teng]:
c≤12
[Orecchia]:
When n = 106, log6n > 106
c≤6
PRACTICAL NEARLY LINEAR TIME SOLVERS
[KOUTIS-MILLER-P `10]
Input: n by n graph Laplacian A
with m non-zeros, vector b
Where: b = Ax for some x
Output: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: O(mlog2n log(1/ ε))
• runtime is cost per bit of accuracy.
• Error in the A-norm: |y|A=√yTAy.
PRACTICAL NEARLY LINEAR TIME SOLVERS
[KOUTIS-MILLER-P `11]
Input: n by n graph Laplacian A
with m non-zeros, vector b
Where: b = Ax for some x
Output: Approximate solution x’ s.t.
|x-x’|A<ε|x|A
Runtime: O(mlogn log(1/ ε))
• runtime is cost per bit of accuracy.
• Error in the A-norm: |y|A=√yTAy.
STAGES OF THE SOLVER
• Iterative Methods
• Spectral Sparsifiers
• Low Stretch Spanning Trees
ITERATIVE METHODS
Numerical analysis:
Can solve systems in
A by iteratively
solving spectrally
similar, but easier, B
WHAT IS SPECTRALLY SIMILAR?
A ≺ B ≺ kA for some small k
• Ideas from scalars hold!
• A ≺ B: for any vector x,
|x|A2 < |x|B2
[Vaidya `91]:
Since G
A isis a
a
graph, H
B should be too!
`EASIER’ H
Ways of easier:
• Fewer vertices
• Fewer edges
Can reduce vertex count
if edge count is small
Goal: H with fewer
edges that’s similar to G
GRAPH SPARSIFIERS
Sparse equivalents of graphs
that preserve something
• Spanners: distance, diameter.
• Cut sparsifier: all cuts.
• What we need: spectrum
WHAT WE NEED: ULTRASPARSIFIERS
`
`
• Given: G with n vertices,
m edges parameter k
• Output: H with n vertices,
n-1+O(mlogpn/k) edges
• Goal: G ≺ H ≺ kG
[Spielman-Teng `04]:
ultrasparsifiers with n1+O(mlogpn/k) edges imply solvers with
O(mlogpn) running time.
EXAMPLE: COMPLETE GRAPH
O(nlogn) random edges
(with scaling) suffice w.h.p.
GENERAL GRAPH SAMPLING
MECHANISM
• For edge e, flip coin Pr(keep) = P(e)
• Rescale to maintain expectation
Number of edges kept: ∑e
P(e)
Also need to prove concentration
EFFECTIVE RESISTANCE
`
• View the graph as a circuit
• R(u,v) = Pass 1 unit of current from
u to v, measure resistance of circuit
EE101
`
Effective resistance in general:
solve Gx = euv, where euv is
indicator vector, R(u,v) = xu –
xv.
(REMEDIAL?) EE101
`
w1
u
`
u
R(u, v) = 1/w1
v
w1
w2
v
R(u, v) = 1/w1 + 1/w2
• Single edge: R(e) = 1/w(e)
• Series: R(u, v) = R(e1) + … + R(el)
SPECTRAL SPARSIFICATION BY
EFFECTIVE RESISTANCE
[Spielman-Srivastava `08]: Setting P(e) to
W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G*
[Foster `49]: ∑e W(e)R(e) = n-1
Spectral sparsifier with O(nlogn) edges
Ultrasparsifier? Solver???
*Ignoring
probabilistic issues
THE CHICKEN AND EGG PROBLEM
How to find effective resistance?
[Spielman-Srivastava `08]: use solver
[Spielman-Teng `04]: need sparsifier
OUR WORK AROUND
• Use upper bounds of
effective resistance, R’(u,v)
• Modify the problem
RAYLEIGH’S MONOTONICITY LAW
`
Rayleigh’s Monotonicity Law: R(u, v)
only increase when edges are removed
Calculate effective resistance w.r.t. a tree T
SAMPLING PROBABILITIES ACCORDING
TO TREE
`
Sample Probability: edge weight
times effective resistance of tree path
stretch
Goal: small total stretch
GOOD TREES EXIST
Every graph has a spanning tree
with total stretch O(mlogn) Hiding loglogn
∑e W(e)R’(e) = O(mlogn)
O(mlog2n) edges, too many!
‘GOOD’ TREE???
`
Stretch = 1+1 = 2
Unit weight case:
stretch ≥ 1 for all edges
WHAT ARE WE MISSING?
`
Haven’t used k!
`
• Need:
• G ≺ H ≺ kG
• n-1+O(mlogpn/k) edges
• Generated:
• G ≺ H ≺ 2G

• n-1+O(mlog2n) edges 
USE K, SOMEHOW
`
G ≺ G’ ≺ kG
• Tree is good!
• Increase weights of tree
edges by factor of k
RESULT
`
Stretch = 1/k+1/k = 2/k
• Tree heavier by factor of k
• Tree effective resistance
decrease by factor of k
NOW SAMPLE?
`
Total: n1+O(mlog2n/k)
Expected in H:
Tree edges: n-1
Off tree edges: O(mlog2n/k)
BUT WE CHANGED G!
`
G ≺ G’ ≺ kG
G’ ≺ H ≺ 2G’
G ≺ H≺ 2kG
WHAT WE NEED: ULTRASPARSIFIERS
`
`
• Given: G with n vertices,
m edges parameter k
• Output: H with n vertices,
n-1+O(mlogpn/k) edges
• Goal: G ≺ H ≺ kG
[Spielman-Teng `04]:
G ≺ H≺ 2kG
n-1+O(mlog2n/k) edges
ultrasparsifiers with
n-1+O(mlogpn/k) edges imply solvers
with O(mlogpn) running time.
PSEUDOCODE OF O(MLOGN) SOLVER
•
•
•
•
•
•
Input: Graph Laplacian G
Compute low stretch tree T of G
T  ( log2n) T
H  G +T
H  SampleT(H)
Solve G by iterating on H and
solving recursively, but reuse T
EXTENSIONS / GENERALIZATIONS
`
`
• [Koutis-Levin-P `12]: sparsify mildly
dense graphs in O(m) time
• [Miller-P `12]: general matrices:
find ‘simpler’ matrix that’s
similar in O(m+n2.38+a) time.
SUMMARY OF SOLVERS
`
`
• Spectral graph theory
allows one to find similar,
easier to solve graphs
• Backbone: good trees
SOLVERS USING GRAPH THEORY
Fast solvers for graph Laplacians
use combinatorial graph theory
OUTLINE
• Regression: why and how
• Spectra: linear system solvers
• Graphs: tree embeddings
LOW STRETCH SPANNING TREE
Sampling probability: edge weight
times effective resistance of tree path
Unit weight case: length of tree path
Low stretch spanning tree:
small total stretch
DIFFERENT THAN USUAL TREES
n1/2-by-n1/2 unit weighted mesh
‘haircomb’ is both shortest path
and max weight spanning tree
stretch(e)= O(1)
stretch(e)=Ω(n1/2)
total stretch = Ω(n3/2)
A BETTER TREE FOR THE GRID
Recursive ‘C’
LOW STRETCH SPANNING TREES
[Elkin-Emek-Spielman-Teng `05],
[Abraham-Bartal-Neiman `08]:
Any graph has a spanning tree
with total stretch O(mlogn)
Hiding loglogn
ISSUE: RUNNING TIME
Algorithms given by
[Elkin-Emek-Spielman-Teng `05],
[Abraham-Bartal-Neiman `08]
take O(nlog2n+mlogn) time
Reason: O(logn) shortest paths
SPEED UP
Shortest path on graphs with k distinct
weights can run in O(mlogm/nk) time
[Koutis-Miller-P `11]:
• Round edge weights to powers of 2
• k=logn, total work = O(mlogn)
Hiding loglogn, we actually improve these
PARALLEL ALGORITHM?
`
`
• [Blelloch-Gupta-Koutis-Miller-PTangwongsan. `11]: current framework
parallelizes to O(m1/3+a) depth
 fast parallel graph algorithms
PARALLEL GRAPH ALGORITHMS?
• Before this work: parallel time >
state of the art sequential time
• Our result: parallel work close to
sequential, and O(m2/3) time
FUNDAMENTAL PROBLEM
Long standing open problem:
theoretical speedups for BFS /
shortest path in directed graphs
Sequential algorithms are too fast!
PARALLEL ALGORITHM?
`
`
First step of framework by
[Elkin-Emek-Spielman-Teng `05]:
 shortest path 
PARALLEL TREE EMBEDDING
• Workaround: use earlier algorithm
by [Alon-Karp-Peleg-West `95]
• Idea: repeated clustering
• Based on ideas from [Cohen `93, `00]
for approximating shortest path
PARALLEL TREE EMBEDDING
THE BIG PICTURE
• Need fast linear system
solvers for graph regression
• Need combinatorial graph
algorithms for fast solvers
ONGOING / FUTURE WORK
• Better regression?
• Faster/parallel solver?
• Sparse approximate
(pseudo) inverse?
• Other types of systems?
THANK YOU!
Questions?