mms2014 11854

Download Report

Transcript mms2014 11854

GEOMETRIC GRAPH-BASED METHODS
IN IMAGE PROCESSING, NETWORKS,
AND MACHINE LEARNING
Andrea Bertozzi
University of California, Los Angeles
Students: Ekaterina Murkerjev, Tijana Kostic, Huiyi Hu, Cristina Garcia (CGU)
Postdoc: Yves van Gennip, Braxton Osting, Nestor Guillen
Navy Collaborator: Arjuna Flenner
Other collaborators: Allon Percus (CGU), Mason Porter (Oxford), Thomas Laurent
(Loyola Marymount),
Inspiration: earlier work of Stan Osher, Chris Anderson, Luminita Vese, and Tony
Chan
Thanks to NSF, ONR, AFOSR for support.
Variational Functionals for Image
Segmentation - sharp interfaces with penalty
function restricting regularity of interface
Mumford-Shah segmentation model 1989 CPAM
Terzopoulos snakes, Lagrangian curve attracted to edges, F is
an environmental function that attracts to edges, Kass-WitkinTerzopoulos IJCV 1987
Chan-Vese Segmentation – binary with sharp interface Gamma between regions,
IEEE Trans. Imag. Proc. 2001. Solved using level sets and the TV functional via a
gradient flow.
DIFFUSE INTERFACE METHODS
Total variation
Ginzburg-Landau functional
W is a double well potential with two minima
Total variation measures length of boundary between two constant regions.
GL energy is a diffuse interface approximation of TV for binary functionals
DIFFUSE INTERFACE EQUATIONS AND THEIR
SHARP INTERFACE LIMIT
Allen-Cahn equation – L2 gradient flow of GL functional
Approximates motion by mean curvaure - useful for image segmentation and
image deblurring.
Cahn-Hilliard equation – H-1 gradient flow of GL functional
Approximates Mullins-Sekerka problem (nonlocal): Pego; Alikakos, Bates, and
Chen. Conserves the mean of u.
Used in image inpainting – fourth order allows for two boundary conditions to be
satisfied for inpainting.
EXAMPLES OF ALLEN-CAHN EQUATION IN IMAGE
PROCESSING
Selim Esedoglu : Blind deconvolution of bar codes Inverse Problems 2004 – K is
a blurring kernel which can be identified as part of the process.
Uses a gradient flow method to minimize E. Results in a modified Allen-Cahn
equation with forcing.
Threshold Dynamics
Merriman-Bence-Osher show that AC can approximated by repeated thresholding
and solution of heat equation- leads to a numerical solution of Motion by Mean
Curvature.
Esedoglu-Tsai (JCP 2006) generalize this to the solution of the piecewise constant
Mumford-Shah problem.
FAST CAHN-HILLIARD INPAINTING
US Patent No. 7,840,086
Bertozzi, Esedoglu, Gillette, IEEE Trans. Image Proc. 2007, SIAM MMS 2007
Transitioned to NGA for road inpainting.
Transitioned to InQtel for document exploitation.
Continue edges in the same direction – higher order method for local inpainting.
Fast method using convexity splitting and FFT
H-1 gradient flow for diffuse TV
L2 fidelity with known data
THE WAVELET LAPLACIAN AND DIFFUSE
INTERFACES – SHARPER INTERFACES
Total variation
Ginzburg-Landau functional
Dobrosotskaya and Bertozzi IEEE Trans. Imag. Proc. 2008
WAVELET ALLEN-CAHN IMAGE PROCESSING
• Dobrosotskaya,
Bertozzi, IEEE Trans.
Image Proc. 2008,
Interfaces and Free
Boundaries 2011.
• Transitioned to NGA
for road inpainting.
Transitioned to InQtel
for document
exploitation.
• Nonlocal wavelet
basis replaces Fourier
basis in classical
diffuse interface
method.
• Analysis theory in
Besov spaces.
• Gamma convergence
to anisotropic TV.
H-1 gradient flow for diffuse TV
L2 fidelity with known data
DIFFUSE INTERFACES ON GRAPHS
Joint work with Arjuna Flenner, China Lake
Paper MMS 2012
WEIGHTED GRAPHS FOR “BIG DATA”
In a typical application we have data supported on
the graph, possibly high dimensional. The above
weights represent comparison of the data.
Examples include:
voting records of US Congress – each person has
a vote vector associated with them.
Nonlocal means image processing – each pixel has
a pixel neighborhood that can be compared with
nearby and far away pixels.
GRAPH CUTS AND TOTAL VARIATION
Minimum cut
Maximum cut
Total Variation of function f defined on nodes of a weighted graph:
Min cut problems can be reformulated as a total variation minimization problem
for binary/multivalued functions defined on the nodes of the graph.
NONLOCAL MEANS GRAPHS AND TOTAL
VARIATION
 Buades Coll and Morel (2006)– introduced the NL Means
functional for imaging applications – patch comparisons
between pixels
 Osher and Gilboa (2007-8)– developed the Nonlocal TV
functional for imaging applications- very effective for image
inpainting applications with texture
 Drawback with Osher-Gilboa is slowness of algorithm
 We will accomplish these results with much faster run time
and extend to general Machine Learning problems
 Suggests an alternative to the NL means calculus of GilboaOsher
DIFFUSE INTERFACE METHODS ON GRAPHS
Bertozzi and Flenner MMS 2012.
Arjuna Flenner
China Lake
(Navy)
CONVERGENCE OF GRAPH GL FUNCTIONAL
van Gennip and ALB Adv. Diff. Eq. 2012
DIFFUSE INTERFACES ON GRAPHS
Joint work with Arjuna
Flenner
Replaces Laplace
operator with a
weighted graph
Laplacian in the
Ginzburg Landau
Functional
Allows for
segmentation using L1like metrics due to
connection with GL
Comparison with HeinBuehler 1-Laplacian
2010.
US HOUSE OF REPRESENTATIVES VOTING
RECORD CLASSIFICATION OF PARTY
AFFILIATION FROM VOTING RECORD
98th US Congress 1984
Assume knowledge of party affiliation of 5 of the 435 members of the House
Infer party affiliation of the remaining 430 members from voting records
Gaussian similarity weight matrix for vector of votes (1, 0, -1)
MACHINE LEARNING IDENTIFICATION OF
SIMILAR REGIONS
IN IMAGES Training Region
Original Image
Image to Segment
Segmented Image
High dimensional fully connected graph – use Nystrom extension methods for fast
computation methods.
RECALL CONVEX SPLITTING SCHEMES
Schoenlieb and Bertozzi, Comm. Math. Sci. 2011
Basic idea:
Project onto Eigenfunctions of the gradient (first variation) operator
For the GL functional the operator is the graph Laplacian
AN MBO SCHEME ON GRAPHS FOR
SEGMENTATION AND IMAGE PROCESSING

E. Merkurjev, T. Kostic and A.L. Bertozzi, SIAM J.
Imaging Sci. 2013
Instead of minimizating the GL functional
Apply MBO scheme involving a simple algorithm alternating the heat
equation with thresholding.
TWO-STEP MINIMIZATION PROCEDURE BASED ON
CLASSICAL MBO SCHEME FOR MOTION BY MEAN
CURVATURE (NOW ON GRAPHS)
 1) propagation by graph heat equation +
forcing term

2) thresholding

Simple! And often converges in just a few
iterations (e.g. 4 for MNIST dataset)
ALGORITHM
•
•
•
•
•
I) Create a graph from the data, choose a weight
function and then create the symmetric graph
Laplacian.
II) Calculate the eigenvectors and eigenvalues of the
symmetric graph Laplacian. It is only necessary to
calculate a portion of the eigenvectors*.
III) Initialize u.
IV) Iterate the two-step scheme described above until a
stopping criterion is satisfied.
*Fast linear algebra routines are necessary – either
Raleigh-Chebyshev procedure or Nystrom extension.
TWO MOONS SEGMENTATION
Second eigenvector segmentation
Our method’s segmentation
IMAGE SEGEMENTATION
Original image 1
Handlabeled grass region
Original image 2
Grass label transferred
IMAGE SEGMENTATION
Handlabeled cow region
Handlabeled sky region
Cow label transferred
Sky label transferred
GENERALIZATION MULTICLASS MACHINE
LEARNING PROBLEMS (MBO)
Garcia, Merkurjev,
Bertozzi, Percus, Flenner,
IEEE TPAMI, 2014
Semi-supervised learning
Instead of double well we have N-class well with
Minima on a simplex in N-dimensions
MNIST DATABASE
Comparisons
Semi-supervised learning
Vs Supervised learning
We do semi-supervised with
only 3.6% of the digits as the
Known data.
Supervised uses 60000 digits for training and tests on 10000 digits.
PERFORMANCE ON COIL WEBKB
HYPERSPECTRAL VIDEO SEGMENTATION
Merkurjev, Sunu, and Bertozzi, 2014, preprint
Eigenfunctions computed using Nystrom
“ground truth obtained from thresholding eigenfunctions; random initialization otherwise
Four class hyperspectral pixel segmentation of gas plume, ground, mountain, and sky
NYSTROM EXTENSION
Fowlkes Belongie Chung and Malik, IEEE PAMI 2004.
COMMUNITY DETECTION –
MODULARITY OPTIMIZATION
Joint work with Huiyi Hu (UCLA), Thomas Laurent (Loyola Marymount),
and Mason Porter (Oxford) to appear in SIAP
The modularity of a partition
Newman, Girvan, Phys. Rev. E 2004.
[wij] is graph adjacency matrix
P is probability nullmodel (Newman-Girvan) Pij=kikj/2m
ki = sumj wij (strength of the node)
Gamma is the resolution parameter
gi is group assignment
2m is total volume of the graph = sumi ki = sumij wij
measures the fraction of total
edge weight within each
community minus the edge
weight expected if edges were
placed randomly using some
null model.
This is an optimization (max) problem. Combinatorially complex – optimize over
all possible group assignments. Very expensive computationally.
BIPARTITION OF A GRAPH
Given a subset A of nodes on the graph define
Vol(A) = sum i in A ki
Then maximizing Q is equivalent to minimizing
Given a binary function on the graph f taking values +1, -1 define A
to be the set where f=1, we can define:
EQUIVALENCE TO L1 COMPRESSIVE SENSING
Thus modularity optimization restricted to two
groups is equivalent to
This generalizes to n class optimization quite naturally
Because the TV minimization problem involves functions with values on the
simplex we can directly use the MBO scheme to solve this problem.
MODULARITY OPTIMIZATION MOONS AND
CLOUDS
MNIST 4-9 DIGIT SEGMENTATION
13782 handwritten digits. Graph created based on similarity score
between each digit. Weighted graph with 194816 connections.
Modularity MBO performs comparably to Genlouvain but in about a
tenth the run time. Advantage of MBO based scheme will be for
very large datasets with moderate numbers of clusters.
4-9 MNIST SEGMENTATION
MNIST 4-9 DIGIT SEGMENTATION
13782 handwritten digits. Graph created based on similarity score
between each digit.
Modularity MBO performs comparably to Genlouvain but in about a
tenth the run time. Advantage of MBO based scheme will be for
very large datasets with moderate numbers of clusters.
LFR BENCHMARK
Lancichinetti, Fortunato, Radicchi, Phys Rev E 2008
Synthetic graphs with powerlaw distribution of community size
Mixing parameter – fraction of edges shared with other communities vs own
community
CONCLUSIONS AND FUTURE WORK
Nestor
Yves van Gennip, Nestor Guillen, Braxton Osting, and Andrea L. Bertozzi,
Mean curvature, threshold dynamics, and phase field theory on finite graphs,
2013.
Diffuse interface formulation provides competitive algorithms for machine
learning applications including nonlocal means imaging
Extends PDE-based methods to a graphical framework
Future work includes community detection algorithms (very computationally
expensive)
Speedup includes fast spectral methods and the use of a small subset of
eigenfunctions rather than the complete basis
Competitive or faster than split-Bregman methods and other L1-TV based
methods
Extension of work to convex optimization methods (joint work with Egil Bae
and Ekaterina Merkurjev)
Braxton
PREPRINTS AND REPRINTS









A. L. Bertozzi and A. Flenner, Multiscale Modeling and Simulation, 10(3), 2012.
Tijana Kostic and Andrea Bertozzi, J. Sci Comp., 2012
Y van Gennip and ALB Adv. Diff. Eq. 2012
H. Hu, Y. van Gennip, B. Hunter, A.L. Bertozzi, M.A. Porter, IEEE ICDM'12, 2012.
Y. van Gennip et al SIAP (spectral clustering gang data) 2013
E. Merkurjev, T. Kostic, and A. L. Bertozzi, An MBO Scheme on Graphs for
Segmentation and Image Processing , accepted SIAM J. Imag. Proc. 2013.
Huiyi Hu, Thomas Laurent, Mason A. Porter, Andrea L. Bertozzi, A Method Based on
Total Variation for Network Modularity Optimization using the MBO Scheme, SIAM J.
Appl. Math., 2013.
C. Garcia-Cardona, E. Merkurjev, A. L. Bertozzi, A. Flenner and A. G. Percus, Fast
Multiclass Segmentation Using Diffuse Interface Methods on Graphs, IEEE PAMI
2014
Y. van Gennip, N. Guillen, B. Osting, and A. L. Bertozzi, Mean curvature, threshold
dynamics, and phase field theory on finite graphs, Milan J. Math. 2014.