Transcript ppt

A Singular Value Thresholding
Algorithm for Matrix Completion
Jian-Feng Cai
Emmanuel J. Candes
Zuowei Shen
Presented by Changchun Zhang
Motivation
• Rapidly growing interest in the recovery of an
unknown low-rank or approximately low rank from
very limited information.
• Matrix Completion – Recovering a rectangular matrix
from a sampling of its entries.
• Proved by Candes and Recht is that most low-rank
matrices can be recovered exactly from most sets of
sampled entries, by solving a simple convex
optimization.
Motivation (cont.)
• Unknown matrix
has available m
sampled entries
is a random
subset of cardinality m. Most matrices M of rank r
can be perfectly recovered by solving the optimization
problem
provided that the number of samples obeys
for some positive numerical constant C
Paper Contribution
• Introduces a novel algorithm to approximate the
matrix with minimum nuclear norm among all
matrices obeying a set of convex constraints.
• Developed a simple first-order and easy-to-implement
algorithm that is extremely efficient at addressing
problems in the optimal solution has low rank.
• Provide the convergence analysis showing that the
sequence of iterates converges.
Paper Contribution (cont.)
• Provide numerical results in which 1,000*1,000
matrices are recovered in less than 1 minute.
• The approach is amenable to very large scale problems
by recovering matrices of rank about 10 with nearly a
billion unknown from just 0.4% of their sampled
entries.
• Developed a framework in which one can understand
these algorithms in term of well-known Lagrange
multiplier algorithms.
Algorithm Outline (1)
• Existing algorithms, like SDPT3, can not solve huge
system as being unable to handle large size matrix.
• None of these general purpose solvers use the fact
that the solution may have low rank.
Algorithm Outline (2)
• Single value thresholding algorithm presented by this
paper:
where
is a linear operator acting on the space of n1*n2
matrices and
.
• This algorithm is a simple first-order method, and is especially
well suited for problems of very large sizes in which the solution
has low rank.
Algorithm Outline (3)
• Sketch this algorithm in special matrix completion
setting, the problem can be expressed as:
where
is orthogonal projector onto the span of
matrices vanishing outside of
so that the (i,j)th
component of
is equal to
and
zero otherwise. Also, the
is optimization
variable.
Algorithm Outline (4)
• Fix
and sequence
Then starting with
inductively defines
of scalar step sizes.
, the algorithm
until a stopping criterion is reached. Here, shrink( )
is a nonlinear function which applies a softthresholding rule at level to the singular values of
the input matrix. The key property here is that for
large values of , the sequence
converges to
the solution.
Algorithm Outline (5)
• Hence, at each step, only needs to compute at most
one singular value decomposition and perform a few
elementary matrix additions.
• Two important remarks:
– Sparsity: For each
is sparse, a fact that can be
used to evaluate the shrink function rapidly.
– Low-rank property: The matrices
turn out to be low-rank,
and so the algorithm has minimum storage requirement since
we only need to keep principle factors in memory.
General Formulation
• The singular value thresholding algorithm can be
adapted to deal with other types of convex constraints,
like
where each
is a Lipschitz convex function.
SVT Algorithm Details (1)
• Singular Value Shrinkage operator
Consider SVD of matrix
,
where U and V are respectively n1*r and n2*r
matrices with orthonormal columns and the singular
values are positive. For each
, the softthresholding operator
is defined as follows:
where
is the positive part of t, namely
.
SVT Algorithm Details (2)
• The singular value thresholding operator is the
proximity operator associated with the nuclear norm.
• Firstly, we have:
• Recast the SVT algorithm as a Lagrange multiplier
algorithm known as Uzawas’ algorithm:
 shrinkage iteration, start from
define for k=1,2,…
, inductively
General Formulation Details (1)
• Linear equality constraints
General Formulation Details (2)
• General Convex Constraints
General Formulation Details (3)
• It means that minimizing the proximal objective
is the same as minimizing the nuclear norm in the
limit of large
Convergence Analysis
Implementation (Parameters Design)
• Use Matlab or Fortran based package PROPACK
• Step Size, select as 1.2 times the undersampling ratio.
• Initial Steps
To save work, we could simply skip the first
and start the iteration by computing
from
the is defined obeying
• Stopping criteria, where is a fixed tolerance.
due to
steps
, while
Numerical Results (1)
• Linear Equality Constraints
– 1.86 GHz CPU and 3GB Memory
– Use the stopping criterion
– Relative error of the reconstruction
–
= 5n
• Easy to implement and surprising effective both in
terms of computational cost and storage requirement.
Numerical Results (2)
Numerical Results (3)
• SVT algorithm performs extremely well
• In all of the experiments, it takes fewer than 200 SVT
iterations to reach convergence
• 1,000*1,000 matrix of rank 10 is recovered in less
than a minute
• 30,000*30,000 matrices of rank 10 from about 0.4% of
sampled entries cost about 17 minutes
• High rank matrices are also efficiently completed
• The overall relative errors reported are all less than
Numerical Results (4)
• The rank of
is nondecreasing so that maximum
rank is reached in the final steps.
• The low rank property is crucial for making the
algorithm run fast.
Numerical Results (5)
• Linear inequality constraints
Numerical Results (6)
Numerical Results (7)
• Figure 2 shows that the algorithm behaves just as
well with linear inequality constraints.
• Before reaching the tolerance
, noisless case
takes 150 iterations, while noisy case takes 200
iteraition.
• The rank of iterates is nondecreasing and quickly
reaches the trun rank of the unknown matrix to be
recovered.
• There is no substantial difference for the total running
time of the noiseless case and the nosiy case.
• The recovery of matrix M from undersampled and
noisy entries appears to be accurate as the relative
error can reach 0.0768, with noise ratio of about 0.08
Discussion
• Thanks
Appendix: Notations