Parallel Computation for SDPs Focusing on the Sparsity of
Download
Report
Transcript Parallel Computation for SDPs Focusing on the Sparsity of
This talk is supported by Ewha University
High Performance Solvers for
Semidefinite Programs
Makoto Yamashita @ Tokyo Tech
Katsuki Fujisawa
@ Chuo Univ
Mituhiro Fukuda
@ Tokyo Tech
Kazuhiro Kobayashi @ NMRI
Kazuhide Nakata
@ Tokyo Tech
Maho Nakata
@ RIKEN
KSIAM Annual Meeting @ Jeju 2011/11/25
(2011/11/25-2011/11/26)
Our interests & SDPA Family
How fast can we solve SDPs?
How large SDP can we solve?
How accurate can we solve SDPs?
Base solver
SDPARA
Parallel
SDPA
SDPA-M
Matlab
SDPA-C
SDPA-GMP
SDPARA-C
Strucutural Sparsity
Multiple precision
SDPA Homepage http://sdpa.sf.net/
KSIAM 2011 @ Jeju
2
SDPA Online Solver
http://sdpa.sf.net/ ⇒ Online Solver
1. Log-in the online
solver
2. Upload your
problem
3. Push ’Execute’
button
4. Receive the result
via Web/Mail
KSIAM 2011 @ Jeju
3
Outline
1.
2.
3.
4.
5.
SDP Applications
Primal-Dual Interior-Point Methods
Inside of SDPARA (Large & Fast)
Inside of SDPA-GMP (Accurate)
Conclusion
SDP Applications
Control Theory
Quantum Chemistry
Sensor Network Localization Problem
Polynomial Optimization
KSIAM 2011 @ Jeju
5
SDP Applications
1.Control theory
Against swing,
we want to keep
stability.
Stability Condition
⇒ Lyapnov Condition
⇒ SDP
INFOMRS 2011 @ Charlotte
6
SDP Applications
2. Quantum Chemistry
Ground state energy
Locate electrons
Schrodinger Equation
⇒Reduced Density Matrix
⇒SDP
INFOMRS 2011 @ Charlotte
7
SDP Applications
3. Sensor Network Localization
Distance
Information
⇒Sensor
Locations
Protein
Structure
INFOMRS 2011 @ Charlotte
8
SDP Applications
4. Polynomial Optimization
For example,
min : Polynomial s.t. Polynomial constraints
n 1
min : f ( x) (1 xi ) 2 100( xi 1 xi2 ) 2 , x R n
i 1
NP-hard in general
Very good lower bound
by SDP relaxation method
KSIAM 2011 @ Jeju
9
SDP Applications
Control Theory
Quantum Chemistry
Polynomial Optimization
Sensor Network Localization Problem
Many Applications
How Large & How Fast & How Accurate
KSIAM 2011 @ Jeju
10
Standard form
( P)
CX
min
s.t.
Ak X bk (k 1, , m),
X O
m
b z
max
k 1
( D)
m
s.t.
A z
k 1
k k
k k
Y C , Y O
n
n
m
The variables are X , Y , z nS , S , R
Inner Product is X Y X ijYij
i , j 1
The size is roughly determined by
m the number of equality constraint s in ( P)
n
the size of X and Y
KSIAM 2011 @ Jeju
Ordinal solver
m 10,000
Our target
m 30,000
11
Primal-Dual Interior-Point Methods
Central Path
X
X , Y , z
1
Optimal
X
*
, Y * , z*
1
1
Target
(dX , dY , dz )
X
2
,Y 2 , z 2
KSIAM 2011 @ Jeju
0
,Y 0 , z0
Feasible region
X , Y , z S n , S n , R m
12
Schur Complement Matrix
Schur Complement Equation
Bdz
r
m
dY D A j dz j
j 1
dX R XdY Y 1 , dX dX dX T / 2
where
Bij XAiY
Schur Complement Matrix
1
A
j
1. ELEMENTS (Evaluation of SCM)
2. CHOLESKY (Cholesky factorization of SCM)
KSIAM 2011 @ Jeju
13
Computation time on single processor
Time unit is second, SDPA 7, Xeon 5460 (3.16GHz)
Control distribution
POP
Row-wise
ELEMENTS
22228
668
CHOLESKY
1593
1992
Total
95%
Two-dimensional
distribution
23986block-cyclic2713
SDPARA replaces these bottleneks by
parallel computation
KSIAM 2011 @ Jeju
14
Row-wise distribution
Bij XAiY 1 Aj
Example
BS
88
Processor1
All rows are
independent
Assign processors
in a cyclic manner
Processor2
Processor3
Processor4
Processor1
Simple idea
⇒Very EFFICIENT
High scalability
KSIAM 2011 @ Jeju
B
Processor2
Processor3
Processor4
15
Block Algorithm
for Cholesky factorization
Triangular Factorization
B U U
T
B11
T
B12
(U: upper triangular matrix)
T
B12 U11 U12 U11 U12 U11T U11
T
T
B22 O U 22 O U 22 U11U12
B11 Sp , B22 Sm p
(e.g. p 4)
1. B11 U11T U11
2. U12 U
T 1
11
T
T
U12U12 U 22U 22
U11T U12
B12
3. B22 B22 U12T U12
Small Cholesky factorizaton
Block Updates
Parallel
Computing
Two-dimensional block-cyclic
distribution
Example
Scalapack library
Processor1
1
1
2
2
1
1
2
2
Processor2
1
1
2
2
1
1
2
2
3
3
4
4
3
3
4
4
3
3
4
4
3
3
4
4
1
1
2
2
1
1
2
2
1
1
2
2
1
1
2
2
3
4
4
3
3
4
4
3
4
4
3
3
4
4
From the row-wise
Processor3
to TDBCD requires
network Processor4
Processor1
communication
Processor2
Cholesky on
TDBCD
is much faster
than
Processor3
the on row-wise
Processor4
B
BS
88
3
3
KSIAM 2011 @ Jeju
B
17
Numerical Results of SDPARA
Quantum Chemistry (m=7230, SCM=100%), middle size
SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory
100000
10000
29700
28678
7764
Second
7192
1000
2294
1826
548
131
100
47
ELEMENTS
CHOLESKY
Total
ELEMENTS 15x speedup
CHOLESKY 12x speedup
Total
13x speedup
10
1
4
16
Servers
KSIAM 2011 @ Jeju
Very FAST!!
18
Acceleration by Multiple Threading
Modern Processors
have multi-cores
Multiple Threading is
becoming common
Processor1:Thread1
Processor2:Thread1
Processor1:Thread2
Processor2:Thread2
Processor1:Thread1
2 Processors
x2 Threads on each processor
B
Processor2:Thread1
Processor1:Thread2
Processor2:Thread2
Two-level Parallel Computing
KSIAM 2011 @ Jeju
19
Comparison with PCSDP
developed by Ivanov & de Klerk
SDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%)
Xeon X5460, 3.16GHz x2 (8core), 48GB memory
Time unit is second
Servers
PCSDP
SDPARA
1
2
4
53,768 27,854 14,273
5983
3002
1680
8
16
7995
4050
901
565
SDPARA is 8x faster by MPI & Multi-Threading
(Two-level parallization)
KSIAM 2011 @ Jeju
20
Extremely Large-Scale SDPs
Other solvers can handle only m 30,000
m
Esc32_b(QAP)
SCM
198,432 100%
time
129,186
second
(1.5days)
16 Servers [Xeon X5670(2.93GHz) , 128GB Memory]
The LARGEST solved SDP
in the world
KSIAM 2011 @ Jeju
21
Numerical Accuracy
One weakpoint of PDIPM
X *Y * O, lim ( X k , Y k ) ( X * , Y * )
.( X * , Y * , z* ) optimal
k
PDIPM requires ( X k ) 1 & (Y k ) 1
for example,
Bij XAiY 1 Aj
Eventually, numerical trouble
(often, Cholesky fails)
KSIAM 2011 @ Jeju
22
Numerical Precision
Ordinal double precision in C or C++
a
b
c
64bit = 1bit(sign) + 11bit(exponent)+53bit(fraction);
accuracy =
10 16
1a 2b 1 c
arbitrary precision in GMP library
a
b
c
We can arbitrary set the bit number of fraction part.
(for example, 200bit = 10 53 )
Replace BLAS(Basic Linear Algebra Sytems)
SDPA-GMP
by MPLAPACK (Multiple precision LAPACK)
Numerically Hard problem
Test Problem
min : C X
s.t.
eeT X , ei ei X 1(i 1,, n), X O
T
PDIPM is stable if Slater’s condition
X O s.t. Ak X bk (k 1,, m)
Graph Partition Problem 0
has no interior
X : ee
T
X 0, ei eiT X 1, X O
Small ⇒ Numerically Hard
KSIAM 2011 @ Jeju
24
Numerical Results of SDPA-GMP
Small ⇒ Numerically Hard
1.0e-1
1.0e-15
0
Solver
SDPA
SDPA-GMP
SDPA
SDPA-GMP
SDPA
SDPA-GMP
Accuracy Time(second)
1.08e-8
2.03
4.80e-48
77760.19
1.63e-7
2.26
2.97e-48
82115.52
5.26e-9
2.36
7.29e-24 105325.74
24digits for even no-interior case
SDPA-GMP uses 300 digits
KSIAM 2011 @ Jeju
25
Conclusion
SDPARA ⇒ How Fast & How Large
100times & m 200,000
SDPA-GMP ⇒ How Accurate 10 48
http://sdpa.sf.net/ & Online solver
Thank you very much for your attention.
KSIAM 2011 @ Jeju
26