Advanced Optimization Techniques for Complex Problems TRACER:ULL - 2003
Download
Report
Transcript Advanced Optimization Techniques for Complex Problems TRACER:ULL - 2003
Advanced Optimization Techniques
for Complex Problems
Técnicas de Optimización Avanzadas para Problemas Complejos
TRACER:ULL - 2003
Barcelona, October 25th, 2003
http://www.tracer.ull.es
TIC2002-04498-C05-05
University of La Laguna
Outline
• Objectives
• Researchers
• Problems
• Branch and Bound and Divide and Conquer Skeletons
Knapsack Problem
Matrix Product
Constrained two-dimensional cutting stock problem
• CALL and LLAC: tools for Complexity Analysis
Symbolic regression Problem
• An analytical model for Pipeline and Master-Slave algorithms over
heterogeneous clusters
Resource allocation problem
Prediction of the RNA Secondary Structure problem
• Results
TRACER::ULL Objectives
•
•
•
The TRACER::ULL main objective is to achieve an efficient resolution of the following complex problems
by developing new optimization procedures:
Constrained two-dimensional cutting stock problem
Symbolic regression problem
Prediction of the RNA secondary structure problem
We propose the design, implementation and evaluation of solving tools using exact techniques:
Divide and Conquer
Branch and Bound
Dynamic Programming
It is an objective to provide sequential, parallel and distributed implementations for academia problems:
Resource allocation problem
Knapsack problem
Matrix Product
•
A second research track is related with the building of a methodology and the associated tool for the
complexity and performance analysis of both sequential and parallel algorithms.
•
Another goal is the implementation of
An Internet execution systems
A Problem repository
Performance Analysis
Web site: http://www.tracer.ull.es
Researchers
• ULL Staff
Coromoto León Hernández
Isabel Dorta González
Branch and Bound
Dynamic Programming
Daniel González Morales
Casiano Rodríguez León
Jesús Alberto González Martínez
• Foreing
Rumen Andonov
• Students
Performance Analysis Tools
and
Symbolic regression problem
Prediction of the RNA
secondary structure
problem
Juan Ramón González González
Gara Miranda Valladares
Divide and Conquer
María Dolores Medina Barroso
Grants
two dimensional
cutting stock problem
Shared Memory Branch and Bound Skeletons
// shared variables {bqueue, bstemp, soltemp, data}
// private variables {auxSol, high, low}
// the initial subproblem is already inserted in the global shared queue
while(!bqueue.empty()) {
nn = bqueue.getNumberOfNodes();
nt = (nn > maxthread)?maxthread:nn;
data = new SubProblem[nt];
for (int j = 0; j < nt; j++)
data[j] = bqueue.remove();
set.num.threads(nt);
parallel forall (i = 0; i < nt; i++) {
high = data[i].upper_bound(pbm,auxSol);
if ( high > bstemp ) {
low = data[i].lower_bound(pbm,auxSol);
if ( low > bstemp ) {
// critical region
// only one thread can change the value at any time
bstemp = low;
soltemp = auxSol;
}
if ( high != low ) {
// critical region
// just one thread can insert subproblems in the queue at any time
data[i].branch(pbm,bqueue);
} } }
}
bestSol = bstemp;
sol = soltemp;
0-1 Knapsack Problem
The 0/1 Knapsack Problem can be stated as follows:
"We have been provided with a knapsack of capacity C and with a
set of N objects; p[k] and w[k] are the profit and weight associated
to object k. Without exceeding the capacity of the knapsack, the
objects must be inserted into the knapsack providing the maximum
profit".
N
max pk xk
k 1
subject
to :
N
w
k xk
k 1
C
xk 0,1
k 1,..., N
Martello, S., Toth, P. : Knapsack Problems Algorithms and Computer
Implementatios. John Wiley & Sons Ltd. (1990)
Comparison between MPI and OpenMP skeletons
KNP No Sol - N = 100,000
7,00
6,00
Speedup
5,00
4,00
MPI
3,00
OpenMP
2,00
1,00
0,00
2
3
4
8
16
24
32
Processors
Origin 3000- CIEMAT
Distributed Branch and Bound skeleton
• Initialization Phase
• Resolution Phase
Conditional Communication
Message Reception
Avoiding starvation
Compute
Best bound Propagation
Work querying
Ending resolution phase
• Solution Building
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Knapsack N = 50.000 ULL
5,00
4,50
4,00
speedup
3,50
ULL 500 Mhz
3,00
ULL 800 Mhz
2,50
ULL 800-500 Mhz
2,00
ULL 1400 Mhz
1,50
1,00
0,50
0,00
0
5
number of processors
10
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Matrix Product
Lets be
A11
A
A21
Definition:
A12
A22
y
B11
B
B21
B12
B22
n
Cij Aik Bkj
k 1
Strassen algorithm:
P1 A12 A22 B21 B22
P2 A11 A22 B11 B22
P3 A11 A21 B11 B12
P4 A11 A12 B22
P5 A11 B12 B22
P6 A22 B21 B11
P7 A21 A22 B11
P1 P2 P4 P6
C
P6 P7
P4 P5
P2 P3 P5 P7
Distributed Divide and Conquer skeleton
Two dimensional cutting stock Problem: User Interface
•
In this problem we are given a
large stock rectangle S of
dimension LxW and n types of
smaller rectangles (pieces) where
the i-th type has dimension lixwi.
Furthermore, each problem is now
to cut off from the large rectangle
a set of small rectangles such
that:
All pieces have a fixed orientation,
i.e., a piece of length l and width w
is different from a piece of length w
and width l (l≠w)
All applied cuts are of guillotine
type, i.e., cut that start form one
edge and run parallel to the other
two edges.
There are at most bi rectangles of
type i in the cutting plane, the
demand constrain of the i-th piece.
The overall profit obtained by
Σi=1ncixi where xi denotes the
number of rectangles of type i in
the cutting patter, is maximized.
Performance: CALL & LLAC
Parallel Architectures
Communication
Network
Memory
Memory
Processor
Processor
...
Standard Libraries
MPI PVM
Memory
Processor
We need a well accepted
Parallel Computing Model
BSP
LogP
......
CALL & LLAC Architecture
Performance: CALL & LLAC
C0 C1 N C2 N 2 C3 N 3
#pragma cll mp mp[0] + mp[1]*N + mp[2]*N*N + mp[3]*N*N*N
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
sum = 0;
for (k = 0; k < N; k++)
sum += A(i,k) * B(k,j);
C(i,j) = sum;
}
}
#pragma cll end mp
Square Matrix Product.
A, B y C of dimension N×N,
Measuring and Predicting Performance
while(!bqueue.empty()) {
auxSp = bqueue.remove(); // pop a problem from the local queue
#pragma cll code numvis++;
high = auxSp.upper_bound(pbm,auxSol);
// upper bound
if ( high > bestSol ) {
low = auxSp.lower_bound(pbm,auxSol);
// lower bound
if ( low > bestSol ) {
bestSol = low;
sol = auxSol;
outputPacket.send(MASTER, SOLVE_TAG, bestSol, sol);
}
if ( high != low ) {
// calculate the number of required slaves
rSlaves = bqueue.getNumberOfNodes();
op.send(MASTER, BnB_TAG, high, rSlaves);
inputPacket.recv(MASTER, nfSlaves, bestSol, rank {1,..., nfSlaves});
if ( nfSlaves >= 0) {
auxSp.branch(pbm,bqueue);
// branch and save in the local queue
for i=0, nfSlaves{
// send subproblems to the assigned slaves
auxSp = bqueue.remove();
#pragma cll code numvis++;
outputPacket.send(rank, PBM_TAG, auxSp, bestSol, sol);
}
} // if nfSlaves == DONE the problem is bounded (cut)
} }
How to compile?
kpr.cll.h
kpr.c
call
kpr.c.dat
kpr
kpr.c.dat.1
......
kpr.c.dat.n
kpr.cll.c
cc
EXPERIMENT: "kps"
BEGIN_LINE: 115
END_LINE: 119
FORMULA: p 0 p 1 v 0 * +
INFORMULA: kps[0]+kps[1]*numvis
MAXTESTS: 131072
DIMENSION: 2
PARAMETERS:
NUMIDENTS: 1
IDENTS: numvis
OBSERVABLES: CLOCK
COMPONENTS: 1 numvis
POSTFIX_COMPONENT_0: 1
POSTFIX_COMPONENT_1: v 0
NUMTESTS: 1
SAMPLE:
CPU NCPUS numvis
CLOCK
0
1 261134.0 0.16491100
kpr
Number of visited Nodes Study
Measuring and Predicting Performance
int main(int argc, char ** argv) {
number sol;
readKnap(data);
#pragma cll code double numvis = 0.0;
#pragma cll kps kps[0]*unknown(numvis) posteriori numvis
/* obj. sig., capacidad rest., beneficio */
sol = knap(
0,
M,
0);
#pragma cll end kps
printf("\nsol = ", sol);
#pragma cll report all
return 0;
}
i i w
Symbolic Regression Problem
• Find the unknown complexity formula starting from the
experimental data gathered by CALL.
• We can use Symbolic Regression: the induction of
mathematical expressions on data. Rather than searching
for the values of the regression constants, The object of
search is a symbolic description of the system.
• See Scientific Discovery using Genetic Programming by
Maarten Keijzer. 2001
http://www.cs.vu.nl/˜mkeijzer/publications/thesis/.
• Currently we use a fitness function that measures the error
of the predictions “on the asymptotic side” using linear
regression on a small sub-sample
Prediction of the RNA Secondary Structure Problem
•
RNA molecule: string of n characters:
R=r1 r2 ... rn
such that ri {A, C, G, U}
•
Nucleotides join to free energy:
AU
GU
CG
The iteration space is n x n triangular
Dependences nonuniform: dependences
among non-consecutive stages
•
•
E(Si+1,j-1) + ( ri, rj ),
E( Si,j ) = min
min { E(Si,k-1) + E(Sk,j) }
i<kj
TRACER::ULL 2003 Results
• Journals:
Authors: Dorta, León, Rodríguez
Title: Comparing MPI and openMP Implementations of the 0-1 Knapsack
Problem
Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803
(Accepted)
Date: 2003
Authors: Blanco V., García L., González J.A., Rodríguez C., Rodríguez G.
Title: A Performance Model for the Analysis of OpenMP Programs
Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803
(Accepted)
Date: 2003
TRACER::ULL 2003 Results
• International Conferences:
Blanco V., González J. A., León C. , Rodríguez C., Rodríguez G. “From Complexity Analysis
to Performance Analysis”. Euro-Par 2003. International Conference on Parallel and
Distributed Computing. Klagenfurt, Austria. 26 - 29 August 2003.
Dorta I., León C., Rodríguez C., Rojas A.”Parallel Skeletons for Divide and Conquer and
Branch and Bound Techniques”. 11th euromicro Conference on Parallel and Network-Based
Processing. ISSN 1066-6192. Genova, Italy. 5-7 February, 2003.
Dorta I., León C., Rodríguez C. “A comparison between MPI and OpenMP Branch-and-Bound
Skeletons”. 8th International Workshop on High-Level Parallel Programming Models and
Supportive Enviroments. ISBN 0-7695-1880-X. Nice, France.
22 April, 2003.
Dorta I., León C., Rodríguez C., Rojas A. “Parallel Skeletons. Branch-and-Bound and Divideand-Conquer Techniques”. TAM User Group Meeting 2003. Barcelona, Spain. 16 May, 2003
Dorta I., León C., Rodríguez C., Rojas A. “MPI and OpenMP implementations of Branch and
Bound Skeletons”. ParCo2003. Dresden, Germany. 2-5 Septiembre, 2003.
Dorta I., León C., Rodríguez C. “Parallel Branch and Bound Skeletons: Message Passing and
Shared Memory Implementtions”. 5th International Conference on Parallel Processing and
Applied Mathematics. Czestochowa, Poland. 7-10 September, 2003.
García L., González J.A., González J.C., León C., Rodríguez C., Rodríguez G. “Complexity
Driven Performance Analysis”. 10th EuroPVM/MPI 2003. Venice, Italy. Sep 29 - Oct 2, 2003.
TRACER::ULL 2003 Results
• National Conferences:
•
Dorta I., León C., Rodríguez C. Rodríguez, G., Rojas A. “Complejidad Algorítmica: de la Teoría a la
Práctica”. JENUI’03 (Jornadas de Enseñanza Universitaria de la Informática). ISBN 84-283-2845-5.
Cádiz. 9-11 Julio, 2003
González J.R., León, C., Rodríguez C., ”Un esqueleto para Ramificación y Acotación Distribuido”. XIV
Jornadas De Paralelismo. Leganés (Madrid). 15-17 septiembre 2003
PFC
González J. R., “Esqueletos Paralelos Distribuidos. Paradigmas de Ramificación y Acotación
y Divide y Vencerás”. Documento de Trabajo Interno del DEIOC: DT-03-07. Julio 2003.