PPT - Parallel Programming Laboratory

Download Report

Transcript PPT - Parallel Programming Laboratory

Charm++
Data-driven Objects
L. V. Kale
Parallel Programming
• Decomposition
– what to do in parallel
• Mapping:
– Which processor does each task
• Scheduling (sequencing)
– On each processor
• Machine dependent expression
– Express the above decisions for the particular parallel machine
The parallel objects model of Charm++ automates
Mapping, Scheduling, and machine dependent expression
Shared objects model:
• Basic philosophy:
– Let the programmer decide what to do in parallel
– Let the system handle the rest:
• Which processor executes what, and when
• With some override control to the programmer, when
needed
• Basic model:
– The program is set of communicating objects
– Objects only know about other objects (not processors)
– System maps objects to processors
• And may remap the objects for load balancing etc.
dynamically
• Shared objects, not shared memory
– in-between “shared nothing” message passing, and “shared everything” of SAS
Charm++
• Charm++ programs specify parallel computations
consisting of a number of “objects”
– How do they communicate?
• By invoking methods on each other, typically asynchronously
• Also by sharing data using “specifically shared variables”
– What kinds of objects?
• Chares: singleton objects
• Chare arrays: generalized collections of objects
• Advanced: Chare group (Used by library writers, system)
Data Driven Execution in Charm++
Objects
Scheduler
Scheduler
Message Q
Message Q
Need for Proxies
• Consider:
– Object x of class A wants to invoke method f of obj y of class B.
– x and y are on different processors
– what should the syntax be?
• y->f( …)? : doesn’t work because y is not a local pointer
• Needed:
–
–
–
–
Instead of “y” we must use an ID that is valid across processors
Method Invocation should use this ID
Some part of the system must pack the parameters and send them
Some part of the system on the remote processor must invoke the
right method on the right object with the parameters supplied
Charm++ solution: proxy classes
• Classes with remotely invokeable methods
– inherit from “chare” class (system defined)
– entry methods can only have one parameter: a subclass of message
• For each chare class D
–
–
–
–
which has methods that we want to remotely invoke
The system will automatically generate a proxy class Cproxy_D
Proxy objects know where the real object is
Methods invoked on this class simply put the data in an “envelope”
and send it out to the destination
• Each chare object has a proxy
– CProxy_D thisProxy; // thisProxy inherited from “CBase_D”
– Also you can get a proxy for a chare when you create it:
• CProxy_D myNewChare = CProxy_D::ckNew(arg);
Chare creation and method invocation
CProxy_D x = CProxy_D::ckNew(25);
Sequential equivalent:
x.f(5,7);
y = new D(25);
y->f(5,7);
Chares (Data driven Objects)
• Regular C++ classes,
– with some methods designated as remotely invokable
(called entry methods )
• Creation: of an instance of chare class C
– CProxy_C myChareProxy = CProxy_C::ckNew(args);
– Creates an instance of C on a specified processor “pe”
• CProxy_C::ckNew (args, pe);
– Cproxy_C: a proxy class generated by Charm for chare
class C declared by the user
Remote method invocation
• Proxy Classes:
– For each chare class C, the system generates a proxy class.
• (C : CProxy_C)
–
–
–
–
–
Global: in the sense of being valid on all processors
thisProxy (analogous to this) gets you your own proxy
You can send proxies in messages
Given a proxy p, you can invoke methods:
p.method(msg);
Execution
begins here
argc/argv
Exit the
program
CProxy_main mainProxy;
main::main(CkArgMsg * m)
{
int i = 0;
for (i=0; i<100; i++)
new CProxy_piPart();
responders = 100;
count = 0;
mainProxy = thisProxy; // readonly initialization
}
void main::results(int pcount)
{
count += pcount;
if (0 == --responders) {
cout << "pi=: “ << 4.0*count/100000 << endl;
CkExit();
}
}
piPart::piPart()
{
// declarations..
srand48((long) this);
mySamples = 100000/100;
for (i= 0; i<= mySamples; i++) {
x = drand48();
y = drand48();
if ((x*x + y*y) <= 1.0) localCount++;
}
mainProxy.results(localCount);
delete this;
}
Generation of proxy classes
• How does charm generate the proxy classes?
–
–
–
–
Needs help from the programmer
name classes and methods that can be remotely invoked
declare this in a special “charm interface” file (pgm.ci)
Include the generated code in your program
pgm.ci
mainmodule PiMod {
mainchare main {
entry main();
entry results(int pc);
};
chare piPart {
entry piPart(void);
};
pgm.h
Generates
#include “PiMod.decl.h”
PiMod.def.h
..
PiMod.def.h
Pgm.c
…
#include “PiMod.def.h”
Charm++
•
•
•
•
•
•
Data Driven Objects
Message classes
Asynchronous method invocation
Prioritized scheduling
Object Arrays
Object Groups:
– global object with a “representative” on each PE
• Information sharing abstractions
– readonly data
– accumulators
– distributed tables
Object Arrays
• A collection of chares,
– with a single global name for the collection, and
– each member addressed by an index
– Mapping of element objects to processors handled by the system
User’s view
A[0] A[1] A[2] A[3]
A[..]
System
view
A[0]
A[3]
Introduction
• Elements are parallel objects like chares
• Elements are indexed by a user-defined data type-[sparse] 1D, 2D, 3D, tree, ...
• Send messages to index, receive messages at element.
Reductions and broadcasts across the array
• Dynamic insertion, deletion, migration-- and everything
still has to work!
• Interfaces with automatic load balancer.
1D Declare & Use
module m {
array [1D] Hello {
entry Hello(void);
entry void SayHi(int
HiData);
};
};
In the interface
(.ci) file
In the .C file
//Create an array of Hello’s with 4
elements:
int nElements=4;
CProxy_Hello p =
CProxy_Hello::ckNew(nElements);
//Have element 2 say “hi”
1D Definition
class Hello:public
CBase_Hello{
public:
Hello(void) {
Inherited from
… thisProxy … ArrayElement1D
… thisIndex …
}
void SayHi(int m) {
if (m <1000)
thisProxy[thisIndex+1].SayHi(
m+1);
}
3D Declare & Use
module m {
array [3D] Hello {
entry
Hello(void);
entry void
SayHi(int HiData);
};
};
CProxy_Hello p=
CProxy_Hello::ckNew();
for (int i=0;i<800000;i++)
p(x(i),y(i),z(i)).insert();
p.doneInserting();
p(12,23,7).SayHi( 34);
3D Definition
class Hello:public CBase_Hello{
public:
Hello(void) {
... thisProxy ...
... thisIndex.x,
thisIndex.y, thisIndex.z ...
}
void SayHi(int HiData) { ...
}
Hello(CkMigrateMessage *m) {}
Pup Routine
void pup(PUP::er &p) {
// Call our superclass’s pup
routine:
ArrayElement3D::pup(p);
p|myVar1;p|myVar2; ...
}
Generalized “arrays”: Declare & Use
module m{
array [Foo] Hello {
entry
Hello(void);
entry void
SayHi(int data);
};
};
CProxy_Hello p=
CProxy_Hello::ckNew();
for (...)
p[CkArrayIndexFoo(..)].inse
General Definition
class CkArrayIndexFoo:
public CkArrayIndex
{ Bar b; //char b[8]; float b[2];..
public:
CkArrayIndexFoo(...) {...
nInts=sizeof(b)/sizeof(int);
}
};
class Hello:public CBase_Hello
{ public:
Hello(void) {
... thisIndex ...
Collective ops
Broadcast message SayHi:
p.SayHi(data);
Reduce x across all elements:
contribute(sizeof(x),&x,CkReduction::
sum_int,cb);
Where do reduction results go?
To a “callback” function, named cb above:
// Call some function foo with fooData
when done:
CkCallback cb(foo,fooData);
Migration support
Delete element i:
p[i].destroy();
Migrate to processor destPe:
migrateMe(destPe);
Enable load balancer:
by creating a load balancing object
Provide pack/unpack functions:
Each object that needs this, provides a
“pup” method. (pup is a single
abstraction that allows data traversal
for determining size, packing and
unpacking)
Object Groups
• A group of objects (chares)
– with exactly one representative on each processor
– A single proxy for the group as a whole
– invoke methods in a branch (asynchronously), all branches
(broadcast), or in the local branch
– creation:
• agroup = Cproxy_C::ckNew(msg)
– remote invocation:
• p.methodName(msg); // p.methodName(msg, peNum);
• p.ckLocalBranch()->f(….);
Information sharing abstractions
• Observation:
– Information is shared in several specific modes in parallel
programs
• Other models support only a limited sets of modes:
– Shared memory: everything is shared: sledgehammer approach
– Message passing: messages are the only method
• Charm++: identifies and supports several modes
–
–
–
–
Readonly / writeonce
Tables (hash tables)
accumulators
Monotonic variables
Compiling Charm++ programs
• Need to define an interface specification file
– mod.ci for each module mod
– Contains declarations that the system uses to produce proxy
classes
– These produced classes must be included in your mod.C file
– See examples provided on the class web site.
• More information:
– Manuals, example programs, papers
• http://charm.cs.uiuc.edu/
• These slides are currently at:
– http://charm.cs.uiuc.edu/presentations/charmTutorial/
Fortran 90 version
• Quick implementation on top of Charm++
• How to use:
– follow example program, with the same basic concepts
– Only use object arrays, for now
• Most useful construct
• Object groups can be implemented in C++, if needed
Further Reading
• More information:
– Manuals, example programs, papers
• http://charm.cs.uiuc.edu
• These slides are currently at:
– http://charm.cs.uiuc.edu/kale/cse320