Transcript slides

Would you like some syntactic
sugar with your TBB?
Evan Driscoll
What is TBB?
• C++ library for writing concurrent programs
• Classes for:
–
–
–
–
–
–
Loop-based concurrency
Task-based concurrency
Pipelines
Concurrent data structures
Atomic types
…
TBB good?
• I’m excited
• Integrates “well” with C++
• Large library, many paradigms
– Broader than MapReduce, OpenMP
TBB syntax
• Low-level syntax
– templates, operator overloading, placement new
• High-level syntax
– Tons of boilerplate code
• main.C in parallel_while othello went from 206 to 424 lines
(doing 3 loops), >58 boilerplate
– Separation of code
• Loop bodies lexically moved
TBB syntax
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
parallel_while<LookaheadMoveEvaluator> w;
QueueGenerator g(moves);
LookaheadMoveEvaluator e(nBest, board,
other_color, newdepth);
w.run(g, e);
class QueueGenerator
{
queue<OthelloMove> & _moves;
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
public:
QueueGenerator
(queue<OthelloMove> & moves)
: _moves(moves) { }
bool pop_if_present
(OthelloMove & out_move)
{
if(_moves.empty()) return false;
else {
out_move = _moves.front();
_moves.pop();
return true;
}
}
if( quality > nBest ) {
nBest = quality;
}
}
};
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
class LookaheadMoveEvaluator
{
atomic<int> & _nBest;
OthelloBoard const & _board;
char _otherColor;
int _newDepth;
public:
LookaheadMoveEvaluator
(atomic<int> & nBest,
OthelloBoard const & board,
char otherColor,
int newDepth)
: _nBest(nBest)
, _board(board)
, _otherColor(otherColor)
, _newDepth(newDepth)
{}
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
void operator() (OthelloMove & move) const
{
OthelloBoard b = _board;
b.applyMove(move);
int quality =
Lookahead( b, _otherColor, _newDepth );
int curNBest = _nBest;
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
while(quality > curNBest &&
_nBest.compare_and_swap
(quality, curNBest) != curNBest)
{
curNBest = _nBest;
}
if( quality > nBest ) {
nBest = quality;
}
}
}
typedef OthelloMove argument_type;
};
But these transformations
are all mechanical!
class QueueGenerator
{
queue<OthelloMove> & _moves;
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
public:
QueueGenerator
(queue<OthelloMove> & moves)
: _moves(moves) { }
bool pop_if_present
(OthelloMove & out_move)
{
if(_moves.empty()) return false;
else {
out_move = _moves.front();
_moves.pop();
return true;
}
}
if( quality > nBest ) {
nBest = quality;
}
}
};
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
class LookaheadMoveEvaluator
{
atomic<int> & _nBest;
OthelloBoard const & _board;
char _otherColor;
int _newDepth;
public:
LookaheadMoveEvaluator
(atomic<int> & nBest,
OthelloBoard const & board,
char otherColor,
int newDepth)
: _nBest(nBest)
, _board(board)
, _otherColor(otherColor)
, _newDepth(newDepth)
{}
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
class LookaheadMoveEvaluator
{
atomic<int> & _nBest;
OthelloBoard const & _board;
char _otherColor;
int _newDepth;
public:
LookaheadMoveEvaluator
(atomic<int> & nBest,
OthelloBoard const & board,
char otherColor,
int newDepth)
: _nBest(nBest)
, _board(board)
, _otherColor(otherColor)
, _newDepth(newDepth)
{}
So do this automagically
• Extend the syntax of C++
• Implement a source-to-source
transformation
othello.c
c
tbbetter
gcc
othello.o
So do this automagically
othello.c
c
parsing
tbbetter
type
checking
gcc
othello.o
translation
pretty
printing
New syntax
• tbb_shared: new qualifier
– Concept: variable is shared between threads
– Implementation: a reference is used in the class
New syntax
• concurrent_for(var , start , end [, grainsize])
– var can name an existing variable or declare a new
one; must be compatible with blocked_range
– Iterates over [ start, end )
– If omitted grainsize, uses auto_partitioner()
– Can’t specify different increments
• Current status: parses (and typechecks), but
transformation incomplete
New syntax
• cwhile_iterator type itervar;
concurrent_while(cond) {
cwhile_generator {
genstmts
}
bodystmts
}
– Generates two classes
• Generator (stream) class checks cond; if true, runs genstmts
• Body class runs bodystmts
– Communicate via a cwhile_iterator
• genstmts should write to itervar
• bodystmts should read from itervar
New syntax
• cwhile_iterator type itervar;
concurrent_while(cond) {
cwhile_generator {
genstmts
}
bodystmts
}
– Current status: implemented
• Only minor restrictions: no nested concurrent_while, types of
variables used restricted
C++ elision
• #define:
– concurrent_while
– cwhile_iterator
– tbb_shared
-> while
->
->
– concurrent_for(var, start, end, …)
-> for(var=start, var != end, ++var)
• But doesn’t work if var declares a variable
Othello revisited
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
concurrent_while( moves.size() > 0 ) {
cwhile_generator {
move = moves.front();
moves.pop();
}
/* Recurse */
quality =
Lookahead( b, other_color, newdepth );
OthelloBoard b = board;
b.applyMove(move);
if( quality > nBest ) {
nBest = quality;
}
int quality =
Lookahead( b, other_color, newdepth );
}
}
if( quality > nBest ) {
tbb::spin_mutex::scoped_lock
l ((tbb::spin_mutex&)lock);
if(quality > nBest) {
nBest = quality;
} }
What changed?
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
What changed?
while( moves.size() ) {
b = board;
b.applyMove( moves.front() );
moves.pop();
quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
What changed?
while( moves.size() ) {
b = board;
OthelloMove move = moves.front();
b.applyMove(move);
moves.pop();
quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
What changed?
while( moves.size() ) {
b = board;
OthelloMove move = moves.front();
b.applyMove(move);
moves.pop();
quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
What changed?
while( moves.size() ) {
OthelloMove move = moves.front();
moves.pop();
b = board;
b.applyMove(move);
quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
nBest = quality;
}
}
What changed?
while( moves.size() ) {
OthelloMove move = moves.front();
moves.pop();
OthelloBoard b = board;
b.applyMove(move);
int quality = Lookahead( b, other_color, newdepth );
if(quality > nBest) {
nBest = quality;
}
}
What changed?
concurrent_while( moves.size() ) {
cwhile_generator {
OthelloMove move = moves.front();
moves.pop();
}
OthelloBoard b = board;
b.applyMove(move);
int quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
tbb::spin_mutex::scoped_lock l((tbb::spin_mutex&)lock);
if(quality > nBest) {
nBest = quality;
}
}
}
Done!
concurrent_while( moves.size() ) {
cwhile_generator {
OthelloMove move = moves.front();
moves.pop();
}
OthelloBoard b = board;
b.applyMove(move);
int quality = Lookahead( b, other_color, newdepth );
if( quality > nBest ) {
tbb::spin_mutex::scoped_lock l((tbb::spin_mutex&)lock);
if(quality > nBest) {
nBest = quality;
}
}
}
Anything else?
• #includes for TBB headers
• Removed declaration for OthelloBoard b and int
quality
• Added cwhile_iterator to definition of move
• Added tbb_shared to definition of nBest
• Added a declaration for the lock
• Added another declaration of OthelloBoard b in
another branch
• Created a task_scheduler_init object in main
Anything else?
• Changed CC=g++ and LD=g++ to use tbbetter
driver
• Added a –I and –L flag to the TBB directories
• Added –ltbb to the linker flags
Interesting point: constness
• Loop-carried dependencies wrong
• This translates to not modifying thread-local
variables
– Any not marked tbb_shared!
• Which iteration did the value come from?
– Also applies to reading the value of a variable
changed in the loop
• TBB enforces: operator() is const
Performance (othello, lookahead 7)
6
250
5
Execution time (seconds)
200
Speedup
4
3
2
150
100
50
1
0
0
Serial
Elision
TBB-1
Hand-8
TBB-8
Handauto
TBB-auto
Serial
Elision
TBB-1
Hand-8
TBB-8
Handauto
TBBauto
Wrapping up
• Got some done… more to do
• Goals if this were a longer-term project (or if I hadn’t
procrastinated like mad):
– Finish concurrent_for
– Find a translation for parallel_reduce
– Make cilk code compile
• Goals for a serious project
– Engineering on the parser:
• Bug squashing (still 7 or so outstanding; 5 “patched” by sed)
• Integrate with other compilers
– Improve edge cases
• “If you can do 80% of the job with 50% of the work, that is the right
way”
Questions?
Alternate syntax
• Alternate names:
– cwhile_generator?!
– Really easy to change
• Updating test cases probably longer process
• tbb_shared implies volatile?
– Seems useful, but already broke
Alternate syntax
• concurrent_while(iterator, cond) { }
– Explicit as to what iterator is being used
• Currently: you must use exactly one variable declared with
cwhile_iterator inside the body of each concurrent_while
• concurrent_while(…) {
cwhile_generator { … }
cwhile_body { … }
}
What really happens
othello.c
c
cc1plus -E
cc1plus -E
cc1plus
modified
elsa
as
sed
othello.o