Concurrency and Parallelism in Haskell

Download Report

Transcript Concurrency and Parallelism in Haskell

Semi-Explicit Parallel
Programming in Haskell
Satnam Singh
Microsoft Research Cambridge
Leeds2009
0
1
19
0
1
9
19
public class ArraySummer
{
private double[] a; // Encapsulated array
private double sum; // Variable used to compute sum
// Constructor requiring an initial value for array
public ArraySummer(double[] values)
{
a = values;
}
// Method to compute the sum of segment of the array
public void SumArray(int fromIndex, int toIndex,
out double arraySum)
{
sum = 0;
for (int i = fromIndex; i < toIndex; i++)
sum = sum + a[i];
arraySum = sum;
}
}
thread 1
ThreadCreate
thread.Start
thread 2
thread.Join
class Program
{
static void Main(string[] args)
{
const int testSize = 100000000;
double[] testValues = new double[testSize] ;
for (int i = 0; i < testSize; i++)
testValues[i] = i/testSize;
ArraySummer summer = new ArraySummer(testValues) ;
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
double testSum ;
summer.SumArray(0, testSize, out testSum);
TimeSpan ts = stopWatch.Elapsed;
Console.WriteLine("Sum duration (mili-seconds) = " +
stopWatch.ElapsedMilliseconds);
Console.WriteLine("Sum value = " + testSum);
Console.ReadKey();
}
}
}
class Program
{
static void Main(string[] args)
{
const int testSize = 100000000;
double[] testValues = new double[testSize];
for (int i = 0; i < testSize; i++)
testValues[i] = i / testSize;
ArraySummer summer = new ArraySummer(testValues);
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
double testSumA = 0 ;
double testSumB;
Thread sumThread = new Thread(delegate()
{ summer.SumArray(0, testSize / 2,
out testSumA); });
sumThread.Start();
summer.SumArray(testSize/2+1, testSize, out testSumB);
sumThread.Join();
TimeSpan ts = stopWatch.Elapsed;
Console.WriteLine("Sum duration (mili-seconds) = " +
stopWatch.ElapsedMilliseconds);
Console.WriteLine("Sum value = " + (testSumA+testSumB));
Console.ReadKey();
}
}
The Accidental Semi-colon
A;
B;
A
B
createThread (A) ;
B;
A
B
Execution Model
“Thunk”
for “fib 10”
Pointer to the
1 for free
implementation 1 Values
variables
3
8
6
8
10
8
5
5
Storage slot for
9
the result
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
wombat and numbat
wombat :: Int -> Int
wombat n = 42*n
pure function
side-effecting
function
numbat :: Int -> IO Int
numbat n
= do c <- getChar
return (n + ord c)
Computation
inside a ‘monad’
IO (), pronounced “IO unit”
numbat :: IO ()
numbat
= do c <- getChar
putChar (chr (1 + ord c))
f (g + h) z!!2
mapM f [a, b, ...
infer type
[Int] -> Bool
pure function
deterministic
IO String
stateful operation
may be non-deterministic
, g]
Functional Programming to the Rescue?
• Why not evaluate every-sub expression of
our pure functional programs in parallel?
– execute each sub-expression in its own
thread?
• The 80s dream does not work:
– granularity
– data-dependency
Infix Operators
• mod a b
mod 7 3 = 1
• Infix with backquotes:
a `mod` b
7 `mod` 3 = 1
x `par` y
• x is sparked for speculative evaluation
• a spark can potentially be instantiated on a
thread running in parallel with the parent
thread
• x `par` y = y
• typically x used inside y
• blurRows `par` (mix blurCols blurRows)
x `par` (y + x)
y is evaluated first
y
x is evaluated second
x is sparked
x fizzles
x
x
x `par` (y + x)
P1
P2
y is evaluated on P1
y
x is taken up for evaluation on P2
x
x is sparked on P1
x
par is Not Enough
• pseq :: a -> b -> b
• pseq is strict in its first argument but not in
its second argument
• Related function:
–
–
–
–
seq :: a -> b -> b
Strict in both arguments
Compiler may transform seq x y to seq y x
No good for controlling order for evaluation for
parallel programs
Don Stewart Parallel fib with threshold
cutoff = 35 -- Threshold for parallel evaluation
-- Sequential fib
fib' :: Int -> Integer
fib' 0 = 0
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)
-- Parallel fib with thresholding
fib :: Int -> Integer
fib n | n < cutoff = fib' n
| otherwise = r `par` (l `pseq` l + r)
where
l = fib (n-1)
r = fib (n-2)
-- Main program
main = forM_ [0..45] $ \i ->
printf "n=%d => %d\n" i (fib i)
Parallel fib performance
parallel fib from 1 to 8 cores (2X Intel quad core)
Speedup over 1 core
7
6
5
4
3
parfib
2
1
0
1
2
3
4
5
Number of cores
6
7
8
Parallel quicksort (wrong)
quicksortN
quicksortN
quicksortN
quicksortN
= losort
where
losort
hisort
:: (Ord a) => [a] -> [a]
[] = []
[x] = [x]
(x:xs)
`par` hisort `par` losort ++ (x:hisort)
= quicksortN [y|y <- xs, y < x]
= quicksortN [y|y <- xs, y >= x]
What went wrong?
losort
Unevaluated
thunk
cons
cell
Unevaluated
thunk
forceList
forceList :: [a] -> ()
forceList [] = ()
forceList (x:xs) = x `seq` forceList xs
Parallel quicksort (right)
quicksortF [] = []
quicksortF [x] = [x]
quicksortF (x:xs)
= (forceList losort) `par`
(forceList hisort) `par`
losort ++ (x:hisort)
where
losort = quicksortF [y|y <- xs, y < x]
hisort = quicksortF [y|y <- xs, y >= x]
parSumArray :: Array Int Double -> Double
parSumArray matrix
= lhs `par` (rhs`pseq` lhs + rhs)
where
lhs = seqSum 0 (nrValues `div` 2) matrix
rhs = seqSum (nrValues `div` 2 + 1)
(nrValues-1) matrix
Strategies
• Haskell provides a collection of evaluation
strategies for controlling the evaluation order
of various data-types.
• Users have to define indicate how their own
types are evaluated to a normal form.
• Algorithms + Strategy = Parallelism, P. W.
Trinder, K. Hammond, H.-W. Loidl and S. L.
Peyton Jones.
• http://www.macs.hw.ac.uk/~dsg/gph/papers/h
tml/Strategies/strategies.html
Explicitly Creating Threads
• forkIO :: IO () -> ThreadID
• Creates a lightweight Haskell thread, not
an operating system thread.
Inter-thread Communication
• putMVar :: MVar a -> IO ()
• takeMVar :: MVar a -> IO a
MVars
empty
52
mv
...
putMVar mv 52
...
...
...
...
v <- takeMVar mv
...
Rendezvous
threadA :: MVar Int -> MVar Float -> IO ()
threadA valueToSendMVar valueReceivedMVar
= do -- some work
-- new perform rendezvous by sending 72
putMVar valueToSendMVar 72 -- send value
v <- takeMVar valueToReadMVar
putStrLn (show v)
Rendezvous
threadB :: MVar Int -> MVar Float -> IO ()
threadB valueToReceiveMVar valueToSendMVar
= do -- some work
-- now perform rendezvous by waiting on value
z <- takeMVar valueToReceiveMVar
putMVar valueToSendMVar (1.2 * z)
-- continue with other work
Rendezvous
main :: IO ()
main
= do aMVar <- newEmptyMVar
bMVar <- newEmptyMVar
forkIO (threadA aMVar bMVar)
forkIO (threadB aMVar bMVar)
threadDelay 1000 -- BAD!
fib again
fib :: Int -> Int
-- As before
fibThread :: Int -> MVar Int -> IO ()
fibThread n resultMVar
= putMVar resultMVar (fib n)
sumEuler :: Int -> Int
-- As before
fib fixed
fibThread :: Int -> MVar Int -> IO ()
fibThread n resultMVar
= do pseq f (return ())
putMVar resultMVar f
where
f = fib n
$ time fibForkIO +RTS -N1
real
user
sys
0m40.473s
0m0.000s
0m0.031s
$ time fibForkIO +RTS -N2
real
user
sys
0m38.580s
0m0.000s
0m0.015s
“STM”s in Haskell
data STM a
instance Monad STM
-- Monads support "do" notation and sequencing
-- Exceptions
throw :: Exception -> STM a
catch :: STM a -> (Exception->STM a) -> STM a
-- Running STM computations
atomically :: STM a -> IO a
retry :: STM a
orElse :: STM a -> STM a -> STM a
-- Transactional variables
data TVar a
newTVar :: a -> STM (TVar a)
readTVar :: TVar a -> STM a
writeTVar :: TVar a -> a -> STM ()
43
Transactional Memory
Q1
Q2
void GetEither() {
atomic {
do { i = Q1.Get(); }
orelse { i = Q2.Get(); }
R.Put( i );
}
}
R
• do {...this...} orelse {...that...} tries to run “this”
• If “this” retries, it runs “that” instead
• If both retry, the do-block retries. GetEither() will thereby
wait for there to be an item in either queue
ThreadScope
• GHC run-time can generate eventlogs.
• Instrument:
– thread creating, start/stop, migration
– GCs
• ThreadScope graphical viewer
• Q: how to mine / understand the
information?
Lots Unsaid
•
•
•
•
xperf / VTune correlation
Verification
Debugging
Parallel garbage collection
Summary
• Three ways of writing parallel and concurrent
programs in Haskell:
– `par` and `pseq` (semi-explicit parallelism)
– Mvars (explicit concurrency)
– STM (explicit concurrency with transactions)
• Implicit concurrency
• Pure functional programming has pros and cons for
parallel programming.
• Can mainstream languages take advantage of the
same techniques?
• How can visualization help with performance tuning?