POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search & Transfer.

Download Report

Transcript POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search & Transfer.

POSH
Python Object Sharing
Steffen Viken Valvåg
In collaboration with
Kjetil Jacobsen &
Åge Kvalnes
University of Tromsø,
Norway
Sponsored by
Fast Search & Transfer
Python Execution Model
Test.py
for x in "TEST PROGRAM":
if x not in "FORGET IT":
print x,
Byte-code compilation
Output
SPAM
Interpretation
Test.pyc
0
3
6
7
10
13
16
19
22
25
26
29
30
33
34
37
SETUP_LOOP
LOAD_CONST ('TEST PROGRAM')
GET_ITER
FOR_ITER
STORE_FAST (x)
LOAD_FAST (x)
LOAD_CONST ('FORGET IT')
COMPARE_OP (not in)
JUMP_IF_FALSE (to 33)
POP_TOP
LOAD_FAST (x)
PRINT_ITEM
JUMP_FORWARD (to 34)
POP_TOP
JUMP_ABSOLUTE
POP_BLOCK
Python Threading Model
Thread A
Thread B
GIL


Byte
codes
Each thread executes a separate sequence of byte codes
All threads must contend for one global interpreter lock
Example: Matrix Multiplication


Performs a matrix
multiplication A = B x C
The work is split between
several worker threads
The application runs on a
machine with 8 CPUs
500
400
Time

300
Ideal
Threads
200
100
0

Threads do not scale for
multiple CPUs due to lock
contention on the GIL
12345678
Number of workers
Workaround: Processes
Process A
Process B
IPC


Each process has its own interpreter lock
Requires inter-process communication, using e.g.
message passing by means of pipes
Matrix Multiplication using
Processes




A master process distributes the input matrices
to a set of worker processes
Each worker process computes some part of
the output matrix, and returns its result to the
master
The master process assembles the final result
matrix
More communication, and more complex
pattern than using threads
Ways Ahead




Communication through standard Python
container objects favors threads
Scalability on multiprocessor architectures
favors processes
The GIL is here to stay, so making threads
scale better is hard
However, there might be room for improvement
of inter-process communication mechanisms
Using Shared Memory for IPC
Process B
Process A
Shared Memory


Processes communicate by accessing a
shared memory region
Requires explicit synchronization and data
marshalling, imposes a flat data structure
Using POSH for IPC
Process B
Process A
Shared Memory
X.method1()
X
L
L.extend([X, Y])
Y



Allocates regular Python objects in shared memory
Shared objects are accessed transparently through
regular method calls
IPC is done by modifying shared, mutable objects
Complications



Processes must synchronize their access to
shared, mutable objects (just like threads)
Explicit synchronization of critical regions must
be possible, while implicit synchronization upon
accessing shared objects is desireable
Python’s regular garbage collection algorithm
is inadequate for shared objects, which may be
referenced by multiple processes
Proxy Objects
X.method1()
X.method2()
X
return value
return value
Proxy Object


Shared Object
Provides transparent access to a shared
object by forwarding all attribute accesses
and method calls
Provides a single entry point to a shared
object, where synchronization policies may
be enforced
Multi-Process Garbage Collection



Must account for references from all live
processes
Must stay up-to-date when processes fork, as
this may create new references to shared
objects
Should be able to handle abnormal process
termination without leaking shared objects
Garbage Collection in POSH
Process A
Shared Memory
X
M
Y
L
Shared Object
Process B
Proxy Object
Regular Python reference
Reference from a process to a shared object (type I)
Reference from one shared object to another (type II)
Garbage Collection Details





POSH creates at most one proxy object per process
for any given shared object
Shared objects are always referenced through their
proxy objects
A bitmap in each shared object records the processes
that have a corresponding proxy object. This tracks
references of type I (from a process)
A separate count in each shared object records the
number of references to the object from other shared
objects. This tracks references of type II
Shared objects are deleted when there are no
references to them of either type
Performance



Performing a matrix
multiplication A = B x C
using POSH
The work is split between
several worker processes
The application runs on a
machine with 8 CPUs
More overhead, but scales
for multiple CPUs
500
400
Time

300
Ideal
Threads
POSH
200
100
0
12345678
Number of workers
Summary





Python uses a global interpreter lock (GIL) to serialize
execution of byte codes
This entails a lack of scalability on multiprocessor
architectures for CPU-intensive multi-threaded apps
However, threads offer an attractive programming
model, with implicit communication
Processes + shared memory reduce IPC overheads,
but normally impose flat data structures and require
data marshalling
POSH uses processes + shared memory to offer a
programming model similar to threads, with the
scalability of processes
Availability




Open source, hosted at SourceForge
http://poshmodule.sf.net/
Still not very stable
Developers wanted
Example Usage
import posh
class Stuff(object):
pass
posh.allow_sharing(Stuff, posh.generic_init)
mystuff = posh.share(Stuff())
def worker1():
mystuff.money = 0
def worker2():
mystuff.debt = 100000
def worker3():
mystuff.balance = mystuff.money - mystuff.debt
for w in worker1, worker2, worker3:
posh.forkcall(w)
posh.waitall()