Transcript Why Stackless is Cool
Stackless Python: programming the way Guido prevented it intended
Back To IPC9 developer‘s day
Why Stackless is Cool
• Microthreads • Generators (now obsolete) • Coroutines
Microthreads
• Very lightweight (can support thousands) • Locks need not be OS resources • Not for blocking I/O • A comfortable model for people used to real threads
Coroutines
Various ways to look at them • Peer to peer subroutines • Threads with voluntary swapping • Generators on steroids (args in, args out) What’s so cool about them • Both sides get to “drive” • Often can replace a state machine with something more intuitive [1] [1] Especially where the state machine features complex state but relatively simple events (or few events per state).
Three Steps To Stacklessness
• Get Python data off the C stack • Give each frame its own (Python) stackspace • Get rid of interpreter recursions
Result
• All frames are created equal • Stack overflows become memory errors • Pickling program state becomes conceivable (new: *has* been done)
Getting rid of recursion is difficult
• Often there is “post” processing involved • The C code (doing the recursing) may need its own “frame” • Possible Approaches • Tail optimized recursion • Transformation to loop Either way, the “post” code needs to be separated from the “setup” code.
Ironic Note: This is exactly the kind of pain we seek to relieve the Python programmer of!
Stackless Reincarnate
• Completely different approach: • Nearly no changes to the Python core • Platform dependant • Few lines of assembly • No longer fighting the Python implementation • Orthogonal concepts
Platform Specific Code
__forceinline static int slp_switch(void) { int *stackref, stsizediff; __asm mov stackref, esp; SLP_SAVE_STATE(stackref, stsizediff); __asm { mov eax, stsizediff add add esp, eax ebp, eax } SLP_RESTORE_STATE(); } Note: There are no arguments, in order to simplify the code
Support Macros 1(2)
#define SLP_SAVE_STATE(stackref, stsizediff) \ {\ PyThreadState *tstate = PyThreadState_GET();\ PyCStackObject **cstprev = tstate->slp_state.tmp.cstprev;\ PyCStackObject *cst = tstate->slp_state.tmp.cst;\ int stsizeb;\ if (cstprev != NULL) {\ if (slp_cstack_new(cstprev, stackref) == NULL) return -1;\ stsizeb = (*cstprev)->ob_size * sizeof(int*);\ memcpy((*cstprev)->stack, (*cstprev)->startaddr - (*cstprev)->ob_size, stsizeb);\ (*cstprev)->frame = tstate->slp_state.tmp.fprev;\ }\ else\ stsizeb = (cst->startaddr - stackref) * sizeof(int*);\ if (cst == NULL) return 0;\ stsizediff = stsizeb - (cst->ob_size * sizeof(int*));\ Note: Arguments are passed via Threadstate for easy implementation
Support Macros 2(2)
#define SLP_RESTORE_STATE() \ tstate = PyThreadState_GET();\ cst = tstate->slp_state.tmp.cst;\ if (cst != NULL)\ memcpy(cst->startaddr - cst->ob_size, &cst->stack, (cst->ob_size) * sizeof(int*));\ return 0;\ }\
Stacklessness via Stack Slicing
• Pieces of the C stack are captured • Recursion limited by heap memory only • Stack pieces attached to frame objects • „One-shot continuation“
Tasklets
• Tasklets are the building blocks • Tasklets can be switched • They behave like tiny threads • They communicate via channels
Tasklet Creation
# a function that takes a channel as argument
def simplefunc(chan): chan.receive()
# a factory for some tasklets
def simpletest(func, n): c = stackless.channel() gen = stackless.taskoutlet(func) for i in range(n): gen(c).run() return c
Inside Tasklet Creation
• Create frame „before call“ – Abuse of generator flag • Use „initial stub“ as a blueprint – slp_cstack_clone() • Parameterize with a frame object • Wrap into a tasklet object • Ready to run
Channels
• Known from OCCAM, Limbo, Alef • Channel.send(x) – activates a waiting tasklet with data – Blocks if none is waiting • y = Channel.receive() – Activates a waiting tasklet, returns data – Blocks if none is listening
Planned Extensions
• Async I/O in a platform independent way • Prioritized scheduling • High speed tasklets with extra stacks – Quick monitors which run between tasklets • Stack compression • Thread pickling • More channel features – Multiple wait on channel arrays
Thread pickling
• Has been implemented by TwinSun – Unfortunately for old Stackless • Analysis of the C stack necessary – By platform, only – Lots of work?
– Only a few contexts need stack analysis • Show it !!!
Stackless Sponsors
• Ironport – Email server with dramatic throughput – Integrating their code with the new Stackless – Async I/O • CCPGames – Massive Multiplayer Online Game EVE – Porting their client code to new Stackless next week