Why Stackless is Cool

Download Report

Transcript Why Stackless is Cool

Stackless Python: programming the way Guido prevented it intended

Back To IPC9 developer‘s day

Why Stackless is Cool

• Microthreads • Generators (now obsolete) • Coroutines

Microthreads

• Very lightweight (can support thousands) • Locks need not be OS resources • Not for blocking I/O • A comfortable model for people used to real threads

Coroutines

Various ways to look at them • Peer to peer subroutines • Threads with voluntary swapping • Generators on steroids (args in, args out) What’s so cool about them • Both sides get to “drive” • Often can replace a state machine with something more intuitive [1] [1] Especially where the state machine features complex state but relatively simple events (or few events per state).

Three Steps To Stacklessness

• Get Python data off the C stack • Give each frame its own (Python) stackspace • Get rid of interpreter recursions

Result

• All frames are created equal • Stack overflows become memory errors • Pickling program state becomes conceivable (new: *has* been done)

Getting rid of recursion is difficult

• Often there is “post” processing involved • The C code (doing the recursing) may need its own “frame” • Possible Approaches • Tail optimized recursion • Transformation to loop Either way, the “post” code needs to be separated from the “setup” code.

Ironic Note: This is exactly the kind of pain we seek to relieve the Python programmer of!

Stackless Reincarnate

• Completely different approach: • Nearly no changes to the Python core • Platform dependant • Few lines of assembly • No longer fighting the Python implementation • Orthogonal concepts

Platform Specific Code

__forceinline static int slp_switch(void) { int *stackref, stsizediff; __asm mov stackref, esp; SLP_SAVE_STATE(stackref, stsizediff); __asm { mov eax, stsizediff add add esp, eax ebp, eax } SLP_RESTORE_STATE(); } Note: There are no arguments, in order to simplify the code

Support Macros 1(2)

#define SLP_SAVE_STATE(stackref, stsizediff) \ {\ PyThreadState *tstate = PyThreadState_GET();\ PyCStackObject **cstprev = tstate->slp_state.tmp.cstprev;\ PyCStackObject *cst = tstate->slp_state.tmp.cst;\ int stsizeb;\ if (cstprev != NULL) {\ if (slp_cstack_new(cstprev, stackref) == NULL) return -1;\ stsizeb = (*cstprev)->ob_size * sizeof(int*);\ memcpy((*cstprev)->stack, (*cstprev)->startaddr - (*cstprev)->ob_size, stsizeb);\ (*cstprev)->frame = tstate->slp_state.tmp.fprev;\ }\ else\ stsizeb = (cst->startaddr - stackref) * sizeof(int*);\ if (cst == NULL) return 0;\ stsizediff = stsizeb - (cst->ob_size * sizeof(int*));\ Note: Arguments are passed via Threadstate for easy implementation

Support Macros 2(2)

#define SLP_RESTORE_STATE() \ tstate = PyThreadState_GET();\ cst = tstate->slp_state.tmp.cst;\ if (cst != NULL)\ memcpy(cst->startaddr - cst->ob_size, &cst->stack, (cst->ob_size) * sizeof(int*));\ return 0;\ }\

Stacklessness via Stack Slicing

• Pieces of the C stack are captured • Recursion limited by heap memory only • Stack pieces attached to frame objects • „One-shot continuation“

Tasklets

• Tasklets are the building blocks • Tasklets can be switched • They behave like tiny threads • They communicate via channels

Tasklet Creation

# a function that takes a channel as argument

def simplefunc(chan): chan.receive()

# a factory for some tasklets

def simpletest(func, n): c = stackless.channel() gen = stackless.taskoutlet(func) for i in range(n): gen(c).run() return c

Inside Tasklet Creation

• Create frame „before call“ – Abuse of generator flag • Use „initial stub“ as a blueprint – slp_cstack_clone() • Parameterize with a frame object • Wrap into a tasklet object • Ready to run

Channels

• Known from OCCAM, Limbo, Alef • Channel.send(x) – activates a waiting tasklet with data – Blocks if none is waiting • y = Channel.receive() – Activates a waiting tasklet, returns data – Blocks if none is listening

Planned Extensions

• Async I/O in a platform independent way • Prioritized scheduling • High speed tasklets with extra stacks – Quick monitors which run between tasklets • Stack compression • Thread pickling • More channel features – Multiple wait on channel arrays

Thread pickling

• Has been implemented by TwinSun – Unfortunately for old Stackless • Analysis of the C stack necessary – By platform, only – Lots of work?

– Only a few contexts need stack analysis • Show it !!!

Stackless Sponsors

• Ironport – Email server with dramatic throughput – Integrating their code with the new Stackless – Async I/O • CCPGames – Massive Multiplayer Online Game EVE – Porting their client code to new Stackless next week