Python as “Glue”

Download Report

Transcript Python as “Glue”

enthought ®
Scientific Computing
with Python
[Advanced Topics]
Eric Jones
[email protected]
Travis Oliphant
[email protected]
Enthought
www.enthought.com
Brigham Young University
http://www.ee.byu.edu/
Topics
•
•
•
•
Python as Glue
Wrapping Fortran Code
Wrapping C/C++
Parallel Programming
enthought ®
enthought ®
Python as “Glue”
Why Python for glue?
enthought ®
• Python reads almost like “pseudo-code” so it’s easy to
pick up old code and understand what you did.
• Python has dynamic typing and dynamic binding --allows very flexible coding.
• Python is object oriented.
• Python has high-level data structures like lists,
dictionaries, strings, and arrays all with useful methods.
• Python has a large module library (“batteries included”)
and common extensions covering internet protocols and
data, image handling, and scientific analysis.
• Python development is 5-10 times faster than C/C++ and
3-5 times faster than Java
Electromagnetics Example
(1) Parallel simulation
(2) Create plot
(3) Build HTML page
enthought ®
(4) FTP page to Web Server
(5) E-mail users that results
are available.
enthought ®
How is Python glue?
Legacy
App.
Parallel
Process.
Internet
Python
Users
Equipment
New
Algorithm
Why is Python good glue?
enthought ®
• Python can be embedded into any C or C++ application
Provides your legacy application with a powerful scripting language
instantly.
• Python can interface seamlessly with Java
– Jython www.jython.org
– JPE
jpe.sourceforge.net
• Python can interface with critical C/C++ and Fortran
subroutines
– Rarely will you need to write a main-loop again.
– Python does not directly call the compiled routines, it uses
interfaces (written in C or C++) to do it --- the tools for
constructing these interface files are fantastic (sometimes
making the process invisible to you).
enthought ®
Tools
• C/C++ Integration
–
–
–
–
–
SWIG
SIP
Pyrex
boost
weave
www.swig.org
www.riverbankcomputing.co.uk/sip/index.php
nz.cosc.canterbury.ac.nz/~greg/python/Pyrex
www.boost.org/libs/python/doc/index.html
www.scipy.org/site_content/weave
• FORTRAN Integration
– f2py
– PyFort
cens.ioc.ee/projects/f2py2e/
pyfortran.sourceforge.net
f2py
• Author: Pearu Peterson at Center for
Nonlinear Studies Tallinn, Estonia
• Automagically “wraps” Fortran 77/90/95
libraries for use in Python. Amazing.
• f2py is specifically built to wrap Fortran
functions using NumPy arrays.
enthought ®
enthought ®
Simplest f2py Usage
Fortran
File
Python Extension
Module
fcopy.f
fcopymodule.so
f2py –c fcopy.f –m fcopy
Compile code and build an
extension module
Name the extension
module fcopy.
Simplest Usage Result
enthought ®
Fortran file fcopy.f
C
SUBROUTINE FCOPY(AIN,N,AOUT)
C
>>> import fcopy
DOUBLE COMPLEX AIN(*)
>>> info(fcopy)
INTEGER N
This module 'fcopy' is auto-generated with f2py
DOUBLE COMPLEX AOUT(*)
(version:2.37.233-1545).
DO 20 J = 1, N
Functions:
fcopy(ain,n,aout)
AOUT(J) = AIN(J)
>>> info(fcopy.fcopy)
20
CONTINUE
fcopy - Function signature:
END
>>> a = rand(1000) +
1j*rand(1000)
fcopy(ain,n,aout)
Required arguments:
ain : input rank-1 array('D') with bounds (*)
n : input int
aout : input rank-1 array('D') with bounds (*)
>>> b = zeros((1000,),’D’)
>>> fcopy.fcopy(a,1000,b)
Looks exactly like the
Fortran --- but now in Python!
enthought ®
More Sophisticated
Fortran
File
Interface File
fcopy.f
hand edit
fcopy.pyf
Python Extension
Module
fcopymodule.so
f2py fcopy.f –h fcopy.pyf –m fcopy
f2py -c fcopy.pyf fcopy.f
enthought ®
More Sophisticated
Interface file fcopy.pyf
!
-*- f90 -*python module fcopy ! in
interface ! in :fcopy
subroutine fcopy(ain,n,aout) ! in :fcopy:fcopy.f
double complex dimension(n), intent(in) :: ain
integer, intent(hide),depend(ain) :: n=len(ain)
double complex dimension(n),intent(out) :: aout
end subroutine fcopy
end interface
end python module fcopy
! This file was auto-generated with f2py (version:2.37.233-1545).
! See http://cens.ioc.ee/projects/f2py2e/
fcopy - Function signature:
aout = fcopy(ain)
Required arguments:
ain : input rank-1 array('D') with
bounds (n)
Return objects:
aout : rank-1 array('D') with
bounds (n)
Give f2py
some hints as
to what these
variables are
used for and
how they may
be related in
Python.
>>> a = rand(100,’F’)
>>> b = fcopy.fcopy(a)
>>> print b.typecode()
‘D’
More Pythonic behavior
enthought ®
Simply Sophisticated
Fortran File
fcopy.f
hand edit
Python Extension
Module
fcopymodule.so
f2py –c fcopy.f –m fcopy
Compile code and build an
extension module
Name the extension
module fcopy.
Simply Sophisticated
enthought ®
Fortran file fcopy2.f
C
SUBROUTINE FCOPY(AIN,N,AOUT)
A few directives can help
C
CF2PY INTENT(IN), AIN
f2py interpret the source.
CF2PY INTENT(OUT), AOUT
CF2PY INTENT(HIDE), DEPEND(A), N=LEN(A)
DOUBLE COMPLEX AIN(*)
INTEGER N
DOUBLE COMPLEX AOUT(*)
DO 20 J = 1, N
>>> import fcopy
AOUT(J) = AIN(J)
>>> info(fcopy.fcopy)
20
CONTINUE
fcopy - Function signature:
END
aout = fcopy(ain)
>>> a = rand(1000)
>>> import fcopy
>>> b = fcopy.fcopy(a)
Much more Python like!
Required arguments:
ain : input rank-1 array('D') with bounds (n)
Return objects:
aout : rank-1 array('D') with bounds (n)
enthought ®
Saving the Module C-File
f2py –h alib.pyf –m alib *.f
compile
*.f
Library
libflib.a
Interface File
flib.pyf
hand edited
C-extension
Module
flibmodule.c
f2py –c alibmodule.c *.f
either one
Library of
Fortran Files
f2py alib.pyf
Shared extension
Module
flibmodule.so
f2py –c alibmodule.c –l alib
Multidimensional array issues
enthought ®
Python and Numeric use C conventions for array storage (row
major order). Fortran uses column major ordering.
Numeric:
A[0,0], A[0,1], A[0,2],…, A[N-1,N-2], A[N-1,N-1]
(last dimension varies the fastest)
Fortran:
A(1,1), A(2,1), A(3,1), …, A(N-1,N), A(N,N)
(first dimension varies the fastest)
f2py handles the conversion back and forth between the
representations if you mix them in your code. Your code will
be faster, however, if you can avoid mixing the representations
(impossible if you are calling out to both C and Fortran
libraries that are interpreting matrices differently).
scipy_distutils
How do I distribute this great new extension module?
Recipient must have f2py and scipy_distutils installed (both
are simple installs)
Create setup.py file
Distribute *.f files with setup.py file.
Optionally distribute *.pyf file if you’ve spruced up the
interface in a separate interface file.
Supported Compilers
g77, Compaq Fortran, VAST/f90 Fortran, Absoft F77/F90,
Forte (Sun), SGI, Intel, Itanium, NAG, Lahey, PG
enthought ®
Complete Example
enthought ®
In scipy.stats there is a function written entirely in Python
>>> info(stats.morestats._find_repeats)
_find_repeats(arr)
Find repeats in the array and return a list of the
repeats and how many there were.
Goal: Write an equivalent fortran function and link it in to
Python with f2py so it can be distributed with scipy_base
(which uses scipy_distutils) and be available for stats.
Python algorithm uses sort and so we will
need a fortran function for that, too.
Complete Example
Fortran file futil.f
C
Sorts an array arr(1:N) into
SUBROUTINE DQSORT(N,ARR)
CF2PY INTENT(IN,OUT,COPY), ARR
CF2PY INTENT(HIDE), DEPEND(ARR), N=len(ARR)
INTEGER N,M,NSTACK
REAL*8 ARR(N)
PARAMETER (M=7, NSTACK=100)
INTEGER I,IR,J,JSTACK, K,L, ISTACK(NSTACK)
REAL*8 A,TEMP
…
END
C
CF2PY
CF2PY
CF2PY
CF2PY
CF2PY
Finds repeated elements of ARR
SUBROUTINE DFREPS(ARR,N,REPLIST,REPNUM,NLIST)
INTENT(IN), ARR
INTENT(OUT), REPLIST
INTENT(OUT), REPNUM
INTENT(OUT), NLIST
INTENT(HIDE), DEPEND(ARR), N=len(ARR)
REAL*8 REPLIST(N), ARR(N)
REAL*8 LASTVAL
INTEGER REPNUM(N)
INTEGER HOWMANY, REPEAT, IND, NLIST, NNUM
…
END
enthought ®
#Lines added to setup_stats.py
#add futil module
sources = [os.path.join(local_path,
'futil.f']
name = dot_join(package,’futil’)
ext = Extension(name,sources)
config['ext_modules'].append(ext)
#Lines added to morestats.py
# (under stats)
import futil
def find_repeats(arr):
"""Find repeats in arr and
return (repeats, repeat_count)
"""
v1,v2, n = futil.dfreps(arr)
return v1[:n],v2[:n]
enthought ®
Complete Example
Try It Out!!
>>> from scipy import *
>>> a = stats.randint(1,30,size=1000)
>>> reps, nums = find_repeats(a)
>>> print reps
[
1.
12.
23.
2.
13.
24.
3.
14.
25.
4.
15.
26.
5.
16.
27.
6.
17.
28.
7.
8.
18. 19.
29.]
9.
20.
10.
21.
11.
22.
>>> print nums
[29 37 29 30 34 39 46 20 30 32 35 42 40 39 35 26 38 33 40
29 34 26 38 45 39 38 29 39 29]
New function is 25 times faster
than the plain Python version
enthought ®
Complete Example
Packaged for Individual release
#!/usr/bin/env python
# File: setup_futil.py
from scipy_distutils.core import Extension
ext = Extension(name = 'futil',
sources = ['futil.f'])
python setup_futil.py install
With futil.f in current directory this builds
and installs on any platform with a C
compiler and a fortran compiler that
scipy_distutils recognizes.
if __name__ == "__main__":
from scipy_distutils.core import setup
setup(name = 'futil',
description
= "Utility fortran functions",
author
= "Travis E. Oliphant",
author_email
= "[email protected]",
ext_modules = [ext]
)
# End of setup_futil.py
enthought ®
Weave
weave
enthought ®
• weave.blitz()
Translation of Numeric array expressions to C/C++ for fast
execution
• weave.inline()
Include C/C++ code directly in Python code for on-the-fly
execution
• weave.ext_tools
Classes for building C/C++ extension modules in Python
weave.inline
>>> import weave
>>> a=1
>>> weave.inline('std::cout << a << std::endl;',['a'])
sc_f08dc0f70451ecf9a9c9d4d0636de3670.cpp
Creating library <snip>
1
>>> weave.inline('std::cout << a << std::endl;',['a'])
1
>>> a='qwerty'
>>> weave.inline('std::cout << a << std::endl;',['a'])
sc_f08dc0f70451ecf9a9c9d4d0636de3671.cpp
Creating library <snip>
qwerty
>>> weave.inline('std::cout << a << std::endl;',['a'])
qwerty
enthought ®
Support code example
enthought ®
>>> import weave
>>> a = 1
>>> support_code = ‘int bob(int val) { return val;}’
>>> weave.inline(‘return_val = bob(a);’,['a'],support_code=support_code)
sc_19f0a1876e0022290e9104c0cce4f00c0.cpp
Creating library <snip>
1
>>> a = 'string'
>>> weave.inline(‘return_val = bob(a);’,['a'],support_code = support_code)
sc_19f0a1876e0022290e9104c0cce4f00c1.cpp
C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\sc_19f0a1876e0022290e9104c0cce4
f00c1.cpp(417) : error C2664: 'bob' : cannot convert parameter 1 from 'class Py:
:String' to 'int' No user-defined-conversion operator available that can
perform this conversion, or the operator cannot be called
Traceback (most recent call last):
<snip>
weave.build_tools.CompileError: error: command '"C:\Program Files\Microsoft Visu
al Studio\VC98\BIN\cl.exe"' failed with exit status 2
ext_tools example
import string
from weave import ext_tools
def build_ex1():
ext = ext_tools.ext_module('_ex1')
# Type declarations– define a sequence and a function
seq = []
func = string.upper
code = """
py::tuple args(1);
py::list result(seq.length());
for(int i = 0; i < seq.length();i++)
{
args[0] = seq[i];
result[i] = PyEval_CallObject(func,py::tuple(args[0]));
}
return_val = result;
"""
func = ext_tools.ext_function('my_map',code,['func','seq'])
ext.add_function(func)
ext.compile()
try:
from _ex1 import *
except ImportError:
build_ex1()
from _ex1 import *
if __name__ == '__main__':
print my_map(string.lower,['asdf','ADFS','ADSD'])
enthought ®
enthought ®
Efficiency Issues
PSEUDO C FOR STANDARD
NUMERIC EVALUATION
>>> c = a + b + c
FAST, IDIOMATIC C CODE
>>> c = a + b + c
tmp1
tmp2
// c code
// tmp1 = a + b
tmp1 = malloc(len_a * el_sz);
for(i=0; i < len_a; i++)
tmp1[i] = a[i] + b[i];
// tmp2 = tmp1 + c
tmp2 = malloc(len_c * el_sz);
for(i=0; i < len_c; i++)
tmp2[i] = tmp1[i] + c[i];
// c code
// 1. loops “fused”
// 2. no memory allocation
for(i=0; i < len_a; i++)
c[i] = a[i] + b[i] + c[i];
Finite Difference Equation
enthought ®
MAXWELL’S EQUATIONS: FINITE DIFFERENCE TIME DOMAIN (FDTD),
UPDATE OF X COMPONENT OF ELECTRIC FIELD
Ex
σ x t
1
dH y
dHz
2ε x
t
t

E 
–
σ x t x
σ x t dy
σ x t dz
εx 
εx 
1
2
2
2ε x
PYTHON VERSION OF SAME EQUATION, PRE-CALCULATED CONSTANTS
ex[:,1:,1:] =
ca_x[:,1:,1:]
* ex[:,1:,1:]
+ cb_y_x[:,1:,1:] * (hz[:,1:,1:] - hz[:,:-1,:])
- cb_z_x[:,1:,1:] * (hy[:,1:,1:] - hy[:,1:,:-1])
weave.blitz
enthought ®
weave.blitz compiles array
expressions to C/C++ code using
the Blitz++ library.
WEAVE.BLITZ VERSION OF SAME EQUATION
>>> from scipy import weave
>>> # <instantiate all array variables...>
>>> expr = “ex[:,1:,1:] =
ca_x[:,1:,1:]
* ex[:,1:,1:]”\
“+ cb_y_x[:,1:,1:] * (hz[:,1:,1:] - hz[:,:-1,:])”\
“- cb_z_x[:,1:,1:] * (hy[:,1:,1:] - hy[:,1:,:-1])”
>>> weave.blitz(expr)
< 1. translate expression to blitz++ expression>
< 2. compile with gcc using array variables in local scope>
< 3. load compiled module and execute code>
enthought ®
weave.blitz benchmarks
Equation
Numeric
(sec)
Inplace
(sec)
compiler
(sec)
Speed
Up
Float (4 bytes)
a=b+c
(512,512)
0.027
0.019
0.024
1.13
a = b + c + d (512x512)
0.060
0.037
0.029
2.06
5 pt. avg filter (512x512)
0.161
-
0.060
2.68
FDTD
0.890
-
0.323
2.75
(100x100x100)
Double (8 bytes)
a=b+c
(512,512)
0.128
0.106
0.042
3.05
a = b + c + d (512x512)
0.248
0.210
0.054
4.59
5 pt. avg filter (512x512)
0.631
-
0.070
9.01
FDTD
3.399
-
0.395
8.61
•
•
(100x100x100)
Pentium II, 300 MHz, Python 2.0, Numeric 17.2.0
Speed-up taken as ratio of scipy.compiler to standard Numeric runs.
weave and Laplace’s equation
enthought ®
Weave case study: An iterative solver for Laplace’s Equation
u u
 2 0
2
x
y
2
PURE PYTHON
2
1 Volt
2000 SECONDS
for i in range(1, nx-1):
for j in range(1, ny-1):
tmp = u[i,j]
u[i,j] = ((u[i-1, j] + u[i+1, j])*dy2 +
(u[i, j-1] + u[i, j+1])*dx2)
/ (2.0*(dx2 + dy2))
diff = u[i,j] – tmp
err = err + diff**2
Thanks to Prabhu Ramachandran for designing and running this example. His complete
write-up is available at: www.scipy.org/site_content/weave/python_performance.html
weave and Laplace’s equation
USING NUMERIC
enthought ®
29.0 SECONDS
old_u = u.copy() # needed to compute the error.
u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +
(u[1:-1, 0:-2] + u[1:-1, 2:])*dx2)
* dnr_inv
err = sum(dot(old_u – u))
WEAVE.BLITZ
10.2 SECONDS
old_u = u.copy() # needed to compute the error.
expr = ””” \
u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +
(u[1:-1, 0:-2] + u[1:-1, 2:])*dx2)
* dnr_inv
”””
weave.inline(expr,size_check=0)
err = sum((old_u – u)**2)
weave and Laplace’s equation
WEAVE.INLINE
enthought ®
4.3 SECONDS
code = """
#line 120 "laplace.py" (This is only useful for debugging)
double tmp, err, diff;
err = 0.0;
for (int i=1; i<nx-1; ++i) {
for (int j=1; j<ny-1; ++j) {
tmp = u(i,j);
u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2 +
(u(i,j-1) + u(i,j+1))*dx2)*dnr_inv;
diff = u(i,j) - tmp;
err += diff*diff;
}
}
return_val = sqrt(err);
"""
err = weave.inline(code, ['u','dx2','dy2','dnr_inv','nx','ny'],
type_converters = converters.blitz,
compiler = 'gcc',
extra_compile_args = ['-O3','-malign-double'])
enthought ®
Laplace Benchmarks
Method
Speed
Up
1897.0
 0.02
Numeric
29.0
1.00
weave.blitz
10.2
2.84
weave.inline
4.3
6.74
weave.inline (fast)
2.9
10.00
Python/Fortran (with f2py)
3.2
9.06
Pure C++ Program
2.4
12.08
Pure Python
•
•
•
Run Time
(sec)
Debian Linux, Pentium III, 450 MHz, Python 2.1, 192 MB RAM
Laplace solve for 500x500 grid and 100 iterations
Speed-up taken as compared to Numeric
enthought ®
SWIG
SWIG
enthought ®
• Author: David Beazley at Univ. of Chicago
• Automatically “wraps” C/C++ libraries for use in
Python. Amazing.
• SWIG uses interface files to describe library
functions
– No need to modify original library code
– Flexible approach allowing both simple and complex
library interfaces
• Well Documented
enthought ®
SWIG Process
Interface File
C Extension
File
lib.i
SWIG
Writing this is
your responsibility (kinda)
Library Files
lib_wrap.c
compile
Python Extension
Module
*.h files *.c files
compile
libmodule.so
enthought ®
Simple Example
fact.h
example.i
#ifndef FACT_H
#define FACT_H
// Define the modules name
%module example
int fact(int n);
// Specify code that should
// be included at top of
// wrapper file.
%{
#include “fact.h”
%}
#endif
fact.c
#include “fact.h”
int fact(int n)
{
if (n <=1) return 1;
else return n*fact(n-1);
}
// Define interface. Easy way
// out - Simply include the
// header file and let SWIG
// figure everything out.
%include “fact.h”
enthought ®
Building the Module
LINUX
# Create example_wrap.c file
[ej@bull ej]$ swig –python example.i
# Compile library and example_wrap.c code using
# “position independent code” flag
[ej@bull ej]$ gcc –c –fpic example_wrap.c fact.c
–I/usr/local/include/python2.1
–I/usr/local/lib/python2.1/config
# link as a shared library.
[ej@bull ej]$ gcc –shared example_wrap.o fact.o
-o examplemodule.so
# test it in Python
[ej@bull ej]$ python
...
>>> import example
>>> example.fact(4)
24
\
\
\
For notes on how to use SWIG with
VC++ on Windows, see
http://www.swig.org/Doc1.1/HTML/Python.html#n2
enthought ®
The Wrapper File
example_wrap.c
static PyObject *_wrap_fact(PyObject *self, PyObject *args) {
PyObject *resultobj;
int arg0 ;
int result ;
/* parse the Python input arguments and extract */
name of function to return in case of error
if(!PyArg_ParseTuple(args,"i:fact",&arg0)) return NULL;
first arg in args read into arg0 as int
/* call the actual C function with arg0 as the argument*/
result = (int )fact(arg0);
/* Convert returned C value to Python type and return it*/
resultobj = PyInt_FromLong((long)result);
return resultobj;
}
enthought ®
SWIG Example 2
vect.h
int* vect(int x,int y,int z);
int sum(int* vector);
vect.c
#include <malloc.h>
#include “vect.h”
int* vect(int x,int y, int z){
int* res;
res = malloc(3*sizeof(int));
res[0]=x;res[1]=y;res[2]=z;
return res;
}
int sum(int* v) {
return v[0]+v[1]+v[2];
}
example2.i
Identical to example.i if you replace
“fact” with “vect”.
TEST IN PYTHON
>>> from example2 import *
>>> a = vect(1,2,3)
>>> sum(a)
6 #works fine!
# Let’s take a look at the
# integer array a.
>>> a
'_813d880_p_int'
# WHAT THE HECK IS THIS???
Complex Objects in SWIG
• SWIG treats all complex objects as pointers.
• These C pointers are mangled into string
representations for Python’s consumption.
• This is one of SWIG’s secrets to wrapping
virtually any library automatically,
• But… the string representation is pretty
primitive and makes it “un-pythonic” to
observe/manipulate the contents of the
object.
• Enter typemaps
enthought ®
enthought ®
Typemaps
example_wrap.c
static PyObject *_wrap_sum(PyObject *self, PyObject *args) {
...
if(!PyArg_ParseTuple(args,"O:sum",&arg0))
return NULL;
...
result = (int )sum(arg0);
...
return resultobj;
}
• Typemaps allow you to insert “type
conversion” code into various location within
the function wrapper.
• Not for the faint of heart. Quoting David:
“You can blow your whole leg off,
including your foot!”
Typemaps
enthought ®
The result? Standard C pointers are mapped to
NumPy arrays for easy manipulation in Python.
YET ANOTHER EXAMPLE – NOW WITH TYPEMAPS
>>> import example3
>>> a = example3.vect(1,2,3)
>>> a
# a should be an array now.
array([1, 2, 3], 'i') # It is!
>>> example3.sum(a)
6
The typemaps used for example3 are included in the handouts.
Another example that wraps a more complicated C function used in
the previous VQ benchmarks is also provided. It offers more generic
handling 1D and 2D arrays.
enthought ®
Parallel Programming in Python
Parallel Computing Tools
•
•
•
•
Python has threads (sort’a)
pyMPI(pympi.sf.net/)
pyre (CalTech)
PyPAR
(datamining.anu.edu.au/~ole/pypar/)
• SCIENTIFIC
(starship.python.net/crew/hinsen)
• COW (www.scipy.org)
enthought ®
Cluster Computing with Python
enthought ®
• cow.py
• Pure Python Approach
• Easy to Use
• Suitable for “embarrassingly” parallel tasks
• pyMPI (Message Passing Interface)
• Developed by Patrick Miller, Martin Casado et al. at
Lawrence Livermore National Laboratories
• De-facto industry standard for high-performance computing
• Vendor optimized libraries on “Big Iron”
• Possible to integrate existing HPFortran and HPC codes
such as Scalapack (parallel linear algebra) into Python.
Threads
• Python threads are built on POSIX and
Windows threads (hooray!)
• Python threads share a “lock” that
prevents threads from invalid sharing
• Threads pass control to another thread
– every few instructions
– during blocking I/O (if properly guarded)
– when threads die
enthought ®
The “threading” module
• from threading import Thread
– a lower level thread library exists, but this
is much easier to use
• a thread object can “fork” a new
execution context and later be “joined”
to another
• you provide the thread body either by
creating a thread with a function or by
subclassing it
enthought ®
Making a thread
• we will work at the prompt!
>>> from threading import *
>>> def f(): print ‘hello’
>>> T = Thread(target=f)
>>> T.start()
enthought ®
Thread operations
• currentThread()
• T.start()
• T.join()
• T.getName() / T.setName()
• T.isAlive()
• T.isDaemon() / T.setDaemon()
enthought ®
Passing arguments to a thread
enthought ®
>>> from threading import *
>>> def f(a,b,c): print ‘hello’,a,b,c
>>> T = Thread(target=f,args=(11,22),kwargs={‘c’: )
>>> T.start()
Subclassing a thread
enthought ®
from threading import *
class myThread(Thread):
def __init__(self,x,**kw):
Thread.__init__(self,**kw) #FIRST!
self.x = x
def run():
print self.getName()
print ‘I am running’,self.x
T = myThread(100)
T.start()
NOTE: Only __init__ and run() are available for overload
CAUTION!
enthought ®
• Threads are really co-routines!
• Only one thread can operate on Python
objects at a time
• Internally, threads are switched
• If you write extensions that are intended
for threading, use
– PY_BEGIN_ALLOW_THREADS
– PY_END_ALLOW_THREADS
enthought ®
cow
enthought ®
Electromagnetic Scattering
Airborne Radar
Monostatic BackScatter from Buried
Landmine, Theta = 30, Phi = 0
0
RCS (dB)
-5
-10
-15
-20
-25
100
Buried Land Mine
Inputs
environment, target
mesh, and
multiple frequencies
Mem: KB to Mbytes
SMALL
Computation
N3 CPU
N2 storage
Time: a few seconds
to days
Mem: MB to GBytes
LARGE!
150
200
250
300
Frequency (MHz)
350
Outputs
Radar Cross Section
values
Mem: KB to MBytes
SMALL
400
enthought ®
cow.py
Master
Python
>>>
Socket
Node 0
Node 1
Node 2
Python
>>>
Python
>>>
Python
>>>
58
Cluster Creation
enthought ®
Master
>>> import scipy.cow
#
[name, port]
>>> machines = [['s0',11500],['s1',11500],['s2',11500]]
>>> cluster = scipy.cow.machine_cluster(machines)
>>>
Port numbers below 1024 are reserved by the OS and generally
must run as ‘root’ or ‘system’. Valid port numbers are between
1025-49151. Be sure another program is not using the port
you choose.
enthought ®
Starting remote processes
Master
>>> cluster = scipy.cow.cluster(machines)
>>> cluster.start()
start() uses ssh to
start an interpreter
listening on port 11500
on each remote machine
s0
s1
s2
Python
>>>
Python
>>>
Python
>>>
Dictionary Behavior of Clusters
Master
# Set a global variable on each of the machines.
>>> cluster['a'] = 1
s0
Python
>>> a = 1
s1
Python
>>> a = 1
s2
Python
>>> a = 1
enthought ®
Dictionary Behavior of Clusters
Master
# Set a global variable on each of the machines.
>>> cluster['a'] = 1
# Retrieve a global variable from each machine.
>>> cluster['a']
( 1, 1, 1)
#(s0,s1,s2)
s0
Python
>>> a = 1
>>> a
1
s1
Python
>>> a = 1
>>> a
1
s2
Python
>>> a = 1
>>> a
1
enthought ®
enthought ®
cluster.apply()
Master
# run a function on each remote machine
>>> import os
>>> cluster.apply(os.getpid)
(123,425,947)
s0
>>> os.getpid()
123
s1
>>> os.getid()
425
s2
>>> os.getpid()
947
enthought ®
cluster.exec_code()
Master
# run a code fragment on each remote machine
>>> cluster.exec_code('import os; pid = os.getpid()',
...
returns = ('pid',))
(123,425,947)
s0
>>> import os
>>> os.getpid()
123
s1
>>> import os
>>> os.getpid()
425
s2
>>> import os
>>> os.getpid()
947
enthought ®
cluster.loop_apply()
Master
# divide task evenly (as possible) between workers
>>> import string
>>> s = ['aa','bb','cc','dd']
>>> cluster.loop_apply(string.upper, loop_var=0, args=(s,) )
('AA','BB','CC','DD')
s0
>>> x=upper('aa')
>>> y=upper('bb')
>>> (x,y)
('AA','BB')
s1
>>> x=upper('cc')
>>> (x,)
('CC',)
s2
>>> x=upper('dd')
>>> (x,)
('DD',)
Cluster Method Review
enthought ®
• apply(function, args=(), keywords=None)
– Similar to Python’s built-in apply function. Call the given function
with the specified args and keywords on all the worker machines.
Returns a list of the results received from each worker.
• exec_code(code, inputs=None, returns=None)
– Similar to Python’s built-in exec statement. Execute the given code
on all remote workers as if it were typed at the command line.
inputs is a dictionary of variables added to the global namespace on
the remote workers. returns is a list of variable names (as strings)
that should be returned after the code is executed. If returns
contains a single variable name, a list of values is returned by
exec_code. If returns is a sequence of variable names,
exec_code returns a list of tuples.
Cluster Method Review
enthought ®
• loop_apply(function,loop_var,args=(),
keywords=None)
– Call function with the given args and keywords. One of the
arguments or keywords is actually a sequence of arguments. This
sequence is looped over, calling function once for each value in the
sequence. loop_var indicates which variable to loop over. If an
integer, loop_var indexes the args list. If a string, it specifies a
keyword variable. The loop sequence is divided as evenly as possible
between the worker nodes and executed in parallel.
• loop_code(code, loop_var, inputs=None,
returns=None)
– Similar to exec_code and loop_apply. Here loop_var
indicates a variable name in the inputs dictionary that should be
looped over.
Cluster Method Review
enthought ®
• ps(sort_by=‘cpu’,**filters)
– Display all the processes running on the remote machine much like
the ps Unix command. sort_by indicates which field to sort the
returned list. Also keywords allow the list to be filtered so that only
certain processes are displayed.
• info()
– Display information about each worker node including its name,
processor count and type, total and free memory, and current work
load.
enthought ®
Query Operations
>>> herd.cluster.info()
MACHINE
CPU
GHZ
s0
2xP3
0.5
s1
2xP3
0.5
s2
2xP3
0.5
MB TOTAL
960.0
960.0
960.0
MB FREE
930.0
41.0
221.0
>>> herd.cluster.ps(user='ej',cpu='>50')
MACHINE USER PID %CPU %MEM TOTAL MB RES MB
s0
ej
123 99.9
0.4
3.836
3.836
s1
ej
425 99.9
0.4
3.832
3.832
s2
ej
947 99.9
0.4
3.832
3.832
LOAD
0.00
1.00
0.99
CMD
python...
python...
python...
Simple FFT Benchmark
enthought ®
(1) STANDARD SERIAL APPROACH TO 1D FFTs
>>> b = fft(a) # a is a 2D array: 8192 x 512
(2) PARALLEL APPROACH WITH LOOP_APPLY
>>> b = cluster.loop_apply(fft,0,(a,))
(3) PARALLEL SCATTER/COMPUTE/GATHER APPROACH
>>> cluster.import_all(‘FFT’)
# divide a row wise amongst workers
>>> cluster.row_split('a',a)
# workers calculate fft of small piece of a and stores as b.
>>> cluster.exec_code('b=fft(a)')
# gather the b values from workers back to master.
>>> b = cluster.row_gather('b')
enthought ®
FFT Benchmark Results
Method
CPUs
Run Time
(sec)
Speed Efficiency
Up
(1) standard
1
2.97
-
-
(2) loop_apply
2
11.91
0.25
-400%
(3) scatter/compute/gather
2
13.83
0.21
-500%
Test Setup:
The array a is 8192 by 512. ffts are applied to each row
independently as is the default behavior of the FFT module.
The cluster consists of 16 dual Pentium II 450 MHz machines
connected using 100 Mbit ethernet.
enthought ®
FFT Benchmark Results
Method
CPUs
Run Time
(sec)
Speed Efficiency
Up
(1) standard
1
2.97
-
-
(2) loop_apply
2
11.91
0.25
-400%
(3) scatter/compute/gather
2
13.83
0.21
-500%
(3) compute alone
2
1.49
2.00
100%
(3) compute alone
4
0.76
3.91
98%
(3) compute alone
16
0.24
12.38
78%
(3) compute alone
32
0.17
17.26
54%
Moral:
If data can be distributed among the machines once and then
manipulated in place, reasonable speed-ups are achieved.
enthought ®
Electromagnetics
EM Scattering Problem
CPUs
Run Time
(sec)
Speed Efficiency
Up
Small Buried Sphere
64 freqs, 195 edges
32
8.19
31.40
98.0%
Land Mine
64 freqs, 1152 edges
32
285.12
31.96
99.9%
Serial vs. Parallel EM Solver
SERIAL VERSION
def serial(solver,freqs,angles):
results = []
for freq in freqs:
# single_frequency handles calculation details
res = single_frequency(solver,freq,angles)
results.append(res)
return results
PARALLEL VERSION
def parallel(solver,freqs,angles,cluster):
# make sure cluster is running
cluster.start(force_restart = 0)
# bundle arguments for loop_apply call
args = (solver,freqs,angles)
# looping handled by loop_apply
results = cluster.loop_apply(single_frequency,1,args)
return results
enthought ®
enthought ®
pyMPI
Simple MPI Program
# output is asynchronous
% mpirun -np 4 pyMPI
>>> import mpi
>>> print mpi.rank
3
0
2
1
# force synchronization
>>> mpi.synchronizedWrite(mpi.rank, ’\n’)
0
1
2
3
enthought ®
Broadcasting Data
import mpi
import math
if mpi.rank == 0:
data = [sin(x) for x in range(0,10)]
else:
data = None
common_data = mpi.bcast(data)
enthought ®
mpi.bcast()
enthought ®
• bcast() broadcasts a value from the
“root” process (default is 0) to all other
processes
• bcast’s arguments include the message
to send and optionally the root sender
• the message argument is ignored on all
processors except the root
Scattering an Array
# You can give a little bit to everyone
import mpi
from math import sin,pi
if mpi.rank == 0:
array = [sin(x*pi/99) for x in
range(100)]
else:
array = None
# give everyone some of the array
local_array = mpi.scatter(array)
enthought ®
mpi.scatter()
• scatter() splits an array, list, or tuple
evenly (roughly) across all processors
• the function result is always a [list]
• an optional argument can change the
root from rank 0
• the message argument is ignored on all
processors except the root
enthought ®
Gathering wandering data
enthought ®
# Sometimes everyone has a little data to bring
# together
import mpi
import math
local_data = [sin(mpi.rank*x*pi/99) for x in range(100)]
print local_data
root_data = mpi.gather(local_data)
print root_data
mpi.gather() / mpi.allgather()
• gather appends lists or tuples into a
master list on the root process
• if you want it on all ranks, use
mpi.allgather() instead
• every rank must call the gather()
enthought ®
Reductions
# You can bring data together in interesting ways
import mpi
x_cubed = mpi.rank**3
sum_x_cubed = mpi.reduce(x_cubed,mpi.SUM)
enthought ®
mpi.reduce() / mpi.allreduce()
• The reduce (and allreduce) functions
apply an operator across data from all
participating processes
• You can use predefined functions
– mpi.SUM, mpi.MIN, mpi.MAX, etc…
• you can define your own functions too
• you may optionally specify an initial
value
enthought ®
enthought ®
3D Visualization with VTK
Visualization with VTK
• Visualization Toolkit from Kitware
– www.kitware.com
• Large C++ class library
– Wrappers for Tcl, Python, and Java
– Extremely powerful, but…
– Also complex with a steep learning curve
enthought ®
VTK Gallery
enthought ®
enthought ®
VTK Pipeline
PIPELINE
Pipeline view from Visualization Studio
at http://www.principiamathematica.com
OUTPUT
Cone Example
enthought ®
SETUP
# VTK lives in two modules
from vtk import *
# Create a renderer
renderer = vtkRenderer()
# Create render window and connect the renderer.
render_window = vtkRenderWindow()
render_window.AddRenderer(renderer)
render_window.SetSize(300,300)
# Create Tkinter based interactor and connect render window.
# The interactor handles mouse interaction.
interactor = vtkRenderWindowInteractor()
interactor.SetRenderWindow(render_window)
Cone Example (cont.)
PIPELINE
# Create cone source with 200 facets.
cone = vtkConeSource()
cone.SetResolution(200)
# Create color filter and connect its input
# to the cone's output.
color_filter = vtkElevationFilter()
color_filter.SetInput(cone.GetOutput())
color_filter.SetLowPoint(0,-.5,0)
color_filter.SetHighPoint(0,.5,0)
# map colorized cone data to graphic primitives
cone_mapper = vtkDataSetMapper()
cone_mapper.SetInput(color_filter.GetOutput())
enthought ®
Cone Example (cont.)
DISPLAY
# Create actor to represent our
# cone and connect it to the
# mapper
cone_actor = vtkActor()
cone_actor.SetMapper(cone_mapper)
# Assign actor to
# the renderer.
renderer.AddActor(cone_actor)
# Initialize interactor
# and start visualizing.
interactor.Initialize()
interactor.Start()
enthought ®
Mesh Generation
enthought ®
POINTS AND CELLS
3
2
0
id
0
1
2
3
1
points
x y z temp
0 0 0 10
1 0 0 20
0 1 0 20
0 0 1 30
triangles
id x y z
0
0 1 3
1
0 3 2
2
1 2 3
3
0 2 1
# Convert list of points to VTK structure
verts = vtkPoints()
temperature = vtkFloatArray()
for p in points:
verts.InsertNextPoint(p[0],p[1],p[2])
temperature.InsertNextValue(p[3])
# Define triangular cells from the vertex
# “ids” (index) and append to polygon list.
polygons = vtkCellArray()
for tri in triangles:
cell = vtkIdList()
cell.InsertNextId(tri[0])
cell.InsertNextId(tri[1])
cell.InsertNextId(tri[2])
polygons.InsertNextCell(cell)
Mesh Generation
enthought ®
POINTS AND CELLS
# Create a mesh from these lists
mesh = vtkPolyData()
mesh.SetPoints(verts)
mesh.SetPolys(polygons)
mesh.GetPointData().SetScalars( \
...
temperature)
# Create mapper for mesh
mapper = vtkPolyDataMapper()
mapper.SetInput(mesh)
# If range isn’t set, colors are
# not plotted.
mapper.SetScalarRange( \
...
temperature.GetRange())
Code for temperature bar not shown.
enthought ®
VTK Demo