Dias nummer 1
Download
Report
Transcript Dias nummer 1
Literate programming
with SAS
- and other languages
Søren Højsgaard
Faculty of Agricultural Sciences
Aarhus University
Denmark
SASforum, May 2009, Copenhagen
AAR H U S U N I V E R S I T E T
Faculty of Agricultural Sciences
Take-home message
Literate programming: Combining text, code and
results in one document
Supports text formats:
LaTeX / OpenOffice (OpenDocument Text)
In combination with the ’engines’
SAS, R, S-plus, Maple, Stata, …
Ensures reproducibility of analysis
Great help in ”recalling what I did 2 months ago”
StatWeave does all this – and is free…
This talk: Focus on StatWeave with OpenOffice
and SAS/R …
Overview – Combining code,
documentation and results
Source document
Writing
SAS statements
More writing
R statements
Even more writing
More SAS statements
More writing…
Final document
Writing
SAS statements
SAS output
SAS graphics
More writing
R statements
R output
Even more writing
SAS statements
SAS output
More writing…
Hello StatWeave World…
What is literate programming
Knuth (1979) coined the term literate
programming:
Create software as works of literature:
Embed source code into descriptive text (rather than the
opposite which is common practice)
Software should follow flow of thoughts and logic
Should be designed to be readable by humans (and not
only by compilers / programs).
Very useful idea in statistics…
Why literate programming?
Reproducible statistical analysis
Research, consulting
Document exactly what has been done
Possible to re-run if data change
Manuals, course notes etc.
Shown output guaranteed to be result of shown code
Some systems for literate programming
Comments inside code
WEB (Knuth 1979) and friends
Sweave (Lesich 2002)
R code in LaTeX documents
odfWeave (Kuhn and Coulter 2007)
R code in OpenOffice documents
SASweave (Lenth and Højsgaard 2007)
SAS / R code in LaTeX documents
StatWeave
SAS / R / maple / S-plus / Stata … code in LaTeX and
OpenOffice documents
StatWeave
StatWeave created by Russ Lenth, University of Iowa, USA
Available: http://www.cs.uiowa.edu/~rlenth/StatWeave/
StatWeave is in its making, but becomming ”mature” and
stable.
Statweave design goals
Support many languages
R, S-plus, SAS, Stata, Maple, …
Support different word processing systems, currently
LaTeX
OpenDocument Text (ODT) www.openoffice.org
Portability: Usable on all platforms (Written in JAVA)
Extendible:
Add other languages
Under the hood of StatWeave
Source file is regular text document but with
code chunks added (with special tags)
Two basic operations
Weaving:
Process source file into single document with code
listings, output listings, graphs…
Tangling:
Extract code from source file to run later
Weaving is useful for reproducible statistical
analysis
Running StatWeave
Command-line interface:
statweave SAS-HelloWorld-swv.odt
statweave --tangle SAS-HelloWorld-swv.odt
statweave --keepall SAS-HelloWorld-swv.odt
Graphical User Interface:
Generally, source xxx-swv.odt becomes output
xxx.odt
Chicken weight data
Set global options (for SAS code)
Inline evaluation of expressions
… chicken weight data
… chicken weight data
Output can be saved for later use
- and display
Code reuse and argument substitution
Save code chunks for later execution
Pass arguments to code chunks
Simplest case: Not unlike a macro…
…code reuse and argument substitution
Costumize display and
output (tables) by
reusable code chunk
…code reuse and argument substitution
Multi-language example: SAS, R and DOS
together
Can use different engines in the same source file
Use SAS when appropriate; use R when appropriate; use
Maple when appropriate…
Weaving:
SAS/R/XX chunks assembled into separate code files.
Code files are processed in order of first appearence in
the source file
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
Synchronization
issue: SAS
chunk depends
on data from R
chunk which
depends on
data from SAS
chunk….
Solution: The
restart option
will restart the
engines
Code chunks are processed as a whole
Code chunks are processed as a ”unit” so in
general one can not split a call to proc xxxx over
several chunks:
Thus the following is illegal
… one exception in SAS: IML
Odds and ends – Maple
Differentiate y= sin(x) xx
x
Output is ugly, but it reads:
Odds and ends – calling the shell
Want to list all StatWeave / Open office source
files: *-swv.odt
Summary
Reproducible statistical analyses
Integrate text, code and results in one document
Several text formats
Several languages
This talk (and the examples) are avaiable at
http://genetics.agrsci.dk/~sorenh/misc/
All credit is due to Russ Lenth, the creator of
StatWeave. Thanks!!!!