Dias nummer 1

Download Report

Transcript Dias nummer 1

Literate programming
with SAS
- and other languages
Søren Højsgaard
Faculty of Agricultural Sciences
Aarhus University
Denmark
SASforum, May 2009, Copenhagen
AAR H U S U N I V E R S I T E T
Faculty of Agricultural Sciences
Take-home message
 Literate programming: Combining text, code and
results in one document
 Supports text formats:
 LaTeX / OpenOffice (OpenDocument Text)
 In combination with the ’engines’
 SAS, R, S-plus, Maple, Stata, …
 Ensures reproducibility of analysis
 Great help in ”recalling what I did 2 months ago”
 StatWeave does all this – and is free…
 This talk: Focus on StatWeave with OpenOffice
and SAS/R …
Overview – Combining code,
documentation and results
Source document
 Writing
 SAS statements
 More writing
 R statements
 Even more writing
 More SAS statements
 More writing…
Final document
 Writing
 SAS statements
 SAS output
 SAS graphics
 More writing
 R statements
 R output
 Even more writing
 SAS statements
 SAS output
 More writing…
Hello StatWeave World…
What is literate programming
 Knuth (1979) coined the term literate
programming:
 Create software as works of literature:
 Embed source code into descriptive text (rather than the
opposite which is common practice)
 Software should follow flow of thoughts and logic
 Should be designed to be readable by humans (and not
only by compilers / programs).
 Very useful idea in statistics…
Why literate programming?
 Reproducible statistical analysis
 Research, consulting
 Document exactly what has been done
 Possible to re-run if data change
 Manuals, course notes etc.
 Shown output guaranteed to be result of shown code
Some systems for literate programming
 Comments inside code
 WEB (Knuth 1979) and friends
 Sweave (Lesich 2002)
 R code in LaTeX documents
 odfWeave (Kuhn and Coulter 2007)
 R code in OpenOffice documents
 SASweave (Lenth and Højsgaard 2007)
 SAS / R code in LaTeX documents
 StatWeave
 SAS / R / maple / S-plus / Stata … code in LaTeX and
OpenOffice documents
StatWeave
 StatWeave created by Russ Lenth, University of Iowa, USA
 Available: http://www.cs.uiowa.edu/~rlenth/StatWeave/
 StatWeave is in its making, but becomming ”mature” and
stable.
 Statweave design goals
 Support many languages
 R, S-plus, SAS, Stata, Maple, …
 Support different word processing systems, currently
 LaTeX
 OpenDocument Text (ODT) www.openoffice.org
 Portability: Usable on all platforms (Written in JAVA)
 Extendible:
 Add other languages
Under the hood of StatWeave
 Source file is regular text document but with
code chunks added (with special tags)
 Two basic operations
 Weaving:
Process source file into single document with code
listings, output listings, graphs…
 Tangling:
Extract code from source file to run later
 Weaving is useful for reproducible statistical
analysis
Running StatWeave
 Command-line interface:
statweave SAS-HelloWorld-swv.odt
statweave --tangle SAS-HelloWorld-swv.odt
statweave --keepall SAS-HelloWorld-swv.odt
 Graphical User Interface:
 Generally, source xxx-swv.odt becomes output
xxx.odt
Chicken weight data
 Set global options (for SAS code)
 Inline evaluation of expressions
… chicken weight data
… chicken weight data
 Output can be saved for later use
 - and display
Code reuse and argument substitution
 Save code chunks for later execution
 Pass arguments to code chunks
 Simplest case: Not unlike a macro…
…code reuse and argument substitution
 Costumize display and
output (tables) by
reusable code chunk
…code reuse and argument substitution
Multi-language example: SAS, R and DOS
together
 Can use different engines in the same source file
 Use SAS when appropriate; use R when appropriate; use
Maple when appropriate…
 Weaving:
 SAS/R/XX chunks assembled into separate code files.
 Code files are processed in order of first appearence in
the source file
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
…Multi-language example: SAS, R and
DOS together
 Synchronization
issue: SAS
chunk depends
on data from R
chunk which
depends on
data from SAS
chunk….
 Solution: The
restart option
will restart the
engines
Code chunks are processed as a whole
 Code chunks are processed as a ”unit” so in
general one can not split a call to proc xxxx over
several chunks:
 Thus the following is illegal
… one exception in SAS: IML
Odds and ends – Maple
 Differentiate y= sin(x) xx
x
 Output is ugly, but it reads:
Odds and ends – calling the shell
 Want to list all StatWeave / Open office source
files: *-swv.odt
Summary




Reproducible statistical analyses
Integrate text, code and results in one document
Several text formats
Several languages
 This talk (and the examples) are avaiable at
http://genetics.agrsci.dk/~sorenh/misc/
 All credit is due to Russ Lenth, the creator of
StatWeave. Thanks!!!!