Transcript Document

R: Statistics? Programme?
and Who are You?
-- An ABC introduction to R
For Fudan University
Presented by
Guohui Ding
R&D, SIBS, CAS
8 Sept, 2004
Main Topics Today
•
•
•
•
•
•
What is R?
How to administrate R?
How does R work?
How to apply R for statistical problem?
How to program your R function?
………
What is R?
A brief history of R
The legend of R
• R started in the early 1990’s as a project by Ross Ihaka
and Robert Gentleman at the University of Auckland,
New Zealand, intended to provide a statistical
environment in their teaching lab. The lab had
Macintosh computers, for which no suitable commercial
environment was available.
Ross Ihaka Robert Gentleman
R’s Parents(1)
• The S language
– S: an interactive environment for data analysis developed at Bell
Laboratories since 1976
– Exclusively licensed by AT&T/Lucent to Insightful Corporation, Seattle
WA. Product name: “S-plus”.
My father is S,
mother is
Scheme, but
why my name
is “R”?
You can learn more from:
http://cm.bell-labs.com/cm/ms/departments/sia/S/history.html
R’s Parents(2)
• The Scheme language
Scheme is a statically scoped and properly tail-recursive dialect of the Lisp
programming language invented by Guy Lewis Steele Jr. and Gerald Jay
Sussman.
Learn more: http://swiss.csail.mit.edu/projects/scheme/
• Scheme’s underlying semantics +
S’syntax = R
“ We have named our language R –in part to acknowledge the
influence of S and in part to celebrate our own efforts.”
-- R. Ihaka
R. Gentleman
-- Ihaka R. & Gentleman R., 1996
R Now
• Since mid-1997 there has been a core
group who can modify the R source
code CVS archive.
• The R package system
CRAN (the Comprehensive
R Archive Network )
http://www.r-project.org
The characters of R
• R is “GNU S” — A language and environment for data
manipula-tion, calculation and graphical display.
– That is R is a Free Software (or Open source software). (Here,
Free refers to freedom, not price, although R is free in that
sense as well.)
• The core of R is an interpreted computer language.
– A mosaic of procedure-based programming and object-oriented
programming
– Good interface to procedures written in C, C++, FORTRAN and
other languages
– A flexible data exchange mechanism accessing
relational databases -ODBC,
PostgreSQL, MySQL and so on.
——小偷与强盗的谈判
R and Statistics
•
•
•
•
Most packages deal with statistics and data analysis.
Powerful statistical graphics.
Well crosstalking with other statistical softwares.
Most R user are statistical experts. You can learn more
modern analysis method from they by email.
• You can do it when you come across a thing no body do it
before.
Install and administrate R
Focus on Windows(MS)
How do I get R?
• The informational web site http://www.r-project.org/
• CRAN - the Comprehensive R Archive Network.
– The primary site is http://cran.r-project.org/ .Mirror sites are available
for many countries.
– CRAN sites have binary distributions for Windows 95, 98, ME, NT4,
2000 and XP on Intel, for the Macintosh (System 8.6 to 9.1 and MacOS
X), and for several Linux distributions.
Down it!
• New releases occur frequently
It is about 20.6M
– about every 3 months.
in size.
Be prepared to re-install
frequently.
• Also you can get it
from your friends,
teachers, etc.
Using Precompiled
Binary Distributions
Installing R
• Double click “rw1091.exe” using your mouse. That is
OK. You can install it as all other standard MS softwares.
R Console/RGui in Windows(MS)
Graphics box
Menu
Icons
Command box
Several concepts in Administrating R
• Workspace
– xxx.RData
• History
– xxx.Rhistory
•
•
•
•
Run your R codes
Package
Object
Load/save workspace
Session
Load/save History
Console Change your working directory
-- Ihaka R. & Gentleman R., 1996
Add a new package
• Commands:
– library()
add a package in the library
– detach(package : xxx)
detach a package
• All can do in the GUI (except detach())
Load a local package
Install packages from
internet or local
Update the local package
from internet
Packages in R Environment
• Basic packages
– "package:methods" "package:stats"
"package:graphics“ "package:utils"
"package:base"
• Recommanded packages
– grid; lattice;e1071…
• Contributed packages (more than 366 packages nowadays)
– ……
You can see what packages
loaded now by the command search().
Don’t lose your way!
• Three useful system command
– getwd()
– setwd()
– list.files()
Get Working Directory
Set Working Directory
List the Files in a Directory/Folder
Show the Demonstrations of the Packages/Functions
• Commands
– demo()
– example()
Demonstrations of R Functionality
Run an Examples Section from the Online Help
Getting Helps
• Several commands
–
–
–
–
help.start()
help() or ?()
help.search()
apropos()
• Internet searching
– I like it very
much. It seems
omnipotence.
Quit R
• Command
– q()
Terminate an R Session
How does R work?
Basic R Structure and data manipulation
Basic R working flow(Object orientation)
package
-- R for Beginners. Emmanuel Paradis
Object orientation
• Object: a collection of atomic variables and/or other
objects that belong together
• Parlance:
–
–
–
–
class: the “abstract” definition of it
object: a concrete instance
method: other word for ‘function’
slot: a component of an object
Types of Data in R
• The basic data object is a vector of elements of type:
– numeric numbers - either floating point or integer
– character each element is a character string
– logical each element is TRUE or FALSE
– list elements can be any type of object, including other lists
• Components of the S language, such as functions, are also
vectors.
• Any vector can include the missing data marker NA as an
element.
• All vectors have a length and a mode. The functions length
and mode return this information as does the str function.
• A structure consists of a data object plus additional
information. Matrices (or arrays, in general) and time series
are examples of structures.
Operators
Vectors, Matrices and Arrays
• Command:
– array(data = NA, dim = length(data), dimnames = NULL)
– matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
Lists
• List vs. Vector
– list: an ordered collection of data of arbitrary types.
– vector: an ordered collection of data of the same type.
– Typically, vector elements are accessed by their index (an integer),
list elements by their name (a character string). But both types
support both access methods.
Factors
• Factors: classification variables
• If the levels of a factor are numeric (e.g. the treatments
are labelled“1”, “2”, and “3”) it is important to ensure
that the data are ctually
stored as a factor and not as numeric data. Always
check this by using summary.
Data frames
• data frame: is supposed to represent the typical data table
that researchers come up with – like a spreadsheet.
– It is a rectangular table with rows and columns; data
within each column has the same type (e.g. number,
text, logical), but different columns may have different
types. ( A list actually)
Subsetting
Individual elements of a vector, matrix, array or data frame are
accessed with “[ ]” by specifying their index, or their name
Using R on Windows(MS)
Basic statistical analysis by R
Data Input
• From the keyboard one by one
– c( ); scan( )
• From the file
– read.table(); read.csv(); read.csv2();
read.dta(); read.spss(); …
• By a spreadsheet
–
–
–
–
data.entry()
edit()
fix()
……
Data Edit
• Commands
– edit()
– fix()
Tips: edit() can invoke
an notepad in the RGui!
Data Discription
• Commands
–
–
–
–
–
–
summary()
mean()
sd()
hist()
boxplot()
……
Probability Distribution
Three useful prefix in Probability Distribution Function
•
•
•
•
dxxx for the density
pxxx for the CDF
qxxx for the quantile function
rxxx for the simulation(random deviates)
They are different!
The seed is set by
the system.
You can set seed yourself
by set.seed().
Statistical Inference
• Commands
–
–
–
–
–
qxxx () for the quantile function
t.test()
wilcox.test(stats)
kruskal.test(stats)
var.test();
shapiro.test();
qqnorm();
qqline()
--……
Analysis of variance and Regression Analysis
• Commands
– anova()
– lm()
– ……
Experiment Design
• Commands
– sample()
– power.t.test()
– ……
Save Object/Data
• Every R object can be stored into and restored from a file
with the commands “save” and “load”.
> save(x, file=“x.Rdata”)
> load(“x.Rdata”)
• Importing and exporting data with rectangular tables in the
form of tab-delimited text files.
> write.table(x, file=“x.txt”, sep=“\t”)
Graphics with R
A Friendly R Environment -- Rcmdr
If you don’t like a command
line environment, package
Rcmdr may be a good choice!
R programming (.R)
Program your R code own
Control Flow
•
•
•
•
•
•
•
if(cond) expr
if(cond) cons.expr else alt.expr
for(var in seq) expr
while(cond) expr
repeat expr
break
next
Loops
• The main loop construct in R is for.
The commonest use, as in C and other
languages, is to count from 1 to n.
– for (i in 1:n) {
## do something
}
Leaving loops
• The break and next commands
allow the flow of a loop to be altered
– break jumps out the loop
– next jumps to the next iteration of the
loop
Avoiding Iteration
• The canonical bad R program looks like this
• ## multiply two vectors
• for(i in 1:n) {
d[i] <- a[i] * b[i]
•}
• ##compute the inner product
• s <- 0
• for (i in 1:n){
• s <- s + d[i]
•}
• The right way to do this is
– s<-sum(a*b)
• apply(); lapply(); sapply()
Write R function
A function definition looks like
median <- function(x, na.rm =
FALSE)
{
…lots of code...
## a return value
}
More
•
•
•
•
•
Packages
Objects and methods
Debugging and optimisation
Connecting to other packages
Interface to other programme language or DataBase
R++? ++R!
Some Resources
• A Course (The ppt is showed with R Development Core Group)
– http://faculty.washington.edu/tlumley/Rcourse/
• A Paper (citing R in a publication)
– Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics.
Journal of Computational and Graphical Statistics 5: 299–314.
• Two URL
– http://www.r-project.org
– http://www.ats.ucla.edu/stat/
• Several Books
–
–
–
–
–
Using R for Data Analysis and Graphics—An Introduction. J.H. Maindonald
An Introduction to R. The R Development Core Team
simpleR –Using R for Introductory Statistics. John Verzani
R for Beginners. Emmanuel Paradis
The R Reference Manual Base Package. The R Development Core Team
Acknowledge
PhD. Qi Liu
Prof. Gang Pei
Prof. Yixue Li
Prof. Naiqing Zhao
Everyone Here
Any Question?