Smoking and Lung Cancer
Download
Report
Transcript Smoking and Lung Cancer
Essential R
Xuhua Xia
[email protected]
http://dambe.bio.uottawa.ca
R language
•
•
•
•
•
•
•
•
Help
Basic utilities
Data types and data structures
File I/O
Data manipulation/transformation
Statistical analysis
Graphic functions
Installation of additional packages
Xuhua Xia
R Help
• Command to get help
–
–
–
–
–
help(funName), help(package=packageName), …
help.search(), ??...
vignette()
args(funName)
RSiteSearch(sTopic)
• R-related web sites:
– http://rseek.org
– http://stackoverflow.com
– http://stats.stackexchange.com
Xuhua Xia
Utility command
•
•
•
•
•
•
•
•
•
•
•
•
•
getwd()
setwd(sSubDir), e.g., setwd("c:/users/xxia")
save.image()
history(), history(100), history(Inf)
x <- .Last.value
search()
library(), library(MASS), detach(package:MASS)
install.packages(), e.g., nortest, outliers
head(x), tail(x)
data()
attach(dataName)
chooseCRANmirror()
Sys.getenv("R_HOME")
Xuhua Xia
Data types and data structures
• Data type and data structures
–
–
–
–
–
Integer, Numeric, Boolean(TRUE,FALSE)
Factor
Character
Sequence, e.g., 1:9, 2:(n-1), seq(from=1, to=100, by=2|length.out(10))
Array:
•
•
•
•
•
myVec <- c(1,1,2,3,4,4,5,6), or c(TRUE, FALSE, FALSE, TRUE)
myVec <- c("my name","your name")
myVec[2], myVec[1:4], myVec[c(1,3,4)], myVec[-3], myVec[-(1:3)]
v[v > mean(v)], v[!is.na(v) & !is.null(v)]
names(v) <- c("Ottawa", "Toronto", "Kingston")
– Matrix: a vector with dimensions, A <- 1:6, dim(A) <- c(2,3), matrix(vec,2,3)
– List, Data frame:
• mydf <- data.frame()
• mydf <- edit(mydf)
• mydf <-data.frame(label = c("Low", "Mid", "Hi"), lb=c(1,2,3),ub=c(4,5,6))
• R commands to check data types and data structures
– Class
– Mode
• Data manipulation is better done with EXCEL or the like.
Xuhua Xia
File I/O
• Input
– scan(), e.g., scan(sFileName,skip=3,comment.char="#")
– read.table(sFileName,header=TRUE), read.fwf(widths=c(3,5,…), read.csv
– readLines(), e.g., readLines(sFileName,NumLines), readLines(sFileName,
-1)
– load("myData.Rdata")
• Output
–
–
–
–
–
–
writeLines(): writeLines(sText, sFile, sep="\n")
write.table(): write.table(DFr, sFileName)
cat(sText1,sText2, …, file="filename")
sink("filename") … sink()
print
save(myData, file="myData.RData)
• Related
– cat, print
– paste0()
– ls(), rm(), rm(list=ls(all=TRUE))
Xuhua Xia
Descriptive statistics
• mean(vector, na.rm=TRUE), median, sd, var, SE,
CV, skewness, kurtosis, 95%CL, …
• Graphic:
– hist(x,n)
– plot(density(x))
Xuhua Xia
Distribution
•
•
•
•
normal: rnorm, dnorm, pnorm, qnorm
t: rt, dt, pt, qt
ad.test(y) in nortest package.
grubbs.test(y) in outliers package
Xuhua Xia
Graphics
• par(mfrow=c(2,2)): set a canvas for four graphs
• plot(x,y,xlab="",ylab="",type="l"): default type is
scatterplot
• histogram:
– hist(inVec,xlab=""), hist(inVec,xlab="",freq=FALSE): y is
density instead of frequency,
curve(dnorm(inVec,mean,sd,add=TRUE,col="blue"):
overlay the expected normal curve on the histogram
• qqnorm(y),qqline(y,col="blue")
Xuhua Xia