Transcript R Basics

R Basics
Xudong Zou Prof. Yundong Wu Dr. Zhiqiang Ye
18th Dec. 2013
1
R Basics
 History of R language
 How to use R
 Data type and Data Structure
 Data input
 R programming
 Summary
 Case study
2
History of R language
3
4
Robert
Gentleman
Ross Ihaka
5
History of R language
6
History of R language
7
History of R language
8
History of R language
9
History of R language
10
History of R language
11
History of R language
12
History of R language
13
History of R language
14
History of R language
15
History of R language
16
2013-09-25:
Version: R-3.0.2
17
History of R language
18
History of R language
19
History of R language
20
History of R language
21
History of R language
22
History of R language
5088
23
What is R?
• R is a programming language, and also a environment for statistics
analysis and graphics
Why use R
• R is open and free. Currently contains 5088 packages that makes R
a powerful tool for financial analysis, bioinformatics, social network
analysis and natural language process and so on.
• More and more people in science tend to learn and use R
# BioConduct: bioinformatics analysis(microarray)
# survival: Survival analysis
How to use R
从这里输
入命令
控制台
How to use R
新建或打
开R脚本
?用来获
取帮助
点这里添
加R包
Data type and Data structure
Data type in R :
numeric :
character
complex
logical
integer, single float, double float
Data structure in R:
Objects
Class
Mixed-class permitted?
Vector
numeric, char, complex, logical
no
Factor
numeric, char
no
Array
numeric, char, complex, logical
no
Matrix
numeric, char, complex, logical
no
Data frame
numeric, char, complex, logical
yes
list
numeric, char, complex, logical, func, exp…
yes
Vector and vector operation
Vector is the simplest data structure in R, which is a single entity containing a
collection of numbers, characters, complexes or logical.
注意这个向
左的箭头
# Create two vectors:
# Check the attributes:
# basic operation on vector:
28
Vector and vector operation
# basic operation on vector:
> max( vec1)
> min (vec1)
> mean( vec1)
> median(vec1)
> sum(vec1)
> summary(vec1)
> vec1
> vec1[1]
> x <- vec1[-1] ; x
[1]
> vec1[7] <- 15;vec1
29
array and matrix
An array can be considered as a multiply subscripted collection of data entries.
> x <- 1:24
> dim( x ) <- c( 4,6) # create a 2D array with 4 rows and 6 columns
> dim( x ) <- c(2,3,4) # create a 3D array
30
array and matrix
array()
> x <- 1:24
> array( data=x, dim=c(4,6))
> array( x , dim= c(2,3,4) )
array indexing
> x <- 1:24
> y <- array( data=x, dim=c(2,3,4))
> y[1,1,1]
> y[,,2]
> y[,,1:2]
31
array and matrix
Matrix is a specific array that its dimension is 2
> class(potentials)
> dim(potentials)
> rownames(potentials)
> colnames(potentials)
> min(potentials)
# “matrix”
# 20 20
# GLY ALA SER …
# GLY ALA SER …
# -4.4
32
list
List is an object that containing other objects as its component which can be a
numeric vector, a logical value, a character or another list, and so on. And the
components of a list do not need to be one type, they can be mixed type.
>Lst <- list(drugName="warfarin",no.target=3,price=500,
+ symb.target=c("geneA","geneB","geneC")
>length(Lst) # 4
>attributes(Lst)
>names(Lst)
>Lst[[1]]
>Lst[[“drugName”]]
>Lst$drugName
33
Data Frame
A data frame is a list with some restricts:
① the components must be vectors, factors, numeric matrices, lists or other
data frame
② Numeric vectors, logicals and factors are included as is, and by default
character vectors are coerced to be factors, whose levels are the unique values
appearing in the vector
③ Vector structures appearing as variables of the data frame must all have
the same length, and matrix structures must all have the same row size
Names of
components
34
Data Frame
> names(cars)
[1] "Plant" "Type" "Treatment" "conc" "uptake“
> length(cars) # 2
> cars[[1]]
> cars$speed # recommended
> attach(cars) # ?what’s this
> detach(cars)
> summary(cars$conc) # do what we can do for a vector
35
Data Input
scan(file, what=double(), sep=“”, …)
# scan will return a vector with data type the same as the what give.
read.table(file, header=FALSE, sep= “ ”, row.names, col.names, …)
# read.table will return a data.frame object
# my_data.frame <- read.table("MULTIPOT_lu.txt",row.names=1,header=TRUE)
From other software
# from SPSS and SAS
library(Hmisc)
mydata <- spss.get(“test.file”,use.value.labels=TRUE)
mydata <- sasxport.get(“test.file”)
#from Stata and systat
library(foreign)
mydata<- read.dta(“test.file”)
mydata<-read.systat(“test.file”)
# from excel
library(RODBC)
channel <- odbcConnectExcel(“D:/myexcel.xls”)
mydata <- sqlFetch(channel, “mysheet”)
odbcclose(channel)
load
package
36
Operators
37
R Programming
Control Statements
# repeat {…}
# switch( statement, list)
38
R Programming
Function
Definition:
Example:
matrix.axes <- function(data) {
x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);
axis(side=1, at=x, labels=rownames(data), las=2);
x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);
axis(side=2, at=x, labels=colnames(data), las=2);
}
39
Summary
numeric, character, complex, logical
Data type and Data
Structure
vector, array/matrix, list, data frame
scan, read.table
Data Input
load from other software: SPSS, SAS, excel
Operators :
<-
R Programming:
40
Case study
Residue based Protein-Protein Interaction potential analysis:
Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein
Interactions, Biophysical Journal 84(3), p1895-1901
41
Reference
CRAN-Manual:http://cran.r-project.org/
Quick-R:http://www.statmethods.net/index.html
R tutorial:http://www.r-tutor.com/
MOAC:
http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/matrix_cont
our/
42
43