Design and Data Analysis in Cancer Research

Download Report

Transcript Design and Data Analysis in Cancer Research

R Programming Language
R 程式語言
林
建
甫
C.F. Jeff Lin, MD. PhD.
台 北 大 學 統 計 系 助 理 教 授
台 北 榮 民 總 醫 院 生 物 統 計 顧 問
美 國 密 西 根 大 學 生 物 統 計 博 士
2015/7/20
Jeff Lin, MD. PhD.
1
R: 物件導向程式語言
R as Objective-Oriented Language
2015/7/20
Jeff Lin, MD. PhD.
2
R: 基本 (R Basics)
•
•
•
•
•
•
•
2015/7/20
物件命名與指派 (Naming and Assign)
變數類型 (Type of Variables)
缺失值 (Missing Values)
資料輸入與指派 (Assignment and Input Data)
函數 (Functions)
工作路徑 (Workspace)
歷史紀錄 (History)
Jeff Lin, MD. PhD.
3
物件命名 Naming Convention
•
•
•
必須以英文字母起始 (A-Z 或 a-z)
中間可以任何文字與數字, 點 (periods) “.”
大小寫有差別
– mydata 與 MyData 不同
•
2015/7/20
不可使用下線 (do not use use underscore “_”)
Jeff Lin, MD. PhD.
4
物件指派
<- 指派
= (避免使用)
> xyz.vector <- c(1, 2, 3)
2015/7/20
Jeff Lin, MD. PhD.
5
未完成的指令或程式
R 出現 “+”
•
•
> sqrt(
若鍵入 “return”, 而 R
+
出現 “+” , 表示輸入指 + )))))
令未完成, 若查不出錯誤
Error in parse(text = txt): Syntax error:
處, 可議連續鍵入
No opening parenthesis, before
“return”, 直到 R 出現
")" at this point:
“>” 符號.
sqrt(
如此再重新輸入 R 指令 ))
Dumped
或程式.
> sqrt(100)
2015/7/20
Jeff Lin, MD. PhD.
6
物件 Objects
•
•
物件名稱 (names)
物件種類 (Type of Variables):
–
•
向量, 因子, 陣列, 矩陣, 資料框架, 時間序列, 列表.
(vector, factor, array, matrix, data.frame, ts, list)
屬性 (attributes)
–
–
模式 (mode): 邏輯, 整數, 倍精準度, 單精準度, 複數, 文
字. (numeric, character, complex, logical)
長度 (Length):與物件的模式有關
•
2015/7/20
物件產生 (creation): 指派數值或空物件
Jeff Lin, MD. PhD.
7
基本模式變數命名與指派
> a <- 49
> sqrt(a)
[1] 7
數值 numeric
> b <- "The dog ate my homework"
文字與字串
> sub("dog","cat",b)
[1] "The cat ate my homework" character
string
> x <- (1+1==3)
> x
邏輯 logical
[1] FALSE
> as.character(b)
[1] "FALSE"
2015/7/20
Jeff Lin, MD. PhD.
8
基本模式變數命名與指派
•
•
2015/7/20
•
Character 文字與字串
Logical (邏輯)
> a <- "1"; b <> x <- T; y <1
F
> a; b
> x; y
[1] "1"
[1] TRUE
[1] 1
[1] FALSE
> a <"character"
Numerical (數值)
> b <- "a"; x <> a <- 5; b <a
sqrt(2)
> a; b; x
> a; b
[1] "character"
[1] 5
[1] "a"
Jeff Lin, MD. PhD.
9
物件指派 Assignment
“<-” used to indicate assignment
> x<-c(1,2,3,4,5,6,7)
> x<-c(1:7)
> x<-1:4
2015/7/20
Jeff Lin, MD. PhD.
10
2015/7/20
Jeff Lin, MD. PhD.
11
算數操作 (Arithmetic Operator)
Simple operations
• Add: > 10 + 20
• Multiply: > 10 * 20
• Divide: > 10/20
• Raise to a power: > 10 ** 20
• Modulo: > 10%%20
• Integer division: > 10%/%4
2015/7/20
Jeff Lin, MD. PhD.
12
R: 邏輯操作 與 關係比較操作
•
•
•
•
•
•
•
•
•
•
2015/7/20
==
!=
<
>
<=
>=
is.na(x)
&
|
!
Equal to
Not equal to
Less than
Greater than
Less than or equal to
Greater than or equal to
Missing?
Logical AND
Logical OR
Logical NOT
Jeff Lin, MD. PhD.
13
缺失值 NA, NaN, 與 Null
•
NA 或 “Not Available”
–
•
NaN 或 “Not a Number”
–
•
可用在許多模式 (modes) – character, numeric, etc.
只用在數值模式 (numeric modes)
NULL
列表 (lists) 的長度為 0 (zero length)
2015/7/20
Jeff Lin, MD. PhD.
14
缺失值 Missing Values, NA, NaN, 與 Null
•
NA or “Not Available”
–
•
NaN or “Not a Number”
–
•
Applies only to numeric modes
NULL
–
2015/7/20
Applies to many modes – character, numeric, etc.
Lists with zero length
Jeff Lin, MD. PhD.
15
缺失值 Missing Values
•
•
2015/7/20
NA “not available”
> x <- c(1, 2, 3, NA)
> x + 3
[1] 4 5 6 NA
非數字 “Not a number”
> log(c(0, 1, 2))
[1]
-Inf 0.0000000 0.6931472
> 0/0
[1] NaN
Jeff Lin, MD. PhD.
16
R: 缺失值 Missing Values
•
NA 或 “Not Available”
–
–
–
NA 不是 0
NA 不是 “ “ (空格, 或 空字串)
NA不是 FALSE
•
任何與 NA 的計算, 可能或不可能產生 NA
•
> 1+NA
[1] NA
> max(c(NA, 4, 7))
[1] NA
> max(c(NA, 4, 7), na.rm=T)
[1] 7
2015/7/20
Jeff Lin, MD. PhD.
17
R: 統計分析常見物件型態
Common Object Types for Statistics
2015/7/20
Jeff Lin, MD. PhD.
18
Types of Objects
• 向量 Vector
• 矩陣 Matrix
• 陣列 Array
• 列表 List
• 因子 Factor
• 時間序列 Time series
• 資料框架 Data frame
• 函式 Function
> typeof(物件名稱) 可已回傳物件型態
2015/7/20
Jeff Lin, MD. PhD.
19
Mode 物件結構 (模式)
•
模式 Mode
–
–
原型模式 Atomic Mode:
logical, numeric, complex 或 character
遞迴型 Recursive Mode
list, graphics, function, expression, call ..
> mode() 指令可以用來查看物件的模式
2015/7/20
Jeff Lin, MD. PhD.
20
物件長度 (Length)
•
長度 Length
–
–
–
–
2015/7/20
vector: number of elements
matrix, array: product of dimensions
list: number of components
data frame: number of columns
Jeff Lin, MD. PhD.
21
物件屬性 (Attribute)
•
屬性 Attributes
–
–
–
列位名 (row name),
欄位名 (column name),
維度 (dimension).
> row.names()
> names()
> str()
2015/7/20
Jeff Lin, MD. PhD.
22
物件類別 (Class)
•
類別 Class
–
–
2015/7/20
物件的類別方便 {R} 進行程式寫作.
類別可以讓 {R} 得知物件的特殊性, 使用特別的
方法進行操作; 例如, 有一個物件, 其類別是資料
框架, 則此物件會以特別形式列印.
Jeff Lin, MD. PhD.
23
向量 Vector
•
•
•
•
•
•
•
2015/7/20
指包含相同 ``模式'' 的元素 (element) 組成序列.
主要有 6 種基本模式 (mode)
logical, integer, double, single, complex, and
character.
(邏輯, 整數, 倍精準度, 單精準度, 複數, 文字).
向量是具有相同基本類型的元素序列,
大體相當於其他語言中的 1-維度數列,
在 R 中, 單一數值 ( scalar) 也可看成是長度為 1 的
向量.
Jeff Lin, MD. PhD.
24
向量 Vector
> a <- c(1,2,3)
> a*2
[1] 2 4 6
向量的產生最常用辦法是使用函式 c(), 它把若干個
數值或字串組合為一個向量,
2015/7/20
Jeff Lin, MD. PhD.
25
向量運算操作
• 算數操作 (arithmetic operator) 符號包含
• +, -, *, /, ^, %%, %/%, %*%, %o%, %x% 等.
• 通常其含意是對向量的每一個元素進行運算
的 ``單元運算子'' (unary) 或 ``二元運算子''
(binary),如同一般算數運用在向量.
• 向運有 “長度”, 但不具有 “維度”.
2015/7/20
Jeff Lin, MD. PhD.
26
向量 Vectors
> Mydata <- c(2,3.5,-0.2)
#
> Colors <- c(“Red”,“Green”,“Red”) # 文字
> x1 <- 25:30
> x1
[1] 25 26 27 28 29 30
# 數字序列
> Colors[2]
[1] “Green”
# 單一元素
> x1[3:5]
[1] 27 28 29
#多各元素
2015/7/20
Jeff Lin, MD. PhD.
27
2015/7/20
Jeff Lin, MD. PhD.
28
2015/7/20
Jeff Lin, MD. PhD.
29
2015/7/20
Jeff Lin, MD. PhD.
30
向量 Vectors
> x <- c(5.2, 1.7, 6.3)
> log(x)
[1] 1.6486586 0.5306283 1.8405496
> y <- 1:5
> z <- seq(1, 1.4, by = 0.1)
> y + z
[1] 2.0 3.1 4.2 5.3 6.4
> length(y)
[1] 5
> mean(y + z)
[1] 4.2
2015/7/20
Jeff Lin, MD. PhD.
31
2015/7/20
Jeff Lin, MD. PhD.
32
向量運算操作
> Mydata
[1] 2 3.5 -0.2
> Mydata > 0
[1] TRUE TRUE FALSE
> Mydata[Mydata>0]
[1] 2 3.5
> Mydata[-c(1,3)]
[1] 3.5
2015/7/20
•
邏輯檢測
•
抽出某一元素
•
刪除元素
Jeff Lin, MD. PhD.
33
向量運算操作
> x <- c(5,-2,3,-7)
> y <- c(1,2,3,4)*10
> y
[1] 10 20 30 40
> sort(x)
[1] -7 -2 3 5
> order(x)
[1] 4 2 3 1
# 對所有元素操作
# 重新排序
# 排順序後的位置
> y[order(x)]
[1] 40 20 30 10
> rev(x)
[1] -7 3 -2 5
2015/7/20
#
# 反向
Jeff Lin, MD. PhD.
34
c() & rev()
> c(1,3,5,7)
[1] 1 3 5 7
> rev(c(1,3,5,7))
[1] 7 5 3 1
2015/7/20
Jeff Lin, MD. PhD.
35
length(), mode() & names()
> x<-c(1,3,5,7)
> length(x)
[1] 4
> mode(x)
[1] "numeric"
> names(x)
NULL
2015/7/20
Jeff Lin, MD. PhD.
36
seq()
•
seq() 產生數字序列
> seq(1:5)
[1] 1 2 3 4 5
> seq(5,1,by=-1)
[1] 5 4 3 2 1
> seq(5)
[1] 1 2 3 4 5
2015/7/20
Jeff Lin, MD. PhD.
37
seq()
> 1.1:5
[1] 1.1 2.1 3.1 4.1
> 4:-5
[1] 4 3 2 1 0 -1 -2 -3 -4 -5
> seq(-1,2,0.5)
[1] -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
> seq(1,by=0.5,length=5)
[1] 1.0 1.5 2.0 2.5 3.0
2015/7/20
Jeff Lin, MD. PhD.
38
rep()
•
rep() replicates elements
> rep(1,5)
[1] 1 1 1 1 1
> rep(1:2,3)
[1] 1 2 1 2 1 2
> rep(1:2,each=3)
[1] 1 1 1 2 2 2
> rep(1:2,each=3,len=4)
[1] 1 1 1 2
> rep(1:2,each=3,len=7)
[1] 1 1 1 2 2 2 1
> rep(1:2,each=3,time=2)
[1] 1 1 1 2 2 2 1 1 1 2 2 2
2015/7/20
Jeff Lin, MD. PhD.
39
sort() & rank()
> x<-c(8,6,9,7)
> sort(x)
[1] 6 7 8 9
> rank(x)
[1] 3 1 4 2
> rank(x)[1]
[1] 3
> x[rank(x) = =1]
[1] 6
> x[rank(x)]
[1] 9 8 7 6
2015/7/20
Jeff Lin, MD. PhD.
40
rank() & order()
> x<-c(8,6,9,7)
> order(x)
[1] 2 4 1 3
> rank(x)
[1] 3 1 4 2
> x[order(x)]
[1] 6 7 8 9
> x[rank(x)]
[1] 9 8 7 6
2015/7/20
Jeff Lin, MD. PhD.
41
矩陣 Matrix
•
•
2015/7/20
矩陣由包含相同的元素組成的 2-維 (2dimension) 資料物件
可由 matrix()產生
> x<-matrix(data=0,nr=2,nc=2)
> x<-matrix(0,2,2)
Jeff Lin, MD. PhD.
42
矩陣下標
> x <- c("a", "b", "c", "d", "e",
"f", "g", "h")
> x[1]
> x[3:5]
> x[-(3:5)]
> x[c(T, F, T, F, T, F, T, F)]
> x[x <= "d"]
> m[,2]
> m[3,]
2015/7/20
Jeff Lin, MD. PhD.
43
Generate a Matrix
> xmat<-matrix(1:12,nrow=3,byrow=T)
> xmat
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
> length(xmat)
[1] 12
> dim(xmat)
[1] 3 4
> mode(xmat)
[1] "numeric"
> names(xmat)
NULL
> dimnames(xmat)
NULL
2015/7/20
Jeff Lin, MD. PhD.
44
Generate a Matrix
> matrix(0,3,3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
2015/7/20
Jeff Lin, MD. PhD.
45
Generate a Matrix
> dimnames(xmat)<-list(c("A","B","C"), c("W","X","Y","Z"))
> dimnames(xmat)
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "W" "X" "Y" "Z"
> xmat
W X Y Z
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
2015/7/20
Jeff Lin, MD. PhD.
46
Diagonal Element of a Matrix
> m <- matrix(1:12, 4, byrow = T)
>m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
> diag(m)
[1] 1 5 9
2015/7/20
Jeff Lin, MD. PhD.
47
Diagonal Element of a Matrix
> diag(k)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
2015/7/20
Jeff Lin, MD. PhD.
48
Inverse of Matrices
> m<-matrix(c(1,3,5,,9,11,13,15,19,21),3,byrow=T)
>m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 9 11 13
[3,] 15 19 21
> solve(m)
[,1] [,2] [,3]
[1,] -0.5000 1.0000 -0.5
[2,] 0.1875 -1.6875 1.0
[3,] 0.1875 0.8125 -0.5
2015/7/20
Jeff Lin, MD. PhD.
49
rbind() & cbind()
> x<-c(1,2,3)
> y<-matrix(0,3,3)
> rbind(y,x)
[,1] [,2] [,3]
0 0 0
0 0 0
0 0 0
x 1 2 3
> cbind(y,x)
x
[1,] 0 0 0 1
[2,] 0 0 0 2
[3,] 0 0 0 3
2015/7/20
Jeff Lin, MD. PhD.
50
Multiplication
> x<-matrix(1:4,2,byrow=T)
> y<-matrix(1:4,2,byrow=T)
> x*y
# element wise
[,1] [,2]
[1,] 1 4
[2,] 9 16
> x%*%y #
[,1] [,2]
[1,] 7 10
[2,] 15 22
2015/7/20
Jeff Lin, MD. PhD.
51
> x%o%x
, , 1, 1
[,1] [,2]
[1,] 1 2
[2,] 3 4
, , 2, 1
[,1] [,2]
[1,] 3 6
[2,] 9 12
, , 1, 2
[,1] [,2]
[1,] 2 4
[2,] 6 8
, , 2, 2
[,1] [,2]
[1,] 4 8
[2,] 12 16
2015/7/20
Jeff Lin, MD. PhD.
52
Array
Arrays are generalized matrices by extending the function dim() to mor thantwo
dimensions.
> xarr<-array(c(1:8,11:18,111:118),dim=c(2,4,3)) # row, col, array
> xarr
,,1
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
,,2
[,1] [,2] [,3] [,4]
[1,] 11 13 15 17
[2,] 12 14 16 18
,,3
[,1] [,2] [,3] [,4]
[1,] 111 113 115 117
[2,] 112 114 116 118
2015/7/20
Jeff Lin, MD. PhD.
53
列表 Lists
2015/7/20
Jeff Lin, MD. PhD.
54
列表 Lists
•
•
•
•
•
2015/7/20
列表是一個特殊的 ``向量'', 這特殊的向量中
的元素是物件.
因此列表物件是由資料物件有順序組成,
列表物中的 ``元素'', 稱作 ``成份''
( component) 是物件本身.
是有順序的 (order sequence).
成份物件的元素模式, 沒有任合限制, 每一個
別成份的物件之原型模式可以不相同.
Jeff Lin, MD. PhD.
55
列表 Lists
> doe <- list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28
> doe[[3]]
[1] FALSE
列表由 “$” 抽取元素
2015/7/20
Jeff Lin, MD. PhD.
56
列表 Lists
vector: 指包含相同 ``模式'' 的元素 (element) 組成序列.
> a = c(7,5,1)
> a[2]
[1] 5
list:列表是一個特殊的 ``向量'', 這特殊的向量中的元素
是物件.
> doe =
list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28
2015/7/20
Jeff Lin, MD. PhD.
57
列表 Lists
> my.list <- list(c(5,4,1),c("X1","X2","X3"))
> my.list
[[1]]:
[1] 5 4 -1
[[2]]:
[1] "X1" "X2" "X3"
> my.list[[1]]
[1] 5 4 -1
> my.list <- list(c1=c(5,4,1),c2=c("X1","X2","X3"))
> my.list$c2[2:3]
[1] "X2" "X3"
2015/7/20
Jeff Lin, MD. PhD.
58
列表 Lists
> x.mat
[,1] [,2]
[1,]
3
-1
[2,]
2
0
[3,]
-3
6
> dimnames(x.mat) <list(c("L1","L2","L3"), c("R1","R2"))
> x.mat
R1 R2
L1 3 -1
L2 2 0
L3 -3 6
2015/7/20
Jeff Lin, MD. PhD.
59
Lists, Factors and Data Frames
2015/7/20
Jeff Lin, MD. PhD.
60
因數 或 因子 Factor and factor()
•
•
•
•
•
•
•
2015/7/20
處理類別資料, 提供的一種有效的方法.
因為統計中的離散變數 (discrete variable)
名義變數 (nominal variable) 與 有序變數 (ordinal
variable).
因素 / 因子 是一種特殊的文字向量, 文字向量中每
一個元素, 取一個離散值.
因素物件有一個特殊屬性 層次 / 水平 / 水準 (levels)
表示這組所有的離散值.
因數可以簡單地用函式 factor() 產生.
因素 / 因子 是用 ``文字/字串‘’ 輸入, 一但設定
為因素 / 因子向量,{R} 列印時, 並不會加上雙引號
{"}.
Jeff Lin, MD. PhD.
61
因數 或 因子 Factor and factor()
> gender<-c("male","female",
"male","male","female","female")
> gender
[1] "male" "female" "male" "male" "female"
"female"
> factor(gender)
[1] male female male male female female
Levels: female male
2015/7/20
Jeff Lin, MD. PhD.
62
factor() and levels()
intensity<factor(c("Hi","Med","Lo","Hi","Lo","Med",
"Lo","Hi","Med"))
> intensity
[1] Hi Med Lo Hi Lo Med Lo Hi Med
Levels: Hi Lo Med
2015/7/20
Jeff Lin, MD. PhD.
63
factor() and levels()
> intensity<factor(c("Hi","Med","Lo","Hi","Lo","Med",
"Lo","Hi","Med"), levels=c("Hi","Med","Lo"))
> intensity
[1] Hi Med Lo Hi Lo Med Lo Hi Med
Levels: Hi Med Lo
2015/7/20
Jeff Lin, MD. PhD.
64
factor() and levels()
intensity<factor(c("Hi","Med","Lo","Hi","Lo","Med",
"Lo","Hi","Med"), levels=c("Hi","Med","Lo"),
labels=c("HiDOse","MedDOse","LoDose"))
> intensity
[1] HiDOse MedDOse LoDose HiDOse LoDose
MedDOse LoDose HiDOse MedDOse
Levels: HiDOse MedDOse LoDose
2015/7/20
Jeff Lin, MD. PhD.
65
factor(), ordered() and levels()
intensity<ordered(c("Hi","Med","Lo","Hi","Lo","Med",
"Lo","Hi","Med"))
> intensity
[1] Hi Med Lo Hi Lo Med Lo Hi Med
Levels: Hi < Lo < Med
Oooooop! This is not what you want!
2015/7/20
Jeff Lin, MD. PhD.
66
factor(), ordered() and levels()
intensity<ordered(c("Hi","Med","Lo","Hi","Lo","Med",
"Lo","Hi","Med"), levels=c("Lo","Med", "Hi"))
> intensity
[1] Hi Med Lo Hi Lo Med Lo Hi Med
Levels: Lo < Med < Hi
Ordinal Variable!
2015/7/20
Jeff Lin, MD. PhD.
67
Lists, Factors and Data Frames
2015/7/20
Jeff Lin, MD. PhD.
68
Data Frames
data frame: represents a spreadsheet.
Rectangular table with rows and columns; data
within each column has the same type (e.g.
number, text, logical), but different columns may
have different types.
...
2015/7/20
Jeff Lin, MD. PhD.
69
Data Frames
# R data ToothGrowth
# The Effect of Vit. C on Tooth Growth in Guinea Pigs
> ToothGrowth
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
…………………
58 27.3 OJ 2.0
59 29.4 OJ 2.0
60 23.0 OJ 2.0
2015/7/20
Jeff Lin, MD. PhD.
70
Data Frames
•
•
•
2015/7/20
A data frame is a list with class “data.frame”.
There are restrictions on lists that may be
made into data frames.
a. The components must be vectors
(numeric, character, or logical), factors,
numeric matrices, lists, or other data frames.
b. Matrices, lists, and data frames provide
as many variables to the new data frame as
they have columns, elements, or variables,
respectively.
Jeff Lin, MD. PhD.
71
Data Frames
•
•
2015/7/20
c. Numeric vectors and factors are
included as is, and non-numeric vectors are
coerced to be factors, whose levels are the
unique values appearing in the vector.
d. Vector structures appearing as variables
of the data frame must all have the same
length, and matrix structures must all have
the same row size.
Jeff Lin, MD. PhD.
72
Data Frame
•
•
•
several modes allowed within a single data frame
can be created using data.frame()
L<-LETTERS[1:4] #A B C D
x<-1:4
#1 2 3 4
data.frame(x,L) #create data frame
attach() and detach()
–
–
2015/7/20
the database is attached to the R search path so that
the database is searched by R when it is evaluating
a variable.
objects in the database can be accessed by simply
giving their names
Jeff Lin, MD. PhD.
73
Data Elements
•
select only one element
– x[2]
•
select range of elements
– x[1:3]
•
select all but one element
– x[-3]
•
slicing: including only part of the object
– x[c(1,2,5)]
•
select elements based on logical operator
– x(x>3)
2015/7/20
Jeff Lin, MD. PhD.
74
Subsetting
Individual elements of a vector, matrix, array or
data frame are accessed with “[ ]” by specifying
their index, or their name
> ToothGrowth[1:3,]
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
> ToothGrowth[1:2,1:2]
len supp
1 4.2 VC
2 11.5 VC
2015/7/20
Jeff Lin, MD. PhD.
75
Labels in Data Frames
> labels(ToothGrowth)
[[1]]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
[13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"
[25] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36"
[37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48"
[49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60"
[[2]]
[1] "len" "supp" "dose"
2015/7/20
Jeff Lin, MD. PhD.
76
Finding out about a data object
mode ( ): tells you the storage ‘mode’ of an object (i.e.
whether it is a numeric vector, or a list etc.)
attributes ( ): provides information about the data object
class( ): provides informaiton about the object’s class.
The class of an object often determines how the data
object is handled by a function.
You can also set the object’s mode, attributes or class
using the above functions.
e.g. mode (x) <- “numeric”
2015/7/20
Jeff Lin, MD. PhD.
77
What type is my data?
class
Class from which object inherits
(vector, matrix, function, logical, list, … )
mode
Numeric, character, logical, …
storage.mode Mode used by R to store object (double,
typeof
is.function
is.na
names
dimnames
slotNames
attributes
2015/7/20
integer, character, logical, …)
Logical (TRUE if function)
Logical (TRUE if missing)
Names associated with object
Names for each dim of array
Names of slots of BioC objects
Names, class, etc.
Jeff Lin, MD. PhD.
78
Data Import & Entry
2015/7/20
Jeff Lin, MD. PhD.
79
Topics
•
•
•
•
•
•
Datasets that come with R
Inputting data from a file
Writing data to a file
Writing data to the clipboard
Exchanging data between programs
NB: saving the workspace
2015/7/20
Jeff Lin, MD. PhD.
80
R comes with several pre-packaged datasets
You can access these datasets with the data function
data ( ) gets you a list of all the datasets
data (Titanic) loads a dataset about passengers
on the Titanic (for example)
summary (Titanic) provides some summary information
about the dataset Titanic
attributes(Titanic) provides some more information
Typing the dataset name on its own (followed by Enter)
will display the data
2015/7/20
Jeff Lin, MD. PhD.
81
Data
>summary(data)
>names(data)
>attributes(data)
Editing data
>fix(data) or >edit(data)
>data$var
>attach(data)
>detach(data)
2015/7/20
# in order to remove need of ‘$’
Jeff Lin, MD. PhD.
82
Data Entry & Editing
•
start editor and save changes
– data.entry(x)
•
start editor, changes not saved
– de(x)
•
start text editor
– edit(x)
2015/7/20
Jeff Lin, MD. PhD.
83
The attach and detach functions
The attach function makes all the objects in a list or
data frame accessible from outside the list or data frame.
E.g. instead of typing my_list$age to access the vector
‘age’ in the list my_list you can just type ‘age’
(provided there is no other vector called ‘age’ in the
main workspace).
The detach function undoes this
2015/7/20
Jeff Lin, MD. PhD.
84
Importing Data
• read.table()
–
reads in data from an external file
• data.entry()
–
create object first, then enter data
• c()
–
concatenate
• scan()
– prompted data entry
• R has ODBC for connecting to other programs
2015/7/20
Jeff Lin, MD. PhD.
85
Importing Data
> # Data Managements
> setwd("C://temp//Rdata")
> DMTKRtable<-read.table("DMTKRcsv.csv",
header=TRUE, row.names=NULL, sep=",", dec=".")
> DMTKRtable
2015/7/20
Jeff Lin, MD. PhD.
86
2015/7/20
Jeff Lin, MD. PhD.
87
Importing Data
> setwd("C://temp//Rdata")
> DMTKRcsv<-read.csv("DMTKRcsv.csv",
header = TRUE, sep = ",", dec=".")
> DMTKRcsv
> attach(DMTKRcsv)
> scan(file = "DMTKRcsv.csv", skip=1, sep = ",",
dec = ".")
2015/7/20
Jeff Lin, MD. PhD.
88
Loading
•
Stata, SPSS, SAS files
–
•
Library(foreign)
• Stata: read.dta
• SPSS: read.spss
• SAS: read.xport (must first create export file in SAS)
Excel files
–
–
•
•
•
Files must be saved as comma separated value or .csv
read.table, read.csv, read.csv2: identical except for
defaults
Watch the direction of ‘/’!
>load(“.Rdata”)
Loading and running R programs
•
2015/7/20
>source(“.R”)
Jeff Lin, MD. PhD.
89
Writing data to a file (the write and
write.table functions)
Change directory on the file menu then
write ( q, file = “filename”, ncol = 2)
(for vector, ncol specifies the number of columns
in output)
write.table (q, file = “filename” )
(works quite well for a data frame)
as always there are many optional arguments
2015/7/20
Jeff Lin, MD. PhD.
90
Exporting Data
> #write data out
> cat("2 3 5 7", "11 13 17 19", file="ex.dat", sep="\n")
# Read in ex.dat again
> scan(file="ex.dat", what=list(x=0, y="", z=0),
flush=TRUE)
df <- data.frame(a = I("a \" quote"))
write.table(df)
write.table(df, qmethod = "double")
write.table(df, quote = FALSE, sep = ",")
2015/7/20
Jeff Lin, MD. PhD.
91
Thanks !
2015/7/20
Jeff Lin, MD. PhD.
92