Multiple regression model in R

Download Report

Transcript Multiple regression model in R

Multiple Regression Model in
經濟四 盧蘇士
地政四 蔡亞倫
R
目錄
R studio
複回歸基本介紹
R 相關指令
作業
Page 2
R studio
RStudio is an integrated development
environment (IDE) for R.
It includes a console, syntax-highlighting editor
that supports direct code execution, as well as
tools for plotting, history, debugging and
workspace management.
http://www.rstudio.com/products/rstudio/download/
Page 3
Page 4
Multiple Regression Model
Page 5
Multiple Regression Model
在實際問題的分析上, 應變數或被解釋變數會受到不
只一個變數的影響, 此時, 我們即要討論複迴歸模型
(multiple regression) 的分析
複迴歸模型可為線型亦可為非線型, 以線型複迴歸
(multiple linear regression) 為例亦即我們所要討論
的是一個應變數Y 的實現值決定於許多個 (如k個) 解
釋變數X1, X2, . . . , Xk出現的數值, 而其間的關係
是為線型的關係, 即
yi = β X + β X + · · · + β X + e , i = 1, · · · , n.
複迴歸分析的主要工作即在以實際資料估計出模型中
的參數β1, β2, . . . , βk
1
1i
2
2i
k
ki
i
Page 6
Multiple Regression Model
線型複迴歸模型參數的最小平方估計式依然是由平方
誤差樣本平均數(sample average of squared errors)
的最小化求解所導出
Page 7
模型假設
Page 8
相關檢定
對於單一虛無假設 (single null hypothesis)
H : βj = b的統計檢定量為
0
F test 可做單一檢定也可做聯合檢定
聯合檢定
H : β = β = β =……=β = 0
0
1
2
3
n
Page 9
Multiple Regression Model
in
R
Page 10
MRA in R (1)
Form example data
> set.seed(666) #設定亂數種子
> FH=rnorm(100,mean=170,sd=15)
#產生100組父親身高
> MH=rnorm(100,mean=160,sd=10)
#產生100組母親身高
> GD=rbinom(100,size=1,prob=.45)
#產生100組性別
> CH=0.4*FH+0.6*MH-50+rnorm(100,sd=8)
#產生100組兒童身高
> height=data.frame(MH,FH,GD,CH)
#將身高資料合併為一資料集
Page 11
MRA in R (2)
Form example data
> names(height)
[1] “MH” “FH” “GD” “CH”
> cor(height)
MH
FH
GD
CH
MH
1.0000000
0.1810484
-0.1084293
0.5909038
#各參數的相關係數
FH
GD
CH
0.181048442 -0.108429291 0.5909038
1.000000000 0.009871436 0.5151523
0.009871436 1.000000000 -0.1600201
0.515152289 -0.160020142 1.0000000
> summary(height)
MH
Min. :128.5
1st Qu.:150.5
Median :158.4
Mean :159.1
3rd Qu.:167.5
Max. :185.8
FH
Min. :125.9
1st Qu.:158.4
Median :169.2
Mean :169.0
3rd Qu.:180.3
Max. :202.3
GD
Min. :0.00
1st Qu.:0.00
Median :0.00
Mean :0.46
3rd Qu.:1.00
Max. :1.00
CH
Min. : 76.68
1st Qu.:104.26
Median :112.03
Mean :111.77
3rd Qu.:119.42
Max. :134.66
Page 12
MRA in R (3)
Form example data
> pairs(height)
Page 13
MRA in R (4)
Linear model
> form = CH~FH+MH+GD
> lm(form,data=height[1:50,])
Call: lm(formula = form, data = height[1:50, ])
Coefficients:
(Intercept)
FH
MH
GD >>>
-24.6993
0.2933 0.5421 -1.4246 CH= -24.6993+0.2933FH+0.5421MH-1.4246GD
> lm(form,data=height[51:100,])
Call: lm(formula = form, data = height[51:100, ])
Coefficients:
>>>
(Intercept)
FH
MH
GD
CH= -21.1932+0.3516FH+0.4796MH-3.3089GD
-21.1932
0.3516 0.4796 -3.3089
> model=lm(form,data=height)
> model
Call: lm(formula = form, data = height)
Coefficients:
>>>
(Intercept)
FH
MH
GD
CH= -21.9038+0.3178FH+0.5101MH-2.5284GD
-21.9038
0.3178 0.5101 -2.5284
Page 14
Page 15
MRA in R (5)
Linear model
> summary(model)
Call:
lm(formula = form, data = height)
Residuals:
Min
1Q Median
3Q
Max
-19.2689 -4.6959 -0.5674 5.4403 16.3832
Coefficients:
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
-21.90383
13.34304
-1.642
0.104
FH
0.31778
0.05299
5.997
3.53e-08 ***
MH
0.51005
0.07249
7.036
2.92e-10 ***
GD
-2.52837
1.61594
-1.565
0.121
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.003 on 96 degrees of freedom
Multiple R-squared: 0.5333 , Adjusted R-squared: 0.5187
F-statistic: 36.57 on 3 and 96 DF,
p-value: 7.53e-16
Page 16
MRA in R (6)
Linear model
> library(car)
Warning message: package ‘car’ was built under R version 3.1.2
> Anova(model)
Anova Table (Type II tests)
Response: CH
Sum Sq
Df
F value
Pr(>F)
FH
2303.3
1
35.9646
3.531e-08 ***
MH
3170.6
1
49.5081
2.924e-10 ***
GD
156.8
1
2.4481
0.121
Residuals 6148.1
96
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Page 17
MRA in R (7)
Linear model
> confint(model, level=0.95)
2.5 %
(Intercept) -48.3895545
FH
0.2125955
MH
0.3661621
GD
-5.7359740
> vcov(model)
# CIs
97.5 %
4.5818848
0.4229602
0.6539446
0.6792422
# covariance
(Intercept)
FH
MH
GD
(Intercept) 178.0366433 -0.3613914446 -0.7229975953 -2.85283569
FH
-0.3613914 0.0028078357 -0.0007037318 -0.00258388
MH
-0.7229976 -0.0007037318 0.0052547813 0.01312824
GD
-2.8528357 -0.0025838804 0.0131282356 2.61124999
Page 18
MRA in R (8)
Model prediction
> fit=predict(model)
> fit
#預測值
1
2
3
4
5
6
...
109.95707 111.53414 119.63329 127.56047 98.06113 120.20398 ...
> err=residuals(model)
> err
#誤差
1
2
3
4
5
-0.93857614 5.89653186 4.05236595 -0.72986074 3.64675180
...
...
> range(err-(CH-fit))
[1] 2.753353e-14 4.023448e-13
#表示除了捨位誤差之外,err 與(CH-fit)數值是一樣的
Page 19
MRA in R (9)
Plot
> par(mfrow=c(2,2))
> plot(model)
# visualize four
Page 20
作業
使用R內建的資料 (變數名稱 mtcars)
執行一複回歸,令 y = 𝛽0 + 𝛽1x1 + 𝛽2x2 + 𝛽3x3 + 𝛽4x4
其中 y=mtcars$hp
x1=mtcars$cyl, x2=mtcars$wt, x3=mtcars$gear, x4=mtcars$disp
請求出該模型之
回歸式及各𝛽
Page 21
Multiple Regression Model in
R
Page 22