PSYC 3030 Review Session Gigi Luk December 7, 2004 Overview Matrix  Multiple Regression  Indicator variables  Polynomial Regression  Regression Diagnostics  Model Building 

Download Report

Transcript PSYC 3030 Review Session Gigi Luk December 7, 2004 Overview Matrix  Multiple Regression  Indicator variables  Polynomial Regression  Regression Diagnostics  Model Building 

PSYC 3030
Review Session
Gigi Luk
December 7, 2004
Overview
Matrix
 Multiple Regression
 Indicator variables
 Polynomial Regression
 Regression Diagnostics
 Model Building

Matrix: Basic Operation
Addition
 Subtraction
 Multiplication
 Inverse

 |A|
Possible only when dimensions
are the same
Possible only when inside dimensions
are the same 2x3 & 3x2
≠0
 A is non-singular
 All rows (columns) are linearly independent
Matrix: Inverse
Linearly
Linearlyindependent:
Dependent:
31 23 43
51 72 12


21 84 94
2
A
5
4
9

| A | 9  2  5  4  18  20  2
1  9  4  9 /  2  4 /  2  4.5 2 
A 






 2  5 2   5 /  2 2 /  2   2.5  1
1
Some notations





n = sample size
p = number of parameters
c = number of values in x (cf. LOF, p. 85)
g = number of family member in a Bonferroni
test (cf. p. 92)
J = 1 1
I = 1 0 
H = x(x’x)-1x’
1 1


0 1 


Matrix: estimates & residuals
LS estimates
 x’y = (x’x)b
 x’x =  n

 x
i


x’y =

(x’x)-1=
x
x
i
2
i




  yi 


 x y 
i i

  xi2   xi 
1


2
2
n 
n xi  ( xi )    xi
Residuals
ˆ
 e= y y
= y – xb
= [I – H]y
Matrix: Application in Regression
df





SSE = e’e = y’y-b’x’y
1
2
SSM = ny  n y ' Jy
SSR = b’x’y – SSM
SST = y’y
SSTO = y’(1-J/n)y
= y’y – SSM
n-p
1
p-1
n
n-1
MS
SSE/n-p
SSR/p-1
Matrix: Variance-Covariance
cov(y1 y2 ) cov(y1 y3 ) 
 var y1


var y2
cov(y2 y3 ) 
 cov(y1 y2 )
 cov(y y ) cov(y y )

var
y
1 3
2 3
3


Var-cov (Y) = σ2(Y) =
var-cov (b) = est
σ2(b)
=
=
s2(b)
2

s
=  b0
s
 b 0 b1
sb 0 b1  = MSE (x’x)-1

2 
sb1 
 MSE
MSE( x 2 )


2
2
n
x

(
x
)
i i

x MSE

  x2  ( x )2
i i


x MSE

2
2
x

(
x
)
i i 
MSE

2
2 
x

(
x
)
i i 
Matrix: Variance-Covariance
Estimatedvarianceof
a mean response:
' 2
'
1
ˆ
ˆ
s ( yh )  xh s (b) xh  MSE{xh ( x' x) xh }
2
Estimatedvarianceof
a new observation :
s { pred}  MSE  s ( yˆ h )
2
2
Multiple Regression

Model with more than 2 independent variables:
y = β0 + β1X1 + β2X2 + εi
 n

x' x    xi1
 x
  i2
x x
x x x
x x x
i1
2
i1
i1 i 2


i1 i 2 
2 
i2 
i2
  yi 


x' y    xi1 yi 
 x y
  i2 i 
MR: R-square

Coefficients of multiple
determination:
 R2
= SSR/SSTO
0 ≤ R2 ≤ 1
 alternative:
SSE
1
SSTO

Coefficients of partial
determination:
2
y 321
SSR( x3 | x1 x2 )

SSE( x1 x2 )
2
y 21
SSR( x2 | x1 )

SSE( x1 )
r
r
SSTO
SSR(X2)
SSR
(X1,
X2)
SSR(X1|X2)
SSR(X1)
SSR(X2
|X1)
SSE(X1)
SSE(X2)
SSE(X1,
X2)
MR: Hypothesis testing
Test for regression relation (the overall test):
Ho: β1 = β2 =….. =βp-1 =0
Ha: not all βs = 0
If F* ≤ F(1-α; p-1, n-p), conclude Ho.
F*=MSR/MSE
 Test for βk:
H o: β k = 0 H a: β k ≠ 0
If |t|* ≤ t(1-α/2; n-p), conclude Ho.
t* = bk/s(bk) ≈ F*= [MSR(xk|all others)/MSE]

MR: Hypothesis Testing (cont’)
Test for LOF:
Ho: E{Y} = βo + β1X1+β2X2+….+ βp-1Xp-1
Ha: E{Y} ≠ βo + β1X1+β2X2+….+ βp-1Xp-1
If F* ≤ F(1-α; c-p, n-p), conclude Ho.
F* = (SSLF/c-p)/(SSPE/n-c)
 Test whether some βk=0:
Ho: βh = βh+1 =….. =βp-1 =0
If F* ≤ F(1-α; p-1, n-p), conclude Ho.
F* = [MSR(xh…xp-1|x1…xh-1)]/MSE

MR: Extra SS (p. 141, CK)




Full: y = βo+ β1X1+ β2X2  SSR(x1,x2)
Red: y = βo+ β1X1
 SSR(x1)
SSR (x2|x1) = SSR(x1,x2) - SSR(x1)
= Effect of X2 adjusted for X1
= SSE(x1) - SSE(x1,x2)
General Linear Test
Ho: β2 = 0
Ha: β2 ≠ 0
F* = SSE( R)  SSE( F )  SSE( F )  SSTO SSE  SSE( F )
dfR  dfF
dfF
(n  1)  (n  2)
dfF
Indicator variables
y-hat
y-hat= =bo
bo+b1X1
+b1X1+b2X2
Y = expressive
vocabulary
girls
boys
bo+b2
slope = b1
bo
0
X = receptive
vocabulary
y-hat = bo + b1X1 +b2X2 + b12X1X2
Y = expressive
vocabulary
If b12 > 0, then there is an
interaction  boys and girls have
different slopes in the relation of X
and Y.
boys
girls
0
X = receptive
vocabulary
Polynomial Regression
2nd Order: Y = βo+ β1X1 + β2X2+εi
 3rd Order: Y = βo+ β1X1 + β2X2+ β3X3+εi
 Interaction:
Y = βo+ β1X1 + β2X2+ β11X2 1+ β22X2 2+

linear
β12X1X2+ εi
interaction
quadratic
PR: Partial F-test (p.303,
ed.)
st

th
5
Test whether a 1 order model would be
sufficient:
Ho: β11= β22= β12= 0
Ha: not all βs in Ho =0
SSR( x12 , x22 , x1 x2 | x1 , x2 ) SSE

F* =
p
n p
In order to obtain this SSR,
you need sequential SS
(see top of p. 304 in text).
This test is a modified test
for extra SS.)
Regression Diagnostics

Collinearity:
 Effects:
(1) poor numerical accuracy
(2) poor precision of estimates
 Danger sign: several large s(bk)
 Determinant of x’x ≈ 0
 Eigenvalues of c = # of linear dependencies
 Condition #: (λmax/ λi)1/2
15-30 watch out
 > 30 trouble
 > 100 disaster

Regression Diagnostics
 VIF
(Variance Inflation Factor)
= 1/(1-R2i)
When to worry? When VIF ≈ 10
 TOL (Tolerance)
= 1/VIFi
Model Building

Goals:
R2 large or MSE small
 Keep cost of data collection, s(b) small
 Make

Selection Criteria:
 look at ∆R2
 MSE  can  or  as variables are added
 R2
Model Building (cont’)
 Cp≈
p = est. of 1/σ2
Σ{var(yhat) + [yhattrue – yhatp]}
Random error
Bias
=SSEp/MSEall – (n-2p)
=p+(m+1-p)(Fp-1)
m: # available predictors
Fp: incremental F for predictors omitted
Model Building (cont’)

Variable Selection Procedure
min MSE & Cp≈ p
 SAS tools:
 Choose
Forward
 Backward
 Stepwise
 Guided selection: key vars, promising vars, haystack

Substantive knowledge of the area
 Examination of each var: expected sign &
magnitude coefficients
