PSYC 3030 Review Session Gigi Luk December 7, 2004 Overview Matrix Multiple Regression Indicator variables Polynomial Regression Regression Diagnostics Model Building
Download
Report
Transcript PSYC 3030 Review Session Gigi Luk December 7, 2004 Overview Matrix Multiple Regression Indicator variables Polynomial Regression Regression Diagnostics Model Building
PSYC 3030
Review Session
Gigi Luk
December 7, 2004
Overview
Matrix
Multiple Regression
Indicator variables
Polynomial Regression
Regression Diagnostics
Model Building
Matrix: Basic Operation
Addition
Subtraction
Multiplication
Inverse
|A|
Possible only when dimensions
are the same
Possible only when inside dimensions
are the same 2x3 & 3x2
≠0
A is non-singular
All rows (columns) are linearly independent
Matrix: Inverse
Linearly
Linearlyindependent:
Dependent:
31 23 43
51 72 12
21 84 94
2
A
5
4
9
| A | 9 2 5 4 18 20 2
1 9 4 9 / 2 4 / 2 4.5 2
A
2 5 2 5 / 2 2 / 2 2.5 1
1
Some notations
n = sample size
p = number of parameters
c = number of values in x (cf. LOF, p. 85)
g = number of family member in a Bonferroni
test (cf. p. 92)
J = 1 1
I = 1 0
H = x(x’x)-1x’
1 1
0 1
Matrix: estimates & residuals
LS estimates
x’y = (x’x)b
x’x = n
x
i
x’y =
(x’x)-1=
x
x
i
2
i
yi
x y
i i
xi2 xi
1
2
2
n
n xi ( xi ) xi
Residuals
ˆ
e= y y
= y – xb
= [I – H]y
Matrix: Application in Regression
df
SSE = e’e = y’y-b’x’y
1
2
SSM = ny n y ' Jy
SSR = b’x’y – SSM
SST = y’y
SSTO = y’(1-J/n)y
= y’y – SSM
n-p
1
p-1
n
n-1
MS
SSE/n-p
SSR/p-1
Matrix: Variance-Covariance
cov(y1 y2 ) cov(y1 y3 )
var y1
var y2
cov(y2 y3 )
cov(y1 y2 )
cov(y y ) cov(y y )
var
y
1 3
2 3
3
Var-cov (Y) = σ2(Y) =
var-cov (b) = est
σ2(b)
=
=
s2(b)
2
s
= b0
s
b 0 b1
sb 0 b1 = MSE (x’x)-1
2
sb1
MSE
MSE( x 2 )
2
2
n
x
(
x
)
i i
x MSE
x2 ( x )2
i i
x MSE
2
2
x
(
x
)
i i
MSE
2
2
x
(
x
)
i i
Matrix: Variance-Covariance
Estimatedvarianceof
a mean response:
' 2
'
1
ˆ
ˆ
s ( yh ) xh s (b) xh MSE{xh ( x' x) xh }
2
Estimatedvarianceof
a new observation :
s { pred} MSE s ( yˆ h )
2
2
Multiple Regression
Model with more than 2 independent variables:
y = β0 + β1X1 + β2X2 + εi
n
x' x xi1
x
i2
x x
x x x
x x x
i1
2
i1
i1 i 2
i1 i 2
2
i2
i2
yi
x' y xi1 yi
x y
i2 i
MR: R-square
Coefficients of multiple
determination:
R2
= SSR/SSTO
0 ≤ R2 ≤ 1
alternative:
SSE
1
SSTO
Coefficients of partial
determination:
2
y 321
SSR( x3 | x1 x2 )
SSE( x1 x2 )
2
y 21
SSR( x2 | x1 )
SSE( x1 )
r
r
SSTO
SSR(X2)
SSR
(X1,
X2)
SSR(X1|X2)
SSR(X1)
SSR(X2
|X1)
SSE(X1)
SSE(X2)
SSE(X1,
X2)
MR: Hypothesis testing
Test for regression relation (the overall test):
Ho: β1 = β2 =….. =βp-1 =0
Ha: not all βs = 0
If F* ≤ F(1-α; p-1, n-p), conclude Ho.
F*=MSR/MSE
Test for βk:
H o: β k = 0 H a: β k ≠ 0
If |t|* ≤ t(1-α/2; n-p), conclude Ho.
t* = bk/s(bk) ≈ F*= [MSR(xk|all others)/MSE]
MR: Hypothesis Testing (cont’)
Test for LOF:
Ho: E{Y} = βo + β1X1+β2X2+….+ βp-1Xp-1
Ha: E{Y} ≠ βo + β1X1+β2X2+….+ βp-1Xp-1
If F* ≤ F(1-α; c-p, n-p), conclude Ho.
F* = (SSLF/c-p)/(SSPE/n-c)
Test whether some βk=0:
Ho: βh = βh+1 =….. =βp-1 =0
If F* ≤ F(1-α; p-1, n-p), conclude Ho.
F* = [MSR(xh…xp-1|x1…xh-1)]/MSE
MR: Extra SS (p. 141, CK)
Full: y = βo+ β1X1+ β2X2 SSR(x1,x2)
Red: y = βo+ β1X1
SSR(x1)
SSR (x2|x1) = SSR(x1,x2) - SSR(x1)
= Effect of X2 adjusted for X1
= SSE(x1) - SSE(x1,x2)
General Linear Test
Ho: β2 = 0
Ha: β2 ≠ 0
F* = SSE( R) SSE( F ) SSE( F ) SSTO SSE SSE( F )
dfR dfF
dfF
(n 1) (n 2)
dfF
Indicator variables
y-hat
y-hat= =bo
bo+b1X1
+b1X1+b2X2
Y = expressive
vocabulary
girls
boys
bo+b2
slope = b1
bo
0
X = receptive
vocabulary
y-hat = bo + b1X1 +b2X2 + b12X1X2
Y = expressive
vocabulary
If b12 > 0, then there is an
interaction boys and girls have
different slopes in the relation of X
and Y.
boys
girls
0
X = receptive
vocabulary
Polynomial Regression
2nd Order: Y = βo+ β1X1 + β2X2+εi
3rd Order: Y = βo+ β1X1 + β2X2+ β3X3+εi
Interaction:
Y = βo+ β1X1 + β2X2+ β11X2 1+ β22X2 2+
linear
β12X1X2+ εi
interaction
quadratic
PR: Partial F-test (p.303,
ed.)
st
th
5
Test whether a 1 order model would be
sufficient:
Ho: β11= β22= β12= 0
Ha: not all βs in Ho =0
SSR( x12 , x22 , x1 x2 | x1 , x2 ) SSE
F* =
p
n p
In order to obtain this SSR,
you need sequential SS
(see top of p. 304 in text).
This test is a modified test
for extra SS.)
Regression Diagnostics
Collinearity:
Effects:
(1) poor numerical accuracy
(2) poor precision of estimates
Danger sign: several large s(bk)
Determinant of x’x ≈ 0
Eigenvalues of c = # of linear dependencies
Condition #: (λmax/ λi)1/2
15-30 watch out
> 30 trouble
> 100 disaster
Regression Diagnostics
VIF
(Variance Inflation Factor)
= 1/(1-R2i)
When to worry? When VIF ≈ 10
TOL (Tolerance)
= 1/VIFi
Model Building
Goals:
R2 large or MSE small
Keep cost of data collection, s(b) small
Make
Selection Criteria:
look at ∆R2
MSE can or as variables are added
R2
Model Building (cont’)
Cp≈
p = est. of 1/σ2
Σ{var(yhat) + [yhattrue – yhatp]}
Random error
Bias
=SSEp/MSEall – (n-2p)
=p+(m+1-p)(Fp-1)
m: # available predictors
Fp: incremental F for predictors omitted
Model Building (cont’)
Variable Selection Procedure
min MSE & Cp≈ p
SAS tools:
Choose
Forward
Backward
Stepwise
Guided selection: key vars, promising vars, haystack
Substantive knowledge of the area
Examination of each var: expected sign &
magnitude coefficients