Liiketaloustieteellisen tutkimuksen perusteet

Download Report

Transcript Liiketaloustieteellisen tutkimuksen perusteet

GENERAL LINEAR
MODELS
Oneway ANOVA,
GLM Univariate (n-way ANOVA,
ANCOVA)
BASICS





Dependent variable is continuous
Independent variables are nominal, categorical
(factor, CLASS) or continuous (covariate)
Are the group means of the dependent variable
different across groups defined by the
independents
Main effects, interactions and nested effects
Often used for testing hypotheses with
experimental data
BASICS
Factor A (industry)
Level 1 (manufact)
Factor B (size)
Level 1 (small)
Factor A (industry)
Level 2 (trade)
Cell
Factor B (size)
Level 2 (medium)
Factor B (size)
Level 3 (large)
3 X 2 full factorial design (full: each cell has observations)
Balanced design: each cell has equal number of observations
ASSUMPTIONS






Enough observations in each group? (n >20)
Independence of observations
Similarity of variance-covariance matrices (no
problem if largest group variance < 1.5*smallest
group variance, 4* if balanced design)
Normality
Linearity
No outlier-observations
STEPS OF INTERPRETATION

Model significance?





F-test and R square
Welch, if unequal group variances (this can be
tested using Levene or Brown-Forsythe test)
Significance of effects? (F-test and partial eta
squared)
Which group differences are significant? Post
hoc or contrast tests
What are the group differences like?
Estimated marginal means for groups
Oneway ANOVA






A continuous dependent variable (y) and one
categorical independent variable (x), with min. 3
categories, k= number of categories
assumptions: y normally distributed with equal
variance in each x category
H0: mean of y is the same in all x categories
Variance of y is divided into two components:
within groups (error) and between groups
(model, treatment)
Test statistic= between mean square / within
mean square follows F-distribution with k-1, n-k
degrees of freedom
F-test can be replaced by Welch if variances are
unequal
Oneway ANOVA


If the F test is significant, you can use
post hoc tests for pairwise comparison of
means across the groups
Alternatively (in experiments) you can
define contrasts ex ante
Contrast Coefficients
hius ten väri
Contrast
1
2
vaalea
tumma
punainen
kalju
1
0
-1
0
,5
,5
-1
0
SAS: oneway ANOVA
SAS: oneway ANOVA
Use this instead of F if
variances are not equal
BF or Levene, H0: group
variances are equal
SAS: oneway ANOVA
Post hoc -tests
SAS: oneway ANOVA
SAS: oneway ANOVA
MODEL FIT
Class Level Information
Class
Levels Values
class_popgrowth
4 1234
Source
DF
Sum of
Squares
Model
3
298.3992640
99.4664213
Error
68
504.4139305
7.4178519
Corrected Total
71
802.8131944
Mean Square
F Value
Pr > F
13.41
<.0001
R-Square
Coeff Var
Root MSE
deathrate Mean
0.371692
34.10981
2.723573
7.984722
EQUALITY OF VARIANCES
Levene's Test for Homogeneity of deathrate Variance
ANOVA of Squared Deviations from Group Means
Source
class_popgrowth
Error
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
3
3004.3
1001.4
4.23
0.0084
68
16110.6
236.9
Welch's ANOVA for deathrate
Source
class_popgrowth
Error
DF
F Value
Pr > F
3.0000
13.00
<.0001
18.9519
GROUP MEANS
deathrate
Level of
class_popgrowth
N
Mean
Std Dev
1
27
10.5666667
3.01457996
2
22
6.9272727
1.60064922
3
17
5.9705882
1.87941637
4
6
5.9500000
5.61809576
POST HOC TEST
Comparisons significant at the 0.05 level are indicated by ***.
class_popgrowth
Comparison
Difference
Between
Means
Simultaneous 95%
Confidence Limits
1-2
3.6394
1.5135
5.7653
***
1-3
4.5961
2.3044
6.8877
***
1-4
4.6167
1.2760
7.9573
***
2-1
-3.6394
-5.7653
-1.5135
***
2-3
0.9567
-1.4335
3.3468
2-4
0.9773
-2.4317
4.3862
3-1
-4.5961
-6.8877
-2.3044
3-2
-0.9567
-3.3468
1.4335
3-4
0.0206
-3.4942
3.5353
4-1
-4.6167
-7.9573
-1.2760
4-2
-0.9773
-4.3862
2.4317
4-3
-0.0206
-3.5353
3.4942
***
***
BOXPLOTS
Multiway ANOVA, GLM





A continuous dependent variable y, two or more
categorical independent variables (factorial design)
ANCOVA, if there are continuous independents
(covariates)
main effects and interaction effects can be modeled
fixed factor, if all groups are present and random
factor, if only some groups are randomly represented
in the data
Eta squared = SSK/SST expresses how many % of
the variance in y is explained by x (not in EG! SAS
code: model y = x1 x2 / ss3 EFFECTSIZE;)
INTERACTION EFFECT




Synergy of two factors, the effect of
one factor is different in the groups of
the other factor
Crossing effect = interaction effect
Ordinal (lines in means plot have
different slopes, but do not cross)
Disordinal (lines cross in the means
plot)
NO INTERACTION
mean of profitability
40
30
manufact
20
trade
10
0
small
medium
large
Size and industry both have a significant main effect
No interaction, homogeneity of slopes
INTERACTIONS
mean of profitability
Ordinal interaction (the effect
of size is stronger in
manufacturing than in trade)
50
40
30
manufact
20
trade
10
0
small
Dis-ordinal interaction (the
effect of size has a different
sign in manufacturing and
trade)
medium
large
mean of profitability
50
40
30
manufact
20
trade
10
0
small
medium
large
NESTED EFFECTS




Nested effect B(A) ”B nested within A”
size (industry): the effect of size is estimated
separately for each industry group
Difference between nested and interaction
effect is that the main effect of B (size) is not
included
The slope of B (size) is different in each
category of A (industry)
ESTIMATED GROUP MEANS




Estimated marginal means or LS (least
squares) means
Predicted group means are calculated
using the estimated model coefficients
The effects of other independent
variables are controlled for
Is not equal to the group means from
the sample
SUM OF SQUARES



Type I SS does not control for the
effects of other independent variables
which are specified later into the model
Type II SS controls for the effects of all
other independents
Types III and IV SS are better in
unbalanced designs, IV if there are
empty cells
POST HOC TESTS





Multiple comparison procedures, mean separation
tests
The idea is to avoid the risk of Type I error which
results from doing many pairwise tests, each at 5%
risk level
E.g. Bonferroni, Scheffe, Sidak,…
Tukey-Kramer is most powerful
H0: equal group means -> rejection means that
group means are not equal, but failure to reject does
not necessarily mean that they are equal (small
sample size -> low power -> failure to reject the
null)
ANCOVA





The model includes a covariate (= continuous
independent variable, often one whose effect you
want to control for)
Regress y on the covariate -> then ANOVA with
factors explaining the residual
The relationship between covariate and y must be
linear, and the slope is assumed to be the same at all
factor levels
The covariate and factor should not be too much
related to each other
Do not include too many covariates, max 0.1*n – (k1)
SAS: analyze – ANOVA – linear
models
Effects to be estimated
Interaction here, first select both
variables, then click Cross
Sums of squares
Other options, defaults ok
Post hoc-tests
Plots
SAS - code
PROC GLM DATA=libname.datafilename
PLOTS(ONLY)=DIAGNOSTICS(UNPACK)
PLOTS(ONLY)=RESIDUALS
PLOTS(ONLY)=INTPLOT
;
CLASS Elinkaari Perheyr;
MODEL growthorient=
ln_hlo Elinkaari Perheyr Elinkaari*Perheyr
/
SS3
SOLUTION
SINGULAR=1E-07
EFFECTSIZE
;
LSMEANS Elinkaari Perheyr Elinkaari*Perheyr / PDIFF ADJUST=BON ;
RUN;
QUIT;
Model significance and fit
Class Level Information
Class
Levels Values
Elinkaari phase
3 234
Perheyr family
2 01
Sum of
Squares
13.03085542
75.69810081
88.72895623
Source
Model
Error
Corrected Total
DF
6
125
131
R-Square
0.146861
Coeff Var
21.79382
Root MSE
0.778193
Number of Observations Read
Number of Observations Used
Mean Square
2.17180924
0.60558481
F Value Pr > F
3.59 0.0026
growthorient Mean
3.570707
181
132
Significance of predictors
Source
DF Type III SS Mean Square F Value Pr > F
ln_hlo employees
1 2.88693851
2.88693851
4.77 0.0309
Elinkaari phase
2 9.52176337
4.76088169
7.86 0.0006
Perheyr family
1 0.28960870
0.28960870
0.48 0.4905
Elinkaari*Perheyr
2 1.99071120
0.99535560
1.64 0.1974
Phase*Family
EFFECT SIZE OF PREDICTORS
Total Variation Accounted For
Source
ln_hlo
Partial Variation Accounted For
Semipartial
Conservative
Semipartial
Omega- 95% Confidence Li Partial EtaEta-Square
Square
mits
Square
0.0325
0.0255
0.0000
0.1112
0.0367
Elinkaari
0.1073
0.0930
0.0219
0.2056
0.1117
Perheyr
0.0033
-0.0035
0.0000
0.0488
0.0038
Elinkaari*Per
heyr
0.0224
0.0087
0.0000
0.0842
0.0256
Partial
95%
Omega- Confidenc
Square e Limits
0.0277 0.000
0
0.0942 0.022
5
-0.0040 0.000
0
0.0097 0.000
0
0.115
8
0.207
3
0.050
3
0.088
7
Parameter estimates
Parameter
Intercept
ln_hlo employees
Elinkaari
2 growth
Elinkaari
3 mature
Elinkaari
4 decline
Perheyr
0 non family
Perheyr
1 family
Elinkaari*Perheyr 2 0
Elinkaari*Perheyr 2 1
Elinkaari*Perheyr 3 0
Elinkaari*Perheyr 3 1
Elinkaari*Perheyr 4 0
Elinkaari*Perheyr 4 1
Estimate
3.196306815
0.161079578
0.372704251
-0.041166136
0.000000000
-0.862973482
0.000000000
1.250588328
0.000000000
0.654885600
0.000000000
0.000000000
0.000000000
B
B
B
B
B
B
B
B
B
B
B
B
Standard
Error
0.49826714
0.07377500
0.49030119
0.46224369
.
0.92404272
.
0.98491805
.
0.94241380
.
.
.
t Value
6.41
2.18
0.76
-0.09
.
-0.93
.
1.27
.
0.69
.
.
.
Pr > |t|
<.0001
0.0309
0.4486
0.9292
.
0.3522
.
0.2065
.
0.4884
.
.
.
Prediction for 6 cells

Elinkaari=2 & perheyr=0 (growth phase, non family)

Elinkaari=3 & perheyr=0 (mature phase, non family)

Elinkaari=4 & perheyr=0 (decline phase, non family)

Elinkaari=2 & perheyr=1 (growth phase, family)

Elinkaari=3 & perheyr=1 (mature phase, family)

Elinkaari=4 & perheyr=1 (decline phase, family)
Growth = 3.20 + 0.16*ln_hlo + 0.37 – 0.86 + 1.25
= 3.96 + 0.16*ln_hlo
Growth = 3.20 + 0.16*ln_hlo – 0.04 – 0.86 + 0.65
= 2.95 + 0.16*ln_hlo
Growth = 3.20 + 0.16*ln_hlo + 0.00 – 0.86 + 0.00
= 2.34 + 0.16*ln_hlo
Growth = 3.20 + 0.16*ln_hlo + 0.37 + 0.00 + 0.00
= 3.57 + 0.16*ln_hlo
Growth = 3.20 + 0.16*ln_hlo - 0.04 + 0.00 + 0.00
= 3.16 + 0.16*ln_hlo
Growth = 3.20 + 0.16*ln_hlo + 0.00 + 0.00 + 0.00
= 3.20 + 0.16*ln_hlo
38
Parameter estimates
The X'X matrix has been found to be singular, and a generalized inverse
was used to solve the normal equations. Terms whose estimates are
followed by the letter 'B' are not uniquely estimable.
This warning always occurs if you have categorical independent variables
in the model, SAS can however estimate the coefficients
39
Homoskedasticity
Outlier diagnostics
Residual distribution
Model fit
Influence diagnostics
Residual vs. covariate
Significance of group
differences, main effects
Elinkaari
phase
2 growth
3 mature
4 decline
Perheyr
Family
0
1
growthorient LSMEAN
LSMEAN
Number
4.14643211
1
3.43471035
2
3.14843369
3
Least Squares Means for effect Elinkaari
Pr > |t| for H0: LSMean(i)=LSMean(j)
i/j
1
2
3
Dependent Variable: growthorient
1
2
3
0.0006
0.1225
0.0006
1.0000
0.1225
1.0000
H0:LSMean1=LSMean
growthorient
2
LSMEAN
Pr > |t|
3.46261763
0.4905
3.69043314
Significance of group
differences, interaction
Phase
2 growth
2
3 mature
3
4 decline
4
Family
0
1
0
1
0
1
growthorient LSMEAN
LSMEAN Number
4.34023953
1
3.95262468
2
3.33066641
3
3.53875430
4
2.71694695
5
3.57992043
6
Non-family firms in
growth phase differ from
non-family firms in
mature phase
Least Squares Means for effect Elinkaari*Perheyr
Pr > |t| for H0: LSMean(i)=LSMean(j)
i/j
1
2
3
4
5
6
Dependent Variable: growthorient
1
2
3
4
5
1.0000 0.0161 0.1052 0.8474
1.0000
0.1040 0.8177 1.0000
0.0161 0.1040
1.0000 1.0000
0.1052 0.8177 1.0000
1.0000
0.8474 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000 1.0000
6
1.0000
1.0000
1.0000
1.0000
1.0000
REPORTING GLM




Model fit: F + df + p and R Square
Nature and significance of effects:
parameter estimates B+s.e.+t+p and
F+p
estimated group means (means plot)
post hoc test results
Means plot
5
4.5
kasvuhakuisuus
4
3.5
perheyr
3
ei-perheyr
2.5
2
1.5
1
kasvuvaihe
vakiintunut
loppumassa
Employees at its mean value (20)