Powerpoint Slides for Unit 3

Download Report

Transcript Powerpoint Slides for Unit 3

Covariance
x
x–x>0
(x,y)
y–y>0
y
x and y axes
1
Covariance
x–x<0
x
(x,y)
y–y>0
y
x and y axes
2
Below average values of x
x are with above average
values of y
Above average values of
x are also above average
values of y
y
Below average values of
x are also below average
values of y
Above average values of
x are with below average
values of y
So what happens on balance?
Covariance
3
Covariance
What happens on balance?
x
Calculate the average of
the squared deviations.
y
4
Covariance
What happens on balance?
x
Calculate the average of
the squared deviations.
y
5
Covariance Example
x
25
Sxy= 1.999
Wage
20
15
10
y
5
0
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Aptitude
6
Correlation
x
rxy= 0.476
25
Wage
20
15
10
y
5
0
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Aptitude
7
Perfect Correlation
8
Fit That Line !
y=2,500+1,800x
Annual Income
$40,000
y=10,000+1,000x
$35,000
$30,000
$25,000
$20,000
y=13,000+750x
$15,000
8
10
12
14
16
18
20
22
Education (years) (x)
9
Fit That Line !
Annual Income
$40,000
y=8,135 + 1,233x
minimizes the squared
errors
$35,000
$30,000
$25,000
$20,000
$15,000
8
10
12
14
16
18
20
22
Education (years) (x)
10
Word Problem
• Students in a small class were polled by a researcher
attempting to establish a relationship between hours of
study in a week preceding a test and the result of the
test.
• If you get data on hours studied and exam results,
which variable is the dependent variable? why?
11
Word Problem
Exam Score (y)
100
y=39.406 + 2.122x
90
80
70
60
50
40
5
10
15
20
25
30
Hours Studied (x)
12
Word Problem
Excel Regression Output (Data
Analysis Add-In)
Regression Statistics
Multiple R
0.770
R Squared
0.594
Adj. R Squared
0.543
Standard Error
10.710
Obs.
10
ANOVA
df
SS
MS
F
Significance
Regression
1
1340.452
1341.452
11.686
0.009
Residual
8
917.648
114.706
Total
9
2258.100
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
Intercept
39.401
12.153
3.242
0.012
11.375
67.426
hours
2.122
0.621
3.418
0.009
0.691
3.554
13
Word Problem
Regression analysis to predict score from hours.
The prediction equation is:
Score =
Excel Regression
Output (StatPad AddIn)
39.401
2.122 hours
0.594
10.710
10
11.686
0.009
Constant
hours
R squared
Standard error of estimate
Number of observations
F statistic
P value
95%
95%
Coeff
LowerCI
UpperCI
StdErr
t
p
Significant
39.401
11.375
67.426
12.153
3.242
0.012
Yes (p<0.05)
2.122
0.691
3.554
0.621
3.418
0.009
Yes (p<0.05)
14
The Nine Lives of
Goldfish
Regression Statistics
Multiple R
0.671
R Squared
0.450
Adj. R Squared
0.340
Standard Error
45.214
Obs.
7
ANOVA
df
SS
MS
F
Significance
Regression
1
8360.48
8360.048
4.089
0.099
Residual
5
10221.667
2044.333
Total
6
18581.714
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
Intercept
91.500
22.607
4.047
0.010
33.387
149.613
filter
-69.833
34.533
-2.022
0.099
-158.603
18.936
15
Predicting Job
Performance
Regression Statistics
R Squared
0.107
Adj. R Squared
0.107
Standard Error
1.955
Obs.
3525
Simple Regression:
Perform = 3.956 – 0.022 age
ANOVA
df
SS
MS
F
Significance
3
1620.806
540.269
141.287
0.000
Residual
3521
13463.982
3.824
Total
3524
15084.788
Regression
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
4.865
0.171
28.423
0.000
4.529
5.200
-0.037
0.002
-20.263
0.000
-0.041
-0.034
Seniority
0.011
0.003
3.325
0.001
0.004
0.017
Cognitive
-0.032
0.033
-0.983
0.326
-0.097
0.032
Intercept
Age
16
Predicting Job
Performance
Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive
Age
35
36
45
46
Seniority
10
10
10
10
Cognitive
1
1
1
1
3.626
3.589
3.251
3.214
Predicted
Performance
Net Difference
-0.037
-0.037
Age
35
35
Seniority
20
21
Cognitive
1
1
3.731
3.742
Predicted Performance
Net Difference
Note importance
of ceteris paribus
(all else constant)
0.011
17
Predicting Job
Performance
Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive
And holding seniority constant at 10 and cognitive constant at 1
3.75
Performance
3.65
3.55
3.45
3.35
3.25
3.15
30
35
40
45
50
Age
18
Predicting Job
Performance
Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive
And holding seniority constant at 20 and cognitive constant at -1
3.85
Performance
3.75
3.65
3.55
3.45
3.35
3.25
30
35
40
45
With linear
models,
other values
don’t
matter; just
all else
50 constant
Age
19
Predicting Job Perf. With
a Dummy Variable
Regression Statistics
R Squared
Adj. R Squared
Standard Error
Obs.
0.110
0.109
1.953
3525
Structured Interview Dummy
Variable: 1=yes, 0=no
ANOVA
Regression
Residual
Total
Intercept
Age
Seniority
Cognitive
Structured int.
df
34
SS
1657.286
MS
414.321
3520
3524
13427.502
15084.788
3.815
Std. Error
0.172
0.002
0.003
0.033
0.922
t stat
28.096
-20.231
3.271
-0.756
3.092
Coeff.
4.820
-0.037
0.010
-0.025
2.850
F
108.614
Significance
0.000
p value
0.000
0.000
0.001
0.450
0.002
Lower 95%
4.484
-0.041
0.004
-0.090
1.043
Upper 95%
5.156
-0.034
0.017
0.040
4.658
20
Predicting Job Perf. With
a Dummy Variable
Perform = 4.820 – 0.037 age + 0.010 seniority - 0.025 cognitive +
2.850 structured interview
Age
35
35
45
45
Seniority
10
10
5
5
Cognitive
1
1
2
2
Structured
Interview
0
1
0
1
3.600
6.450
3.155
6.005
Predicted
Performance
Net Difference
2.850
2.850
Dummy variable turns “on” and “off” with all else constant.
21
Predicting Job Perf. With
a Dummy Variable
Perform = 4.865 – 0.037 age + 0.010 seniority - 0.025 cognitive +
2.850 structured interview
And holding seniority constant at 10 and cognitive constant at 1
7.00
Performance
6.00
5.00
4.00
3.00
2.00
30
35
40
45
50
Age
22
Predicting Job Perf. With
a Dummy Variable
Performance
Note new y-intercept
9
8
7
6
5
4
3
2
1
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65
Age
Seniority=20, Cognitive=0
23
Multiple Dummy
Variables
Source |
SS
df
MS
---------+-----------------------------Model | 5035.58483
14 359.684631
Residual | 10049.2032 3510 2.86302087
---------+-----------------------------Total | 15084.7881 3524 4.28058685
Number of obs =
F( 14, 3510) =
Prob > F
=
R-squared
=
Adj R-squared =
Root MSE
=
3525
125.63
0.0000
0.3338
0.3312
1.692
-----------------------------------------------------------------------------perform |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------age | -.0301543
.0016933
-17.808
0.000
-.0334742
-.0268344
seniorty |
.0016888
.002762
0.611
0.541
-.0037265
.007104
cognitve |
.0119113
.0286362
0.416
0.677
-.0442339
.0680565
strucint |
3.665569
.7995184
4.585
0.000
2.098001
5.233137
job1 |
1.928286
.1277788
15.091
0.000
1.677758
2.178814
job2 |
.426524
.1260009
3.385
0.001
.1794815
.6735664
job3 |
.1407506
.1306411
1.077
0.281
-.1153896
.3968908
job4 |
.2921016
.1347211
2.168
0.030
.0279621
.5562411
job5 | -1.069262
.1331017
-8.033
0.000
-1.330227
-.8082974
job6 | -1.179162
.1377497
-8.560
0.000
-1.449239
-.9090839
job7 | -1.304191
.1406734
-9.271
0.000
-1.580001
-1.028381
job8 | -.8530246
.1381293
-6.176
0.000
-1.123846
-.5822027
job9 | -.6652395
.1501504
-4.430
0.000
-.9596304
-.3708487
job10 | -1.012177
.1420816
-7.124
0.000
-1.290748
-.7336058
_cons |
5.021799
.1643372
30.558
0.000
4.699593
5.344005
-----------------------------------------------------------------------------Note: job1-job10 are dummy variables representing 10 different job classes
(job11 is the omitted reference category)
24
Interaction Variables
Source |
SS
df
MS
---------+-----------------------------Model | 2581.89927
6 430.316544
Residual | 12502.8888 3518 3.55397635
---------+-----------------------------Total | 15084.7881 3524 4.28058685
Number of obs =
F( 6, 3518) =
Prob > F
=
R-squared
=
Adj R-squared =
Root MSE
=
3525
121.08
0.0000
0.1712
0.1697
1.8852
-----------------------------------------------------------------------------perform |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------age |
-.006
.0034204
-1.705
0.088
-.0125379
.0008743
seniorty |
.011
.0030589
3.559
0.000
.0048879
.0168827
cognitve |
-.005
.0318774
-0.167
0.867
-.0678283
.0571719
strucint |
2.129
.8937022
2.383
0.017
.3770909
3.881545
manual |
-1.513
.2391962
-6.327
0.000
-1.982442
-1.044488
manl_age |
-.042
.004011
-10.439
0.000
-.0497349
-.0340066
_cons |
6.009
.2354444
25.526
0.000
5.548275
6.471517
------------------------------------------------------------------------------
Note: manual is a dummy variable indicating a manual
occupation; manl_age is age interacted with manual (i.e.
manl_age = manual*age)
25
Performance
Interaction Variables
9
8
7
6
5
4
3
2
1
0
Note
different
slopes, too.
0 5 10 15 20 25 30 35 40 45 50 55 60 65
Age
Seniority=20, Cognitive=0, StrucInt=0
26
Another Interaction
Variable Example
Source |
SS
df
MS
-------------+-----------------------------Model |
804247599
5
160849520
Residual | 3.0773e+09 15315 200936.252
-------------+-----------------------------Total | 3.8816e+09 15320 253367.252
Number of obs
F( 5, 15315)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
15321
800.50
0.0000
0.2072
0.2069
448.26
-----------------------------------------------------------------------------earnwkly |
Coef.
-------------+---------------------------------------------------------------married |
136.003
female | -169.837
exper |
2.946
parttime | -227.716
exp_pt |
-1.896
_cons |
700.802
------------------------------------------------------------------------------
exper is potential labor market experience (age-educ-6)
parttime is a dummy variable indicating a part-time worker
exp_pt is exper interacted with perttime
(i.e. exp_pt = exper*parttime)
27
Weekly Earnings
Interaction Variables
800
750
700
650
600
550
500
450
400
0
5
10
15
20
25
30
Potential Experience
Married=1, Female=1
28
Adjusted R2
Source |
SS
df
MS
---------+-----------------------------Model | 5035.58483
14 359.684631
Residual | 10049.2032 3510 2.86302087
---------+-----------------------------Total | 15084.7881 3524 4.28058685
Number of obs =
F( 14, 3510) =
Prob > F
=
3525
125.63
0.0000
R-squared
=
Adj R-squared =
Root MSE
=
0.3338
0.3312
1.692
-----------------------------------------------------------------------------perform |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------age | -.0301543
.0016933
-17.808
0.000
-.0334742
-.0268344
seniorty |
.0016888
.002762
0.611
0.541
-.0037265
.007104
cognitve |
.0119113
.0286362
0.416
0.677
-.0442339
.0680565
strucint |
3.665569
.7995184
4.585
0.000
2.098001
5.233137
job1 |
1.928286
.1277788
15.091
0.000
1.677758
2.178814
job2 |
.426524
.1260009
3.385
0.001
.1794815
.6735664
job3 |
.1407506
.1306411
1.077
0.281
-.1153896
.3968908
job4 |
.2921016
.1347211
2.168
0.030
.0279621
.5562411
job5 | -1.069262
.1331017
-8.033
0.000
-1.330227
-.8082974
job6 | -1.179162
.1377497
-8.560
0.000
-1.449239
-.9090839
job7 | -1.304191
.1406734
-9.271
0.000
-1.580001
-1.028381
job8 | -.8530246
.1381293
-6.176
0.000
-1.123846
-.5822027
job9 | -.6652395
.1501504
-4.430
0.000
-.9596304
-.3708487
job10 | -1.012177
.1420816
-7.124
0.000
-1.290748
-.7336058
_cons |
5.021799
.1643372
30.558
0.000
4.699593
5.344005
-----------------------------------------------------------------------------Note: job1-job10 are dummy variables representing 10 different job classes
(job11 is the omitted reference category)
29
Causality ?
Workforce Optimization
Sue Bostrom: Leadership on IT—What’s It Worth?
September 10, 2001
“For those who still doubt that Internet-related
investments will pay off, consider this: A
PricewaterhouseCoopers study released earlier this year
found that productivity gains in 2000 were 2.7 times
greater for Internet-enabled companies than for
businesses that have not leveraged the Web.”
http://business.cisco.com/prod/tree.taf%3Fpublic_view=true&kbns=1&asset_id=66966.html
30
Causality
Reasons for an estimated statistical relationship
1. The explanatory variable is the direct cause of the
response (dependent) variable
2. The response variable is causing a change in the
explanatory variable (reverse causality)
3. The explanatory variable is a contributing, but not
sole, cause of the response variable
4. Confounding variables may exist
5. Both variables may stem from a common cause
6. Both variables are changing over time
Source: Jessica M. Utts (1999) Seeing Through Statistics,
7. Coincidence
2nd ed., Pacific Grove, CA: Duxbury, p. 186.
31