Transcript Lecture 6

Regression Concept & Examples,
Latent Variables,
&
Partial Least Squares (PLS)
1
Simple Regression Model
• Make prediction about the starting salary of a current college
graduate
• Data set of starting salaries of recent college graduates
Data Set
Compute Average Salary
How certain are of this prediction?
There is variability in the data.
2
Simple Regression Model
• Use total variation as an index of uncertainty about our prediction
Compute Total Variation
• The smaller the amount of total variation the more accurate
(certain) will be our prediction.
3
Simple Regression Model
• How “explain” the variability - Perhaps it depends on
the student’s GPA
Salary GPA
4
Simple Regression Model
• Find a linear relationship between GPA and starting salary
• As GPA increases/decreases starting salary increases/decreases
5
Simple Regression Model
• Least Squares Method to find regression model
– Choose a and b in regression model (equation) so that it minimizes the sum
of the squared deviations – actual Y value minus predicted Y value (Y-hat)
6
Simple Regression Model
• How good is the model?
a= 4,779 & b = 5,370
A computer program computed these values
u-hat is a “residual” value
The sum of all u-hats is zero
The sum of all u-hats squared is the total variance not explained by the model
“unexplained variance” is 7,425,926
7
Simple Regression Model
Total Variation = 23,000,000
8
Simple Regression Model
Total Unexplained Variation = 7,425,726
9
Simple Regression Model
• Relative Goodness of Fit
– Summarize the improvement in prediction using regression model
• Compute R2 – coefficient of determination
Regression Model (equation) a better predictor than guessing the average salary
The GPA is a more accurate predictor of starting salary than guessing the average
R2 is the “performance measure“ for the model.
Predicted Starting Salary = 4,779 + 5,370 * GPA
10
Detailed Regression Example
11
Data Set
Obs #
1
2
3
4
5
6
7
8
9
10
Salary
20000
24500
23000
25000
20000
22500
27500
19000
24000
28500
GPA
2.8
3.4
3.2
3.8
3.2
3.4
4.0
2.6
3.2
3.8
Months Work
48
24
24
24
48
36
20
48
36
12
12
Scatter Plot - GPA vs Salary
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
5000
10000
15000
20000
25000
30000
13
Scatter Plot - Work vs Salary
60
50
40
30
20
10
0
0
5000
10000
15000
20000
25000
30000
14
Pearson Correlation Coefficients
-1 <= r <= 1
Salary
Salary
GPA
Months
Work
Months
Work
GPA
1
0.898007
1
-0.93927
-0.82993
1
15
Three Regressions
•
•
•
•
Salary = f(GPA)
Salary = f(Work)
Salary = f(GPA, Work)
Interpret Excel Output
16
Interpreting Results
• Regression Statistics
– Multiple R,
– R2,
– R2adj
– Standard Error Sy
• Statistical Significance
– t-test
– p-value
– F test
17
Regression Statistics Table
• Multiple R
– R = square root of R2
• R2
– Coefficient of Determination
• R2adj
– used if more than one x variable
• Standard Error Sy
– This is the sample estimate of the standard
deviation of the error (actual – predicted)
18
ANOVA Table
• Table 1 gives the F statistic
• Tests the claim
– there is no significant relationship between your
all of your independent and dependent variables
• The significance F value is a p-value
• should reject the claim:
– Of NO significant relationship between your
independent and dependent variables if p<
– Generally  = 0.05
19
Regression Coefficients Table
• Coefficients Column gives
– b0 , b1, ,b2 , … , bn values for the regression equation.
– The b0 is the intercept
– b1value is next to your independent variable x1
– b2 is next to your independent variable x2.
– b3 is next to your independent variable x3
20
Regression Coefficients Table
• p values for individual t tests each independent
variables
• t test - tests the claim that there is no relationship
between the independent variable (in the
corresponding row) and your dependent variable.
• Should reject the claim
• Of NO significant relationship between your independent variable (in
the corresponding row) and dependent variable if p<.
21
Salary = f(GPA)
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(GPA)
0.898006642
0.806415929
0.78221792
1479.019946
10
ANOVA
Regression
Residual
Total
Intercept
GPA
df
1
8
9
SS
72900000
17500000
90400000
MS
72900000
2187500
F
33.32571
Significance F
0.00041792
Standard
Coefficients
Error
t Stat
P-value Lower 95% Upper 95%
1928.571429 3748.677 0.514467 0.620833 -6715.89326 10573.04
6428.571429 1113.589 5.772843 0.000418 3860.63173 8996.511
22
Salary = f(Work)
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(Work)
0.939265177
0.882219073
0.867496457
1153.657002
10
ANOVA
Regression
Residual
Total
df
1
8
9
SS
79752604.17
10647395.83
90400000
MS
79752604
1330924
F
Significance F
59.92271 5.52993E-05
Standard
Coefficients
Error
t Stat P-value Lower 95% Upper 95%
Intercept 30691.66667 1010.136344 30.38369 1.49E-09 28362.28808 33021.0453
Months
Work
-227.864583 29.43615619 -7.74098 5.53E-05 295.7444812 -159.98469
23
Salary = f(GPA, Work)
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(GPA,Work)
0.962978985
0.927328525
0.906565246
968.7621974
10
ANOVA
Regression
Residual
Total
Intercept
GPA
Months
Work
df
2
7
9
SS
83830499
6569501
90400000
MS
41915249
938500.2
Standard
Coefficients
Error
19135.92896 5608.184
2725.409836 1307.468
t Stat
3.412144
2.084495
-151.2124317 44.30826
-3.41274
F
44.66195
Significance F
0.00010346
P-value
Lower 95% Upper 95%
0.011255 5874.682112 32397.176
0.075582 -366.2602983 5817.08
0.011246 -255.9848174 46.440046
24
Compare Three “Models”
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(GPA)
0.898006642
0.806415929
0.78221792
1479.019946
10
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(Work)
0.939265177
0.882219073
0.867496457
1153.657002
10
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
f(GPA,Work)
0.962978985
0.927328525
0.906565246
968.7621974
10
25
Latent Variables
(Theoretical Entities)
26
Latent Variables
• Latent Variables
– Explanatory Variables that are not directly
measured
– Identified by “Exploratory Factor Analysis”
– Confirmed by “Confirmatory Factor Analysis”
• Statistical Methods for Latent Variables
– Principles Components Analysis
– PLS
– SEM
27
Example:
Confirmatory Factor Analysis
Intention to Use Travelocity Website
28
Research Instrument
29
30
31
32