Regression Analysis

Download Report

Transcript Regression Analysis

More about Correlations
Spearman Rank order correlation
• Does the same type of analysis as a
Pearson r but with data that only
represents order.
– Ordinal data represents highest to lowest but
without any indication of distance between
ranks.
Spearman correlation cont.
– With a Spearmen rank order correlation both
variables (x and y) are ranked.
– The correlation then determines the
relationship between rankings
– Easier calculation but less powerful as a
statistical test.
Multiple Regression
Correlation: Relationship between two
variables.
Regression: What would you predict about
the dependent variable, given the
independent variable(s).
Since you can have several variables:
• One or more are designated as dependent while
all others are independent.
– The DV is identified based on prior knowledge or
expectations.
– The IV’s can be continuous measurements (different
than an ANOVA)
– This analysis still does not show causation.
Relationship is defined by :
Y  a  B1 x1  B2 x2  ... Bk xk
'
– Where:
• a is the intercept
• Each x is an IV
• Each B is a regression coefficient for a particular IV
Looking at the output
Correlation overall is evaluated with F.
F
MSreg
MSres
IV1
IV2
DV
IV3
• R - Multiple correlation coefficient is the
measure of correlation between the predicted y
and the obtained y.
• R2 - the portion of the variation of the DV that is
predictable from the regression equation.
Output cont.
• Each IV can be evaluated based on a t
test based on the regression coefficients.
If:
• cancer deaths
• % of smokers and
• % of the population over 75
are used to predict median health care
costs…
Model Summary
Model
1
R
.640a
R Square
.410
Adjusted
R Square
.364
Std. Error of
the Estimate
**********
a. Predictors: (Constant), deaths due to cancer/ 100,000,
% of smokers, % of population over 75
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
4944948
7119704
12064652
df
3
39
42
Mean Square
1648316.001
182556.515
F
9.029
a. Predictors: (Constant), deaths due to cancer/ 100,000, % of smokers, % of
population over 75
b. Dependent Variable: health cost spent/person - 2000
Sig.
.000a
Coefficientsa
Model
1
(Constant)
% of population over 75
% of smokers
deaths due to cancer/
100,000
Unstandardized
Coefficients
B
Std. Error
-473.998
1067.026
148.853
61.315
2.186
21.909
7.484
a. Dependent Variable: health cost spent/person - 2000
2.416
Standardized
Coefficients
Beta
.339
.013
t
-.444
2.428
.100
Sig.
.659
.020
.921
.418
3.098
.004
If:
• # of hospitals and
• # of MD’s
Are used to predict median health care
costs…
Model Summary
Model
1
R
.758a
R Square
.575
Adjusted
R Square
.557
Std. Error of
the Estimate
**********
a. Predictors: (Constant), # of MD's/100,000 people,
number of hospitals/100000
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
7566999
5598500
13165499
df
2
47
49
Mean Square
3783499.516
119117.014
F
31.763
a. Predictors: (Constant), # of MD's/100,000 people, number of hospitals/100000
b. Dependent Variable: health cost spent/person - 2000
Sig.
.000a
Coefficientsa
Model
1
(Constant)
number of
hospitals/100000
# of MD' s/100,000 people
Unstandardized
Coefficients
B
Std. Error
1746.150
289.435
Standardized
Coefficients
Beta
t
6.033
Sig.
.000
94.256
35.635
.273
2.645
.011
7.223
.908
.822
7.955
.000
a. Dependent Variable: health cost spent/person - 2000