SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression Basic Relationships Multinomial Logistic Regression Describing Relationships Classification Accuracy Sample Problems.

Download Report

Transcript SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression Basic Relationships Multinomial Logistic Regression Describing Relationships Classification Accuracy Sample Problems.

SW388R7
Data Analysis &
Computers II
Slide 1
Multinomial Logistic Regression
Basic Relationships
Multinomial Logistic Regression
Describing Relationships
Classification Accuracy
Sample Problems
SW388R7
Data Analysis &
Computers II
Multinomial logistic regression
Slide 2




Multinomial logistic regression is used to analyze relationships
between a non-metric dependent variable and metric or
dichotomous independent variables.
Multinomial logistic regression compares multiple groups
through a combination of binary logistic regressions.
The group comparisons are equivalent to the comparisons for a
dummy-coded dependent variable, with the group with the
highest numeric score used as the reference group.
For example, if we wanted to study differences in BSW, MSW,
and PhD students using multinomial logistic regression, the
analysis would compare BSW students to PhD students and MSW
students to PhD students. For each independent variable, there
would be two comparisons.
SW388R7
Data Analysis &
Computers II
What multinomial logistic regression predicts
Slide 3




Multinomial logistic regression provides a set of coefficients for
each of the two comparisons. The coefficients for the
reference group are all zeros, similar to the coefficients for the
reference group for a dummy-coded variable.
Thus, there are three equations, one for each of the groups
defined by the dependent variable.
The three equations can be used to compute the probability
that a subject is a member of each of the three groups. A case
is predicted to belong to the group associated with the highest
probability.
Predicted group membership can be compared to actual group
membership to obtain a measure of classification accuracy.
SW388R7
Data Analysis &
Computers II
Level of measurement requirements
Slide 4




Multinomial logistic regression analysis requires that the
dependent variable be non-metric. Dichotomous, nominal, and
ordinal variables satisfy the level of measurement requirement.
Multinomial logistic regression analysis requires that the
independent variables be metric or dichotomous. Since SPSS
will automatically dummy-code nominal level variables, they
can be included since they will be dichotomized in the analysis.
In SPSS, non-metric independent variables are included as
“factors.” SPSS will dummy-code non-metric IVs.
In SPSS, metric independent variables are included as
“covariates.” If an independent variable is ordinal, we will
attach the usual caution.
SW388R7
Data Analysis &
Computers II
Assumptions and outliers
Slide 5



Multinomial logistic regression does not make any assumptions
of normality, linearity, and homogeneity of variance for the
independent variables.
Because it does not impose these requirements, it is preferred
to discriminant analysis when the data does not satisfy these
assumptions.
SPSS does not compute any diagnostic statistics for outliers. To
evaluate outliers, the advice is to run multiple binary logistic
regressions and use those results to test the exclusion of
outliers or influential cases.
SW388R7
Data Analysis &
Computers II
Sample size requirements
Slide 6


The minimum number of cases per independent variable is 10,
using a guideline provided by Hosmer and Lemeshow, authors of
Applied Logistic Regression, one of the main resources for
Logistic Regression.
For preferred case-to-variable ratios, we will use 20 to 1.
SW388R7
Data Analysis &
Computers II
Methods for including variables
Slide 7

The only method for selecting independent variables in SPSS is
simultaneous or direct entry.
SW388R7
Data Analysis &
Computers II
Overall test of relationship - 1
Slide 8



The overall test of relationship among the independent
variables and groups defined by the dependent is based on the
reduction in the likelihood values for a model which does not
contain any independent variables and the model that contains
the independent variables.
This difference in likelihood follows a chi-square distribution,
and is referred to as the model chi-square.
The significance test for the final model chi-square (after the
independent variables have been added) is our statistical
evidence of the presence of a relationship between the
dependent variable and the combination of the independent
variables.
SW388R7
Data Analysis &
Computers II
Overall test of relationship - 2
Slide 9
Model Fitting Information
Model
Intercept Only
Final
-2 Log
Likelihood
284.429
265.972
Chi-Square
18.457
df
Sig .
6
.005
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
Strength of multinomial logistic regression
relationship
SW388R7
Data Analysis &
Computers II
Slide 10


While multinomial logistic regression does compute correlation
measures to estimate the strength of the relationship (pseudo R
square measures, such as Nagelkerke's R²), these correlations
measures do not really tell us much about the accuracy or
errors associated with the model.
A more useful measure to assess the utility of a multinomial
logistic regression model is classification accuracy, which
compares predicted group membership based on the logistic
model to the actual, known group membership, which is the
value for the dependent variable.
SW388R7
Data Analysis &
Computers II
Evaluating usefulness for logistic models
Slide 11



The benchmark that we will use to characterize a multinomial
logistic regression model as useful is a 25% improvement over
the rate of accuracy achievable by chance alone.
Even if the independent variables had no relationship to the
groups defined by the dependent variable, we would still
expect to be correct in our predictions of group membership
some percentage of the time. This is referred to as by chance
accuracy.
The estimate of by chance accuracy that we will use is the
proportional by chance accuracy rate, computed by summing
the squared percentage of cases in each group. The only
difference between by chance accuracy for binary logistic
models and by chance accuracy for multinomial logistic models
is the number of groups defined by the dependent variable.
SW388R7
Data Analysis &
Computers II
Computing by chance accuracy
Slide 12
The percentage of cases in each group defined by the dependent
variable is found in the ‘Case Processing Summary’ table.
Case Processing Summary
N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation
1
2
3
62
93
12
167
103
270
153a
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
a. The dependent variable has only one value observed
in 146 (95.4%) subpopulations.
The proportional by chance accuracy rate was
computed by calculating the proportion of cases for
each group based on the number of cases in each
group in the 'Case Processing Summary', and then
squaring and summing the proportion of cases in each
group (0.371² + 0.557² + 0.072² = 0.453).
The proportional by chance accuracy criteria is 56.6%
(1.25 x 45.3% = 56.6%).
SW388R7
Data Analysis &
Computers II
Comparing accuracy rates
Slide 13

To characterize our model as useful, we compare the overall
percentage accuracy rate produced by SPSS at the last step in which
variables are entered to 25% more than the proportional by chance
accuracy. (Note: SPSS does not compute a cross-validated accuracy
rate for multinomial logistic regression .)
Classification
Predicted
Observed
1
2
3
Overall Percentage
1
2
15
7
5
16.2%
47
86
7
83.8%
3
0
0
0
.0%
The classification accuracy rate was 60.5%
which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied in this example.
Percent
Correct
24.2%
92.5%
.0%
60.5%
SW388R7
Data Analysis &
Computers II
Numerical problems
Slide 14




The maximum likelihood method used to calculate multinomial
logistic regression is an iterative fitting process that attempts
to cycle through repetitions to find an answer.
Sometimes, the method will break down and not be able to
converge or find an answer.
Sometimes the method will produce wildly improbable results,
reporting that a one-unit change in an independent variable
increases the odds of the modeled event by hundreds of
thousands or millions. These implausible results can be
produced by multicollinearity, categories of predictors having
no cases or zero cells, and complete separation whereby the
two groups are perfectly separated by the scores on one or
more independent variables.
The clue that we have numerical problems and should not
interpret the results are standard errors for some independent
variables that are larger than 2.0.
Relationship of individual independent
variables and the dependent variable
SW388R7
Data Analysis &
Computers II
Slide 15


There are two types of tests for individual independent
variables:
 The likelihood ratio test evaluates the overall relationship
between an independent variable and the dependent
variable
 The Wald test evaluates whether or not the independent
variable is statistically significant in differentiating between
the two groups in each of the embedded binary logistic
comparisons.
If an independent variable has an overall relationship to the
dependent variable, it might or might not be statistically
significant in differentiating between pairs of groups defined by
the dependent variable.
Relationship of individual independent
variables and the dependent variable
SW388R7
Data Analysis &
Computers II
Slide 16



The interpretation for an independent variable focuses on its
ability to distinguish between pairs of groups and the
contribution which it makes to changing the odds of being in
one dependent variable group rather than the other.
We should not interpret the significance of an independent
variable’s role in distinguishing between pairs of groups unless
the independent variable also has an overall relationship to the
dependent variable in the likelihood ratio test.
The interpretation of an independent variable’s role in
differentiating dependent variable groups is the same as we
used in binary logistic regression. The difference in
multinomial logistic regression is that we can have multiple
interpretations for an independent variable in relation to
different pairs of groups.
Relationship of individual independent
variables and the dependent variable
SW388R7
Data Analysis &
Computers II
Slide 17
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
95% Confidence Interva
Exp(B)
SPSS
identifies
the
comparisons
it
makes
for
Wald
df
Sig .
Exp(B)
Lower Bound
Upper B
groups
defined by1the dependent
variable in
1.709
.191
the table of ‘Parameter Estimates,’ using either
.906 codes or1 the value
.341labels,1.019
the value
depending .980
.427
1
1.073
on the
options settings
for.514
pivot table
labeling. .868
4.913
1
.027
.253
.075
The reference category is identified in the
2.195
1
.138
footnote
to the table.
.017
1
.897
1.003
.963
In this
analysis, two
will
be
2.463
1 comparisons
.117
1.188
.958
made:
7.298
1
.007
.191
.057
a. The reference category is: 3.
HIGHWAYS
a
AND BRIDGES
TOO LITTLE
ABOUT RIGHT
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
•the TOO LITTLE group (coded 1, shaded
blue) will be compared to the TOO MUCH
Parameter
Estimatespurple)
group (coded
3, shaded
•the ABOUT RIGHT group (coded 2 ,
shaded orange)) will be compared to the
TOO MUCH group (coded 3, shaded
Std.purple).
Error
Wald
df
Sig .
Exp(B)
B
3.240
2.478
1.709
1
.191
The
reference
category
plays
the
same
role in
.019
.020
.906
1
.341
multinomial logistic regression that it plays in
.071
.108
.427
1 variable:
.514 it is
the dummy-coding
of a nominal
the
category
that
would
be
coded
with
zeros
-1.373
.620
4.913
1
.027
for
all
of
the
dummy-coded
variables
that
3.639
2.456
2.195
1
.138 all
other categories are interpreted against.
.003
.020
.017
1
.897
.172
.110
2.463
1
.117
-1.657
.613
7.298
1
.007
a. The reference category is: TOO MUCH.
1.019
1.073
.253
1.003
1.188
.191
95% C
Lower B
Relationship of individual independent
variables and the dependent variable
SW388R7
Data Analysis &
Computers II
Slide 18
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig .
.309
.265
.110
.010
In this example, there is a
statistically significant
relationship between the
independent variable
CONLEGIS and the dependent
variable. (0.010 < 0.05)
The chi-square statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reduced model is
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
HIGHWAYS
a
AND BRIDGES
1
2
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
a. The reference category is: 3.
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
As well, the independent
variable CONLEGIS is
significant in distinguishing
both category 1 of the
95% Confidence Interval fo
dependent variable from
category 3 of the dependent Exp(B)
Sig
.
Exp(B) < 0.05)
Lower Bound
Upper Bou
variable.
(0.027
.191
.341
.514
.027
.138
.897
.117
.007
And the independent variable CONLEGIS is significant in
distinguishing category 2 of the dependent variable from
category 3 of the dependent variable. (0.007 < 0.05)
1.019
1.073
.253
.980
.868
.075
1.
1.
.
1.003
1.188
.191
.963
.958
.057
1.
1.
.
SW388R7
Data Analysis &
Computers II
Slide 19
Interpreting relationship of individual independent
variables to the dependent variable
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Survey
Likelihood
of respondents who had less confidence in congress (higher
values
correspond to lower confidence) were less likely to be in the
Reduced
group
of
survey respondents
who
we spend too little money
Model
Chi-Square
df
Sigthought
.
on highways and bridges (DV category 1), rather than the group of
268.323
2.350
2
.309spend too much money on
survey respondents
who thought
we
highways and2.652
bridges (DV 2category.265
3).
268.625
270.395
4.423
2
.110
For
each
unit
increase
in
confidence
in Congress, the odds of being
275.194
9.221
2
.010
in the group of survey respondents who thought we spend too little
The chi-square statistic
is the difference
in -2 log-likelihoods
money
on highways
and bridges decreased by 74.7%. (0.253 – 1.0
between the final model
a reduced model. The reduced model is
= and
-0.747)
Parameter Estimates
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
a. The reference category is: 3.
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig .
.191
.341
.514
.027
.138
.897
.117
.007
Exp(B)
95% Confidence Interval fo
Exp(B)
Lower Bound
Upper Bou
1.019
1.073
.253
.980
.868
.075
1.
1.
.
1.003
1.188
.191
.963
.958
.057
1.
1.
.
SW388R7
Data Analysis &
Computers II
Slide 20
Interpreting relationship of individual independent
variables to the dependent variable
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig .
.309
.265
.110
.010
Survey respondents who had less confidence in congress (higher
correspond
to lower confidence) were less likely to be in the
The chi-square statistic is values
the difference
in -2 log-likelihoods
group
of
survey
respondents
who thought we spend about the right
between the final model and a reduced model. The reduced model is
Parameter
Estimates
amount
of
money
on
highways
(DV category 2), rather
formed by omitting an effect from the final model. The null hypothesis and bridges
than
the
group
of
survey
respondents
who
thought
we spend too
is that all parameters of that effect are 0.
much money on highways and bridges (DV Category 3).
HIGHWAYS
a
AND BRIDGES
1
2
For each unit
in confidence
in Congress,
the
ofExp(B)
being
B increase
Std. Error
Wald
df
Sig odds
.
in the group
of
survey
respondents
who
thought
we
spend
about
the
Intercept
3.240
2.478
1.709
1
.191
right amount of money on highways and bridges decreased by
AGE 80.9%. (0.191
.019 – 1.0 =
.020
1
.341
1.019
0.809) .906
EDUC
.071
.108
.427
1
.514
1.073
CONLEGIS
-1.373
.620
4.913
1
.027
.253
Intercept
3.639
2.456
2.195
1
.138
AGE
.003
.020
.017
1
.897
1.003
EDUC
.172
.110
2.463
1
.117
1.188
CONLEGIS
-1.657
.613
7.298
1
.007
.191
a. The reference category is: 3.
95% Confidence Interval fo
Exp(B)
Lower Bound
Upper Bou
.980
.868
.075
1.
1.
.
.963
.958
.057
1.
1.
.
Relationship of individual independent
variables and the dependent variable
SW388R7
Data Analysis &
Computers II
Slide 21
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Likelihood of
Reduced
Model
327.463a
333.440
329.606
334.636
338.985
Chi-Square
.000
5.976
2.143
7.173
11.521
df
Sig .
0
2
2
2
2
.
.050
.343
.028
.003
The chi-sq uare statistic is the difference in -2 log-likelihoods
Parameter Estimates
between the final model and a reduced model. The reduced model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
a
NATCHLD
B
Std. Error
Wald
df
This
reducedIntercept
model is equivalent to the final model because
TOO LITTLE
8.434
2.233
14.261
1
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
EDUC
-.066
.102
.414
1
POLVIEWS
-.575
.251
5.234
1
[SEX=1]
-2.167
.805
7.242
1
b
[SEX=2]
0
.
.
0
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
AGE
-.001
.018
.003
1
EDUC
.011
.104
.011
1
POLVIEWS
-.397
.257
2.375
1
[SEX=1]
-1.606
.824
3.800
1
b
[SEX=2]
0
.
.
0
a. The reference category is: TOO MUCH.
In this example, there is
a statistically significant
relationship between SEX
and the dependent
variable, spending on
childcare assistance.
As well, SEX plays a
statistically significant role
in differentiating the
TOO
95% Confidence
Interval f
LITTLE group from the TOO
Exp(B)
(reference)
group.
SigMUCH
.
Exp(B)
Lower
Bound
Upper Bou
(0.007 < 0.5)
.000
.185
.977
.944
.520
.936
.766
.022
.563
.344
.007
.115
.024
.
.
.
However, SEX does not
.047differentiate the ABOUT
group from the
.955RIGHT .999
.965
TOO
MUCH
(reference)
.916
1.011
.824
group.(0.51 > 0.5)
.123
.673
.406
.051
.201
.040
.
.
.
1
1
.
.
1
1
1
1
SW388R7
Data Analysis &
Computers II
Slide 22
Interpreting relationship of individual independent
variables and the dependent variable
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
POLVIEWS
SEX
-2 Log
Likelihood of
Reduced
Model
Chi-Square
df
Sig .
327.463a
.000
0
.
Survey
respondents
333.440
5.976who were2male (code
.050 1 for sex) were less likely
to be in the group of survey respondents who thought we spend too
329.606
2.143
2
.343
little
money on childcare
assistance
(DV
category 1), rather than the
334.636
7.173
2
.028 we spend too much
group
of survey respondents
who
thought
money
on childcare
3).
338.985
11.521assistance2 (DV category
.003
The chi-sq uare statistic is the difference in -2 log-likelihoods
were male
were 88.5%
less likely (0.115 –
Parameter
Estimates
between the final Survey
model andrespondents
a reduced model.who
The reduced
model
1.0 = -0.885) to be in the group of survey respondents who thought
is formed by omitting
effect from
finalmoney
model. The
we an
spend
too the
little
onnull
childcare assistance.
hypothesis is that all parameters of that effect are 0.
a.
a
NATCHLD
B
Std. Error
Wald
df
Sig .
Exp(B)
This
reducedIntercept
model is equivalent to the final model because
TOO LITTLE
8.434
2.233
14.261
1
.000
omitting the effect does not increase the degrees of freedom.
AGE
-.023
.017
1.756
1
.185
.977
EDUC
-.066
.102
.414
1
.520
.936
POLVIEWS
-.575
.251
5.234
1
.022
.563
[SEX=1]
-2.167
.805
7.242
1
.007
.115
b
[SEX=2]
0
.
.
0
.
.
ABOUT RIGHT Intercept
4.485
2.255
3.955
1
.047
AGE
-.001
.018
.003
1
.955
.999
EDUC
.011
.104
.011
1
.916
1.011
POLVIEWS
-.397
.257
2.375
1
.123
.673
[SEX=1]
-1.606
.824
3.800
1
.051
.201
b
[SEX=2]
0
.
.
0
.
.
a. The reference category is: TOO MUCH.
95% Confidence Interval f
Exp(B)
Lower Bound
Upper Bou
.944
.766
.344
.024
.
1
1
.
.
.965
.824
.406
.040
.
1
1
1
1
Interpreting relationships for independent
variable in problems
SW388R7
Data Analysis &
Computers II
Slide 23

In the multinomial logistic regression problems, the problem
statement will ask about only one of the independent variables.
The answer will be true or false based on only the relationship
between the specified independent variable and the dependent
variable. The individual relationships between other
independent variables are the dependent variable are not used
in determining whether or not the answer is true or false.
SW388R7
Data Analysis &
Computers II
Problem 1
Slide 24
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 1
Slide 25
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who For
thought
spend too
thesewe
problems,
we little
will money on highways and
bridges from survey respondents who assume
thoughtthat
we spend
too
much
money on highways and
there is no problem
bridges and survey respondents who thought
we spend
theorright amount of money on
with missing
data,about
outliers,
highways and bridges from survey respondents
who
thought
we
spend too much money on
influential cases, and that the
highways and bridges.
validation analysis will confirm
the in
generalizability
the
Among this set of predictors, confidence
Congress wasofhelpful
in distinguishing among the
results
groups defined by responses to opinion
about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
In little
this problem,
wehighways
are told to
respondents who thought we spend too
money on
and bridges, rather than the
use 0.05
as alpha
the money on highways and bridges.
group of survey respondents who thought
we spend
too for
much
For each unit increase in confidence in
Congress, logistic
the odds
of being in the group of survey
multinomial
regression.
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 2
Slide 26
The variables listed first in the problem
statement are the independent variables
(IVs): "age" [age], "highest year of school
11. Incompleted"
the dataset
GSS2000,
is the following
statement true, false, or an incorrect application
[educ]
and "confidence
in
of a statistic?
Assume
that
there
is
no
problem
with missing data, outliers, or influential cases,
Congress" [conlegis].
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways andThe
bridges
from
survey
respondents who thought we spend too much money on
variable
used
to define
highways andgroups
bridges.
is the dependent
variable (DV): "opinion about
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
spending on highways and
groups defined
by responses to opinion about spending on highways and bridges. Survey
bridges"
respondents who had[natroad].
less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the
oddsonly
of being
in the group of survey
supports direct or
respondents who thought we spend too little moneySPSS
on highways
and bridges decreased by
entryless
of independent
74.7%. Survey respondents who had less confidence simultaneous
in congress were
likely to be in the
variables
in
multinomial
group of survey respondents who thought we spend about the right amount logistic
of money on
regression,
so we
have
no choice
of
highways and bridges, rather than the group of survey
respondents
who
thought
we spend
too
method
for
entering
variables.
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 3
Slide 27
SPSS multinomial logistic regression models the relationship by
comparing each of the groups defined by the dependent variable to the
group with the highest code value.
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
TheAssume
responses
opinion
about
spending
highways
and
bridgesor
were:
of a statistic?
thattothere
is no
problem
withon
missing
data,
outliers,
influential cases,
and that the
validation
willright,
confirm
of the results. Use a level of
1=
Too little, analysis
2 = About
andthe
3 =generalizability
Too much.
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents whoThe
thought
we will
spend
too in
little
on highways and bridges, rather than the
analysis
result
two money
comparisons:
group of survey respondents
who
thought
we
spend
too
much money
highways and bridges.
• survey respondents who thought we spend
too littleon
money
For each unit increase
in
confidence
in
Congress,
the
odds
of
being
in
the
group of survey
versus survey respondents who thought we spend too much
respondents who thought
too and
littlebridges
money on highways and bridges decreased by
moneywe
on spend
highways
74.7%. Survey respondents
who
had
less
confidence
werethe
less
likely to be in the
• survey respondents who thought in
wecongress
spend about
right
group of survey respondents
who
thought
we
spend
about
the
right
amount
of money on
amount of money versus survey respondents who thought we
highways and bridges, rather than the group of survey respondents who thought we spend too
spend and
too much
money
on highways
and bridges.
much money on highways
bridges.
For each
unit increase
in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 4
Slide 28
Each problem includes a statement about the relationship between
one independent variable and the dependent variable. The answer
to the problem is based on the stated relationship, ignoring the
independent
variables
and
the"confidence in
The variablesrelationships
"age" [age], between
"highest the
yearother
of school
completed"
[educ]
and
Congress" [conlegis]
were
useful predictors for distinguishing between groups based on
dependent
variable.
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate This
survey
respondents
who
thought we
too
little
money on highways and
problem
identifies
a difference
forspend
both of
the
comparisons
bridges from among
survey groups
respondents
who
thought
we
spend
too
much
money on highways and
modeled by the multinomial logistic regression.
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend too little money on highways and bridges, rather
than the group of survey respondents who thought we spend too much money on highways
and bridges. For each unit increase in confidence in Congress, the odds of being in the
group of survey respondents who thought we spend too little money on highways and
bridges decreased by 74.7%. Survey respondents who had less confidence in congress were
less likely to be in the group of survey respondents who thought we spend about the right
amount of money on highways and bridges, rather than the group of survey respondents
who thought we spend too much money on highways and bridges. For each unit increase in
confidence in Congress, the odds of being in the group of survey respondents who thought
we spend about the right amount of money on highways and bridges decreased by 80.9%.
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 5
Slide 29
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
order for the
the multinomial
logistic
For each unit increase in confidence inInCongress,
odds of being
in theregression
group of survey
question
to
be
true,
the
overall
must
respondents who thought we spend too little money on highways andrelationship
bridges decreased
by
beconfidence
statistically in
significant,
mustlikely
be noto be in the
74.7%. Survey respondents who had less
congress there
were less
evidence
of numerical
problems,
the classification
group of survey respondents who thought
we spend
about the
right amount
of money on
must
be substantially
better than
highways and bridges, rather than the accuracy
group of rate
survey
respondents
who thought
we spend too
much money on highways and bridges.could
For each
unit increase
in confidence
Congress, the
be obtained
by chance
alone, andinthe
odds of being in the group of survey respondents
who thought
we spend
about
the right amount
stated individual
relationship
must be
statistically
of money on highways and bridges decreased
by
80.9%.
significant and interpreted correctly.
SW388R7
Data Analysis &
Computers II
Request multinomial logistic regression
Slide 30
Select the Regression |
Multinomial Logistic…
command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II
Selecting the dependent variable
Slide 31
First, highlight the
dependent variable
natroad in the list
of variables.
Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.
SW388R7
Data Analysis &
Computers II
Selecting metric independent variables
Slide 32
Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
Move the metric
independent variables,
age, educ and conlegis to
the Covariate(s) list box.
In this analysis, there are no nonmetric independent variables. Nonmetric independent variables would be
moved to the Factor(s) list box.
SW388R7
Data Analysis &
Computers II
Specifying statistics to include in the output
Slide 33
While we will accept most of
the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics… button
to make a request.
SW388R7
Data Analysis &
Computers II
Requesting the classification table
Slide 34
First, keep the SPSS
defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the
checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
SW388R7
Data Analysis &
Computers II
Slide 35
Completing the multinomial
logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports
additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT - 1
Slide 36
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had
less confidence
congressrequires
were less
likely
Multinomial
logistic in
regression
that
the to be in the group of survey
respondents who thought
we spend
toobelittle
money and
on highways
and bridges, rather than the
dependent
variable
non-metric
the
group of survey respondents
who
thought
we
spend
too
much
money
on highways and bridges.
independent variables be metric or dichotomous.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought
weabout
spend
too little
on highways
and bridges decreased by
"Opinion
spending
onmoney
highways
and
bridges" who
[natroad]
is ordinal,
satisfying
the non74.7%. Survey respondents
had less
confidence
in congress
were less likely to be in the
metric level
of measurement
requirement
forright
the amount of money on
group of survey respondents
who
thought we spend
about the
dependent
variable.
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
contains
three categories:
survey
respondents
odds of being in theItgroup
of survey
respondents
who thought
we spend about the right amount
who thought we spend too little money, about
of money on highways
the and
rightbridges
amountdecreased
of money, by
and80.9%.
too much money
on highways and bridges.
1. True
2. True with caution
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT - 2
Slide 37
"Age" [age] and "highest year of
school completed" [educ] are interval,
11. satisfying
In the dataset
GSS2000,
is the following statement true, false, or an incorrect application
the metric
or dichotomous
of alevel
statistic?
Assume thatrequirement
there is nofor
problem with missing data, outliers, or influential cases,
of measurement
independent
variables.
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from
survey respondents
thought
spend too much money on
"Confidence
in Congress"who
[conlegis]
is we
ordinal,
highways and bridges. satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
Among this set of predictors,
in Congress
helpfulthe
in distinguishing
among the
ordinalconfidence
level variables
as metricwas
variables,
level
groups defined by responses
to opinion about
spending
and bridges. Survey
of measurement
requirement
for on
thehighways
analysis is
respondents who had less
confidence
congress
were lessdolikely
to be in the group of survey
satisfied.
Sincein
some
data analysts
not agree
with
convention,
note ofon
caution
should
respondents who thought
wethis
spend
too littlea money
highways
andbebridges, rather than the
included
in
our
interpretation.
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
SW388R7
Data Analysis &
Computers II
Sample size – ratio of cases to variables
Slide 38
Case Processing Summary
N
HIGHWAYS
AND BRIDGES
Valid
Missing
Total
Subpopulation
1
2
3
62
93
12
167
103
270
153a
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
a. The dependent variable has only one value observed
Multinomial logistic
regression
requires that the minimum ratio
in 146
(95.4%) subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (167) to number of independent variables
(3) was 55.7 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is 20
to 1. The ratio of 55.7 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
SW388R7
Data Analysis &
Computers II
Slide 39
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Model Fitting Information
Model
Intercept Only
Final
-2 Log
Likelihood
284.429
265.972
Chi-Square
18.457
df
Sig .
6
.005
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(18.457) was 0.005, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
SW388R7
Data Analysis &
Computers II
NUMERICAL PROBLEMS
Slide 40
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
a. The reference category is: 3.
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
95% Confidence Interv
Exp(B)
Multicollinearity in the multinomial
df
Sig .
Exp(B) is Lower Bound
Upper
logistic
regression
solution
1 by examining
.191
detected
the standard
errors 1for the b
coefficients.
.341
1.019 A
.980
standard error larger than 2.0
1
.514
1.073
indicates
numerical
problems,
such .868
1
.027
.253
.075
as multicollinearity among the
independent
variables,
zero cells for
1
.138
a dummy-coded
independent
1
.897
1.003
.963
variable because all of the subjects
1 same.117
.958
have the
value for1.188
the
variable,
separation'
1 and 'complete
.007
.191
.057
whereby the two groups in the
dependent event variable can be
perfectly separated by scores on
one of the independent variables.
Analyses that indicate numerical
problems should not be interpreted.
None of the independent variables in
this analysis had a standard error
larger than 2.0. (We are not
interested in the standard errors
associated with the intercept.)
SW388R7
Data Analysis &
Computers II
Slide 41
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests
Effect
Intercept
AGE
EDUC
CONLEGIS
-2 Log
Likelihood of
Reduced
Model
268.323
268.625
270.395
275.194
Chi-Square
2.350
2.652
4.423
9.221
df
2
2
2
2
Sig .
.309
.265
.110
.010
The chi-square statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reduced model is
formed by omitting an effect from the final model. The null hypothesis
is that all parameters of that effect are 0.
The statistical significance of the relationship between
confidence in Congress and opinion about spending on
highways and bridges is based on the statistical significance of
the chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".
For this relationship, the probability of the chi-square statistic
(9.221) was 0.010, less than or equal to the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with confidence in Congress were equal
to zero was rejected. The existence of a relationship between
confidence in Congress and opinion about spending on
highways and bridges was supported.
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
SW388R7
Data Analysis &
Computers II
Slide 42
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig .
.191
.341
.514
.027
.138
.897
.117
.007
a. The reference category is: 3.
In the comparison of survey respondents who thought we spend
too little money on highways and bridges to survey respondents
who thought we spend too much money on highways and
bridges, the probability of the Wald statistic (4.913) for the
variable confidence in Congress [conlegis] was 0.027. Since the
probability was less than or equal to the level of significance of
0.05, the null hypothesis that the b coefficient for confidence in
Congress was equal to zero for this comparison was rejected.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
SW388R7
Data Analysis &
Computers II
Slide 43
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig .
.191
.341
.514
.027
.138
.897
.117
.007
a. The reference
Thecategory
value is:
of 3.Exp(B) was 0.253 which implies that for each unit
increase in confidence in Congress the odds decreased by 74.7%
(0.253 - 1.0 = -0.747).
The relationship stated in the problem is supported. Survey
respondents who had less confidence in congress were less likely
to be in the group of survey respondents who thought we spend
too little money on highways and bridges, rather than the group of
survey respondents who thought we spend too much money on
highways and bridges. For each unit increase in confidence in
Congress, the odds of being in the group of survey respondents
who thought we spend too little money on highways and bridges
decreased by 74.7%.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
SW388R7
Data Analysis &
Computers II
Slide 44
Parameter Estimates
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig .
.191
.341
.514
.027
.138
.897
.117
.007
a. The reference category is: 3.
In the comparison of survey respondents who thought we spend
about the right amount of money on highways and bridges to
survey respondents who thought we spend too much money on
highways and bridges, the probability of the Wald statistic
(7.298) for the variable confidence in Congress [conlegis] was
0.007. Since the probability was less than or equal to the level
of significance of 0.05, the null hypothesis that the b coefficient
for confidence in Congress was equal to zero for this comparison
was rejected.
Exp(B)
95% Confiden
Exp
Lower Bound
1.019
1.073
.253
.980
.868
.075
1.003
1.188
.191
.963
.958
.057
SW388R7
Data Analysis &
Computers II
Slide 45
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
Parameter Estimates
95% Con
HIGHWAYS
a
AND BRIDGES
1
2
Intercept
AGE
EDUC
CONLEGIS
Intercept
AGE
EDUC
CONLEGIS
B
3.240
.019
.071
-1.373
3.639
.003
.172
-1.657
Std. Error
2.478
.020
.108
.620
2.456
.020
.110
.613
Wald
1.709
.906
.427
4.913
2.195
.017
2.463
7.298
df
1
1
1
1
1
1
1
1
Sig .
.191
.341
.514
.027
.138
.897
.117
.007
a. The reference category is: 3.
The value of Exp(B) was 0.191 which implies that for each unit increase in
confidence in Congress the odds decreased by 80.9% (0.191-1.0=-0.809).
The relationship stated in the problem is supported. Survey respondents
who had less confidence in congress were less likely to be in the group of
survey respondents who thought we spend about the right amount of
money on highways and bridges, rather than the group of survey
respondents who thought we spend too much money on highways and
bridges. For each unit increase in confidence in Congress, the odds of
being in the group of survey respondents who thought we spend about the
right amount of money on highways and bridges decreased by 80.9%.
Exp(B)
Lower Bou
1.019
1.073
.253
.9
.8
.0
1.003
1.188
.191
.9
.9
.0
SW388R7
Data Analysis &
Computers II
Slide 46
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC
REGRESSION MODEL: BY CHANCE ACCURACY RATE
The independent variables could be characterized as useful
predictors distinguishing survey respondents who thought we
spend too little money on highways and bridges, survey
respondents who thought we spend about the right amount
of money on highways and bridges and survey respondents
who thought we spend too much money on highways and
bridges if the classification accuracy rate was substantially
higher than the accuracy attainable by chance alone.
Operationally, the classification accuracy rate should be 25%
or more higher than the proportional by chance accuracy
rate.
Case Processing Summary
N
HIGHWAYS
AND BRIDGES
1
2
3
Marginal
Percentage
37.1%
55.7%
7.2%
100.0%
62
93
12
Valid
167
Missing
103
Total
The proportional
by chance accuracy rate270
was computed by
calculating
the proportion of cases for each
Subpopulation
153agroup based on
the number
of cases in each group in the 'Case Processing
a.
variable has
onlysumming
one value observed
Summary',The
anddependent
then squaring
and
the proportion of
in
146
(95.4%)
subpopulations.
cases in each group (0.371² + 0.557² + 0.072² = 0.453).
SW388R7
Data Analysis &
Computers II
Slide 47
CLASSIFICATION USING THE MULTINOMIAL LOGISTIC
REGRESSION MODEL: CLASSIFICATION ACCURACY
Classification
Predicted
Observed
1
2
3
Overall Percentage
1
2
15
7
5
16.2%
47
86
7
83.8%
3
0
0
0
.0%
The classification accuracy rate was 60.5%
which was greater than or equal to the
proportional by chance accuracy criteria of
56.6% (1.25 x 45.3% = 56.6%).
The criteria for classification accuracy is
satisfied.
Percent
Correct
24.2%
92.5%
.0%
60.5%
SW388R7
Data Analysis &
Computers II
Answering the question in problem 1 - 1
Slide 48
11. In the dataset GSS2000, is the following statement true, false, or an incorrect application
of a statistic? Assume that there is no problem with missing data, outliers, or influential cases,
and that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "age" [age], "highest year of school completed" [educ] and "confidence in
Congress" [conlegis] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on highways and bridges" [natroad]. These predictors
differentiate survey respondents who thought we spend too little money on highways and
bridges from survey respondents who thought we spend too much money on highways and
bridges and survey respondents who thought we spend about the right amount of money on
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
We in
found
a statistically
significant
respondents who had less confidence
congress
were less
likely to overall
be in the group of survey
between
combination
of
respondents who thought we spendrelationship
too little money
onthe
highways
and bridges,
rather than the
independent
variables
and the
dependent
group of survey respondents who thought
we spend
too much
money
on highways and bridges.
variable.
For each unit increase in confidence
in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had
less was
confidence
in congress
were less
likelyinto be in the
There
no evidence
of numerical
problems
group of survey respondents who thought
we spend about the right amount of money on
the solution.
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges.
For each
unit increaseaccuracy
in confidence
in Congress, the
Moreover,
the classification
surpassed
odds of being in the group of survey
respondents
who
thought
we
spend
about
the proportional by chance accuracy criteria, the right amount
of money on highways and bridgessupporting
decreased the
by 80.9%.
utility of the model.
1. True
2. True with caution
3. False
SW388R7
Data Analysis &
Computers II
Answering the question in problem 1 - 2
Slide 49
We verified
thatyear
eachof
statement
about the relationship
The variables "age" [age],
"highest
school completed"
[educ] and "confidence in
an independent
variable
and the dependent
Congress" [conlegis]between
were useful
predictors for
distinguishing
between groups based on
variable
correct
both direction
of the relationship
responses to "opinion
about was
spending
oninhighways
and bridges"
[natroad]. These predictors
the change
in likelihood
associated
with
a one-unit
differentiate surveyand
respondents
who
thought we
spend too
little
money on highways and
bridges from surveychange
respondents
who
thought
we
spend
too
much
money on highways and
of the independent variable, for both of the
bridges and survey respondents
thought
we stated
spend about
the right amount of money on
comparisons who
between
groups
in the problem.
highways and bridges from survey respondents who thought we spend too much money on
highways and bridges.
Among this set of predictors, confidence in Congress was helpful in distinguishing among the
groups defined by responses to opinion about spending on highways and bridges. Survey
respondents who had less confidence in congress were less likely to be in the group of survey
respondents who thought we spend too little money on highways and bridges, rather than the
group of survey respondents who thought we spend too much money on highways and bridges.
For each unit increase in confidence in Congress, the odds of being in the group of survey
respondents who thought we spend too little money on highways and bridges decreased by
74.7%. Survey respondents who had less confidence in congress were less likely to be in the
group of survey respondents who thought we spend about the right amount of money on
highways and bridges, rather than the group of survey respondents who thought we spend too
much money on highways and bridges. For each unit increase in confidence in Congress, the
odds of being in the group of survey respondents who thought we spend about the right amount
of money on highways and bridges decreased by 80.9%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
The answer to the question is true
with caution.
A caution is added because of the
inclusion of ordinal level variables.
SW388R7
Data Analysis &
Computers II
Problem 2
Slide 50
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 1
Slide 51
1. In the dataset GSS2000, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, outliers, or
influential cases, and that the validation analysis will confirm the generalizability of the
results. Use a level of significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration"
For these [natspac].
problems, These
we willpredictors differentiate survey
respondents who thought we spend too
little money
on space
exploration from survey
assume
that there
is no problem
respondents who thought we spend too
much
money
on
space
exploration
and survey
with missing data, outliers,
or
respondents who thought we spend about
the
right
amount
of
money
on
space
exploration from
influential cases, and that the
survey respondents who thought we spend
too much money on space exploration.
validation analysis will confirm
the generalizability of the
Among this set of predictors, total family
income was helpful in distinguishing among the groups
results
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were In
more
to be
thetold
group
this likely
problem,
weinare
to of survey respondents who
thought we spend about the right amount
of
money
on
space
exploration,
rather than the group
use 0.05 as alpha for the
of survey respondents who thought wemultinomial
spend too logistic
much money
on
space
exploration.
For each
regression.
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 2
Slide 52
The variables listed first in the problem
statement are the independent variables
1. In (IVs):
the dataset
GSS2000,
is the following
statement true, false, or an incorrect application of
"highest
year of school
completed"
a statistic?
Assume
that
there
is
no
problem
with missing data, outliers, or influential cases, and
[educ], "sex" [sex] and "total family
that the
validation
analysis will confirm the generalizability of the results. Use a level of
income"
[income98].
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
The variable used to define
exploration.
groups is the dependent
variable (DV): "opinion about
Among
this on
setspace
of predictors, total family income was helpful in distinguishing among the groups
spending
defined
by responses to opinion about spending on space exploration. Survey respondents who
exploration" [natspac].
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
SPSS
supports
direct or
unit increase in total family income, the odds of being
in only
the group
of survey
respondents who
simultaneous
entry
of
independent
thought we spend about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
variables in multinomial logistic
regression, so we have no choice of
method for entering variables.
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 3
Slide 53
SPSS multinomial logistic regression models the relationship
by comparing each of the groups defined by the dependent
variable to the group with the highest code value.
1. In the dataset
GSS2000,toisopinion
the following
statement
false, or an incorrect application of
The responses
about spending
ontrue,
the space
a statistic? Assume
that
there is no problem with missing data, outliers, or influential cases, and
program
were:
that the validation
analysis
confirm
generalizability
1= Too little, 2 =will
About
right,the
and
3 = Too much. of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration
from survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
The analysis
willwere
resultmore
in two
comparisons:
had higher total family
incomes
likely
to be in the group of survey respondents who
•
survey
respondents
who
thought
spend exploration,
too little money
thought we spend about the right amount of money we
on space
rather than the group
versus
survey
respondents
who
thought
we
too exploration.
much
of survey respondents who thought we spend too much money spend
on space
For each
money
space exploration
unit increase in total
familyonincome,
the odds of being in the group of survey respondents who
survey
respondents
whoofthought
spend exploration
about the right
thought we spend• about
the
right amount
money we
on space
increased by 6.0%.
amount of money versus survey respondents who thought we
spend too much money on space exploration.
1. True
2. True with caution
3. False
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 4
Slide 54
Each
problem
includes
a statement
about
the "sex" [sex] and "total family income"
The variables
"highest
year
of school
completed"
[educ],
onefor
independent
variable
and groups based on responses to
[income98]relationship
were usefulbetween
predictors
distinguishing
between
the
dependent
variable.
The
answer
to
the
problem
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
is who
based
on the we
stated
relationship,
ignoringon
the
respondents
thought
spend
too little money
space exploration from survey
respondents
who
thought
we
spend
too
much
money
on
relationships between the other independent space exploration and survey
respondents
who thought
wedependent
spend about
the right amount of money on space exploration from
variables
and the
variable.
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the
groups defined by responses to opinion about spending on space exploration. Survey
respondents who had higher total family incomes were more likely to be in the group of
survey respondents who thought we spend about the right amount of money on space
exploration, rather than the group of survey respondents who thought we spend too much
money on space exploration. For each unit increase in total family income, the odds of
being in the group of survey respondents who thought we spend about the right amount of
money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
True with caution
This problem identifies a difference for only one
of the two comparisons based on the three values
False
the dependent variable.
Inappropriate application of a of
statistic
Other problems will specify both of the possible
comparisons.
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 5
Slide 55
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups
defined by responses to opinion about spending on space exploration. Survey respondents who
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount of money on space exploration, rather than the group
of survey respondents who thought we spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
True
In order for the multinomial logistic regression
True with caution
question to be true, the overall relationship must
False
be statistically significant, there must be no
Inappropriate application of a statistic
evidence of numerical problems, the classification
accuracy rate must be substantially better than
could be obtained by chance alone, and the
stated individual relationship must be statistically
significant and interpreted correctly.
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT - 1
Slide 56
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate
survey respondents who thought we spend too little money on space exploration from
survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey respondents who thought we spend too much money on space
exploration.
Among this set of predictors, total family income was helpful in distinguishing among the groups
Multinomial
requires
the
defined by responses
to opinionlogistic
aboutregression
spending on
space that
exploration.
Survey respondents who
dependent
variable
be
non-metric
and
the
had higher total family incomes were more likely to be in the group of survey respondents who
independent
variables
dichotomous.
thought we spend about
the right
amountbeofmetric
moneyoron
space exploration, rather than the group
of survey respondents
who
thought
we
spend
too
much
money on space exploration. For each
"Opinion about spending on space exploration"
unit increase in total
family income,
the
odds of the
being
in the group of survey respondents who
[natspac]
is ordinal,
satisfying
non-metric
thought we spend about
the
right
amount
of
money
on
level of measurement requirement forspace
the exploration increased by 6.0%.
dependent variable.
1.
2.
3.
4.
True
It contains three categories: survey respondents
True with caution
who thought we spend too little money, about
the right amount of money, and too much money
False
space exploration.
Inappropriateon
application
of a statistic
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT - 2
Slide 57
"Highest year of school
"Sex" [sex] is dichotomous,
completed" [educ] is interval,
satisfying the metric or
satisfying the metric or
dichotomous
of incorrect
measurement
1. In the dataset
true, false,level
or an
application of
dichotomous
level ofGSS2000, is the following statement
requirement
for
independent
a statistic? Assume
that there
is no problem with missing data, outliers, or influential cases, and
measurement
requirement
for
variables.
independent
variables.analysis will confirm the generalizability
that the validation
of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family
income" [income98] were useful predictors for distinguishing between groups based on
responses to "opinion about spending on space exploration" [natspac]. These predictors
differentiate survey respondents who thought we spend too little money on space exploration
from survey respondents who thought we spend too much money on space exploration and
survey respondents who thought we spend about the right amount of money on space
exploration from survey
respondents
thought is
weordinal,
spend too much money on space
"Total family
income"who
[income98]
exploration.
satisfying the metric or dichotomous level of
measurement requirement for independent
variables. If we follow the convention of treating
Among this set of ordinal
predictors,
family
was helpful
distinguishing among the groups
level total
variables
as income
metric variables,
the in
level
defined by responses
to opinion about
spendingforonthe
space
exploration.
Survey respondents who
of measurement
requirement
analysis
is
had higher total family
incomes
likely to do
benot
in the
group of survey respondents who
satisfied.
Since were
some more
data analysts
agree
this
convention,
a note
of caution
be
thought we spendwith
about
the
right amount
of money
on should
space exploration,
rather than the group
included
in
our
interpretation.
of survey respondents who thought we spend about the right amount of money on space
exploration. For each unit increase in total family income, the odds of being in the group of
survey respondents who thought we spend about the right amount of money on space
exploration increased by 6.0%.
1. True
2. True with caution
SW388R7
Data Analysis &
Computers II
Request multinomial logistic regression
Slide 58
Select the Regression |
Multinomial Logistic…
command from the
Analyze menu.
SW388R7
Data Analysis &
Computers II
Selecting the dependent variable
Slide 59
First, highlight the
dependent variable
natspac in the list
of variables.
Second, click on the right
arrow button to move the
dependent variable to the
Dependent text box.
SW388R7
Data Analysis &
Computers II
Selecting non-metric independent variables
Slide 60
Non-metric independent variables are specified as
factors in multinomial logistic regression. Non-metric
variables can be either dichotomous, nominal, or ordinal.
These variables will be dummy coded as needed and
each value will be listed separately in the output.
Select the
dichotomous
variable sex.
Move the non-metric
independent variables
listed in the problem to
the Factor(s) list box.
SW388R7
Data Analysis &
Computers II
Selecting metric independent variables
Slide 61
Metric independent variables are specified as covariates
in multinomial logistic regression. Metric variables can
be either interval or, by convention, ordinal.
Move the metric
independent variables,
educ and income98, to
the Covariate(s) list box.
SW388R7
Data Analysis &
Computers II
Specifying statistics to include in the output
Slide 62
While we will accept most of
the SPSS defaults for the
analysis, we need to specifically
request the classification table.
Click on the Statistics… button
to make a request.
SW388R7
Data Analysis &
Computers II
Requesting the classification table
Slide 63
First, keep the SPSS
defaults for Summary
statistics, Likelihood
ratio test, and
Parameter estimates.
Second, mark the
checkbox for the
Classification table.
Third, click
on the
Continue
button to
complete the
request.
SW388R7
Data Analysis &
Computers II
Slide 64
Completing the multinomial
logistic regression request
Click on the OK
button to request
the output for the
multinomial logistic
regression.
The multinomial logistic procedure supports
additional commands to specify the model
computed for the relationships (we will use the
default main effects model), additional
specifications for computing the regression,
and saving classification results. We will not
make use of these options.
SW388R7
Data Analysis &
Computers II
Sample size – ratio of cases to variables
Slide 65
Case Processing Summary
N
SPACE EXPLORATION
PROGRAM
RESPONDENTS SEX
Valid
Missing
Total
Subpopulation
1
2
3
1
2
33
90
85
94
114
208
62
270
138a
Marginal
Percentage
15.9%
43.3%
40.9%
45.2%
54.8%
100.0%
a. The dependent variable has only one value observed in 112
Multinomial
logistic
regression requires that the minimum ratio
(81.2%)
subpopulations.
of valid cases to independent variables be at least 10 to 1. The
ratio of valid cases (208) to number of independent variables(
3) was 69.3 to 1, which was equal to or greater than the
minimum ratio. The requirement for a minimum ratio of cases
to independent variables was satisfied.
The preferred ratio of valid cases to independent variables is 20
to 1. The ratio of 69.3 to 1 was equal to or greater than the
preferred ratio. The preferred ratio of cases to independent
variables was satisfied.
SW388R7
Data Analysis &
Computers II
Slide 66
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES
Model Fitting Information
Model
Intercept Only
Final
-2 Log
Likelihood
354.268
334.967
Chi-Square
19.301
df
Sig .
6
.004
The presence of a relationship between the dependent
variable and combination of independent variables is
based on the statistical significance of the final model
chi-square in the SPSS table titled "Model Fitting
Information".
In this analysis, the probability of the model chi-square
(19.301) was 0.004, less than or equal to the level of
significance of 0.05. The null hypothesis that there was
no difference between the model without independent
variables and the model with independent variables was
rejected. The existence of a relationship between the
independent variables and the dependent variable was
supported.
SW388R7
Data Analysis &
Computers II
NUMERICAL PROBLEMS
Slide 67
Parameter Estimates
SPACE EXPLORATION
a
PROGRAM
1
2
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
Intercept
EDUC
INCOME98
[SEX=1]
[SEX=2]
B
-4.136
.101
.097
.672
0b
-2.487
.108
.058
.501
0b
Std. Error
1.157
.089
.050
.426
.
.840
.068
.034
.317
.
a. The reference category is: 3.
b. This parameter is set to zero because it is redundant.
Wald
12.779
1.276
3.701
2.488
.
8.774
2.521
2.932
2.492
.
df
95% Confidence
Exp(B)
Lower Bound
U
Sig .
Exp(B)
1
.000
Multicollinearity
in the multinomial
logistic regression
is
1
.259 solution
1.106
detected by examining the
standard1 errors .054
for the b 1.102
1
.115
1.959
coefficients.
A standard
error
larger than
2.0
indicates
numerical
0
.
.
problems,
such
as
multicollinearity
1
.003
among the independent variables,
1 for a dummy-coded
.112
1.114
zero cells
independent
variable
1
.087 because
1.060 all of
the subjects
have
the
same
value
1
.114
1.650
for the variable, and 'complete
0
. the two .
separation'
whereby
groups in the dependent event
variable can be perfectly separated
by scores on one of the
independent variables. Analyses
that indicate numerical problems
should not be interpreted.
None of the independent variables
in this analysis had a standard
error larger than 2.0.
.929
.998
.850
.
.975
.992
.886
.
SW388R7
Data Analysis &
Computers II
Slide 68
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Likelihood Ratio Tests
Effect
Intercept
EDUC
INCOME98
SEX
-2 Log
Likelihood of
Reduced
Model
334.967a
337.788
340.154
338.511
Chi-Square
.000
2.821
5.187
3.544
df
Sig .
0
2
2
2
The chi-sq uare statistic is the difference in -2 log-likelihoods
between the final model and a reduced model. The reduced model
is formed by omitting an effect from the final model. The null
hypothesis is that all parameters of that effect are 0.
a.
The statistical significance
of the relationship between
Thisopinion
reduced model
equivalent toon
the space
final model because
total family income and
aboutis spending
exploration is based on
the statistical
significance
ofdegrees
the of freedom.
omitting
the effect does
not increase the
chi-square statistic in the SPSS table titled "Likelihood
Ratio Tests".
For this relationship, the probability of the chi-square
statistic (5.187) was 0.075, greater than the level of
significance of 0.05. The null hypothesis that all of the b
coefficients associated with total family income were
equal to zero was not rejected. The existence of a
relationship between total family income and opinion
about spending on space exploration was not supported.
.
.244
.075
.170
SW388R7
Data Analysis &
Computers II
Answering the question in problem 2
Slide 69
1. In the dataset GSS2000, is the following statement true, false, or an incorrect application of
a statistic? Assume that there is no problem with missing data, outliers, or influential cases, and
that the validation analysis will confirm the generalizability of the results. Use a level of
significance of 0.05 for evaluating the statistical relationships.
The variables "highest year of school completed" [educ], "sex" [sex] and "total family income"
[income98] were useful predictors for distinguishing between groups based on responses to
"opinion about spending on space exploration" [natspac]. These predictors differentiate survey
respondents who thought we spend too little money on space exploration from survey
respondents who thought we spend too much money on space exploration and survey
respondents who thought we spend about the right amount of money on space exploration from
survey respondents who thought we spend too much money on space exploration.
We found a statistically significant overall
relationship between the combination of
Among this set of predictors, totalindependent
family income
was helpful
in dependent
distinguishing among the groups
variables
and the
defined by responses to opinion about
spending
on
space
exploration.
Survey respondents who
variable.
had higher total family incomes were more likely to be in the group of survey respondents who
thought we spend about the right amount
of no
money
on space
exploration,
rather
There was
evidence
of numerical
problems
in than the group
of survey respondents who thoughtthe
wesolution.
spend too much money on space exploration. For each
unit increase in total family income, the odds of being in the group of survey respondents who
thought we spend about the right amount of money on space exploration increased by 6.0%.
1.
2.
3.
4.
However, the individual relationship between
total family income and spending on space was
not statistically significant.
True
True with caution
The answer to the question is false.
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Slide 70
Steps in multinomial logistic regression:
level of measurement and initial sample size
The following is a guide to the decision process for answering
problems about the basic relationships in multinomial logistic
regression:
Dependent non-metric?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 10 to 1?
Yes
Run multinomial logistic regression
No
Inappropriate
application of
a statistic
SW388R7
Data Analysis &
Computers II
Slide 71
Steps in multinomial logistic regression:
overall relationship and numerical problems
Overall relationship
statistically significant?
(model chi-square test)
No
False
Yes
Standard errors of
coefficients indicate no
numerical problems (s.e.
<= 2.0)?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Slide 72
Steps in multinomial logistic regression:
relationships between IV's and DV
Overall relationship
between specific IV and DV
is statistically significant?
(likelihood ratio test)
No
False
Yes
Role of specific IV and DV
groups statistically
significant and interpreted
correctly?
(Wald test and Exp(B))
Yes
No
False
SW388R7
Data Analysis &
Computers II
Slide 73
Steps in multinomial logistic regression:
classification accuracy and adding cautions
Overall accuracy rate is
25% > than proportional
by chance accuracy rate?
No
False
Yes
Satisfies preferred ratio of
cases to IV's of 20 to 1
No
True with caution
Yes
One or more IV's are
ordinal level treated as
metric?
No
True
Yes
True with caution