Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Download Report

Transcript Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Resolving the Goldilocks
problem: Model specification
Jane E. Miller, PhD
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Overview
• Model specification approaches to resolving
the Goldilocks problem include
– Standardized coefficients
– Logarithmic transformation
– Other specification issues
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Standardized coefficients
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Unstandardized coefficients
• Unstandardized βs estimate the effect of a 1unit increase in Xi on Y, where the effect size is
measured in the original units of Y.
• A “one-size-fits-all” approach to interpreting
βs can be misleading because variables
– Represent different levels of measurement,
– Have different units of measurement,
– Have varying distributions of values,
– Occur in different real-world circumstances.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Standardized coefficients
• A standardized coefficient estimates the effect of a
one-standard-deviation increase in Xi on Y
– Measured in standard deviation units of Y
• e.g., an effect size of 0.3 would mean 30% of a standard deviation
in the dependent variable
– Similar to standardized scores or z-scores
• Standardized βs provide a consistent metric in which
to compare the relative sizes of the βs on continuous
independent variables with different ranges and
scales.
– Contrast for each IV is its standard deviation
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Using standardized coefficients
• Commonly used for psychological or
attitudinal scales for which the units have no
inherent meaning.
• Should not be used for variables for which a
one-standard-deviation increase lacks an
intuitive interpretation. E.g.,
– dummy variables
– interaction terms
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Specifying a model with
standardized coefficients
• Easily specified as an option to an OLS model
in most statistical packages.
• Identify the dependent and independent
variables as usual.
– Enter them in the model specification in their
original, untransformed versions.
• Do not create versions in the metric of standard
deviations. The software will do that for you!
• Request “standardized betas”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Descriptive statistics to report if
you use standardized coefficients
• In table of descriptive statistics, report the
mean, minimum and maximum values and
standard deviation in the original units for
– each independent variable (IV)
– the dependent variable (DV)
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Describing standardized
coefficients in prose
• In the results section, interpret the effect sizes
for different IVs in terms of multiples or
percentages of the standard deviation in the
DV
– E.g., “A one-standard-deviation increase in the
income-to-poverty ratio (IPR) is associated with an
increase of 19.6% of a standard deviation in birth
weight (about 38 grams), roughly twice the size of
the corresponding standardized coefficient on
mother’s age (9.7%).”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Reporting the effect size
in original units
“A one-standard-deviation increase in the income-topoverty ratio (IPR) is associated with an increase of
19.6% of a standard deviation in birth weight (about
38 grams), roughly twice the size of the
corresponding standardized coefficient on mother’s
age (9.7%).”
• Note that the effect size is also reported back in the original
units of the DV (grams in this case), to facilitate intuitive
understanding in the context of the specific research question
and variables.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Logarithmic specifications
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Logarithmic specifications
• Another approach to comparing βs across
variables with different ranges and scales is to
take logarithms of the
– dependent variable (Y),
– independent variable(s) (Xis),
– or both.
• The βs on the transformed variable(s) lend
themselves to straightforward interpretations
such as percentage change.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Types of logarithmic specifications
•
•
•
•
Lin-lin
Lin-log
Log-lin
Log-log
– Also known as “double log”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Lin-lin specifications
• Review: For OLS models in which neither the
IV nor the DV is logged, β measures the
change in Y for a 1-unit increase in X1,
– the changes are measured in the respective units
of the IV and DV.
• In the lingo of logarithmic specifications, these
models are termed “lin-lin” models because
they are linear in both the IV and DV
Y = β0 + β1X1
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Lin-log specifications
• Lin-log models are of the form Y = β0 + β1 lnX1.
Where lnX1 is the natural log (base e) of X1
• For such models, β1 ÷ 100 gives the change in
the original units of the DV for a 1 percent
increase in the IV.
• E.g., in a model of earnings, βlog(hours worked) =
5,905.3:
– “Each 1 percent increase in monthly hours worked
is associated with a NT$ 59 increase in monthly
earnings.”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Log-lin specifications
• Log-lin models are of the form lnY = β0 + β1X1.
• For such models, 100  (eβ – 1) gives the
percentage change in Y for a 1-unit increase in
X1,
– Where the increase in X1 is in its original units.
• E.g., “For each additional child a woman has,
her monthly earnings are reduced by 3.6
percent.”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Log-log specifications
• Log-log models are of the form lnY = β0 + β1lnX1
• For such models, β1 estimates the percentage
change in the Y for a one percent increase in
X1.
– This measure is known in economics as the
elasticity (Gujarati 2002).
• E.g., “A 1 percent increase in monthly hours
worked is associated with a 0.6% increase in
monthly earnings.”
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Choice of contrast size for
logarithmic models
• Caveat: The scale of the logged variable must
be taken into account when choosing an
appropriate-sized contrast.
• E.g., a 1-unit increase in ln(monthly hours
worked) from 5.3 to 6.3 is equivalent to an
increase from 200 to 544 hours per month.
– That contrast is nearly a 2.5 fold increase in hours.
– Implies working three-quarters of all day and
night-time hours, 7 days a week.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Review: Assess whether a 1-unit increase
in the variable is the right sized contrast
• Always consider whether a 1-unit increase in
the variable as specified in the model makes
sense in its real world context!
– Topic
– Distribution in the data
• If not, use theoretical and empirical criteria for
choosing a fitting sized contrast.
– See podcast on measurement and variables
approaches to resolving the Goldilocks problem
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Descriptive statistics to report if
you use a logarithmic specification
• In a table of descriptive statistics, report the
mean and range both
– In the original, untransformed units, such as
income in dollars, which are
• more intuitively understandable
• easier than the logged version to compare with values
from other samples.
– In the logged units, so readers know the range and
scale of values to apply to the estimated
coefficients.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Interpreting coefficients from
logarithmic specifications
• Taking logs of the IV(s) and/or DV affects
interpretation of the estimated coefficients.
• If your models include any logged variables,
report the pertinent units as you write about
the βs, especially if
– your specifications include a mixture of logged
and non-logged variables;
– you are testing the sensitivity of your findings to
different logarithmic specifications.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Goldilocks issues for
other types of specifications
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Polynomial: Quadratic specification
of IPR/ birth weight pattern
Predicted birth weight by income-to-poverty ratio
Birth weight (grams)
3,250
3,200
3,150
3,100
3,050
3,000
2,950
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Income-to-poverty ratio (IPR)
3.5
4.0
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Goldilocks issues for polynomials
• In models involving polynomials such as Xi and Xi2,
the effect of a 1-unit increase in Xi on Y varies for
different values of Xi.
– E.g., cannot generalize the size of the effect of Xi on Y for
all values of Xi.
• To convey shape of the association between Xi and Y.
– In the text, present change in Y for each of several
contrasts in values of Xi.
– Create a graph.
• See podcast on polynomials for more information.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Goldilocks issues for interactions
• In models involving interactions, βs on main
effect and interaction terms for two or more
IVs must be combined to calculate the overall
effect on the DV.
• Cannot examine the effect of a 1-unit change
in only one of those variables based on its β
alone.
• See chapter and podcasts on interactions.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Summary
• Certain model specifications can help reduce
Goldilocks problems by imposing a consistent metric
to facilitate comparison of βs across independent
variables with different levels and ranges. E.g.,
– A 1-standard deviation increase, from standardized
coefficients
– A 1% increase from log-log coefficients.
• Models involving non-linear functions or interactions
complicate the Goldilocks issue because the effect of
each variable involves several terms.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested resources
• Miller, J. E., 2013. The Chicago Guide to
Writing about Multivariate Analysis, 2nd
Edition.
– Chapter 10 on Goldilocks problem, standardized
coefficients, and polynomials
– Chapter 8, on standardized scores and z-scores
– Chapter 16, on interactions
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
More suggested resources
• Miller, J. E. and Y. V. Rodgers, 2008. “Economic
Importance and Statistical Significance: Guidelines
for Communicating Empirical Research.” Feminist
Economics 14 (2): 117–49.
• Kachigan, Sam Kash. 1991. Multivariate Statistical
Analysis: A Conceptual Introduction. 2nd Edition.
New York: Radius Press. on standardized coefficients.
• Gujarati, Damodar N. 2002. Basic Econometrics. 4th
ed. New York: McGraw-Hill/Irwin, on logarithmic
specifications.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Supplemental online resources
• Podcasts on
– Defining the Goldilocks problem
– Resolving the Goldilocks problem
• Measurement and variables
• Presenting results
– Calculating the shape of a polynomial
– Calculating the shape of an interaction pattern
• Online appendix on interpreting coefficients
from logarithmic specifications.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Suggested practice exercises
• Study guide to The Chicago Guide to Writing
about Multivariate Analysis, 2nd Edition.
– Suggested course extensions for chapter 10
• “Applying statistics and writing” question #5.
• “Revising” questions #1, 2, 3, and 9.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
Contact information
Jane E. Miller, PhD
[email protected]
Online materials available at
http://press.uchicago.edu/books/miller/multivariate/index.html
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.