Model Identification & Model Selection

Download Report

Transcript Model Identification & Model Selection

Model Identification &
Model Selection
With focus on Mark/Recapture
Studies
1
Overview
• Basic inference from an evidentialist
perspective
• Model selection tools for mark/recapture
– AICc & SIC/BIC
– Overdispersed data
– Model set size
– Multimodel inference
2
DATA
/* 01 */ 1100000000000000 1 1 1.16 27.7 4.19;
/* 04 */ 1011000000000000 1 0 1.16 26.4 4.39;
/* 05 */ 1011000000000000 1 1 1.08 26.7 4.04;
/* 06 */ 1010000000000000 1 0 1.12 26.2 4.27;
/* 07 */ 1010000000000000 1 1 1.14 27.7 4.11;
/* 08 */ 1010110000000000 1 1 1.20 28.3 4.24;
/* 09 */ 1010000000000000 1 1 1.10 26.4 4.17;
/* 10 */ 1010110000000000 1 1 1.42 27.0 5.26;
/* 11 */ 1010000000000000 1 1 1.12 27.2 4.12;
/* 12 */ 1010101100000000 1 1 1.11 27.1 4.10;
/* 13 */ 1010101100000000 1 0 1.07 26.8 3.99;
/* 14 */ 1010101100000000 1 0 0.94 25.2 3.73;
/* 15 */ 1010101100000000 1 0 1.24 27.1 4.58;
/* 16 */ 1010101100000000 1 0 1.12 26.5 4.23;
/* 17 */ 1010101000000000 1 1 1.34 27.5 4.87;
/* 18 */ 1010101011000000 1 0 1.01 27.2 3.71;
/* 19 */ 1010101011000000 1 0 1.04 27.0 3.85;
/* 20 */ 1010101000000000 1 1 1.25 27.6 4.53;
/* 21 */ 1010101011000000 1 0 1.20 27.6 4.35;
3
Models carry the meaning in science
• Model
– Organized thought
• Parameterized Model
– Organized thought connected to reality
4
Science is a cyclic process of model
reconstruction and model reevaluation
• Comparison of predictions with
observations/data
• Relative comparisons are evidence
All models are false, but some
are useful.
George Box
6
Statistical Inferences
• Quantitative measures of the validity and
utility of models
• Social control on the behavior of scientists
7
Scientific Model Selection Criteria
•
•
•
•
Illuminating
Communicable
Defensible
Transferable
8
Common Information Criteria
9
Statistical Methods are Tools
• All statistical methods exist in the mind only,
but some are useful.
– Mark Taper
10
Classes of Inference
• Frequentist Statistics
- Bayesian Statistics
• Error Statistics – Evidential Stats – Bayesian Stats
11
Two key frequencies in frequentist
statistics
• Frequency definition of probability
• Frequency of error in a decision rule
12
Null H tests with Fisherian P-values
• Single model only
• P-value= Prob of discrepancy at least as great
as observed by chance.
• Not terribly useful for model selection
13
Neyman-Pearson Tests
•
•
•
•
2 models
Null model test along a maximally sensitive axis.
Binary response: Accept Null or reject Null
Size of test (α) describes frequency of rejecting
null in error.
– Not about the data, it is about the test.
– You support your decision because you have made it
with a reliable procedure.
• N-P test tell you very little about relative support
for alternative models.
14
Decisions vs. Conclusions
• Decision based inference reasonable within a
regulatory framework.
– Not so appropriate for science
• John Tukey (1960) advocated seeking to reach
conclusions not making decisions.
– Accumulate evidence until a conclusion is very
strongly supported.
– Treat as true.
– Revise if new evidence contradicts.
15
In conclusion framework, multiple
statistical metrics not
“incompatible”
All are tools for aiding scientific
thought
16
Statistical Evidence
• Data based estimate of the relative distance
between two models and “truth”
17
Common Evidence Functions
• Likelihood ratios
• Differences in information criteria
• Others available
– E.g. Log(Jackknife prediction likelihood ratio)
18
Model Adequacy
• Bruce Lindsay
• The discrepancy of a model from truth
• Truth represented by an empirical
distribution function,
• A model is “adequate” if the estimated
discrepancy is less than some arbitrary
but meaningful level.
Model Adequacy and Goodness of
Fit
• Estimation framework rather than testing
framework
• Confidence intervals rather than testing
• Rejection of “true model formalism”
Model Adequacy, Goodness of Fit,
and Evidence
• Adequacy does not explicitly compare
models
• Implicit comparison
• Model adequacy interpretable as bound on
strength of evidence for any better model
• Unifies Model Adequacy and Evidence in a
common framework
Model adequacy interpreted as a bound on
evidence for a possibly better model
Empirical Distribution - “Truth”
Model 1
Potentially better model
Model adequacy measure
Evidence measure
Goodness of fit misnomer
• Badness of fit measures & goodness of fit
tests
• Comparison of model to a nonparametric
estimate of true distribution.
– G2-Statistic
– Helinger Distance
– Pearson χ2
– Neyman χ2
23
Points of interest
• Badness of fit is the scope for improvement
• Evidence for one model relative to another
model is the difference of badness of fit.
24
ΔIC estimates differences of KullbackLeibler Discrepancies
• ΔIC = log(likelihood ratio) when # of
parameters are equal
• Complexity penalty is a bias correction to
adjust of increase in apparent precision with
an increase in # parameters.
25
Evidence Scales
L/R
Log2
ln
Log10
Weak
<8
<3
<2
<1
Strong
8 - <32
3 - <5
2 - <7
1 - <2
Very Strong
> 32
>5
>7
>2
Note cutoff are arbitrary
and vary with scale
26
Which Information Criterion?
• AIC?
AICc ?
SIC/BIC?
• Don’t use AIC
• 5.9 of one versus 6.1 of the other
27
What is sample size for
complexity penalty?
• Mark/Recapture based on multinomial
likelihoods
• Observation is a capture history not a session
28
To Q or not to Q?
• IC based model selection assumes a good
model in set.
• Over-dispersion is common in
Mark/Recapture data
– Don’t have a good model in set
– Due to lack of independence of observations
– Parameter estimate bias generally not influenced
– But fit will appear too good!
– Model selection will choose more highly
parameterized models than appropriate
29
Quasi likelihood approach
1) χ2 goodness of fit test for most general
model
2) If reject H0 estimate variance inflation cˆ
3) c^ = χ2 /df
4) Correct fit component of IC & redo selection
30
QICs
  
 2 ln L ˆ | y
2 K K  1
QAICc 
 2K 
cˆ
ne  K  1
  
 2 ln L ˆ | y
QBIC 
 lnne K
cˆ
31
Problems with Quasilikelihood
correction
• C^ is essentially a variance estimate.
– Variance estimates unstable without a lot of data
• lnL/c^ is a ratio statistic
– Ratio statistics highly unstable if the uncertainty in
the denominator is not trivial
• Unlike AICc, bias correction is estimated.
– Estimating a bias correction inflates variance!
32
Fixes
• Explicitly include random component in model
– Then redo model selection
• Bootstrapped median c^
• Model selection with Jackknifed prediction
likelihood
33
Large or small model sets?
• Problem: Model Selection Bias
– When # of models large relative to data size some
models will have a good fit just by chance
• Small
– Burnham & Anderson strongly advocate small model
sets representing well thought out science
– Large model sets = “data dredging”
• Large
– The science may not be mature
– Small model sets may risk missing important factors
34
Model Selection from Many
Candidates Taper(2004)
SIC(x) = -2In(L) + (In(n) + x)k.
35
Performance of SIC(X) with small
data set.
N=50, true covariates=10, spurious
covariates=30, all models of order <=20,
1.141 X 1014 candidate models
'
36
Chen & Chen 2009
• M subset size, P= # of possible terms
37
Explicit Tradeoff
• Small model sets
– Allows exploration of fine structure and small effects
– Risks missing unanticipated large effects
• Large model sets
– Will catch unknown large effects
– Will miss fine structure
• Large or small model sets is a principled choice
that data analysts should make based on their
background knowledge and needs
38
Akaike Weights & Model Averaging
Beware, there be dragons here!
39
Akaike Weights
 i 

exp

2


 wi  R
 m 

exp


2


m 1
• “Relative likelihood of model i given the
data and model set”
• “Weight of evidence that model i most
appropriate given data and model set”
40
Model Averaging
• “Conditional”
Variance
– Conditional on
selected model
• “Unconditional”
Variance.
R
ˆ   wiˆi
i 1



 


ˆ
ˆ
ˆ
ˆ
Var    wi Var i | m  i  i   
 i 1

R
2
– Actually conditional
on entire model set
41
2
Good impulse with Huge Problems
• I do not recommend Akaike weights
• I do not recommend model averaging in this
fashion
• Importance of good models is diminished by
adding bad models
• Location of average influenced by adding
redundant models
42
Model Redundancy
• Model Space is not filled uniformly
• Models tend to be developed in highly
redundant clusters.
• Some points in model space allow few models
• Some points allow many
43
Model adequacy
Model adequacy
Redundant models do not add much
information
Model dimension
Model dimension
44
A more reasonable approach
Repeat
Within
Time
Constraints
1) Bootstrap Data
2) Fit model set & select best
model
3) Estimate derived parameter θ
from best model
4) Accumulate θ
Mean or median θ with percentile confidence intervals
45