Review of Basic Statistics, Model Selection, and Inference

Download Report

Transcript Review of Basic Statistics, Model Selection, and Inference

BRIEF REVIEW OF STATISTICAL
CONCEPTS AND METHODS
Mathematical expectation
The mean (x) of random variable x is:
where n is the number of observations, the variance (s2) is:
Mathematical expectation
The standard deviation (s) is:
The coefficient of variation is:
Precision, bias, and accuracy
Basic probability
The probability of a event occurring is expressed as: P(event)
The probability of the event not occurring, 1- P(event) or P(~event).
If events are independent, the probability of events A and B occurring is
estimated as: p(A) * p(B).
Detection and capture probabilities
The probability of capturing a single fish given 1 is present: p(capture)
capturing 2 fish given 2 are present:
p(capture)*p(capture) = p(capture)2,
the probability of catching at least 1 given 2 present:
p(capture)*(1-p(capture )) + (1- p(capture))*p(capture) + p(detect)2
or:
1- (1- p(capture))N
where N = number of fish present
Probability example
The probability of detecting a fish during a single event: p(detect)
On all three sampling occasions is:
p(detect)*p(detect)*p(detect) = p(detect)3,
the probability of not catching it during any of the 3 occasions is:
(1- p(detect))*(1- p(detect))*(1- p(detect)) = (1- p(detect))3,
and the probability of catching it on at least 1 occasion is the complement of not
catching it during any of the occasions:
1- (1- p(detect))3.
Conditional probability
The probability a fish is present: p(present)
The probability detecting a fish, given it is present : p(detect | present)
The probability detecting a fish given it is not present?
The probability a fish is present and detected:
p(detect | present) * p(present)
Conditional probability
The probability N fish are present: p(N)
The probability detecting at least 1 fish, given N are present : p(detect | N)
The probability N fish are present and at least 1 is detected:
p(detect | N) * p(N)
Question: if we sampled but did not
detect a fish species, what are the
chances it was present?
p(present | not detected)
The probability fish species present: p(present)
not present: 1- p(present)
The probability detection, given present : p(detect | present)
probability detection, given not present : p(detect | not present) = 1
Total probability of the event of not detecting the species:
Two possibilities: (1) present but not detected and (2) not present
P(not detected| present)*P(present) + P(not detected| not present)*P(not present)
Bayes rule
p(present | not detected) =
p(not detected| present)*p(present)
p(not detected| present)*p(present) + p(not detected| not present)*p(not present)
Assume 80% probability of detection:
p(not detected| present) = 1- 0.80 = 0.20
Assume 40% probability of bull trout present:
p(present) = 0.40, p(not present) = 0.60
p(not detected| not present) = 1
Now calculate:
0.20*0.40
0.20*0.40 + 1*0.60
= 0.118 or 11.8%
Models and fisheries management
“True” Models
• Fundamental assumption: there is no “true” model that generates biological data
• Truth in biological sciences has essentially infinite dimension; hence,
full reality cannot be revealed with finite samples.
• Biological systems are complex with many small effects, interactions, individual
heterogeneity, and environmental covariates.
• Greater amounts of data are required to model smaller effects.
• Thus all models are approximations of reality
Models and hypotheses
Models = hypotheses
• Hypotheses are unproven theories, suppositions that are tentatively
accepted to explain facts or as the basis for further investigation
• Models are very explicit representations of hypotheses
• Several models can represent a single hypotheses
• Models are tools for evaluating hypotheses
Models and hypotheses: example
Hypothesis: shoal bass reproduction success is greater
when there are more reproductively active
adults
Y = aN
Number of young is proportional to the number of adults
Y = aN/(1+bN)
Number of young increases with the number of adults
until nesting areas are saturated
Y = aNe-bN
Number of young is increases until the carrying capacity of
nesting and rearing areas is reached
Y = aN
Number of YOY
Y = aN/(1+bN)
Y = aNe-bN
Number of shoal bass
Tapering Effect Sizes
• Biological systems there are often large important effects, followed by smaller
effects, and then yet smaller effects.
• These effects might be sequentially revealed as sample size increases
because information content increases
• Rare events yet are more difficult to study (e.g. fire, flood, volcanism)
Big
effects
small
effects
Model selection
• Determine what is the best explanation given the data
• Determine what is the best model for predicting the response
• Two approaches in fisheries/ecology
Null hypothesis testing
Information theoretic approaches
Null hypothesis testing
Develop an a priori hypothesis
Deduce testable predictions (i.e., models)
Carry out suitable test (experiment)
Compare test results with predictions
Retain or reject hypothesis
Hypothesis testing example:
Density independence for lake sturgeon populations
Hypothesis: lake sturgeon reproduction is density independent
Prediction: there is no relation between adult density and age 0 density
Model: Y = B0
Test: measure age 0 density for various adult densities over time
Compare:
Linear regression between age 0 and adult
sturgeon densities, P value = 0.1839
Using a critical a-level = 0.05, we conclude no significant relationship
Result: Retain hypothesis lake sturgeon reproduction is density
independent
Model selection based on p-values
• No theoretical basis for model selection
• P-values ~ precision of estimate
• P-values strongly dependent on sample size
P(the data (or more extreme data)| Model)
vs.
L(model | the data)
JUST SAY NO TO STATISTICAL SIGNIFICANCE TESTING
FOR MODEL SELECTION
Information theory
If full reality cannot be included in a model, how do we tell how close we are to truth.
truth
Kullback-Leibler distance based on information theory
The measures how much information is in accounted for in a model
Entropy is synonymous with uncertainty
Information theory
K,L distance (information) is represented by: I (truth| model)
It represents information lost when the candidate model is used to
Approximate truth thus SMALL values mean better fit
AIC is based on the concept of minimizing K-L distance
Akaike noticed that the maximum log likelihood
Log( L (model or parameter estimate | the data) ) was related to K-L distance
What a maximum likelihood estimate?
It is those parameter values that maximize the value of the likelihood,
given the data
Sums of squares in regression also is a measure of the relative fit of a model
45
40
35
30
25
20
15
10
5
0
SSE = Sdeviations2
0
5
10
The maximum log likelihood (and SSE) is a biased estimate of K-L distance
Akaike’s contribution was that he showed that:
AIC = -2ln(L (model | the data)) + 2K
Bias2
Variance
It is based on the principle of parsimony
Few
Number of parameters
Many
Heuristic interpretation
AIC = -2ln(likelihood) + 2*K
Measures model lack of fit
Penalty for increasing model size
(enforces parsimony)
AIC: Small sample bias adjustment
If ratio of n/K is < 40 then use AICc
AICc = -2*ln(likelihood | data) + 2*K + (2*K*(K+1))/(n-K-1)
As n gets big….
(2*K*(K+1))/(n-K-1) = 1/very large number
(2*K*(K+1))/(n-K-1) = 0
So….
AICc = AIC
Model selection with AIC
What is model selection?
AIC by itself is relatively meaningless.
Recall that we find the best model by comparing various models and examining
Their relative distance to the “truth”
We do this by calculating the difference between the best fitting model (lowest
AIC) and the other models.
Model selection uncertainty
Which model is the best?
What about if you collect data at the same spot next year,
next week, next door?
AIC weights-- long run interpretation vs. Bayesian.
Confidence set of models analogous to confidence intervals
Where do we get AIC?
K
-2ln(L (model | the data))
Interpreting AIC
Best model
(lowest AICc)
Difference between lowest AIC and model
(relative distance from truth)
Interpreting AIC
AICc weight, ranges 0-1 with 1 = best model
Interpreted a relative likelihood that model is best, given the data and the other
models in the set
Interpreting AIC
Ratio of 2 weights interpreted as the strength of evidence for one model over another
Here the best model is 0.86748/0.13056 = 6.64 times more likely to be
The best model for estimating striped bass population size
Confidence model set
Analogous to a confidence interval for a parameter estimate
Using a 1/8 (0.12) rule for weight of evidence, my confidence set includes the
top two models (both model likelihoods > 0.12).
Linear models review
Y: response variable (dependent variable)
X: predictor variable (independent variable)
Y = b0 + b1*X + e
b0 is the intercept
b1 is the slope (parameter) associated with X
e is the residual error
Linear models review
When Y is a probability it is bounded by 0, 1
Y = b0 + b1*X
Can provide values <0 and > 1, we need to transform
or use a link function
For probabilities, the logit link is the most useful
Logit link
h = ln(
p
1- p
)
h is the log odds
p is the probability of an event
Log linear models
(logistic regression)
h = b0 + b1*X
h is the log odds
b0 is the intercept
b1 is the slope (parameter) associated with X
Betas are on a logit scale and the log-odds needs to be back transformed
Back transformation:
Inverse logit link
1
p = 1+exp(-h )
h is the log odds
p is the probability of an event
Back transformation example
h = b0 + b1*X
b0 = - 2.5
b1 = 0.5
X=2
Back transformation example
h = -2.5 + 0.5*2
h = -1.5
1
1+exp(1.5)
= 0.18 or 18%
Interpreting beta estimates
Betas are on a logit scale, to interpret calculate odds ratios
Using the exponential function
b1 = 0.5
exp(0.5) = 1.65
Interpretation: for each 1 unit increase in X, the event is 1.65 times more likely to occur
For example, for each 1 inch increase in length, a fish is 1.65 times more likely to be
caught