Revision of Basic Statistical Concepts
Download
Report
Transcript Revision of Basic Statistical Concepts
Basic Statistical Concepts
M. Burgman & J. Carey 2002
Statistical Population
• The entire underlying set of individuals
from which samples are drawn.
e.g. 0.25m2 quadrats are used to count
barnacles on a sea shore.
• The population is defined implicitly by the
sampling frame.
Strategies
• Define survey objectives
• Define population parameters to estimate
• Implement sampling strategy
i) measure every individual (cost, time,
practicality especially if destructive)
ii) measure a representative portion of
the population (a sample)
Statistical Sample
• An aggregate of objects from which
measurements are taken.
• A representative subset of a population.
Simple Random Sampling
• Every unit and combination of units in the
population has an equal chance of selection.
a) with replacement
b) without replacement
c) finite and infinite populations
Sampling Objectives
• To obtain an unbiased estimate of a
population mean
• To assess the precision of the estimate (i.e.
calculate the standard error of the mean)
• To obtain as precise an estimate of the
parameters as possible for time and money
spent
Statistics of Dispersion
Population variance
Sample variance
Sample
standard deviation
2
(x
)
i
2 =
n
2
(x
x)
i
2
s =
n-1
s=
(xi - x)2
n-1
Statistics of Dispersion
s2
n
Standard error of the mean
sx =
Coefficient of variation
s
CV =
x
Covariance
(xi - x ) (yi - y )
sxy =
n-1
Expectations and Variances
E(X+b)
=
E(X) + b
E(aX)
=
aE(X)
E(X+Y)
=
E(X) + E(Y)
V(X+b)
=
V(X)
V(aX)
=
a2V(X)
V(X+Y)
=
V(X) + V(Y) + 2Cov(X,Y)
Confidence Limits
• For the mean
s
=x
t[, n-1]
n
• This formula sets confidence limits to means
of samples from a normally distributed
population.
Confidence Limits
• Confidence limits of the mean define a region
that we expect will enclose the true mean.
• The likelihood that this is true is determined
by . If we set at 5% (hence specifying
95% confidence intervals), then the region
enclosed by the confidence intervals will
capture the true mean 95 times out of 100.
Confidence Limits
• The same formula may be used to set
confidence limits to any statistic as long as it
follows the normal distribution,
e.g. the median,
the average (absolute) deviation,
standard deviation (s),
coefficient of variation, or
skewness.
How many samples?
t2 CV2
n=
E2
where :
n
• CV is coefficient of variation
(expressed as a
%) of samples in a pilot survey
• t is Student's t value for a specified degree of
certainty and the number of samples used to
estimate the parameters
• E is specified error limits (expressed as a %
of the mean)
Measurement Error
• Measured variation may be decomposed into
natural variation + measurement error
• Measurement error may be reduced by
improving sampling protocols and
instrumentation
• Reducing measurement error increases
confidence in estimates without increasing
the number of samples.
• Precision (variation) v. accuracy (bias)
Components of Measurement Error
• Systematic errors
• Random errors
Causes
• Measurement assumptions
(shape, size, allometry)
• Instrument error
• Operator error
Kinds of Uncertainty
1. Epistemic Uncertainty
• inherent environmental variation
• variation in population responses due to
demographic structure
• imperfect knowledge
• model mis-specification
• measurement error (assessment error)
• ignorance
Kinds of Uncertainty
2. Semantic Uncertainty
• Ambiguity - interpretation of a phrase in two
or more distinct ways.
“Juvenile Court to Try Shooting Defendant”
“Local High School Dropouts Cut in Half”
• Vagueness - leads to borderline cases.
e.g. tall; endangered; adult
Kinds of Uncertainty
More examples of vagueness:
• Tree crown
tree foliage bounded by the first healthy
branch forming part of the main crown and
extending as far or further than any branch
above it.
forked trees?
dead branches?
Kinds of Uncertainty
More examples of vagueness:
• Epilimnion
the upper layer of water in a lake, bounded by
a thermocline
• Soil horizon
a relatively uniform soil layer, differentiated by
contrasts in mineral or organic properties.
Sampling Design Criteria
• Operational simplicity
• Unambiguous interpretation
Null-Hypothesis Tests
An example of hypothesis testing in which
management alternatives are judged on the
basis of the outcome of the test.
Hypothesis
Symbol
Description
Null
hypothesis
H0
The strategy has no
effect.
Alternative
hypothesis
H1
The strategy is
effective
Statistical Outcomes in Null
Hypothesis Testing
Test Result
Difference
Reality
Significant
Not significant
(H0 rejected)
(H0 not rejected)
correct
Type II error
( )
Type I error
()
correct
(H0 false)
No difference
(H0 true)
The Character of Error Types
Type I errors
• Alarmism/Over-reaction
• Incorrectly accepting a (false) alternative
hypothesis
• Concluding (incorrectly) that there is an impact
Type II errors
• False confidence/Cornucopia
• Incorrectly "accepting" a (false) null hypothesis
• Concluding (incorrectly) that there is no impact
t-tests
A t-test of the hypothesis that two sample
means come from a population with equal
i.e. H0: 1= 2
t=
Y1 - Y2
1 (s 2 + s 2)
1
2
n
Distributions of Test Statistics
distribution of mean of
actual population
P(statistic)
distribution of the null
hypothesis, assumed
to be true until rejected
critical value
Assumptions
The assumption of independence: correlation
and autocorrelation
1. if error in one object is related to error in
others, there will be bias eg. measure one
and compare others.
2. the effective sample size may be less than
the number of samples if measurements are
correlated in space or time.
The effects of the non-independence of data
on errors of interpretation of statistical tests
Non-independence
Among
treatments
Within
treatments
Positive
Increased
Type II
Increased
Type I
Negative
Increased
Type I
Increased
Type II
Correlation
Randomization Tests
Jaw lengths of Golden Jackals:
Males:
120, 107, 110, 116, 114,
111, 113, 117, 114, 112
Females: 110, 111, 107, 108, 110,
105, 107, 106, 111, 111
Is there a difference in jaw length between
males and females?
Randomization Tests
1. Calculate means for males and for females.
2. Calculate the difference between the
means D0 = xm - xf = 4.8
3. Randomly allocate 10 sample lengths to
each of 2 groups
4. Calculate Di , the difference between
means for these 2 groups
5. Repeat Steps 3 & 4 many times
Randomization Tests
• If D0 is unusually large, the observed data are
unlikely to have arisen if there was no
difference between males and females.
Frequency
600
400
D0 = 4.8
200
0
-4
-2 0
2
4
Difference in jaw length (mm)
Randomization Tests
• From 5000 runs,
only 9 Dis were greater than or equal to 4.8.
• 9/5000 = 0.0018.
(t-test: pHo = 0.0013)
Confidence Limits by
Randomization
• For 95% confidence limits, the upper and
lower limits, U and L, are such that they
enclose 95% of the randomization
distribution.
• For 99% confidence, L and U must give
values at the 0.5% and 99.5% points on the
distribution.
Randomization Tests
Can do randomization tests in lieu of:
•
paired comparisons
•
ANOVA
•
multiple regression