Transcript Slide 1
Review of Previous Lecture
• • • • •
Range
– The difference between the largest and smallest values
Interquartile range
– The difference between the 25th and 75th percentiles
Variance
– The sum of squares divided by the population size or the sample size minus one
Standard deviation
– The square root of the variance
Z-scores
– The number of standard deviations an observation is away from the mean
Outline of Session
•
Another
Measure of Dispersion –
Coefficient of Variation
(
CV
) •
Histograms
•
Skewness
•
Kurtosis
•
Other
Descriptive Summary Measures
Measures of Dispersion – Coefficient of Variation
•
Coefficient of variation
(
CV
) measures the
spread
of a set of data as a proportion of its mean. • It is the
ratio
of the sample
standard deviation
to the sample
mean
CV
s
100 %
x
• It is sometimes expressed as a
percentage
• There is an
equivalent
definition for the coefficient of variation of a population
Measures of Dispersion – Coefficient of Variation
• A standard application of the
Coefficient of Variation
(CV) is to characterize the
variability
of
geographic variables
over space or time •
Coefficient of Variation
(CV) is particularly applied to characterize the
interannual variability
of
climate variables
(e.g., temperature or precipitation) or
biophysical variables
(leaf area index (LAI), biomass, etc)
Mean Standard Deviation Coefficient of Variation (CV)
Chapel Hill (A) 1198.10
191.80
0.16
(16%) Bend (B) 298.07
82.08
0.28
(28%)
Coefficient of Variation (CV)
• It is a
dimensionless
number that can be used to compare the amount of variance between populations with
different means
s
2
i n
1 (
x i
x
) 2
n
1
CV
s
s x
100 %
i n
1 (
x i n
1
x
) 2
Source: http://www.daymet.org
Source: http://www.daymet.org
1990 - 2000 Source: Xiao & Moody, 2004
CV
of NDVI ~
CV
of temperature
CV
of NDVI ~
CV
of precipitation NDVI Temperature Precipitation
Measures of Skewness and Kurtosis
• A fundamental task in many statistical analyses is to characterize the
location
and
variability
of a data set (Measures of
central tendency
vs. measures of
dispersion)
• Both measures tell us nothing about the
shape
of the distribution • A
further
characterization of the data includes
skewness
and
kurtosis
• The
histogram
is an effective
graphical
technique for showing both the
skewness
and
kurtosis
of a data set
Histograms Fig. 3.
Histogram of crown width (m) measured
in situ
for a random sample of
Quercus robur
trees in Frame Wood (
n
= 63; mean = 9.3 m; SD = 4.64 m). Source: Koukoulas & Blackburn, 2005.
Journal of Vegetation Science:
Vol. 16, No. 5, pp. 587 –596
Frequency & Distribution
• A
histogram distribution
is one way to depict a
frequency
•
Frequency
is the number of times a variable takes on a particular value • Note that any variable has a frequency distribution • e.g. roll a pair of dice several times and record the resulting values (constrained to being between and 2 and 12), counting the number of times any given value occurs (the frequency of that value occurring), and take these all together to form a
frequency distribution
Frequency & Distribution
•
Frequencies
can be
absolute
(when the frequency provided is the actual count of the
occurrences
) or
relative
(when they are
normalized
by dividing the absolute frequency by the total number of observations [0, 1]) •
Relative frequencies
are particularly useful if you want to compare distributions drawn from two different sources (i.e. while the numbers of observations of each source may be different)
Histograms
• We may summarize our data by constructing
histograms
, which are vertical bar graphs • A
histogram
is used to
graphically
summarize the distribution of a data set • A histogram divides the range of values in a data set into
intervals
• Over each interval is placed a bar whose height represents the
frequency
of data values in the interval.
Building a Histogram
• To construct a
histogram
, the data are first
grouped
into categories • The histogram contains one
vertical bar
for each category • The
height
of the bar represents the number of observations in the category (i.e.,
frequency
) • It is common to note the
midpoint
of the category on the horizontal axis
Building a Histogram – Example
•
1. Develop an ungrouped frequency table
– That is, we build a table that counts the number of occurrences of each variable value from lowest to highest:
TMI Value
4.16
4.17
4.18
… 13.71
Ungrouped Freq.
2 4 0 … 1 • We could attempt to construct a bar chart from this table, but it would have too many bars to really be useful
Building a Histogram – Example
•
2. Construct a grouped frequency table
– Select an appropriate number of classes
Class
4.00 - 4.99
5.00 - 5.99
6.00 - 6.99
7.00 - 7.99
8.00 - 8.99
9.00 - 9.99
10.00 - 10.99
11.00 - 11.99
12.00 - 12.99
13.00 - 13.99
Frequency
120 807 1411 407 87 33 17 22 43 19
Percentage
Building a Histogram – Example
•
3. Plot the frequencies of each class
– All that remains is to create the bar graph
Pond Branch TMI Histogram 48 44 40 12 8 4 0 36 32 28 24 20 16 4 5 6 7 8 9 10 11 12 13 Topographic Moisture Index 14 15 16 A proxy for Soil Moisture
Further Moments of the Distribution
• While measures of dispersion are useful for helping us describe the width of the distribution, they tell us nothing about the
shape of the distribution
Source
: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 91.
Further Moments of the Distribution
• There are
further statistics
that describe the
shape
of the distribution, using formulae that are similar to those of the mean and variance • 1 st moment -
Mean
(describes
central value
) • 2 nd moment -
Variance
(describes
dispersion
) • 3 rd moment -
Skewness
(describes
asymmetry
) • 4 th moment -
Kurtosis
(describes
peakedness
)
Further Moments – Skewness
•
Skewness
measures the degree of asymmetry exhibited by the data
skewness
i n
1 (
x
i
ns
3
x
) 3 • If
skewness
equals zero, the histogram is
symmetric
about the mean •
Positive
skewness vs
negative
skewness
Further Moments – Skewness
Source: http://library.thinkquest.org/10030/3smodsas.htm
Further Moments – Skewness
•
Positive skewness
– There are more observations below the mean than above it – When the mean is greater than the median •
Negative skewness
– There are a small number of low observations and a large number of high ones – When the median is greater than the mean
Further Moments – Kurtosis
•
Kurtosis
measures how peaked the histogram is
kurtosis
n
i
(
x
i
ns
4
x
) 4 3 • The
kurtosis
of a
normal distribution
is 0 •
Kurtosis
characterizes the relative
peakedness
or
flatness
of a distribution compared to the normal distribution
Further Moments – Kurtosis
•
Platykurtic
– When the
kurtosis < 0
, the frequencies throughout the curve are closer to be equal (i.e., the curve is more
flat
and
wide
) • Thus,
negative kurtosis
indicates a relatively
flat
distribution •
Leptokurtic
– When the
kurtosis > 0
, there are high frequencies in only a small part of the curve (i.e, the curve is more
peaked
) • Thus,
positive kurtosis
indicates a relatively
peaked
distribution
Further Moments – Kurtosis
platykurtic leptokurtic Source: http://www.riskglossary.com/link/kurtosis.htm
• • •
Kurtosis
is based on the size of a distribution's tails.
Negative
short tails kurtosis (
platykurtic
) – distributions with
Positive
kurtosis (
leptokurtic
) – distributions with relatively long tails
Why Do We Need Kurtosis?
• These two distributions have the same
variance
, approximately the same
skew
, but
differ
markedly in
kurtosis
.
Source: http://davidmlane.com/hyperstat/A53638.html
How to Graphically Summarize Data?
•
Histograms
•
Box plots
Functions of a Histogram
• The
function
of a histogram is to
graphically
summarize the distribution of a data set • The
histogram
graphically shows the following: 1.
Center
(i.e., the location) of the data 2.
Spread
(i.e., the scale) of the data 3.
Skewness
of the data 4.
Kurtosis
of the data 4. Presence of
outliers
5. Presence of multiple
modes
in the data.
Functions of a Histogram
• The
histogram
can be used to answer the following questions: 1. What kind of
population distribution
do the data come from? 2.
Where
are the data located? 3. How
spread out
are the data? 4. Are the data
symmetric
or skewed? 5. Are there
outliers
in the data?
Source: http://www.robertluttman.com/vms/Week5/page9.htm
http://office.geog.uvic.ca/geog226/frLab1.html
(Last) (First three)
Box Plots
• We can also use a
box plot
to
graphically
summarize a data set • • A box plot represents a
graphical summary
what is sometimes called a “
five-number summary
” of the distribution of – Minimum – Maximum – 25 th percentile – 75 th percentile – Median
Interquartile Range
(IQR) max.
median min.
75 th %-ile 25 th %-ile Rogerson, p. 8.
Box Plots
•
Example –
Consider first 9 Commodore prices ( in $,000) 6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0 •
Arrange
these in order of magnitude • • • 3.8
, 5.8, 5.99, 6.0,
6.7
, 7.0, 9.975, 10.5, 20.0
• The
median
is
Q 2
either side) = 6.7 (there are 4 values on
Q 1 Q 3 IQR
= 5.9 (median of the 4 smallest values) = 10.2 (median of the 4 largest values) = Q 3 – Q 1 = 10.2 - 5.9 = 4.3
•
Example
(ranked) 3.8
, 5.8, 5.99, 6.0,
6.7
, 7.0, 9.975, 10.5, 20.0
• • The
median
is
Q 1 Q 1
= 5.9
Q 3
= 6.7
= 10.2
IQR
= Q 3 – Q 1 = 10.2 - 5.9 = 4.3
Box Plots
Example
: Table 1.1 Commuting data (Rogerson, p5) Ranked commuting times: 5 , 5, 6, 9, 10, 11, 11 , 12 , 12, 14, 16, 17, 19, 21,
21
,
21
, 21, 21, 22, 23, 24, 24, 26 , 26 , 31, 31, 36, 42, 44, 47
25th percentile
is represented by observation (30+1)/4=7.75
75th percentile
is represented by observation 3(30+1)/4=23.25
25th percentile: 11.75
75th percentile: 26
Interquartile range
: 26 – 11.75 = 14.25
Example
(Ranked commuting times): 5 , 5, 6, 9, 10, 11, 11 , 12 , 12, 14, 16, 17, 19, 21,
21
,
21
, 21, 21, 22, 23, 24, 24, 26 , 26 , 31, 31, 36, 42, 44, 47
25th
percentile: 11.75
75th
percentile: 26
Interquartile range
: 26 – 11.75 = 14.25
Other Descriptive Summary Measures
•
Descriptive statistics
provide an
organization
and
summary
of a dataset • A small number of
summary measures
replaces the
entirety
of a dataset • We’ll briefly talk about
other
simple descriptive summary measures
Other Descriptive Summary Measures
• You're likely already familiar with some simple descriptive summary measures –
Ratios
–
Proportions
–
Percentages
–
Rates of Change
–
Location Quotients
Other Descriptive Summary Measures
• •
Ratios –
# of observations in A = # of observations in B e.g., A - 6 overcast, B - 24 mostly cloudy days
Proportions
– Relates one part or category of data to the entire set of observations, e.g., a box of marbles that contains 4 yellow, 6 red, 5 blue, and 2 green gives a yellow proportion of 4/17 or color count a count = {yellow, red, blue, green} = {4, 6, 5, 2}
proportion
a
i
a
i
Other Descriptive Summary Measures
•
Proportions
- Sum of all proportions = 1. These are useful for
comparing
two sets of data w/different sizes and category counts, e.g., a different box of marbles gives a yellow proportion of 2/23, and in order for this to be a
reasonable
comparison we need to know the totals for both samples •
Percentages
- Calculated by proportions x 100, e.g., 2/23 x 100% = 8.696%, use of these should be restricted to
larger
samples sizes, perhaps 20+ observations
Other Descriptive Summary Measures
• •
Location Quotients
- An index of relative concentration in space, a comparison of a region's share of something to the total
Example
– Suppose we have a region of 1000 Km 2 which we subdivide into three smaller areas of 200, 300, and 500 km 2 (labeled A, B, & C) • The region has an cases in A, 100 in B, and 350 in C (a total of 600 flu cases):
influenza outbreak
with 150 A B C Proportion of Area 200/1000=0.2
300/1000=0.3
500/1000=0.5
Proportion of Cases 150/600=0.25
100/600=0.17
350/600=0.58
Location Quotient 0.25/0.2=1.25
0.17/0.3 = 0.57
0.58/0.5=1.17
Assignment II
• Due by
Thursday (02/09/2006)
• Downloadable from Course website:
– http://www.unc.edu/courses/2006spring/geog/090/001/www /