18 - Academics

Download Report

Transcript 18 - Academics

Economics 105: Statistics
• http://www.davidson.edu/academic/economics/foley/105/index.html
•Powerpoint slides
– meant to help you listen in class
– print out BEFORE you do the day’s reading!
– things will seem fast if you do the reading AFTER lecture
– I expect you to do the reading prior to class. Everyone does
the first few weeks, hard part is continuing to do so (before
class). If you don’t, things will seem to go fast.
– Stats is one of those classes where you can teach yourself
quite a bit via the reading. I expect you to do so.
– yes, these are high expectations.
– Not everyone likes Powerpoint ...
Economics 105: Statistics
• Today
– What is Statistics?
– Presenting data
• For next time: Read Chapters 1 – 3.5
Organizing and Presenting
Data Graphically
• Data in raw form are usually not easy to use for decision
making
–Some type of organization is needed
•Table
•Graph
•Techniques reviewed in Chapter 2:
Bar charts and pie charts
Pareto diagram
Ordered array
Stem-and-leaf display
Frequency distributions, histograms and polygons
Cumulative distributions and ogives
Contingency tables
Scatter diagrams
Raw Form of Data
Example: A manufacturer of insulation
randomly selects 20 winter days and records
the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Tabulating Numerical Data:
Frequency Distributions
•What is a Frequency Distribution?
– A frequency distribution is a list or a table …
– containing class groupings (ranges within
which the data fall) ...
– and the corresponding frequencies with which
data fall within each grouping or category
Why Use a Frequency Distribution?
•It is a way to summarize numerical data
•It condenses the raw data into a more useful
form
•It allows for a quick visual interpretation of
the data
Class Intervals
and Class Boundaries
• Each class grouping has the same width
• Determine the width of each interval by
range
Width of interval @
number of desired class groupings
• Usually at least 5 but no more than 15
groupings
• Class boundaries never overlap
• Round up the interval width to get desirable
endpoints
Frequency Distribution Example
Example: A manufacturer of insulation
randomly selects 20 winter days and records
the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Frequency Distribution Example
(continued)
• Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 15)
• Compute class interval (width): 10 (46/5 then round up)
• Determine class boundaries (limits):
– 10, 20, 30, 40, 50, 60
• Compute class midpoints: 15, 25, 35, 45,
55
• Count observations & assign to classes
Frequency Distribution Example
(continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Total
Frequency
Relative
Frequency
3
6
5
4
2
20
.15
.30
.25
.20
.10
1.00
Percentage
15
30
25
20
10
100
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
Frequency Percentage
Cumulative Cumulative
Frequency Percentage
10 but less than 20
3
15
3
15
20 but less than 30
6
30
9
45
30 but less than 40
5
25
14
70
40 but less than 50
4
20
18
90
50 but less than 60
2
10
20
100
20
100
Total
Graphing Numerical
Data: The Histogram
• A graph of the data in a frequency
distribution is called a histogram
• The class boundaries (or class midpoints)
are shown on the horizontal axis
• the vertical axis is either frequency, relative
frequency, or percentage
• Bars of the appropriate heights are used to
represent the number of observations within
each class
Histogram
Class
Midpoint Frequency
Class
15
25
35
45
55
3
6
5
4
2
(No gaps
between
bars)
Histogram : Daily High Tem perature
7
6
Frequency
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
5
4
3
2
1
0
5
15
25
35
45
Class Midpoints
55
65
Graphing Numerical Data:
The Frequency Polygon
Class
Midpoint Frequency
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
15
25
35
45
55
3
6
5
4
2
(In a percentage
polygon the vertical axis
would be defined to
show the percentage of
observations per class)
Frequency Polygon: Daily High Temperature
7
6
Frequency
Class
5
4
3
2
1
0
5
15
25
35
45
Class Midpoints
55
65
Graphing Cumulative Frequencies:
The Ogive (Cumulative % Polygon)
Less than 10
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
0
10
20
30
40
50
0
15
45
70
90
100
Ogive: Daily High Temperature
Cumulative Percentage
Class
Lower
Cumulative
class
boundary Percentage
100
80
60
40
20
0
10
20
30
40
50
60
Class Boundaries (Not Midpoints)
Summary Measures
Describing Data Numerically
Central Tendency
Quartiles
Variation
Arithmetic Mean
Range
Median
Interquartile Range
Mode
Variance
Geometric Mean
Standard Deviation
Shape
Skewness
Coefficient of Variation
Measures of Central Tendency
Overview
Central Tendency
Arithmetic Mean
Median
Mode
n
X=
åX
i=1
n
Geometric Mean
XG = ( X1 ´ X2 ´
i
Midpoint of
ranked
values
Most
frequently
observed
value
´ Xn )1/ n
Arithmetic Mean
• The arithmetic mean (mean) is the most
common measure of central tendency
– For a sample of size n:
n
X=
Sample size
åX
i=1
n
i
X1 + X2 +
=
n
+ Xn
Observed values
Arithmetic Mean
(continued)
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 + 2 + 3 + 4 + 5 15
=
=3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 + 2 + 3 + 4 + 10 20
=
=4
5
5
Median
• In an ordered array, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
• Not affected by extreme values
Finding the Median
• The location of the median:
n +1
Median position =
position in the ordered data
2
– If the number of values is odd, the median is the middle
number
– If the number of values is even, the median is the
average of the two middle numbers
• Note that
n +1
2
is not the value of the median,
only the position of the median in the ranked data
Mode
•
•
•
•
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical
(nominal) data
• There may may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Review Example
• Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
• Mean: ($3,000,000/5)
= $600,000
• Median: middle value of ranked
data
= $300,000
Sum $3,000,000
• Mode: most frequent value
= $100,000
Which measure of location
is the “best”?
• Mean is generally used, unless
extreme values (outliers) exist
• Then median is often used, since
the median is not sensitive to
extreme values.
– Example: Median home prices may
be reported for a region – less
sensitive to outliers
Geometric Mean
• Geometric mean
– Used to measure the rate of change of a variable
over time
XG = ( X1 ´ X2 ´
´ Xn )
1/ n
• Geometric mean rate of return
– Measures the status of an investment over time
RG = [(1 + R1 ) ´ (1+ R2 ) ´
´ (1+ Rn )]
1/ n
-1
– Where Ri is the rate of return in time period i
Example
An investment of $100,000 declined to $50,000
at the end of year one and rebounded to
$100,000 at end of year two:
X1 = $100,000
X2 = $50,000
50% decrease
X3 = $100,000
100% increase
The overall two-year return is zero, since it started and
ended at the same level.
Example
(continued)
Use the 1-year returns to compute the arithmetic
mean and the geometric mean:
Arithmetic
mean rate
of return:
( -50%) + (100%)
X=
= 25%
2
Geometric
mean rate
of return:
RG = [(1 + R1 ) ´ (1 + R 2 ) ´
Misleading result
´ (1 + Rn )]1/ n - 1
= [(1 + ( -50%)) ´ (1 + (100%))]1/ 2 - 1
= [(.50) ´ (2)]1/ 2 - 1 = 11/ 2 - 1 = 0%
More
accurate
result
Quartiles
• Quartiles split the ranked data into 4 segments
with an equal number of values per segment
25%
Q1
25%
25%
Q2
25%
Q3
• The first quartile, Q1, is the value for which 25% of
the observations are smaller and 75% are larger
• Q2 is the same as the median (50% are smaller, 50%
are larger)
• Only 25% of the observations are greater than the
third quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)
Third quartile position:
Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
• Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked
data, so use the value half way between the 2nd and 3rd
values, so Q1 = 12.5
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
Quartiles
• Example:
(continued)
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = 19.5