14.3 Numerical Summaries of Data

Download Report

Transcript 14.3 Numerical Summaries of Data

§ 14.3 Numerical Summaries of
Data
Numerical Summaries of a
Data Set
 In the last section we looked at ways
to graphically represent a data set-today we will look at numerical ways to
summarize similar information.
 The are two major types of numerical
summary:
1. Measures of location.
2.
Measures of spread.
Numerical Summaries of a
Data Set
 In the last section we looked at ways
to graphically represent a data set-today we will look at numerical ways to
summarize similar information.
 The are two major types of numerical
summary:
1. Measures of location.
2.
Measures of spread.
average/mean
range
The Average / Mean
 The average or mean of a data set of size N is found by
adding the numbers and dividing by N.
 Or more formally, if the data set is { x1 , x2 , x3 , . . . , xN }
then the mean is given by:
x1 + x2 + x3 + . . . + xN
N
The Average / Mean
 What about when we are given a frequency table?
 Let’s look at the test scores from yesterday:
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The
Average
/
Mean
From
a
Entering Data and Finding the Mean on the TI-83:
1. Hit [Stat] Frequency Table
Select
“1: Edit…”the total of
2.Step
1: Calculate
3. data.
Enter data into L1. If you are working from a
the
frequency table enter the corresponding
frequencies
Total = x1 into
* f1 +L2x. 2 * f2+
x1 x2
Data
4.
Go
to
the
“List”
menu
([2nd],
[Stat])
x3* f3 + . . . + xk * f1k
Freque f1 .f2
5. Select “3: mean( “
6. You should now be on the ‘main’
screen.
ncy
 Step 2: Calculate N.
Proceed as follows:
(a) If you are working from just a list of data,
N = f + f 2 + f3 + . . . +
type “L11” ([2nd],
[1]) , close the parentheses
fk
and hit [Enter].
(b) If you are working from a freq. table type
 Step
3: Calculate
“L1” followed
by “,”the
and “L2” ([2nd], [2]). . Then
average.
close the parentheses and hit enter.
. . xk
. . . fk
Example: Average Salary
 The average salary at a local computer
manufacturer with 50 employees is
$42,000.
 The owner draws a yearly salary of
$800,000.
 What is the average salary of the other
49 employees?
Example: 105 Exam Scores
 Suppose you have averaged a 132 out
of 150 on the first 3 exams in Math 105.
What score would you need on the
fourth exam to have an average of 135?
Percentiles
 The p th percentile of a data set is the
value such that p percent of the
numbers fall at or below the value.
 The rest of the data falls at or above
the value.
 We will call the p th percent of N the
locator, and write it as L .
Example: Height
Sorting Data on the TI-83:
1. Enter data into L1 as before.
2. Hit [Stat]
3. Select “2: SortA( “
4. You should now be on the ‘main’ screen. Hit
L1. ([2nd], 1)
5. Close the parentheses and hit enter.
Finding the p th Percentile

Step 1: Sort the original data set by size.
(Suppose {d1 , d2 , d3 , . . . , dN } is the sorted set)
 Step 2: Compute the value of the locator.
L = ( p /100 )( N )

Step 3: The p th percentile is:
(a) The average of dL and dL+1 if L is a whole number.
(b) dL+ if L is not a whole number. L+ is L rounded up.
Percentiles: The Median and
Quartiles
 The 50th percentile, called the median, is
the percentile that is most commonly
used. The median will be written M.
 The other two commonly used
percentiles are the quartiles:
The first quartile, written as Q1, is the
25th percentile.
The third quartile, denoted Q3, is the
75th percentile.
Example: Let’s examine the test scores again. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
Find the quartiles and the median.
The Five-Number Summary
 One way to give a nice profile of a
data set is the “five-number summary,”
which consists of:
1. The lowest value, called the Min.
2. The first quartile, Q1.
3. The median, M.
4. The third quartile, Q3.
5. The highest value, called the Max.
Example: The Five-Number Summary for our test score
example would look like this:
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The Five-Number Summary:
Box Plots
 We can also represent the FiveNumber Summary graphically in what is
called a box plot or a box-and-whiskers
plot.
Min
Q1
M
Q3
Max
Example: Here is the box plot for our test score example:
Score
4 24
Frequency 1 1
Min = 4
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
Q1 =
36
M = 44 Q3 = 48
Max = 96
§ 14.4 Measures of Spread
Example - Find the average
and median of the following
data sets:
• Set 1 = {45, 46, 47, 48, 49, 51, 52, 53,
54, 55}
• Set 2 = {1, 12, 20, 31, 41, 59, 70, 78,
89, 99}
The Range
 One way to measure the spread of data is to
examine the range, given by
R = Max - Min
 The problem with using the range is that
outliers can severely affect it.
Example: Looking again at our ‘test score’ example. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
We see that the range with the outliers (4 and 96) would be
R = 96 - 4 = 92.
However, without those pieces of data we would have
R = 76 - 24 = 52.
The Interquartile Range
 In order to eliminate the problems
caused by outliers, we could make use
of the interquartile range--the difference
between the third and first quartile:
IQR = Q3 - Q1
 This measure tells us where the middle
50% of the data is located.
Example: Your instructor didn’t feel like making a different
example. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The IQR for this set of data is:
IQR = Q3 - Q1 = 48 - 36 = 12
The Standard Deviation
 The idea: Measure how spread out
your data set is by examining how far
each piece of information is from some
fixed reference point.
 The reference point we will use is the
mean (average).
The Standard Deviation
 We could try to average the Deviations
from the Mean:
(Data value - Mean)
Example: Once again, the test score data. . .
Score (
x)
(x 46.61)
Freque
ncy
4
24
28
32
36
40
44
48
56
60
64
72
76
96
42.6
1
1
22.6
1
1
18.6
2
1
14.6
6
1
10.6
10
1
6.6
16
1
2.6
13
1
1.3
9
9.3
9
13.3
9
17.3
9
25.3
9
29.3
9
49.3
9
9
1
2
1
8
4
1
The Standard Deviation
 We could try to average the Deviations
from the Mean:
(Data value - Mean)
 However, negative deviations and
positive deviations will cancel each
other out--in fact (assuming we don’t
round off any of our figures) the average
of the deviations from the mean will
always be 0!
The Standard Deviation
 What would happen if we squared the
deviations from the mean?
 The squared deviations are always
non-negative, so there would be no
canceling.
 The average of these squared
deviations is called the variance, V.
The Standard Deviation
 Unfortunately, there is a problem with
using the variance as well--the units of
measure.
For instance if we were studying
people’s height in inches (in), the
variance would appear in units of in2.
The Standard Deviation
 Unfortunately, there is a problem with
using the variance as well--the units of
measure.
For instance if we were studying
people’s height in inches (in), the
variance would come be in units of in2.
 The solution to our dilemma is simple-we will just take the square root of the
variance to get the what is called the
standard deviation, .
Finding The Standard
Deviation
 Step 1: Find the average/mean of the data set. Call
it A.
 Step 2: For each number x in the data set find the
deviation from the mean, x - A.
 Step 3: Square each of the deviations found in Step
2.
 Step 4: Find the average of the squared deviations
found in step 3. This is the variance, V.
 Step 5: Take the square root of the variance. This is
the standard deviation, .
Finding The Standard
Deviation
 Another way to find the Standard
Deviation by hand is to use the following
formula:
 =
√
N
∑ ( x i - A )2
i=1
N
Finding The Standard
Deviation
Finding all of the
information from 14.2-14.3 on
the TI-83:
1. Enter data as shown previously. Quit to the
main screen.
2. Hit [Stat]
3. Move right to the “CALC“ menu.
4. Select “1-Var Stats”.
5. Now on the main screen, type “L1”. (If you are
using data from a frequency table also type “,”
and “L2”) Hit [Enter].
6. Interpret the information as follows:
x is the mean/average, A;
x is the Standard Deviation;
n is the size of your data set;
If you arrow down the Min, Max, Median and
Example:
Find the standard deviation for the following data set.
{1, 6, 14, 19}