Chapter Title - Mathematical sciences

Download Report

Transcript Chapter Title - Mathematical sciences

14 Descriptive Statistics
14.1 Graphical Descriptions of Data
14.2 Variables
14.3 Numerical Summaries
14.4 Measures of Spread
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 2
Variable
Before we continue with our discussion of
graphs, we need to discuss briefly the
concept of a variable. In statistical usage, a
variable is any characteristic that varies with
the members of a population. The students
in Dr. Blackbeard’s Stat 101 course (the
population) did not all perform equally on the
exam. Thus, the test score is a variable,
which in this particular case is a whole
number between 0 and 25. In some
instances, such as when the instructor gives
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 3
Variable
partial credit, a test score may take on a
fractional value, such as 18.5 or 18.25.
Even in these cases, however, the possible
increments for the values of the variable are
given by some minimum amount–a quarterpoint, a half-point, whatever. In contrast to
this situation, consider a different variable:
the amount of time each student studied for
the exam. In this case the variable can take
on values that differ by any amount: an
hour, a minute, a second, a tenth of a
second, and so on.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 4
Numerical Variable
A variable that represents a measurable
quantity is called a numerical (or
quantitative) variable. When the difference
between the values of a numerical variable
can be arbitrarily small, we call the variable
continuous (person’s height, weight, foot
size, time it takes to run one mile); when
possible values of the numerical variable
change by minimum increments, the
variable is called discrete (person’s IQ,
SAT score, shoe size, score of a basketball
game).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 5
Categorical Variable
Variables can also describe characteristics
that cannot be measured numerically:
nationality, gender, hair color, and so on.
Variables of this type are called categorical
(or qualitative) variables.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 6
Categorical Variable
In some ways, categorical variables must be
treated differently from numerical variables–
they cannot, for example, be added,
multiplied, or averaged. In other ways,
categorical variables can be treated much
like discrete numerical variables, particularly
when it comes to graphical descriptions,
such as bar graphs and pictograms.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 7
Example 14.4 Enrollments at Tasmania
State University
Table 14-3 shows undergraduate enrollments
in each of the five schools at Tasmania State
University. A sixth category
(“other”) includes
undeclared students,
interdisciplinary majors,
and so on.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 8
Example 14.4 Enrollments at Tasmania
State University
Vertical and horizontal bar graphs displaying
the data for table 14-3.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 9
Example 14.4 Enrollments at Tasmania
State University
When the number of categories is small, as is
the case here, another common way to
describe the relative frequencies of the
categories is by using a pie chart. In a pie
chart the “pie” represents the entire
population (100%), and the “slices” represent
the categories (or classes), with the size
(angle) of each slice being proportional to the
relative frequency of the corresponding
category.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 10
Example 14.4 Enrollments at Tasmania
State University
Some relative frequencies, such as 50% and
25%, are very easy to sketch, but how do we
accurately draw the slice corresponding to a
more complicated frequency, say, 32.47%?
Here, a little elementary geometry comes in
handy. Since 100% equals 360º, 1%
corresponds to an angle of 360º/100 = 3.6º. It
follows that the frequency 32.47% is given by
32.47  3.6º = 117º (rounded to the nearest
degree, which is generally good enough for
most practical purposes).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 11
Example 14.4 Enrollments at Tasmania
State University
This figure shows an accurate pie chart for
the school-enrollment
data given in
Table 14-3.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 12
PIE CHARTS
The general rule in drawing pie charts is
that a slice representing x% is given by an
angle of (3.6)x degrees.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 13
Example 14.5 Who’s Watching the
Boob Tube Tonight?
According to Nielsen Media Research data,
the percentages of the TV audience watching
TV during prime time (8 P.M. to 11 P.M.),
broken up by age group, are as follows:
adults (18 years and older), 63%; teenagers
(12–17 years), 17%; children (2–11 years),
20%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 14
Example 14.5 Who’s Watching the
Boob Tube Tonight?
The pie chart shows this breakdown of
audience composition
by age group. A pie
chart such as this one
might be used to make
the point that children
and teenagers really do
not watch as much TV
as it is generally
believed.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 15
Example 14.5 Who’s Watching the
Boob Tube Tonight?
The problem with this conclusion is that
children make up only 15% of the population
at large and teens only 8%. In relative terms,
a higher percentage of teenagers (taken out
of the total teenage population) watch primetime TV than any other group, with children
second and adults last. Using absolute
percentages can be quite misleading. When
comparing characteristics of a population that
is broken up into categories, it is essential to
take into account the relative sizes of the
various categories.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 16
How Many Categories
When it comes to deciding how best to
display graphically the frequencies of a
population, a critical issue is the number of
categories into which the data can fall. When
the number of categories is too big (say, in
the dozens), a bar graph or pictogram can
become muddled and ineffective. This
happens more often than not with numerical
data–numerical variables can take on
infinitely many values, and even when they
don’t, the number of values can be too large
for any reasonable graph.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 17
Example 14.6 2007 SAT Math Scores
The college dreams and aspirations of
millions of high school seniors often ride on
their SAT scores. The SAT consists of three
sections: a math section, a writing section,
and a critical reading section, with the scores
for each section ranging from a minimum of
200 to a maximum of 800 and going up in
increments of 10 points. In 2007, there were
1,494,531 college-bound seniors who took
the SAT. How do we describe the math
section results for this group of students?
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 18
Example 14.6 2007 SAT Math Scores
We could set up a frequency table (or a bar
graph) with the number of students scoring
each of the possible scores–200, 210, 220,
790, 800. The problem is that there are 61
different possible scores between 200 and
800, and this number is too large for an
effective bar graph.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 19
Example 14.6 2007 SAT Math Scores
In situations such as this one it is customary
to present a more compact picture of the data
by grouping together, or aggregating, sets of
scores into categories called class intervals.
The decision as to how the class intervals are
defined and how many there are will depend
on how much or how little detail is desired,
but as a general rule of thumb, the number of
class intervals should be somewhere
between 5 and 20.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 20
Example 14.6 2007 SAT Math Scores
SAT scores are usually
aggregated into 12 class
intervals of essentially the
same size:
200–249,
250–299,
300–349,
700–749,
750–800.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 21
Example 14.6 2007 SAT Math Scores
Here is the associated bar graph.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 22
Example 14.7 Stat 101 Test Scores:
Part 3
The process of converting test scores (a
numerical variable) into grades (a categorical
variable) requires setting up class intervals for
the various letter grades. Typically, the
professor has the latitude to decide how to do
this. One standard approach is to use an
absolute grading scale, usually with class
intervals of (almost) equal length for all
grades except F. (e.g., A = 90-100%, B = 8089%, C = 70-79%, D = 60-69%, F = 0-59%).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 23
Example 14.7 Stat 101 Test Scores:
Part 3
Another frequently used approach is to use a
relative grading scale. Here the professor fits
the class intervals for the grades to the
performance of the class in the test, often
using class intervals of varying lengths. Some
people call this “grading on the curve,”
although this terminology is somewhat
misused. To illustrate relative grading in
action, let’s revisit the Stat 101 midterm
scores discussed in Example 14.1.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 24
Example 14.7 Stat 101 Test Scores:
Part 3
After looking at the overall class performance,
Dr. Blackbeard chooses to “curve” the test
scores using class intervals of his own
creation.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 25
Example 14.7 Stat 101 Test Scores:
Part 3
The grade
distribution in the
Stat 101 midterm
can now be best
seen by means of a
bar graph. The
picture speaks for
itself–this was a
very tough exam!
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 26
Capture-Recapture Method
When a numerical variable is continuous, its
possible values can vary by infinitesimally
small increments. As a consequence, there
are no gaps between the class intervals,
and our old way of doing things (using
separated columns or stacks) will no longer
work. In this case we use a variation of a
bar graph called a histogram.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 27
Example 14.8 Starting Salaries of TSU
Graduates
Suppose we want to use a graph to display
the distribution of starting salaries for last
year’s graduating class at Tasmania State
University.
The starting salaries of the N = 3258
graduates range from a low of $40,350 to a
high of $74,800. Based on this range and the
amount of detail we want to show, we must
decide on the length of the class intervals. A
reasonable choice would be to use class
intervals defined in increments of $5000.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 28
Example 14.8 Starting Salaries of TSU
Graduates
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 29
Example 14.8 Starting Salaries of TSU
Graduates
Here is the histogram
showing the relative
frequency of each
class interval. As we
can see, a histogram
is very similar to a bar
graph.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 30
Example 14.8 Starting Salaries of TSU
Graduates
Several important distinctions must be made,
however. To begin with, because a histogram
is used for continuous variables, there can be
no gaps between the class intervals, and it
follows, therefore, that the columns of a
histogram must touch each other. Among
other things, this forces us to make an
arbitrary decision as to what happens to a
value that falls exactly on the boundary
between two class intervals.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 31
Example 14.8 Starting Salaries of TSU
Graduates
Should it always belong to the class interval
to the left or to the one to the right? This is
called the endpoint convention. The
superscript “plus” marks in Table 14-6indicate
how we chose to deal with the endpoint
convention in Fig. 14-11. A starting salary of
exactly $50,000, for example, would be listed
under the 45,000+–50,000 class interval
rather than the 50,000+–55,000 class interval.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 32
Use Class Intervals of Equal Length
When creating histograms, we should try,
as much as possible, to define class
intervals of equal length. When the class
intervals are of unequal length, the rules for
creating a histogram are considerably more
complicated, since it is no longer
appropriate to use the heights of the
columns to indicate the frequencies of the
class intervals.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 14.2 - 33