Introduction to Probabilities and Statistics Tenth Edition

Download Report

Transcript Introduction to Probabilities and Statistics Tenth Edition

Chapter 1
Describing Data with Graphs
General Objectives:
Many sets of measurements are samples selected
from larger populations. Other sets constitute the
entire population, as in a national census.
In this chapter, you will learn what a variable is, how
to classify variables into several types, and how
measurements or data are generated.
You will then learn how to use graphs to describe
data sets.
©1998 Brooks/Cole Publishing/ITP
Specific Topics
1. Variables, experimental units, samples and populations, data
2. Univariate and bivariate data
3. Qualitative and quantitative variables—discrete and continuous
4. Data distributions and their shapes
5. Pie charts, bar charts, line charts
6. Dotplots
7. Stem and leaf plots
8. Relative frequency histograms
©1998 Brooks/Cole Publishing/ITP
1.1 Variables and Data
Definition: A variable is a characteristic that changes or varies over
time and/or different individuals or objects under consideration.
Examples: Weight, stock prices, height, price of gasoline
Definition: An experimental unit is the individual or object on which
a variable is measured.
 A single measurement or data value results when a variable is
actually measured on an experimental unit, e.g., the price of gas
of a particular type at a particular station on a particular day, the
price of a particular stock issue on a particular day at closing.
Definition: A population is the set of all measurements of interest
to the investigator.
©1998 Brooks/Cole Publishing/ITP
Definition: A sample is a subset of measurements selected from
the population of interest.
Example: Select n = 20 volunteers for a medical study taken
from the population of a given city. See Example 1.1 for an
example of experimental unit, variables, sample, and population.
Definition: Univariate data result when a single variable is
measured on a single experimental unit, e.g., record a person’s
height.
Definition: Bivariate data result when two variables are measured
on a single experimental unit,e.g., record a person’s height and
weight.
 Multivariate data result when more than two variables are
measured, e.g., record a person’s height, weight, gender, and
age.
©1998 Brooks/Cole Publishing/ITP
Example 1.1
A set of five students is selected from all undergraduates at a
large university, and measurements are recorded as in Table
1.1. Identify the various elements involved in generating this set
of measurements.
Table 1.1
Student
1
2
3
4
5
GPA Gender Year
2,0
F
Fr
2.3
F
So
2.9
M
So
2.7
M
Fr
2.6
F
Jr
Current Number
Major
of Unit Enrolled
Psychology
16
Mathematics
15
English
17
English
15
Business
14
©1998 Brooks/Cole Publishing/ITP
Solution
There are several variables in this example. The experimental
unit on which the variables are measured is a particular
undergraduate student on the campus. Five variables are
measured for each student: grade point average (GPA), gender,
year in college, major, and current number of units enrolled.
Each of these characteristics varies from student to student.
If we consider the GPAs of all students at this university to be
the population of interest, the five GPAs represent a sample
from this population. If the GPA of each undergraduate student
at the university had been measured, we would have generated
the entire population of measurements for this variable.
The second variable measured on the students is gender, which
can fall into one of two categories—male and female. It is not a
numerically valued variable and hence is somewhat different
from GPA. The population, if it could be enumerated, would
consist of a set of Ms and Fs, one for each student at the
university.
©1998 Brooks/Cole Publishing/ITP
Similarly, the third and fourth variables, year and major,
generate nonnumerical data. Year had four categories
(Fr, So, Jr, Sr), and the major had one category for each
undergraduate program major on campus. The last variable,
current number of units enrolled, is numerically valued,
generating a set of numbers rather than a set of qualities or
characteristics.
Although we have discussed each variable individually,
remember that we have measured each of these five variables
on a single experimental unit: the student. Therefore, in the
example, a “measurement” really consists of five observations,
one for each of the five measured variables.
For example, the measurement taken on student 2 produces
this observation:
(2.3, F, So, Mathematics, 15)
©1998 Brooks/Cole Publishing/ITP
1.2 Types of Variables
Definition: Qualitative variables measure a quality or characteristic
on each experimental unit.
Examples: Eye color, state of residence, gender, degree
objective (BS/BA, MA, MBA, Ph.D.)
Definition: Quantitative variables measure a numerical quantity
or amount on each experimental unit.
Examples: Height, weight, systolic blood pressure, number of
weeds/square meter in a wheat field, number of defective items
in a carton of 20
 Qualitative variables produce data that can be categorized
according to similarities or differences in kind; they are often
called categorical data, e.g., eye color, state of residence.
©1998 Brooks/Cole Publishing/ITP
Definition: A discrete variable can assume only a finite number of
values, e.g., number of weeds, number of defective
Definition: A continuous variable can assume the infinitely many
values corresponding to the points on a line interval, e.g.,
height, weight.
 See Example 1.2 for an example of qualitative versus
quantitative variable.
 Also, see Figure 1.1 for a diagram of the types of data we have
defined.
Example 1.2
Identify each of the following variables as qualitative or
quantitative:
1. The most frequent use of your microwave oven (reheating,
defrosting, warming, other)
2. The number of consumers who refuse to answer a
telephone survey
3. The door chosen by a mouse in a maze experiment
(A, B, or C)
©1998 Brooks/Cole Publishing/ITP
4. The winning time for a horse running in the Kentucky Derby
5. The number of children in the fifth-grade class who are
reading at or above grade level.
Figure 1.1

©1998 Brooks/Cole Publishing/ITP
1.3 Graphs for Categorical Data


A statistical table can be used to display data graphically as a
data distribution.
For qualitative data, three measurements are available for the
list of categories:
– the frequency, or number of measurements
– the relative frequency, or proportion
– the percentage
• A pie chart is the familiar circular graph that shows how
the measurements are distributed among the categories.
• A bar chart shows the same distribution of measurements in categories, with the height of the bar measuring
how often a particular category was observed.
• A bar chart in which the bars are ordered from largest to
smallest is called a Pareto chart.
©1998 Brooks/Cole Publishing/ITP

See Example 1.3 for an example of the use of pie and bar charts.
See Example 1.4 for an example of a Pareto chart, where the bars
are ordered from largest to smallest.

Figure 1.2
©1998 Brooks/Cole Publishing/ITP
Figure 1.3
©1998 Brooks/Cole Publishing/ITP
Figure 1.3
©1998 Brooks/Cole Publishing/ITP
1.4 Graphs for Quantitative Data






Sometimes information is collected for a quantitative variable
measured on different segments of the population, or for
different categories of classification.
If the variable can take only a finite or countable number of
values, it is a discrete variable.
A variable that can assume an infinite number of values
corresponding to points on a line interval is called continuous.
The pie chart displays how the total quantity is distributed
among the categories.
The bar chart uses the height of the bar to display the amount in
a particular category.
See Example 1.5 for examples of both pie and bar charts.
©1998 Brooks/Cole Publishing/ITP
Example 1.5
The amount of money expended in fiscal year 1995 by the U.S.
Department of Defense in various categories is shown in Table
1.6. Construct both a pie chart and a bar chart to describe the
data. Compare the two forms of presentation.
Table 1.6
Category
Amount (in billions)
Military personnel
$70.8
Operation and maintenance
90.0
Procurement
55.0
Research and development
34.7
Military construction
6.8
Total
$258.2
Solution
Two variables are being measured: the category of expenditure
(qualitative) and the amount of the expenditure (quantitative).
The bar chart in Figure 1.6 displays the categories on the
©1998 Brooks/Cole Publishing/ITP
horizontal axis and the amounts on the vertical axis. For the pie
chart in Figure 1.5, each “pie slice” represents the proportion of
the total expenditures ($258.2 billion) corresponding to its particular category. For example, for the research and development
category, the angle of the sector is
34.7
 360  = 48.4
258.2
Figure 1.5
©1998 Brooks/Cole Publishing/ITP
Figure 1.6
©1998 Brooks/Cole Publishing/ITP
Line charts:
 A quantitative variable recorded over time at equally spaced
intervals produces a time series.
 Time series are most effective presented on a line chart with
time as the x axis.
 The idea is to discern the trend that will continue into the future.
 See Example 1.6 for an example of a line chart:
©1998 Brooks/Cole Publishing/ITP

Dotplots: Plots the measurements as points on the x axis,
stacking the points that duplicate existing points. See Figure 1.8
for a dotplot.
Figure 1.8 Character Dotplots
©1998 Brooks/Cole Publishing/ITP
Stem and leaf plots:
 This plot presents a graphical display of the data using the
actual numerical values of each data point.
Constructing a Stem and Leaf Plot:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stems in a column, with a vertical line to their right.
3. For each measurement, record the leaf portion in the same
row as its matching stem.
4. Order the leaves from lowest to highest in each stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if necessary.

See Examples 1.7 and 1.8 for examples of the construction of
stem and leaf plots.
©1998 Brooks/Cole Publishing/ITP
Example 1.7
Table 1.8 lists the prices (in dollars) of 19 different brands of
walking shoes. Construct a tem and leaf plot to display the
distribution of the data.
Table 1.8
90 70 70 70 75 70
65 68 60 74 70 95
75 70 68 65 40 65
70
Solution
©1998 Brooks/Cole Publishing/ITP
Interpreting Graphs with a Critical Eye:
 What to look for as you describe the data:
- scales
- location
- shape
- outliers

Distributions are often described by their shapes:
- symmetric
- skewed to the right (long tail goes right)
- skewed to the left (long tail goes left)
- unimodal, bimodal, multimodal (one peak, two peaks,
many peaks)

See Examples 1.9 and 1.10 for examples of describing
distributions in terms of their locations and shapes.
©1998 Brooks/Cole Publishing/ITP
Example 1.9
Examine the three dotplots generated by Minitab and shown in
Figure 1.11. Describe these distributions in terms of their
locations and shapes.
Figure 1.11 Character Dotplots
©1998 Brooks/Cole Publishing/ITP
1.5 Relative Frequency Histograms
Definition: A relative frequency histogram for a quantitative data
set is a bar graph in which the height of the bar represents the
proportion or relative frequency of occurrence for a particular
class or subinterval of the variable being measured.

The class or subintervals are plotted along the x axis.

See the unnumbered example on pages 25 –26 for an example
of the construction of a relative frequency histogram.

Also see Example 1.11.
©1998 Brooks/Cole Publishing/ITP
Example 1.11
Twenty-five households are polled in a marketing survey, and
Table 1.12 lists the numbers of quarts of milk purchased during
a particular week. Construct a relative frequency histogram to
describe the data.
Table 1.12
0
3
5
4
3
2
1
3
1
2
1
1
2
0
1
4
3
2
2
2
2
2
2
3
4
Solution
The variable being measured is “ number of quarts of milk,”
which is a discrete variable that takes on only integer values.
In this case, it is simplest to choose the classes or intervals as
the integer values over the range of observed values: 0, 1, 2,
3, 4, and 5.
©1998 Brooks/Cole Publishing/ITP
Table 1.13 shows the classes and their corresponding
frequencies and relative frequencies. The relative frequency
histogram, generated using Minitab, is shown in Figure 1.14.
Table 1.13
Number
of Quarts
0
1
2
3
4
5
Frequency
2
5
9
5
3
1
Relative
Frequency
.08
.20
.36
.20
.12
.04
©1998 Brooks/Cole Publishing/ITP
Figure 1.14
©1998 Brooks/Cole Publishing/ITP

Process: Divide the interval from the smallest to the largest
measurements into an arbitrary number of subintervals or
classes of equal length.

Method of left inclusion: Include the left class boundary point
but not the right boundary point in the class.

Relative frequency can give us further information:
- the proportion of measurements that fall in a particular class
or group of classes
- the probability that a measurement drawn at random from a
set will fall in a particular class or group of classes

Different samples from the same population will produce
different histograms.
©1998 Brooks/Cole Publishing/ITP
Constructing a relative frequency histogram:
1. Choose the number of classes, usually between 5 and 10.
2. Calculate the approximate class width by dividing the
difference between the largest and smallest values by the
number of classes.
3. Round the approximate class width up to a convenient
number.
4. If discrete, assign one or more integers to a class.
5. Locate the class boundaries.
6. Construct a statistical table containing the classes, their
boundaries, and their relative frequencies.
7. Construct the histogram like a bar graph.
©1998 Brooks/Cole Publishing/ITP
Key Concepts
I. How Data Are Generated
1. Experimental units, variables, measurements
2. Samples and populations
3. Univariate, bivariate, and multivariate data
II. Types of Variables
1. Qualitative or categorical
2. Quantitative
a. Discrete
b. Continuous
III. Graphs for Univariate Data Distributions
1. Qualitative or categorical data
a. Pie charts
b. Bar charts
©1998 Brooks/Cole Publishing/ITP
2. Quantitative data
a. Pie and bar charts
b. Line charts
c. Dotplots
d. Stem and leaf plots
e. Relative frequency histograms
3. Describing data distributions
a. Shapes—symmetric, skewed left, skewed right,
unimodal, bimodal
b. Proportion of measurements in certain intervals
c. Outliers
©1998 Brooks/Cole Publishing/ITP
Introduction to MINITAB


Minitab 12 for Windows 95
Use the desktop icon or the start button to start Minitab.
See Figure 1.15.
Figure 1.15
©1998 Brooks/Cole Publishing/ITP




The main Minitab screen contains two windows: The Data
window and the Session window.
Click anywhere to make the window active for entering data or
typing commands.
Select File Worksheet to retrieve a “worksheet.”
To close the program, select File Exit.
©1998 Brooks/Cole Publishing/ITP
Auto Rental Example


Twenty auto rental locations at airports throughout the U.S.
were randomly selected and the daily rental cost of a midsize
car at each location was determined.
The data were as follows ( assume only integer values):
22, 39, 33, 42, 49, 19, 44, 47, 50, 42,
28, 36, 25, 26, 31, 29, 19, 29, 26, 31 (all in dollars)
Questions:
a) What is the probability of a midsize car at a randomly chosen
rental site in the true population renting for between $25 and
$29 daily?
b) What is the probability of a a midsize care at a site renting for
$30 or more?
c) $29 or less?
©1998 Brooks/Cole Publishing/ITP
Use a graphic representation to support your conclusions.
Describe this distribution:
Answers: a) probability = .25
b) probability = .65
c) probability = .35
5
4
3
2
1
X
X
<19 20–24
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
25–29 30–34 35–39 40–44 45–49 50>
The distribution is skewed slightly right ; it is unimodal, nearly
symmetric, and nearly bimodal.
Questions:
Moving what one data point would make the distribution bimodal
and symmetric? Where would the point need to be moved? Note
that symmetry does not require the histogram to be unimodal.
©1998 Brooks/Cole Publishing/ITP