Transcript Chapter 1.1

CHAPTER 1.1
Analyzing Categorical Data
INDIVIDUALS AND VARIABLES
Individuals are the objects described by a set of
data. Individuals may be people,animals, or
things.
 A variable is any characteristic of an
individual. A variable can take different values
for different individuals.

Categorical Variable: Places an individual into one
of several groups or categories. It does NOT make
sense to find the “average” of the variable
 Quantitative Variable: Takes numerical values for
which it makes sense to find an average.



In practice, any set of data is accompanied by
background information that helps us
understand the data. When you first meet a new
data set, ask yourself the following questions:
Who are the individuals described by the data? How
many individuals are there?
 What are the variables? In what units is each
variable recorded? Weights, for example, might be
recorded in grams, pounds, thousands of pounds, or
kilograms.
 We could follow a newspaper reporter’s lead and
extend our list of questions to include Why, When,
Where, and How were the data produced? For
now, we’ll focus on the first two questions.

We used the site censusatschool.com ‘s “Random Data Selector” to choose 10
Canadian students who completed a survey in a recent year. The table below
displays the data.
(a) Who are the individuals in this data set?
(b) What variables were measured? Identify each as categorical or
quantitative.In what units were the quantitative variables
measured?
(c) Describe the individual in the highlighted row.

The distribution of a variable tells us what
values the variable takes and how often it takes
these values.

Graphs are often used to see the “shape” of a
distribution where we can identify patterns or
deviations in the data.
HOW TO EXPLORE DATA
 Begin by examining each variable by itself. Then
move on to study relationships among the variables.

Start with a graph or graphs. Then add numerical
summaries.
CHECK YOUR UNDERSTANDING



Jake is a car buff who wants to find out more about
the vehicles that students at his school drive. He gets
permission to go to the student parking lot and record
some data. Later, he does some research about each
model of car on the Internet. Finally, Jake makes a
spreadsheet that includes each car’s
model, year, color, number of cylinders, gas
mileage,weight, and whether it has a navigation
system.
Who are the individuals in Jake’s study?
What variables did Jake measure? Identify each as
categorical or quantitative.
ANALYZING CATEGORICAL DATA

Frequency vs. Relative Frequency

Frequency is the count of observations that lie in each
category. To be meaningful, it is important that the total
be given so the counts can be put into perspective.

Relative Frequency is the percent of the total that lies in
each category. Knowing the total number of individuals in
the data is helpful but not necessary to understand the
meaning of the data.
GRAPHICAL DISPLAYS FOR CATEGORICAL DATA
PIE CHART
BAR GRAPH

Pie charts are used to show the parts of a single group
that fall into different categories. If the data is about
separate groups, then a pie chart is NOT appropriate.
A pie chart of the percentage of OHS students with
different colored hair would be appropriate because the
categories come from one group (OHS students).
 The number of people that owned an iPod from different
age groups would NOT be appropriate because they are
different groups.


Bar graphs are always appropriate for categorical
data and are easier to read and understand. When
choosing your graphical display for categorical data,
always use a bar graph unless instructed to make a
pie chart.
TWO – WAY TABLES

A two way table shows the counts for individuals
that fit into a combination of two variables.
MARGINAL DISTRIBUTIONS

The marginal distribution of one of the
categorical variables in a two-way table of counts
is the distribution of values of that variable
among all individuals described by the table.

In other words, the row or column total for that
variable divided by the table total.
This would be the marginal distribution for the
variable “almost certain” . The row “almost certain”
has a total of 1083 and there were 4826 total
individuals counted in the data

If we calculate the marginal distribution for all the row
variables we would get the following outcomes a bar
graph showing the marginal distributions for each row
variable is also shown:
CONDITIONAL DISTRIBUTION

A conditional distribution of a variable
describes the values of that variable among
individuals who have a specific value of another
variable. There is a separate conditional
distribution for each value of the other variable.
Women are the
condition here.
These are the
conditional
distributions of
responses ONLY
AMONG WOMEN.
Therefore the “total”
becomes the total
number of WOMEN
(2367)

We say that there is an association between two
variables if values of one variable seem to occur
in common with values of the other variable.

For example, higher values in variable x seem to
occur with higher values of variable y, we say there
is an association between x and y.
4 STEPS TO ORGANIZING A STATS PROBLEM
1.
2.
3.
4.
State: What’s the question that you’re trying to
answer?
Plan: How will you go about answering the
question? What statistical techniques does this
problem call for?
Do: Make graphs and carry out needed
calculations.
Conclude: Give your practical conclusion in the
setting of the real-world problem.
To keep the four steps straight, just remember:
Statistics Problems Demand Consistency!