Transcript Document
Section 1-1
Overview
Created by Tom Wegleitner, Centreville, Virginia
Overview
A common goal of surveys and other data collecting
tools is to collect data from a smaller part of a larger
group so we can learn something about the larger
group.
In this section we will look at some of ways to describe
data.
Definitions
Data
observations (such as measurements,
genders, survey responses) that have
been collected.
Definitions
Statistics
a collection of methods for planning
experiments, obtaining data, and then
then organizing, summarizing, presenting,
analyzing, interpreting, and drawing
conclusions based on the data.
Definitions
Population
the complete collection of all
elements (scores, people,
measurements, and so on) to be
studied. The collection is complete
in the sense that it includes all
subjects to be studied.
Definitions
Census
the collection of data from every
member of the population.
Sample
a sub-collection of elements
drawn from a population.
A researcher wants to study the effects of
smoking on cholesterol level in Jackson County.
What would be his population?
All adults in Jackson County who smoke at
least one pack per day.
What could possibly be his sample?
Some reasonable number of smokers in
Jackson County who smoke one pack per
day.
A sociologist hypothesizes that the average
annual income of households in Marianna is
less than $25,000 per year. To test her
hypothesis, she samples 500 households in
the city and determines the income of each.
Describe the population.
The set of all households in Marianna.
Describe the sample.
The sample must be a subset of the
population. In this case, it is the 500
households selected by the sociologist.
“Cola War” is the popular term for the
intense competition between Coca-Cola and
Pepsi displayed in their marketing
campaigns. Their campaigns have featured
movie and television stars, rock videos,
athletic endorsements, and claims of
consumer preference based on taste tests.
Suppose, as part of a Pepsi marketing
campaign, 1,000 cola consumers are given a
blind taste test. Each consumer is asked to
state a preference for Brand A or Brand B.
What is the population?
The population of interest is the set of all
consumers of “cola” products.
What is the sample?
The sample is the 1,000 cola consumers
selected from the population of all cola
consumers.
Section 1-2
Types of Data
Created by Tom Wegleitner, Centreville, Virginia
Definitions
Parameter
a numerical measurement describing
some characteristic of a
population population
parameter
Definitions
Statistic
a numerical measurement describing
some characteristic of a sample.
sample
statistic
Definitions
Quantitative data
numbers representing counts or
measurements.
Example: weights of supermodels.
Definitions
Qualitative (or categorical or
attribute) data
can be separated into different categories
that are distinguished by some nonnumeric
characteristics.
Example: genders (male/female) of
professional athletes.
Classify each variable as qualitative or quantitative.
• Colors of automobiles in a dealer’s
showroom.
• Number of seats in movie theaters.
• Classification of patients based on nursing
care needed(complete,partial, or self care)
• Lengths of newborn cats of a certain
species.
• Number of complaint letters received by an
airline per month.
Working with
Quantitative Data
Quantitative data can further
be distinguished between
discrete and continuous types.
Definitions
Discrete
data result when the number of possible
values is either a finite number or a
‘countable’ number of possible
values.
0, 1, 2, 3, . . .
Example: The number of eggs that hens
lay.
Definitions
Continuous
(numerical) data result from infinitely many
possible values that correspond to some
continuous scale
that covers a range of
2
3
values without gaps,
interruptions,
or jumps.
Example: The amount of milk that a cow produces;
e.g. 2.343115 gallons per day.
Classify each variable as discrete or continuous.
• Number of cartons of milk manufactured
each day.
• Temperatures of airplane interiors at a given
airport.
• Incomes of college students on work study
programs.
• Weights of newborn calfs.
• Number of tomatoes on each plant in a
field.
Levels of Measurement
Another way to classify data is to
use use levels of measurement.
Four of these levels are
discussed in the following slides.
Definitions
nominal level of measurement
characterized by data that consist of names,
labels, or categories only. The data cannot
be arranged in an ordering scheme (such as
low to high)
Example: survey responses yes, no,
Definitions
ordinal level of measurement
involves data that may be arranged in some
order, but differences between data values
either cannot be determined or are
meaningless
Example: Course grades A, B, C, D, or F
Definitions
interval level of measurement
like the ordinal level, with the additional property
that the difference between any two data values is
meaningful. However, there is no natural zero
starting point (where none of the quantity is
present)
Example: Years 1000, 2000, 1776, and 1492
Definitions
ratio level of measurement
the interval level modified to include the
natural zero starting point (where zero
indicates that none of the quantity is
present). For values at this level, differences
and ratios are meaningful.
Summary Levels of Measurement
Nominal - categories only
Ordinal - categories with some order
Interval - differences but no natural
starting point
Ratio - differences and a natural starting
point
Classify each as nominal, ordinal, interval, or ratio
level data.
• Horsepower of motorcycle engines.
• Ratings of newscasts in Houston(poor,
fair,good, excellent)
• Temperature of automatic popcorn poppers
• Time required be drivers to complete a
course
• Marital status of respondents to a survey o
savings accounts.
Recap
In Sections 1-1 and 1-2 we have looked at:
Basic definitions and terms describing data
Parameters versus statistics
Types of data (quantitative and qualitative)
Levels of measurement
Key Concepts
Sample data must be collected in an
appropriate way, such as through a
process of random selection.
If sample data are not collected in an
appropriate way, the data may be
so completely useless that no
amount of statistical torturing can
salvage them.