Chapter 5: Producing Data

Download Report

Transcript Chapter 5: Producing Data

Chapter 5:
Producing Data
“An approximate answer to the right
question is worth a good deal more
than the exact answer to an
approximate question.’
John Tukey
5.1 Designing Samples (p. 245-261)
(Overview)



One must design the sampling process very
carefully in order to obtain reliable statistical
information.
Meaningful and useful results can be
produced by good sampling techniques,
many of which involve the use of chance.
Worthless data is produced by bad sampling
techniques.
Definitions

Voluntary response sample



Consists of people who chose themselves.
Example:
Listeners who call in to respond to a talk show question
Two variables are confounded when their effects on
a response variable cannot be distinguished from
one another.

See Example 5.2 in textbook in which the explanatory
variable (the reading of favorable propaganda) and the
events of history are confounded.
Definitions (cont’d.)



Statistical Inference: provides ways to
provide “reasonable” responses to specific
questions by examining data.
Population: group from which information is
desired.
Sample: part of the population that is
examined in an attempt to obtain information
about the population.
Definitions (cont’d.)

Sampling Frame: the list of individuals from
which a sample is actually selected.

Example:



Population: adult residents of Delaware County
Sampling Frame: voter registration roll
Design: the method that is used to select the
sample.
Definitions (cont’d.)

Convenience Sample: selecting individuals
that are easiest to reach.

Examples:



Opinions offered by shoppers entering or leaving a WaWa
or Borders in Springfield (used by Daily Times)
Opinions offered by students of a Catholic school( used by
Catholic Standard and Times)
Biased Sample: sample that has been
systematically chosen because of favoritism of a
specific outcome.
Definitions (cont’d.)

Simple random sample (SRS) of size n:
sample that is chosen is such a way that
every set of n individuals has an equal
chance of being selected to be included in it.


Sometimes this is easier said than done! It can be
tricky to obtain an SRS.
Probability sample: each member of the
population is given a known chance of being
chosen.
Definitions (cont’d.)

Stratified Random Sample:

Steps:




Population is divided into groups called strata
A SRS is chosen from each strata
SRS’s are combined into one sample
Reasons:




To reduce the variation of the estimators
Administrative convenience
Less expensive
Estimates need “subgroups” of population
Definitions (cont’d.)




Multi stage sample design: the selection of
smaller groups within a population by stages.
Undercoverage occurs when some groups in the
population are left out in the process of choosing
the sample.
Nonresponse occurs when an individual cannot
be contacted or refuses to cooperate.
Response bias refers to a variety of things that
can lead to an incorrect or false response.
Final Thoughts:
 The
wording of the question can
greatly influence the response.
A
poorly worded question can
confuse those who are attempting
to answer it.
5.2 Designing Experiments (p. 265-284)
Am Overview
There are good and bad techniques
for producing data.
 Important and effective statistical
practices are the use of random
sampling and randomized
comparative experiments.
 The use of chance is vital in statistical
design.

Concepts and Definitions


In an observational study, NO treatment is
imposed on the individuals in the study.
 Variables of interest are measured,
usually over a period of time.
In an experiment, treatment is imposed on
the individuals in the study.
 Responses to the treatment are
observed.
Definitions (cont’d.)



Experimental units are individuals on which the
experiment is performed.
 i.e. participants in the experiment
A treatment is a specific experimental condition
that is applied to the experimental units.
A placebo is a dummy treatment that can have
no physical effect on an experimental unit.
 Commonly called a “sugar pill.”
Definitions (cont’d.)

The control group receives the
placebo.


This group helps the experimenter to
control the effects of any lurking
variables.
The treatment group receives the
treatment.
Definitions (cont’d.)


Completely randomized experimental
design:
All experimental units are allocated at
random among the treatments
Statistically significant observation:
An observed result that is too unusual to
be an outcome determined by pure
chance.
Three Principals of
Experimental Design
CONTROL
1.



Needed to counter the effects of lurking variables.
Comparison is the simplest form of control.
Experiments should compare two or more treatments in order to
avoid confounding the effect of the treatment with some other
influence.
RANDOMIZATION
2.



Subjects are assigned treatments by pure chance.
Creates groups that are similar (except for chance variation)
Table of random digits can be used to choose the uits for each
group
REPLICATION
3.

Experiment should be done on many subjects to reduce any
chance variation in the results.
Definitions (cont’d.)


In a double blind experiment, neither the subjects
nor the people who have contact with them know
which treatment a subject is receiving.
A block design



Minimizes variation.
Block: group of experimental units or subjects that are
similar in ways that are expected to affect the response of
the treatments.
 Treatment is assigned randomly within similar blocks.
A form of control.
Definitions (cont’d.)

Matched pairs:




Common form of blocking
Compares two treatments
The pairs are “alike”
Common forms:
 Using random process
In pair, one receives treatment, other receives placebo
 Pairs are observed at a later time to see if treatment had
any effect
Test scores from a before-after situation
 Individual
 Takes a before-test
 Receives some type of treatment
 Takes an after-test
 Purpose: to see if treatment improves test performance


5.3 Simulation Experiments (p. 286-296)
An Overview


Empirical probabilities relating to real-life can
be obtained
Chance outcomes can be imitated by using

Random number generators






Tables
Calculators
Computers
Dice
Cards
Spinners
Simulation

The imitation of chance behavior in an
attempt to gain information about a reallife situation
randInt(
can be used on your TI-84 plus to
generate random integers
Steps in Creating a Simulation Model
1.
2.
3.
4.
5.
State the problem or describe the
experiment.
State the assumptions.
Assign digits to represent outcomes.
Simulate your conclusions.
State your conclusions.
When Trials are Completed

Determine empirical probability by
calculating the ratios

Number of situations in which you are
interested divided by the total number
of trials.