Lecture 4 Outline - University of Pennsylvania

Download Report

Transcript Lecture 4 Outline - University of Pennsylvania

Lecture 4 Outline
•
•
•
•
Chapter 1.3, 1.5
Control in Experimental Design
Causal Inference in Observational Studies
Summarizing Data
– Numerical methods
– Graphical methods
The meaning of the causal
inference
• In the motivation-creativity study, we concluded
that there is a strong evidence that the “intrinsic
questionnaire” treatment caused a difference in
creativity compared to the “extrinsic
questionnaire” treatment.
• This difference could be caused by anything that
differs between the two treatments, e.g, the actual
questionnaire, the order in which the poems were
judged, the relative preferences of the judges for
the two treatments.
Control in Experimental Design
• The principle of control in experimental design is
to make sure that all other factors besides the
intended treatments are kept the same in the
different groups. Then we can conclude that the
intended treatment causes a difference between the
groups.
• Examples of control:
– Use a placebo for the control group.
– Double blinding
– Judge poems in random order.
Causal Inference in an
Observational Study
• In an observational study, we could assume that,
unbeknownst to us, the subjects were randomly
assigned to treatments (i.e., there are no
confounding variables). Then we could use the
randomization test p-value to make inferences.
• But this is a “fictitious” probability model which
might or might not be valid.
• Inferences based on a randomized experiment are
much stronger because the probability model on
which they are based (that of random assignment)
is known to be correct.
Meaningful Comparisons
• Main lesson of chapter: The best way to compare
two (or more) groups is to do a random
experiment or take a random sample. This avoids
systematic bias due to confounding variables and
selection bias
• But if this is not possible, we should generally try
to make the groups as “comparable” as possible by
adjusting for known confounding variables and
selection biases. Often times, important first steps
are to use an appropriate control group and to
compare the appropriate rate rather than absolute
numbers
Control Group
• In a randomized experiment, we want the treatment and
control group to be similar in every way except that one
takes the treatment and the other doesn’t, i.e., we use
placebo and double blinding.
• Similarly in an observational study, we want to compare
the treatment group to a control group that is as similar as
possible.
• Explain the need for a control group by criticizing the
statement “A study on the benefits of vitamin C showed
that 90% of the people suffering from a cold who take
vitamin C get over their cold within a week”
Use of Rates
• An article in This Week magazine says that if you
went “hurtling down the highway at 70 miles an
hour, careening from side to side,” you would
have four times as good a chance of staying alive
if the time were seven in the morning than seven
at night.
• The evidence: “Four times more fatalities occur on
the highways at 7 p.m. than 7 a.m.”
• Does the conclusion follow from the evidence?
• More accidents occur in clear weather than foggy
weather. Is clear weather safer to drive in?
Experimental Design Example:
Salk Vaccine Field Trial
• In the first half of the 20th century, polio was one
of the most frightening diseases, striking hardest at
young children and leaving many helpless
cripples.
• By the 1950s, Jonas Salk developed a vaccine for
polio that had proved promising in laboratory
experiments but it was necessary to try it in the
real world before releasing it for general use.
Designs for Salk Vaccine Field Trial
• Historical Control Approach: Distribute the vaccine as
widely as possible, through the schools, to see whether the
rate of reported polio was appreciably less than usual
during the subsequent season.
• Observed Control Approach: Offer vaccination to all
children in the second grade of participating schools and
follow the polio experience not only in these children but
in the first and third grade children.
• Placebo Control Approach: Choose the control group from
the same population as the treatment group – children
whose parents consented to vaccination. Assign the
treatment randomly. Give a placebo to control group. Do
not tell doctors which group children belong to.
Polio Example
• Using figure 1 as an example, explain why a
contemporaneous control group is needed in experiments
where the effectiveness of a drug or vaccine is being
tested?
• Comment on the use of the number of cases. What would
be a more appropriate indicator of whether polio incidence
was increasing?
Summarizing Data
• Numerical summaries
– Measures of center: mean, median mode
– Measures of spread: sample standard deviation
( s  ( X  X ) n1 ( X  X )
) , interquartile range
2
1
2
n
• Graphical Methods
– Relative frequency histograms
– Stem and leaf diagrams
– Box plots
Relative Frequency Histograms
• A histogram is a graph that shows the relative frequency
per unit of measurement.
• The areas of blocks represent the percentage of
observations in the blocks.
• The heights of the blocks represent relative frequency per
unit of measurement, i.e., crowding – percentage per unit
of measurement
• Histograms show broad features – particularly the center,
spread and shape of the distribution (symmetric or
skewed, light tailed or heavy tailed).
Histograms in JMP
• Click Analyze, then Distribution
• Click red triangle next to Distributions, stack to see
horizontal layout
• Click tools, hand (grabber in Version 5) and click on
histogram, drag to change position of bars.
• To make histograms by group (e.g., sex discrimination),
put Salaries in Y and Sex in By box. Click red triangle
next to distributions and click Stack to display horizontally.
For both groups, click red triangle next to distributions and
click Uniform Scaling to display histograms on same scale.
Stem and leaf diagrams
• Cross between graph and table
• Gives quick idea of distribution
• Shows center, spreads and shapes as does
histogram but also shows exact values, easy to
construct by hand, median can be computed.
• Stem and leaf plots in JMP
– Click Analyze, Distribution
– Put variable of interest in Y and click OK
– Click red triangle next to variable of interest (e.g.,
salaries) and click Stem and Leaf
– Back to back stem and leaf plots are not available in
JMP but are useful (see page 17)
Box plots
• Middle 50% of a group of measurements is
represented by a box.
– Line in middle of box is the median
• Various features of upper and lower 25% by other
symbols
– The whiskers extend to the farthest point that is within
1.5 interquartile ranges of upper and lower quartiles.
(IQR=third quartile – first quartile)
– Points farther away are shown individually as outliers.
– Width of a box plot is chosen to make the box look
nice; it does not represent any aspect of data.
Box plots in JMP
• To draw one box plot
– Click Analyze, Distribution.
• To draw side by side box plots
– Click Analyze, Fit Y by X, putting outcome in Y and
group variable in X
– Click red triangle next to One Way Analysis, click
Display Options and then click Box Plot (this produces
box plots that display the box, the whiskers and all of
the data points individually).
• Display 1.13 shows histograms and box plots for
four types of distributions.