Threats to Internal Validity, continued

Download Report

Transcript Threats to Internal Validity, continued

Research Methods in Psychology

Quasi-Experimental Designs and Program Evaluation

Applied Research

    Goal • to improve the conditions in which people live and work Natural settings • messy, “real world,” hard to establish experimental control Quasi-experiments • procedures that approximate the conditions of highly controlled laboratory experiments Program evaluation • applied research to learn whether real-world treatments work

Characteristics of True Experiments

 manipulate an Independent Variable (IV) • treatment, control conditions • high degree of control  especially random assignment to conditions  unambiguous outcome regarding effect of IV on DV • internal validity

Obstacles to Conducting True Experiments in Natural Settings

 Permission • difficult to gain permission to conduct true experiments in natural settings • difficult to gain access to participants  Random assignment perceived as unfair • people want a “treatment” • random assignment is best way to determine whether a treatment is effective • use “waiting-list” control group

Advantage of True Experiments

 Threats to internal validity are controlled • confoundings (alternative explanations for findings) are controlled • rule out alternative explanations to make a causal inference about effect of IV on DV • 8 general classes of threats to internal validity • History • Maturation • Testing Instrumentation • Regression • Selection • Subject attrition • Additive effects with Selection

Threats to Internal Validity

 History • When an event occurs at the same time as the treatment and changes participants ’ behavior • participants’ “history” includes events other than treatment • difficult to distinguish whether treatment has an effect

History Threat, continued

30 20 10 0 70 60 50 40 1 2 3 4 X 5 6 7 8 Week

   Does an AIDS awareness campaign on campus influence condom sales in campus vending machines?

History threat: Suppose at week 4 (X = treatment) a celebrity announces he is HIV+ Can you conclude the awareness campaign was effective?

Threats to Internal Validity, continued

 Maturation • Participants naturally change over time.

• These maturational changes, not treatment, may explain any changes in participants during an experiment.

Maturation Threat, continued

50 40 30 20 10 0 90 80 70 60

   Does a new reading program improve 2nd graders’ reading comprehension?

Reading comprehension improves naturally as children mature over the year.

Can you conclude the reading program was effective?

Pre Post

Threats to Internal Validity, continued

 Testing • Taking a test generally affects subsequent testing • Participants’ performance on a measure at the end of a study may differ from an initial testing because of their familiarity with the measures

Testing Threat, continued

14 12 10 8 2 0 6 4 pre post

   Does teaching people a new problem solving strategy influence their ability to solve problems quickly?

If similar problems are used in the pretest, faster problem solving may be due to familiarity with the test.

Can we conclude that the new strategy improves problem-solving ability?

Threats to Internal Validity, continued

 Instrumentation • Instruments used to measure participants’ performance may change over time  example: observers may become bored or tired • Changes in participants’ performance may be due to changes in instruments used to measure performance, not to a treatment

Instrumentation, continued

30 25 20 15 10 5 0 50 45 40 35 1 2 3 4 X 5 6 7 8 Month

   Suppose that a police protection program is implemented to decrease incidence of rape.

At the same time the program is implemented (X), reporting laws change such that what constitutes rape is broadened.

Can we conclude the program was effective (or ineffective)?

Threats to Internal Validity, continued

 Regression • Participants sometimes perform very well or very poorly on a measure because of chance factors (e.g., luck).

• These chance factors are not likely to be present during a second testing, so their scores will not be so extreme.

• The scores will “regress” (go toward) the mean.

• Regression effects, not treatment, may account for changes in participants ’ performance over time.

Regression, continued

  A test score = true score + error (e.g., chance) definition of an

unreliable

test or measure: • it measures with a lot of error  If people score very high or very low on a test, it ’s possible that chance factors produced the extreme score.

 On a second testing, those chance factors are less likely to be present (that ’s why they’re “chance”)

Regression, continued

100 90 40 30 20 10 0 80 70 60 50 Pre Post

   Suppose that students were selected for an accelerated enrichment program because of their very high scores on a brief test.

Regression: to the extent the test is an unreliable measure of ability, we can expect their scores to regress to the mean at the 2nd testing.

Can we conclude the enrichment program was effective?

Threats to Internal Validity, continued

 Subject attrition • When participants are lost from the study (attrition), the group equivalence formed at the start of the study may be destroyed.

• Differences between treatment and control groups at the end of the study may be due to differences in those who remain in each group.

Subject Attrition, continued

250 200 150 100 50 0 Time1 Time2

     Suppose that an exercise program is offered to employees who would like to lose weight.

At Time 1,

N

= 50

M

weight = 225 pounds At Time 2,

N

= 25 (25 drop out of study) Suppose the 25 who stayed in the program weighed, on average, 150 pounds at Time 1 Did the exercise program help people to lose weight?

Threats to Internal Validity, continued

 Selection • occurs when differences exist between individuals in treatment and control groups at the start of a study • these differences become alternative explanations for any differences observed at the end of the study • random assignment controls the selection threat

Selection, continued

40 35 30 25 20 15 10 5 0

   Suppose that a community recycling program is tested. Individuals who are interested in recycling are encouraged to participate.

Evaluation: Compare the weight of garbage from participants in the program with weight of garbage from those not in the new program.

Can we tell if the new recycling program is effective?

Recyc.

Not

Threats to Internal Validity, continued

 Additive effects with selection • When one group of participants in an experiment    responds differently to an external event (history) matures at a different rate is measured more sensitively by a test (instrumentation) • these threats (rather than treatment) may account for any group differences at the end of a study

Additive Effects with Selection,

continued

70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B

    Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)?

History

threat: Suppose a celebrity announces at week 4 that he is HIV+ Can you conclude the awareness campaign at School A is effective?

Yes, both groups should have experienced the same history threat equally.

Additive Effects with Selection,

continued

70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B

   Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)?

Additive effect of Selection and History

: Suppose at week 4 (X), the student newspaper at School A reports about students who are HIV+ (not part of the awareness campaign).

Can you conclude the awareness campaign was effective?

Additive Effects with Selection,

continued

70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B

   Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)?

Additive effect of Selection and History

: Suppose at week 4 (X), the student newspaper at

School B

reports about students who are HIV+ Can you conclude whether the awareness campaign at School A was effective?

Threats to Internal Validity, continued

 Important points to remember • When there is no comparison group in a study, the following threats to internal validity must be ruled out:  history, maturation, testing, instrumentation, regression, subject mortality, selection • When a comparison group is added, the following threats must be ruled out:  selection, additive effects with selection • Adding a comparison group helps researchers to rule out many threats to internal validity

Threats to Internal Validity, continued

 Threats that even true experiments may not eliminate • contamination • experimenter expectancy effects • novelty effects (including Hawthorne effect)  Threats to

external

validity • occur when treatment effects may not be generalized beyond the particular people, setting, treatment, and outcome of an experiment.

• best way to assess external validity: replication

Threats to Internal Validity, continued

 Contamination • occurs when there is communication about the experiment between groups of participants • three possible outcomes  resentment  rivalry  diffusion of treatments

Threats to Internal Validity, continued

 Expectancy effects • occur when an experimenter unintentionally influences the results of an experiment • two types  expectations lead to systematic errors in interpretation of participants’ performance  expectations lead to errors in recording data

Threats to Internal Validity, continued

 Novelty effects • refer to changes in people’s behaviors simply because as innovation (e.g., a treatment) produces excitement, energy, enthusiasm • Hawthorne effect: a special case  performance changes when people know “significant others” (e.g., researchers, employers) are interested in them or care about their living or work conditions  Because of contamination, expectancy and novelty effects, researchers may have trouble concluding that a treatment was effective

Quasi-Experiments

 “Quasi-” (resembling) experiments • an important alternative when true experiments are not possible • lack the high degree of control found in true experiments • researchers must seek additional evidence to eliminate threats to internal validity

The One-Group Pretest-Posttest Design

 “bad experiment” or “preexperimental design” • an intact group is selected to receive a treatment  e.g., a classroom of children, a group of employees • pretest records participants’ performance before treatment  observation 1 (O 1 ) • treatment is implemented (X) • posttest records performance following treatment (O 2 ) O 1 X O 2

One-Group Pretest-Posttest Design, cont.

O 1 X O 2 •

None

of the threats to internal validity are controlled.

• Any change between pretest (O 1 ) and posttest (O 2 ) may be due to treatment (X) or  history (some other event coincided with treatment)   testing (effects of repeated testing) maturation (natural changes in participants over time  or instrumentation, regression, subject attrition

Quasi-Experimental Designs

 Nonequivalent Control Group Design • a group

similar to

the treatment group serves as a comparison group • obtain pretest and posttest measures for individuals in both groups • random assignment to groups is

not

used • pretest scores are used to determine whether the groups are equivalent  equivalent only on this dimension

Nonequivalent Control Group Design,

continued

treatmen t

O 1 X O 2 ----------------- O 1 p retest

O 2

 

treatmen t g roup no nequivalen t c ontro l g roup

Nonequivalent Control Group Design,

continued

  Compare students in research methods and developmental psychology courses  Example: Does taking a research methods course improve reasoning ability?

DV: 7-item test of methodological and statistical reasoning ability  Suppose group differences are observed at the posttest

Nonequivalent Control Group Design,

continued

3 2 1 6 5 4 0 Pre Develop Post Methods

  By adding a comparison group, rule out these threats to internal validity: • history • maturation • testing • instrumentation • regression Assume that these threats happen the posttest differences

same

to both groups, therefore, can’t be used to explain

Nonequivalent Control Group Design,

continued

 What threats are not ruled out?

• Selection  Without random assignment to conditions, the two groups are probably not equivalent on many dimensions  These preexisting differences may account for group differences at the posttest

Nonequivalent Control Group Design,

continued

• Additive effects with selection  The two groups • may have different experiences (selection X history) • may mature at different rates (selection X maturation) • may be measured more or less sensitively by the instrument (selection X instrumentation) • may drop out of the study (courses) at different rates (differential subject attrition) • may differ in terms of regression to the mean (differential regression)

Quasi-Experiments, continued

 Simple Interrupted Time-Series Design • Observe a DV for some time before and after a treatment is introduced.

• Archival data are often used.

• Look for clear

discontinuity

in the time-series data for evidence of treatment effectiveness.

O 1 O 2 O 3 O 4 X O 5 O 6 O 7 O 8

Simple Interrupted Times-Series Design,

continued

 Example: Study habits • intervention: An instructional course to change students ’ study habits  implemented during the summer following the sophomore year (after semester 4) • DV: semester GPA • Suppose that a discontinuity is observed when the treatment (X) is introduced

Simple Interrupted Times-Series Design,

continued

4 3.5

3 2.5

2 1.5

1 0.5

0 1 2 3 4 5 6 7 8

 What threats can be ruled out?

• maturation: assume maturational changes are gradual, not abrupt • testing (GPA): if testing influences performance, these effects are likely to show up in initial observations (before X)  testing effects less likely with archival data • regression: if scores regress to the mean, they will do so in initial observations

Quasi-Experiments, continued

 Time-Series with Nonequivalent Control Group Design • Add a comparison group to the simple time series design O 1 O 2 O 3 O 4 X O 5 O 6 O 7 O 8 ------------------------------------------------------------- O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8

Time Series with Nonequivalent Control Group Design, continued 4 3.5

3 2.5

2 1.5

1 0.5

0

 Example: Study habits • Suppose that a nonequivalent control group is added —these students don’t participate in the study habits course • Who could be in the comparison group?

• What threats would you be able to rule out?

1 2 3 4 5 6 7 8 Study Control

Program Evaluation

   Goal • provide feedback to administrators of human service organizations in order to help them decide    what services to provide who to provide services to how to provide services most effectively and efficiently Big growth area (especially health care) Program evaluators assess • needs, process, outcomes, efficiency of social services

Four Questions of Program Evaluation

 Needs • Is an agency or organization meeting the needs of the people it serves  survey research designs  Process • How is a program being implemented (is it going as planned)?

 observational research designs

Four Questions of Program Evaluation,

cont.

 Outcome • Has a program been effective in meeting its stated goals  experimental, quasi-experimental research designs; archival data  Efficiency • Is a program cost-efficient relative to alternative programs  experimental, quasi-experimental research designs; archival data

Basic Research and Applied Research

 Program evaluation is the most extreme case of applied research • goal is practical, not theoretical  Relationship between basic and applied research is reciprocal • basic research provides scientifically based principles about behavior and mental processes • these principles are applied in complex, real world • new complexities are recognized and new hypotheses must be tested using basic research