Transcript Document

Week 2
Research Design
Human Factors
PSYC 2200
Michael J. Kalsher
Department of
Cognitive Science
PART 1
Scientific Theory
- Its Nature and Utility
- Its Elements: Concepts and Definitions
Naive Science and Theory
– People regularly observe events around them
and speculate about their causes.
– Personal observations frequently forms the
basis of people’s explanations of these events.
– In these instances, people are behaving like
scientists—in part. They are trying to
understand and explain events and predict
outcomes.
– But …they are doing so without awareness of
the rules of science—hence the term “naïve
science”.
Naive Science and Theory
As naïve scientists, we try to understand some interesting
situation in a way that will predict or explain its operation.
–A Definition of Theory:
A set of interrelated constructs (concepts),
definitions, and propositions that present a systematic view
of phenomena by specifying relations among variables,
with the purpose of predicting and explaining the
phenomena. (Kerlinger, Foundations of Behavioral Research,
1986).
Naïve science/understanding is a “kind” of theory, but
it could be considered “mere speculation.” We’ll use the
term theory to mean a simplified explanation of reality.
Theory: Its Purpose & Components
The goal: to predict and explain events.
– Important practical ramifications.
– A theory achieves prediction and explanation by stating
relationships between concepts, when they are
operationalized as variables.
– Variables = things that vary (take on different
intensities, values, or states).
– Concepts (or constructs) = the mental image of the
thing which varies.
• Example: “Fire” is the concept; size, heat or other details about
the fire are the variables based on the concept.
Naive Theory Building:
An Example
Jill decides to vacation at an ocean resort. The first day
at the beach, the water is warm and great for
swimming. The second day, it is very cold. The next
day, the water is again very warm. This phenomenon
(variability of the water temperature) interests her,
because she likes to swim, but doesn’t like cold water.
What is the cause of this
day-to-day variation?
Potential Contributing Factors
The sun has been out each day. Jill reasons that the
sun can’t be the cause of differing water temperatures.
She therefore doesn’t include the sun in her naïve
theory.
She observes the water very carefully each day and
notices the water is clearer on the days it is cold, and
murkier on the days that are better for swimming.
Jill can now predict whether swimming will be good by
observing the clarity of the water. We can stop now if
the goal is merely to pick the best days to swim!
But … the identification of the pattern doesn’t explain
why the water temperature should shift.
Additional Factors to Consider
Jill next notices a relationship between variations in the
prevailing wind direction on the previous day and the water
temperature the next day. Days with winds out of the
Northeast are followed by days with cold water. Days with
winds from another direction are followed by warm water.
Why should wind direction affect water temperature? She
consults a map and improves her naïve theory by adding
some process or mechanism to explain these events.
Open ocean lies to the Northeast, while the bay she swims in
is protected on all other sides by land. Thus, one possibility is
that the Northeast winds may blow colder deep ocean waters
(which are clearer, as less algae grow in cold temperatures)
into the bay.
The Beginnings of Theory Development
Jill has identified variables (bay water temperature, bay water
clarity, and wind direction) and specified relationships
among them.
She is likely to call one of these variables the cause,
and the other two variables the effects.
Jill now has an intuitive idea of what constitutes a
causal relationship. It is:
A specific condition of a variable (Northeast wind) which
occurs earlier in time than a corresponding condition of
another variable (cold water), combined with some
reasonable explanation for the relationship between
these two variables (the nature of the geography of the region).
Is Jill finished?
Given the data thus far, it is too early too conclude the
proposed causal solution. So what’s next?
1.Jill should collect more data, so she extends her vacation for a
month and continues her observations.
2.If the pattern continues, we can be increasingly certain that the
relationship accurately reflects reality. More evidence can improve
the probability that Jill’s theory is true.
3.Naïve scientists will consider their personal observations to be
sufficient to construct a completed theory. For the true scientist,
personal observations are only the beginning.
4.The scientific method: a highly formalized, systematic and
controlled approach to theory development and testing.
Testing Theories: Naïve Science vs. Science.
1. Naive scientists are likely to be satisfied with Jill’s
evidence because it is “self-evident”, “common sense”, “is
what any reasonable person would conclude.”
2. It is important to rule out alternative explanations
(competing causes of the phenomena) by building
controls into the experimental design.
3. There are many procedures to guard against biased
testing of theories:
1.
Randomization (random selection; random assignment).
2.
Appropriate research design and methodology.
3.
Valid and reliable instrumentation.
4.
Statistical procedures.
Methods of Knowing (fixing belief)
1. Method of Tenacity:
Least sophisticated, but commonly used.
Establishes explanations by asserting that something is true because it is
commonly known to be true. Occurs entirely within a given individual and is
therefore subject to their beliefs, values and idiosyncrasies. Surprisingly
resistant to contrary evidence.
2. Method of Authority: Truth established when something or someone
held in high regard states “the truth.” Relies on the actual “truth” of the
expert or source. Widespread in marketing. Potentially dangerous.
3. Method of Reasonable Men (apriori method): Relies on the
idea that the propositions are self-evident or reasonable. Criterion for “fixing
belief” lies in the reasonableness of the argument and how reasonable is
defined. May agree with “reason” but not the observable facts.
4. Scientific Method: Critical shift; all three previous methods are
focused inward. Science shifts the locus of truth from single individuals to
groups, by establishing a mutually agreed upon rules for establishing truth.
Basic Requirements of the Scientific Method
1. The Use and Selection of Concepts
2. Linking Concepts by Propositions
3. Testing Theories with Observable Evidence
4. Defining Concepts
5. Publication of Definitions and Procedures
6. Control of Alternative Explanations
7. Unbiased Selection of Evidence
8. Reconciliation of Theory and Observation
9. Limitations of the Scientific Method
1. The Use and Selection of Concepts: First, develop a
verbal (conceptual) description or name for the events. Here
we seek to explain events by linking two concepts: a “cause” to
an “effect.” Scientists arrive at causally related concepts
through a thorough review of previous research, by using
logical deduction, and by insight and personal observations.
2. Linking Concepts by Propositions: To explain a
phenomenon, we must specify the functional mechanism
whereby changes in variable “A” (a cause) should lead to
changes in some variable “B” (an effect). Such a functional
statement distinguishes between causal relationships (that
have such an explanation) and covariance relationships (that
do not).
3. Testing Theories with Observable Evidence: No theory
regarded as “probable” truth until it has been empirically
tested against some observable reality.
4. Defining Concepts: Testing theory with some observable
evidence generates this requirement we must bridge the gap
between theory (stated at a high level of abstraction) and
observation (which occurs at a very concrete level).
The gap is bridged by defining both the meanings of concepts and
the indicators or measures used to capture those meanings, a
process that produces an operational definition.
An operational definition adds three things to the theoretical
definition:
- Describes the unit of measurement
- Specifies the level of measurement
- Provides a mathematical or logical statement that clearly states
how measurements are to be made and combined to create a
single value for the abstract concept.
5. Publication of Definitions/Procedures: The scientific
method is public. All other researchers need to have the ability
to carry out the same procedures to arrive at the same
conclusions. Requires that we be as explicit and objective as
possible in stating and publicizing definitions/procedures.
6. Control of Alternative Explanations: Scientific studies
must be designed to rule out alternative causes. Isolating a
true causal variable means that these other confounding
variables have to be identified and their effects eliminated or
controlled.
7. Unbiased Selection of Evidence: Decision to accept a
theory as probably true or probably false will be based on
observations of limited evidence (e.g., a few hundred college
students). Generalizing results beyond the (limited) study
sample requires the evidence to be selected so as to eliminate
biases and be representative of some broader population.
8. Reconciliation of Theory and Observation: Degree of
agreement between what theory predicts we should observe
and what we actually do is the basis of the self-correcting
nature of this iterative approach.
9. Limitations of the Scientific Method: Scientific method
cannot be used when objective observation is not possible
(e.g., determining whether a social policy is good or bad, if
objective measurement of “good” and “bad” is not possible).
Basic beliefs or assumptions are not testable propositions, as
they can never be disproved, and thus cannot be investigated
scientifically.
PART 2
• Types of Relationships
• Testing Hypotheses: Confounds & Controls
Types of Relationships: Null, Covariance,
Causal
1. Null relationship: No relationship at all. Concepts
operate independently of each other.
2. Covariance relationship: Concepts vary together
(directly or inversely).
3. Causal relationship: Concepts covary (are related),
changes in one concept precede changes in the
other concept, and a causal relationship between the
two (a cause and an effect) can be justified logically.
• Covariance relationships can provide prediction, but not
a (necessarily) valid explanation of the relationship.
• Accepting covariance relationships as “true” without
empirical testing fails to identify spurious relationships.
Two variables may covary because they are both the
effects of a common cause.
The unobserved, but real, causal variable (Amount of Education) is termed a
confounding variable, since it may mislead us by creating the appearance of
a relationship between the observed variables.
Covariation vs. Causality:
Key Differences
• Covariance alone does not imply causality.
- Covariance merely means that a change in one variable
is associated with a change in the other variable.
- Causality requires that a change in one variable (IV)
creates the change in the other (DV).
• Covariance is 1of 4 conditions that must be met:
- Spatial Contiguity (connected in the same time and space).
- Temporal Ordering (change in the IV occurs before the
change in the DV).
- Necessary Connection (statement specifying why the
cause can bring about a change in the effect).
Covariation vs. Causality:
An Example
• Consider the example of an observed relationship
between first letter of a person’s last name and the
person’s exam grade.
- Spatial Continguity requirement? Yes
The name and the exam score both exist within the same person.
- Covariance requirement? Yes
Last names A through M scored lower than the others.
- Temporal Ordering requirement? Yes
A person’s last name was established before exams were taken.
- Necessary Connection requirement? Not so fast
Is there a sensible reason why a person’s last name should create
different levels of performance on an exam?
Covariation vs. Causality:
An Example
We expect persons with higher incomes to
read more newspapers (covariation)
because the income provides the
purchasing power and leisure time for such
readership (necessary connection)
We expect older persons will read more
newspapers (covariation) for two reasons:
they have fewer children at home and thus
more leisure time and they developed the
habit of reading before the dominance of
TV and the Internet (necessary connection)
Spurious Relationships
Ice Cream Sales
Heat Wave
Swimming Pool Drownings
A city's ice cream sales are found to be highest when the rate of drownings in the city’s swimming
pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a
spurious relationship between the two. In reality, a third variable, in this instance a heat wave, more
likely caused both.
Testing Hypotheses: Confounds and Controls
Life would be simpler if every effect variable (DV)
had only one cause!
– Hardly ever the case; Becomes difficult to sort out how
variables affect each other.
– An observed covariance relationship between two
variables could occur because of some real relationship
or due to the spurious effect of a third confounding
variable.
– Suppose we are interested in determining whether
there is a real relationship between Exposure to Movie
Violence and the Number of Violent Acts committed
by adolescents.
– If we ignore, or are unaware of, the confounding variable
(Predisposition to Violence) we may erroneously conclude
that all change in the number of Acts of Violence is due to
the direct action of level of Exposure to Movie Violence.
Controlling for Confounding Variables
• Identifying Control Variables
• Internal Validity, External Validity, and
Information
• Methods for Controlling Confounding
Variables
Identifying
Control
Variables
Internal Validity, External Validity,
and Information
• Internal Validity: the extent to which we can be sure
that no confounding variables have obscured the true
relationship between the variables in the hypothesis test.
That a change in the IV causes a change in the DV.
• External Validity: the ability to generalize from
results of a study to the real world.
• Information: pertains to the amount of information we
can obtain about any confounding variable and its
relationship with the relevant variables.
Methods of Controlling Confounding
Variables
• Manipulated Control: we eliminate the effect of a
confounding variable by not allowing it to vary (e.g.,
selecting and/or matching subjects on potentially
important confounding variables).
• Statistical Control: we build the confounding
variable(s) into the research design as additional
measured variables.
• Randomization: randomly assign study participants
to the experimental groups or conditions so that the
potential effects of confounding variables are distributed
equally among the groups.
Manipulated Control:
Eliminating effects of confounding variables
through research design and sampling decisions
Example:
A researcher investigating the effects of seeing
justified violence in video games on children
knows that young children cannot interpret the
motives of characters accurately. She decides to
limit her study to older children only, to eliminate
random responses or unresponsiveness of
younger children.
34
Statistical Control:
Confounding variables measured; mathematical
procedures used to remove their effects
Example:
A political communication researcher interested in
studying emotional appeals versus rational
appeals in political commercials suspects that the
effects vary with the age of the viewer. She
measures age, and uses it as an independent
predictor to isolate, describe, and remove its
effect.
35
Randomization:
Unknown sources of error are equalized by
randomly assigning subjects to research
conditions
Example:
Many different factors are known to affect the
amount of use of Internet social networking sites.
A researcher wants to test two different site
designs. He randomly assigns subjects to work
with each of the two designs. This approach aims
to distribute the amount of confounding error from
unknown factors equally across groups.
36
Methods of Controlling Confounding
Variables: A summary
• Manipulated and statistical control give high internal validity,
while randomization is a bit weaker.
• Statistical control and randomization give high external
validity, while manipulated control is weaker.
• Key difference between randomization and the other
techniques is that randomization doesn’t involve
identifying/measuring the confounding variables.
• A major advantage of randomization is that we can assume
that all confounding variables have been controlled to a certain
extent—but any random process will result in disproportionate
outcomes occasionally. Randomization also provides little
information about the action of any confounding variables.
PART 3
• Classes of Research Variables
• Measurement: The Foundation of Scientific
Inquiry
• Essential Elements of Research: Reliability,
Validity, Control and Importance
Classes of Research Variables:
Variables defined by their use in research
Independent variable
Dependent variable
Extraneous variable
A constant
A variable that is actively manipulated
by the researcher to see what its impact
will be on other variables.
A variable that is hypothesized to be
affected by the independent-variable
manipulation.
Any variable (usually unplanned or
uncontrolled factors), other than the
independent variable, that might affect
the dependent measure in a study.
Any variable prevented from varying
(by holding variables constant, they do not
affect the outcome of the research).
Classes of Research Variables:
Levels of Measurement
Depending on our operational definition, a measurement can
give us differing kinds of information about a theoretical
concept.
1. Nominal. A variable made up of discrete, unordered categories. Each
category is either present or absent and categories are mutually exclusively
and exhaustive (e.g., gender).
2. Ordinal. A variable for which different values indicate a difference in the
relative amount of the characteristic being measured. Not always possible
to determine the absolute distance between adjacent categories.
3. Interval. A variable for which equal intervals between variable values
indicate equal differences in amount of the characteristic being measured.
4. Ratio. Ratios between measurements as well as intervals are meaningful
because there is a starting point (zero).
Nominal Measurement: An Example
A nominal measurement makes a simple distinction between the
presence or absence of the theoretical concept within the unit of
analysis. Theoretical concepts can have more than two nominal
response categories (nominal factors) as in the example below.
Ordinal Measurement: An Example
Categories of a nominal level variable cannot be arranged in any
order of magnitude. By adding ordering by quantity to the
definition of the categories, the sensitivity of our observations is
improved.
Example: Subjects in a study are asked to sort a stack of photographs
according to their physical attractiveness so that the most attractive
photo is on top and the least attractive photo is on the bottom.
This introduces the general idea of comparative similarity in
observations. We can now say that the 2nd photo in the stack is more
attractive to the subject than all the photos below it, but less attractive
than the photo on top of the pile.
We can assign an “attractiveness” score to each photo by numbering,
starting at the top of the pile (1=most attractive; 2=second most
attractive, etc.). This is called a rank order measurement.
With ordinal measurement, we cannot determine the absolute distance
between adjacent categories. Suppose we knew the “real” attractiveness
scores of the photos for two subjects. Although their “real” evaluation of the
photos are quite different, they rank the comparative attractiveness identically.
Interval Measurement: An Example
If we can rank order observations and assign them numerical
scores that register the degree of distance between observations
or points on the measurement scale, we have improved the level
of measurement to interval-level.
Interval scales are numerical scales in which intervals have the
same interpretation throughout. As an example, consider the
Fahrenheit scale of temperature. The difference between 30
degrees and 40 degrees represents the same temperature
difference as the difference between 80 degrees and 90
degrees. This is because each 10-degree interval has the same
physical meaning (in terms of the kinetic energy of molecules).
Interval scales are not perfect, however. In particular, they do not
have a true zero point.
Scales of Measurement
Levels of Measurement
Nominal
Examples
Ordinal
Ratio
Diagnostic categories
Socioeconomic
Test scores;
Weight; length;
brand names; political
class; ranks
personality and
reaction time;
attitude scales
# of responses
Identity; magnitude
Identity; magnitude;
equal intervals
equal intervals;
or religious affiliation
Properties
Interval
Identity
Identity; magnitude
true zero point
Mathematical None
Operations
Type of Data Nominal
Typical
Statistics
Chi Square
Rank order
Add; subtract
Add; subtract;
multiply; divide
Ordered
Score
Score
Mann-Whitney
t-test; ANOVA
t-test; ANOVA
U-test
Evaluating Measures:
Effective Range
Effective Range:
Scales sensitive enough to detect differences among one
group of subjects may be insensitive to detect differences
among another.
Scale Attenuation (or range restriction).
A problem associated with scales not ranging high enough,
low enough, or both.
Leads to “ceiling” effects and “floor” effects that distort data
by not measuring the full range of a variable.
Essential Elements of Measurement:
Reliability, Validity, Control and Importance
Reliability
Getting the same result when a measurement device is applied to the
same quantity repeatedly.
Validity
The extent to which a measurement tool (test,
device) measures what it purports to measure.
Control
Behavior can be influenced by many factors, some known and others
unknown to the researcher. Control refers to the systematic
methods employed by a researcher to reduce threats to the
the study posed by extraneous influences on the behavior of
participants and the observer.
validity of
both the
Importance
Does the research question we are trying to answer warrant the
expenditure of resources (i.e., time, money, effort) that will be
required to complete the study).
Types of Reliability
Test-retest Reliability
Consistency of measurement over time
Internal Consistency
Inter-item correlation
Interrater Reliability
Level of agreement between independent
observers of behavior(s). Assessed via
Agreement
x 100
correlation or the procedure at right.
Agreement + Disagreement
Types of Validity
Face validity. The (non-empirical) degree to which a test appears to be a
sensible measure.
Content validity. The extent to which a test adequately samples the domain of
information, knowledge, or skill that it purports to measure.
Criterion validity.
Now (concurrent) and Later (predictive). Involves
determining the relationship (correlation) between the predictor (IV) and the criterion
(DV).
Construct validity. The degree to which the theory or
theories behind the research study provide(s) the best
explanation for the results observed.
Internal vs. External Validity
Internal Validity
Extent to which causal/independent variable(s) and no
other extraneous factors caused the change being
measured.
External Validity (generalizability)
Degree to which the results and conclusions of your study
would hold for other persons, in other places, and at other
times.
Threats to Internal Validity:
Factors that reduce our ability to draw valid conclusions
Selection
History
Maturation
Repeated Testing
Instrumentation
Regression to the mean
Subject mortality
Selection-interactions
Experimenter bias
Reducing Threats to Internal Validity
The role of Control
Behavior is influenced by many factors termed—confounding
variables—that tend to distort the results of a study, thereby
making it impossible for the researcher to draw meaningful
conclusions. Some of these may be unknown to the researcher.
Control refers to the systematic methods (e.g., research
designs) employed to reduce threats to the validity of the study
posed by extraneous influences on both the participants and the
observer (researcher).
Group/Selection threat
Occurs when nonrandom procedures are used to assign
subjects to conditions or when random assignment fails
to balance out differences among subjects across the
different conditions of the experiment.
Example:
A researcher is interested in determining the factors most likely to
elicit aggressive behavior in male college students. He exposes
subjects in the experimental group to stimuli thought to provoke
aggression and subjects in the control group to stimuli thought to
reduce aggression and then measures aggressive behaviors of the
students. How would the selection threat operate in this instance?
History threat
Events that happen to participants during the
research which affect results but are not linked to
the independent variable.
Example:
The reported effects of a program designed to improve
medical residents’ prescription writing practices by the
medical school may have been confounded by a self-directed
continuing education series on medication errors provided to
the residents by a pharmaceutical firm's medical education
liaison.
Maturation threat
Can operate when naturally occurring biological or
psychological changes occur within subjects and
these changes may account in part or in total for
effects discerned in the study.
Example:
A reported decrease in emergency room visits in a long-term
study of pediatric patients with asthma may be due to subjects
outgrowing childhood asthma rather than to any treatment
regimen introduced to treat the asthma.
Repeated testing threat
May occur when changes in test scores occur not
because of the intervention but rather because of
repeated testing. This is of particular concern when
researchers administer identical pretests and
posttests.
Example:
A reported improvement in medical resident prescribing
behaviors and order-writing practices in the study previously
described may have been due to repeated administration of the
same short quiz. That is, the residents simply learned to provide
the right answers rather than truly achieving improved
prescribing habits.
Instrumentation threat
When study results are due to changes in instrument
calibration or observer changes rather than to a true
treatment effect, the instrumentation threat is in
operation.
Example:
In Kalsher’s Experimental Methods and Statistics course, he
evaluates students progress in understanding principles of research
design at week 3 of the semester. A graduate T.A. evaluates the
students at the conclusion of the course. If the evaluators are
dissimilar enough in their approach, perhaps because of lack of
training, this difference may contribute to measurement error in
trying to determine how much learning occurred over the semester.
Statistical Regression threat
The regression threat can occur when subjects
have been selected on the basis of extreme
scores, because extreme (low and high) scores in
a distribution tend to move closer to the mean (i.e.,
regress) in repeated testing.
Example:
if a group of subjects is recruited on the basis of extremely high
stress scores and an educational intervention is then implemented,
any improvement seen could be due partly, if not entirely, to
regression to the mean rather than to the coping techniques
presented in the educational program.
Experimental Mortality threat
Experimental mortality—also known as attrition,
withdrawals, or dropouts—is problematic when there
is a differential loss of subjects from comparison
groups subsequent to randomization, resulting in
unequal groups at the end of a study.
Example:
Suppose a researcher conducts a study to compare the effects of a
corticosteroid nasal spray with a saline nasal spray in alleviating
symptoms of allergic rhinitis (irritation and inflammation of the nasal
passages). If subjects with the most severe symptoms preferentially
drop out of the active treatment group, the treatment may appear
more effective than it really is.
Selection Interaction threats
A family of threats to internal validity produced
when a selection threat combines with one or
more of the other threats to internal validity.
When a selection threat is already present, other
threats can affect some experimental groups,
but not others.
Example:
If one group is dominated by members of one fraternity
(selection threat), and that fraternity has a party the night
before the experiment (history threat), the results may be
altered for that group.
Threats to External Validity:
Ways you might be wrong in making generalizations
People, Places, and Times
Demand Characteristics
Hawthorne Effects
Order Effects (or carryover effects)
People threat:
Are the results due to the unusual
type of people in the study?
Example:
You learn that the grant you submitted to assess average
drinking rates among college students in the U.S. has been
funded. In late November, you post an announcement
about the study on campus to get subjects for the study.
100 students sign up for the study. Of these, 78 are
members of campus fraternities; the other 22 are members
of the school’s football team.
Places threat:
Did the study work because of the
unusual place you did the study in?
Example:
Suppose that you conduct an “educational” study in a
college town with lots of high-achieving educationallyoriented kids.
Time threat:
Was the study conducted at a peculiar time?
Example:
Suppose that you conducted a smoking cessation study
the week after the U.S. Surgeon General issued the well
publicized results of the latest smoking and cancer studies.
In this instance, you might get different results than if you
had conducted the study the week before.
Demand Characteristics
Participants are often provided with cues to the
anticipated results of a study.
Example:
When asked a series of questions about depression, participants
may become wise to the hypothesis that certain treatments may
work better in treating mental illness than others. When participants
become wise to anticipated results (termed a placebo effect), they
may begin to exhibit performance that they believe is expected of
them.
Making sure that subjects are not aware of anticipated outcomes
(termed a blind study) reduces the possibility of this threat.
Hawthorne Effects
Similar to a placebo, research has found that the mere
presence of others watching a person’s performance
causes a change in their performance. If this change is
significant, can we be reasonably sure that it will also
occur when no one is watching?
Addressing this issue can be tricky but employing a
control group to measure the Hawthorne effect of those
not receiving any treatment can be very helpful. In this
sense, the control group is also being observed and will
exhibit similar changes in their behavior as the
experimental group therefore negating the Hawthorne
effect.
Order Effects (carryover effects)
Order effects refer to the order in which treatment
is administered and can be a major threat to
external validity if multiple treatments are used.
Example:
If subjects are given medication for two months, therapy for another
two months, and no treatment for another two months, it would be
possible, and even likely, that the level of depression would be least
after the final no treatment phase. Does this mean that no treatment
is better than the other two treatments? It likely means that the
benefits of the first two treatments have carried over to the last phase,
artificially elevating the no treatment success rates.
PART 4
• Describing data: Measures of Central
Tendency and Dispersion
• The Role of Variance
Describing Data
Measures of Central Tendency
- Mean (the average)
- Median (the middle number)
- Mode (the most frequently occurring number)
Measures of Dispersion
- Range
- Standard Deviation (square root of the variance)
- Variance (the average squared deviation from the mean)
69
The Role of Variance
- In an experiment, IV(s) are manipulated to cause variation between
experimental and control conditions.
- Experimental design helps control extraneous variation--the variance
due to factors other than the manipulated variable(s).
Sources of Variance
- Systematic between-subjects variance
Experimental variance due to manipulation of the IV(s) [The Good Stuff]
Extraneous variance due to confounding variables.
[The Not-So-Good Stuff]
Natural variability due to sampling error
- Non-systematic within-groups variance
Error variance due to chance factors (individual differences) that affect some
participants more than others within a group
70
Separating Out The Variance
SST
SSM
SST = Sums of Squares Total
SSM = Sums of Squares Model
SSR = Sums of Squares Error
SSR
71
Controlling Variance in Experiments
In experimentation, each study is designed to:
1. Maximize experimental variance.
2. Control extraneous variance.
3. Minimize error variance.
• Good measurement
• Manipulated and Statistical control
72
Test Statistics
Essentially, most test statistics are of the following
form:
Systematic variance
Test statistic =
Unsystematic variance
Test statistics are used to estimate the likelihood that an observed difference is real (not
due to chance), and is usually accompanied by a “p” value (e.g., p<.05, p<.01, etc.)
73
A Very Simple Statistical Model
outcomei = (model) + errori
•model – an equation made up of variables and parameters
•variables – measurements from our research (X)
•parameters – estimates based on our data (b)
outcomei = (bXi) + errori
outcomei = (b1X1i + b2X2i + b3X3i)+ errori
74
Types of Mistakes
Statistical decision
Reject Ho
Don’t reject Ho
True state of null hypothesis
Ho true
Ho false
Type I error
Correct
Correct
Type II error
75
Statistical Power
• A measure of how well Type II errors
have been avoided (i.e. how well a test
is able to find an effect)
• = 1 – type II error rate
• Power should be 0.8 or higher, so Type
II error rate should not exceed .20.
76
Effect Sizes:
The Correlation coefficient
The statistical test only tells us whether it is safe to
conclude that the means come from different populations.
It doesn’t tell us anything about how strong these
differences are. So, we need a standard metric to gauge
the strength of the effects.
The correlation coefficient (r) is one metric for gauging
effect size.
• Ranges from 0 – 1 (no effect to perfect effect)
• Rough cutoffs (nonlinear, that is twice the r value
doesn’t necessarily mean twice the effect)
– 0.10 – small effect (explains 1% of the variance)
– 0.30 – medium effect (explains 9% of the variance)
– 0.50 – large effect (explains 25% of the variance)
77
Effect Sizes:
The coefficient of determination
The statistical test only tells us whether it is safe to conclude that
the means come from different populations. It doesn’t tell us
anything about how strong these differences are. So, we need a
standard metric to gauge the strength of the effects.
r2 (r-Square), or the “Coefficient of Determination”, is one metric
for gauging effect size.
Rules of Thumb regarding effects sizes:
Small effect: 1-3% of the total variance
Medium effect: 10% of the total variance
Large effect: 25% of the variance
r2
=
SSM
SST
78
Reporting Statistical Models
• APA recommends exact p-values for all reported
results; best to include an effect size, too
– Effect “x” was not statistically significant in condition y, p =
.24, d = .21
• Report a mean and the upper and lower boundaries
of the confidence interval as M = 30, 95% CI [20,40]
– If all confidence intervals you are reporting are 95%, it’s
acceptable to say so and then later say something like:
In this condition, effect x increased, M = 30 [20,40].
79
A Model of the Research Process:
Levels of Constraint
(Model used to illustrate the continuum of demands placed on the adequacy of
the information used in research and on the nature of the processing of that
information.)
High
Low
Experimental Research
Differential Research
Correlational Research
Case-study Research
Naturalistic Observation
Exploratory Research
Research plan
becomes increasingly
detailed (e.g., precise
hypotheses and
analyses) but less
flexible.
Research plan may be
general, ideas, questions,
and procedures relatively
unrefined.
Observational Methods
No direct manipulation of variables by the
researcher. Behavior is merely recorded--but
systematically and objectively so that the
observations are potentially replicable.
Advantages
•
•
Reveals how people normally behave.
Experimentation without prior careful observation can lead to a
distorted or incomplete picture.
Disadvantages
•
•
Generally more time-consuming.
Doesn’t allow identification of cause and effect.
81
Quasi-Experimental Design
In a quasi-experimental study, the experimenter
does not have complete control over manipulation
of the independent variable or how participants
are assigned to the different conditions of the
study.
Advantages
•
•
Natural setting
Higher face validity (from practitioner viewpoint)
Disadvantages
•
Not possible to isolate cause and effect as conclusively as with a
“true” experiment.
82
Types of
Quasi-Experimental Designs
83
One Group Post-Test Design
Measurement
Treatment
Time
Change in participants’ behavior may or may not be
due to the intervention.
Prone to time effects, and lacks a baseline against
which to measure the strength of the intervention.
84
One Group Pre-test Post-test Design
Measurement
Treatment
Measurement
Time
Comparison of pre- and post-intervention scores
allows assessment of the magnitude of the
treatment’s effects.
Prone to time effects, and it is not possible to
determine whether performance would have
changed without the intervention.
85
Interrupted Time-Series Design
Measurement
Measurement
Time
Measurement
Treatment
Measurement
Measurement
Don’t have full control over manipulations of
the IV. No way of ruling out other factors.
Potential changes in measurement.
Measurement
86
Static Group Comparison Design
Group A:
Treatment
Measurement
(experimental group)
Group B:
No Treatment
Measurement
(control group)
Time
Participants are not assigned to the conditions randomly.
Observed differences may be due to other factors.
Strength of conclusions depends on the extent to which
we can identify and eliminate alternative explanations.
87
Experimental Research:
Between-Groups and
Within-Groups Designs
88
Between-Groups Designs
Separate groups of participant are used for each
condition of the experiment.
Within-Groups (Repeated Measures) Designs
Each participant is exposed to each condition of
the experiment (requires less participants than
between groups design).
89
Between-Groups Designs
Advantages
•
•
•
Simplicity
Less chance of practice and fatigue effects
Useful when it is not possible for an individual to
participate in all of the experimental conditions
Disadvantages
•
•
Can be expensive in terms of time, effort, and number of
participants
Less sensitive to experimental manipulations
90
Examples of
Between-Groups Designs
91
Post-test Only / Control Group Design
Group A:
Measurement
Treatment
(experimental group)
Random
allocation:
Group B:
Measurement
No Treatment
(control group)
Time
If randomization fails to produce equivalence, there is no way of knowing
that it has failed. Experimenter cannot be certain that the two groups
were comparable before the treatment.
92
Pre-test / Post-test Control Group
Design
Group A:
Measurement
Treatment
Measurement
No Treatment
Measurement
Random
allocation:
Group B: Measurement
Time
Pre-testing allows experimenter to determine equivalence
of the groups prior to the intervention. However, pretesting may affect participants’ subsequent performance.
93
Random allocation:
Solomon Four-Group Design
Group A: Measurement
Treatment
Measurement
Group B: Measurement
No Treatment
Measurement
Group C:
Treatment
Measurement
Group D:
No Treatment
Measurement
Time
94
Within-Groups Designs:
Repeated Measures
Advantages
• Economy
• Sensitivity
Disadvantages
• Carry-over effects from one condition to another
• The need for conditions to be reversible
95
Repeated-Measures Design
Treatment
Measurement
No Treatment
Measurement
Measurement
Treatment
Measurement
Random Allocation
No Treatment
Time
Potential for carryover effects can be avoided by randomizing
the order of presentation of the different conditions or
counterbalancing the order in which participants experience
them.
96
Latin Squares Design
Three Conditions or Trials
order of conditions or trials:
One group of participants
A
B
C
Another group of participants
B
C
A
Yet another group of participants
C
A
B
Order of presentation of conditions in a within-subjects design can be
counterbalanced so that each possible order of conditions occurs just once.
Problem not completely eliminated because A precedes B twice, but B precedes
A only once. Same with C and A.
97
Balanced Latin Squares Design
Four Conditions or Trials
order of conditions or trials:
One group of participants
A
B
C
D
Another group of participants
B
D
A
C
Yet another group of participants
D
C
B
A
And yet another group of participants
C
A
D
B
Note: This approach works only for experiments with an even number of
conditions. For additional help with more complex multi-factorial designs,
see: http://www.jic.bbsrc.ac.uk
98
Factorial Designs
• include multiple independent variables
• allow for analysis of interactions
between variables
• facilitate increased generalizability
99