The Problem of Generalization from Empirical Studies in SE

Download Report

Transcript The Problem of Generalization from Empirical Studies in SE

Generalization from empirical studies
 Tore Dybå:
Session introduction (~20 min.)
 Erik Arisholm: Generalizing results through a series of replicated
experiments on software maintainability (~20 min.)
 Jeff Carver:
Methods and tools for supporting generalization (~20 min.)
 Mini-group discussions (~10 min.)
 Plenary discussion (~20 min.)
ISERN Meeting, Noosa Heads, Queensland, Australia
14–15 November, 2005
ICT
1
Generalization from Empirical Studies
in SE: Session Introduction
Tore Dybå
SINTEF ICT
[email protected]
ISERN Meeting, Noosa Heads, Queensland, Australia
14–15 November, 2005
ICT
2
(Some of) the problem
 Empirical SE research often generalizes about software organizations
as if they were all alike, or refrains from generalizing at all, as if they
were all unique:
 In the first case, it is never really clear that findings about organizations
actually sampled apply to organizations not sampled.
 With respect to the second, is there really any point in studying software
organizations if one does not believe that common denominators exist
among relatively large classes of organizations?
We must become more concerned about the conditions under which
our research findings are valid if our work is to be applied more widely.
ICT
3
Generalization is closely related to
construct validity and external validity
Construct validity:
 the degree to which inferences are warranted from the observed
persons, settings, and cause and effect operations included in a
study to the constructs that these instances might represent.*
External validity:
 the validity of inferences about whether the causal relationship
holds over variations in persons, settings, treatment variables, and
measurement variables.*
*W.R. Shadish, T.D. Cook, and D.T. Campbell (2002) Experimental and Quasi-Experimental Designs for Generalized Causal Inference,
Houghton Mifflin Company.
ICT
4
Statistical, sampling-based generalization
 The statistician’s traditional two-step ideal of
 the random selection of units for enhancing generalization; and
 the random assignment of those units to different treatments for promoting
causal inference;
 is often advocated as the gold standard for empirical studies.
However, this model is of limited utility for generalized causal
inference in empirical SE because
 it assumes that random selection and its goals do not conflict with random
assignment and its goals;
 it is rarely relevant for making generalizations about systems, tasks,
settings, treatments and outcome variables;
 ethical, political, logistical, and economical constraints often limit random
selection to less meaningful populations.
ICT
5
The “painful” problem of induction
 Hume’s truism:
 In past experience, all tests have confirmed
Theory 1.
 Therefore, the next test will confirm Theory 1
or all tests will confirm Theory 1.
“… induction or generalization is never fully justified logically.
Whereas the problems of internal validity are solvable within the limits
of the logic of probability of statistics, the problems of external validity
are not logically solvable in any neat, conclusive way. Generalization
always turns out to involve extrapolation into a realm not represented
in one’s sample. Such extrapolation is made by assuming one knows
the relevant laws.”*
*D.T. Campbell and J.C. Stanley (1963) Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Company, p. 17.
ICT
6
Yin’s conception of generalization*
theory
rival theory
Level-2 inference
(Analytical)
population
characteristics
case study
findings
experimental
findings
Level-1 inference
(Statistical)
sample
subjects
*R.K. Yin (2003) Case Study Research: Design and Methods, Third Edition, Sage Publications.
ICT
7
Lee and Baskerville’s framework*
Generalizing to
empirical
statements
Generalizing to
theoretical
statements
Generalizing
from
empirical
statements
EE
Generalizing
from data
to description
ET
Generalizing
from description
to theory
Generalizing
from
theoretical
statements
TE
Generalizing
from theory
to description
TT
Generalizing
from concepts
to theory
*A.S. Lee and R.L. Baskerville (2003) Generalizing Generalizability in Information Systems Research, Information Systems Research, 14(3):221-243.
ICT
8
Shadish, Cook, and Campbell*
Five principles of generalized causal inference
 Surface similarity: judging the apparent similarities between what was
studied and the targets of generalization.
 Ruling out irrelevancy: identifying those attributes of persons, settings,
treatments, and outcome measures that are irrelevant because they
do not change a generalization.
 Making discriminations: making discriminations that limit
generalization (e.g., from the lab to the field).
 Interpolation and extrapolation: interpolating to unsampled values
within the range of the sampled persons, settings, treatments, and
outcomes and by extrapolating beyond the sampled range.
 Causal explanation: developing and testing explanatory theories about
the target of generalization.
*W.R. Shadish, T.D. Cook, and D.T. Campbell (2002) Experimental and Quasi-Experimental Designs for Generalized Causal Inference,
Houghton Mifflin Company.
ICT
9
Summary
 Formal sampling-based methods are of limited use for generalizing from
empirical SE studies.
 specifically so for tasks, settings, treatments, and outcome measures
 Additionally, there’s a dilemma between scientific validity (complying with
Hume’s truism) and practical impact (applying a theory in a new organizational
setting).
 Although we should advocate the two-step model of random sampling
followed by random assignment when it is feasible, we cannot advocate it as
the model for generalized causal inference in SE.
 So, SE researchers must use other concepts and methods to explore
generalization from empirical SE studies.
 In fact, most SE researchers routinely make such generalizations without using
formal sampling theory.
 In the rest of this session we will attempt to make explicit the concepts and
methods used in such work.
 We turn to examples of such alternative methods now …
ICT
10
Mini-group and plenary discussions
 Form mini-groups with three persons – without leaving
your chairs (first three, next three, etc.)
 Discuss the following two questions in the mini-groups for
~10 minutes:
 How do you generalize the results from YOUR studies?
 How can you improve the validity of these generalizations?
 Plenary discussion based on viewpoints from the groups
ICT
11