GSSR Research Methodology and Methods of Social Inquiry www.socialinquiry.wordpress.com October 25, 2010 Assessing Measurement.

Download Report

Transcript GSSR Research Methodology and Methods of Social Inquiry www.socialinquiry.wordpress.com October 25, 2010 Assessing Measurement.

GSSR
Research Methodology and Methods of Social Inquiry
www.socialinquiry.wordpress.com
October 25, 2010
Assessing Measurement
The measure needs to be:
Valid
Reliable
Exhaustive
Mutually Exclusive
VALIDITY
Claims of having appropriately measured the DV and IVs
are valid;
Validity of measurement
Assuming that there is a relationship in this study, if we
claim causality, is the relationship causal?
Internal Validity of the causal argument
Generalizations from the results of our study to other units of
observations (e.g. persons) in other places and at other
times
External Validity of our conclusions
Validity of Measurement
An empirical measure is valid to the extent to which it
adequately captures the real meaning of the concept
under consideration
(how well it measures the concept it is intended to
measure)
Face validity
Content validity
Criterion-related Validity
Construct Validity
Face Validity
- look at the operationalization; assess whether "on its
face" it seems like a good translation of the construct
To improve the quality of face validity assessment, make it
more systematic.
Content Validity
- concerns the extent to which a measure represents all
facets of a concept;
- Identify clearly the components of the total content
‘domain’; then show that the items adequately represent
the components;
Ex: knowledge tests
- assumes a good detailed description of the content
domain, which may not always be so;
Criterion-related Validity I
Applies to measures (‘tests’) that should
indicate a person’s present/future standing
on a specific behavior (trait)
The behavior (trait) - criterion;
Validation is a matter of how well scores on
the measure correlate with the criterion of
interest.
Criterion-related Validity II
Predictive Validity
- assess how well the measure is able to predict something
it should theoretically be able to predict.
Concurrent Validity
- assess how well the measure is able to distinguish
between groups that it should theoretically be able to
distinguish between.
Construct Validity I
- based upon accumulation of research evidence (lit. rev)
- ‘construct’ – ‘concept’
Assumption: the meaning of any concept is implied by
statements of its theoretical relation to other concepts
Hence:
- Examine theory;
- Hypotheses about variables that should be related to
measure(s) of the concept;
- Hypotheses about variables that should NOT be related
to measure(s) of the concept;
- Gather ‘evidence’
Construct Validity II
Convergent Validity
- examine the degree to which the measure is similar to
(converges on) other measures that it theoretically
should be similar to.
Discriminant Validity
- examine the degree to which the measure is not similar to
(diverges from) other measures that it theoretically
should be not be similar to.
To estimate the degree to which any two measures are
related to each other one typically uses the correlation
coefficient.
http://www.socialresearchmethods.net/kb/constval.php - construct
validity
RELIABILITY
(consistency of measurement)
-deals with the quality of measurement
A measure is considered reliable if it would give the same
result over and over again
(Assumption: what we are measuring is not changing!)
True Score Theory
every measurement has two additive components: true
ability (or the true level) of the respondent on that
measure; PLUS random error.
- foundation of reliability theory:
A measure that has no random error (i.e., is all true score) is perfectly
reliable; a measure that has no true score (i.e., is all random error)
has zero reliability.
Assessing Reliability
Inter-coder reliability:
- check the degree to which different
interviewers/observers/raters/coders give consistent
estimates of the same phenomenon.
A. Nominal measure & raters are checking off which
category each observation falls in: calculate % of
agreement between raters.
Ex: One measure, with 3 categories; N = 100 observations,
rated by two raters.
On 86 of the 100 observations the raters checked the
same category (i.e., 86% inter-rater agreement)
B. If the measure = continuous: calculate the correlation
between the ratings of the two raters/observers.
Ex: rating the overall level of activity in a classroom on a 1to-7 scale.
Ask raters to give their rating at regular time intervals (e.g.,
every 60 seconds). The correlation between these
ratings would give you an estimate of the consistency
between the raters.
Test-retest reliability
- administer the same test to the same sample on two
different occasions.
- calculate the correlation btw repeated applications of the
measure through time
Problems:
- people remember answers;
- Real change in attitudes may occur
- First application of measure may have produced change
in the subject
Internal consistency reliability
Examines the consistency of responses across all items
(simultaneously) in a composite measure (uses a single
measurement instrument administered to a group of
people on one occasion to estimate reliability).
How consistent are the results for different items for the
same construct within the measure.
various stats. procedures;
Split-half reliability
- calculate the correlation btw. responses to subsets of
items from the same measure (apply scale to sample;
then divide scale in 2, randomly; reapply each half; make
correlation of results; the higher the better)
- when factors systematically influence the process of
measurement, or the concept we measure - systematic
measurement error
- when temporary, chance factors affect measurement –
random error
- its presence, extent and direction are unpredictable from
one question to the next, or from one respondent to the
next
See:
www.socialresearchmethods.net
Relation Validity – Reliability – Measurement Error
Systematic error affects distance from center;
Random error affects tightness of pattern AND distance
from center
Target metaphor
- A tight pattern, irrespective of its location, reflects
RELIABLE measure, because it’s consistent.
- How closely the shots cluster around the center indicates
VALIDITY
Appropriate Measurement
Valid
Reliabile
Exhaustive
It should exhaust the possibilities of what it is intended to
measure.
There must be sufficient categories so that virtually all units
of observations being classified will fit into one of the
categories.
Mutually exclusive
each observation fits one and only one of the scale values
(categories).