Chapter10: Validity and Reliability of Measurement

Download Report

Transcript Chapter10: Validity and Reliability of Measurement

Chapter 10

Validity and Reliability

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Chapter Overview

• The difference between internal and external validity. • Participant blinding and how it is utilized in experimental design.

• Different research experimental designs.

• The concept of objective measurement.

• A key to quantitative inquiry is unbiased and objective measurement of the dependent variables.

• Validity is an inherent principle in research design that has many components. • Reliability and agreement of measures are key components of validity.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Types of Quantitative Data

Measures may be obtained via: – instrumented devices – – – clinician measurement clinician observation patient self-report •

Measurement data can usually be classified into one of three types:

1.

Categorical 2.

Ordinal 3.

Continuous Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Categorical, Continuous, and Ordinal Data

Categorical or nominal data involve a finite number of classifications for observations.

• • – – A numeric value must be assigned to each category.

The order of numbers assigned to each category is inconsequential.

Continuous data are measured on a scale that can continuously be broken down into smaller and smaller increments. Ordinal data use categories. – The order of the numeric classification is of consequence.

– Example: Likert scales, in which a numeric value is assigned to each possible response.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity: Internal Validity

• Validity may be discussed in relation to the structure of the overall design of an experiment, the intervention assessed in an experiment, or the measurements performed in an experiment. • Internal validity refers to the validity of a study’s experimental design.

– – If an experiment can conclusively demonstrate that the independent variable has a definite effect on the dependent variable, then the study is internally valid. If other factors influence the dependent variable and these factors are not controlled for in the experimental design, then the study’s internal validity may be questioned.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity: Extraneous Factors

• Internal validity should be thought of as along a continuum rather than as a dichotomous property. • In laboratory experiments, it is easier to control for confounding factors—and thus to enhance internal validity—than it is in clinical trials.

Confounding variables are extraneous factors that may result in false relationships.

• Extraneous factors must be either controlled or quantified.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity: Bias

• • • • • Potential threats to internal validity often involve some sort of bias. Bias may be inherent to either the subjects in the study or the experimenters themselves. Selection bias: The characteristics that subjects have before they enroll in a study (e.g., age, maturation, sex, medical history, injury or illness severity, and motivation) may ultimately influence the results of the study.

Delimitations: Decisions that investigators make to improve the internal validity of their studies.

Make sure that the subjects in different groups have similar characteristics, by either randomly assigning subjects to treatment groups or utilizing some type of matching procedure.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity: Blinding

• Blinding is important to the internal validity of a subject. There are three entities that may be blinded in a study: 1.

2.

The subjects may be blinded to whether they are receiving an experimental treatment or a control treatment.

Members of the experimental team who are performing outcome measures should be blinded to the group assignment of individual subjects and the values of previous measurements for individual subjects.

3.

Clinicians who are treating patients in clinical trials should be blinded to the group assignments of individual subjects.

• If one of these entities is blinded, a study is referred to as being “single-blinded”; if two entities are blinded, a study is “double-blinded”; and if all three are blinded, a study is “triple-blinded.” Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity: External Validity

External validity relates to the degree to which the results of a study are generalizable to the real world. • The more tightly controlled a study is in terms of: – subject selection – – administration of interventions control of confounding factors the less generalizable the study results are to the general population. • Ecological validity is an important issue in terms of translating treatments from controlled laboratory studies to typical clinical practice settings. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity of Measures

Face validity:

• Refers to the property of whether a specific measure actually assesses what it is designed to measure. • Is an important issue in the development of “functional tests” for patients in the rehabilitation sciences. • Is determined subjectively and most often by expert opinion.

Content validity:

• Refers to the amount that a particular measure represents all facets of the construct it is supposed to measure. • Is similar to face validity but is more scientifically rigorous. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity of Measures

Accuracy:

• Is defined as the closeness of a measured value to the true value of what is being assessed.

• Should not be confused with precision of measurement.

Concurrent validity:

• Refers to how well one measure is correlated with an existing gold standard measure. • Is an important property to be established for new measures aiming to assess the same properties as an existing test. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Validity of Measures

Construct validity:

• • Stems from psychology but is applicable to other areas of study, such as the health sciences.

Convergent validity:

• Is the measurement property demonstrating whether existing measures of the same construct.

Discriminative validity:

• Is indicative of a given measure’s lack of correlation, or divergence, from existing measures that it should not be related to.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Reliability, Agreement, and Precision of Measures

• • • • • Reliability refers to the consistency of a specific measurement. Intratester reliability is the ability of the same tester to produce consistent, repeated measures of a test (also known as intrarater reliability and test–retest reliability). Intertester reliability is the ability of different testers to produce consistent repeated measures of a test (also called interrater reliability). Estimates of reliability for measures of continuous data are often reported as intraclass correlation coefficients (ICCs). ICCs are reported on a scale of 0 to 1.

The ICC is lower when systematic error is present.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Reliability, Agreement, and Precision of Measures (continued)

Pearson’s r assesses the association between two continuous measures across a sample of subjects. • If as one measure increases in value the second measure also increases incrementally, then Pearson’s r will approach 1.

• Pearson’s r does not assess for systematic error.

• Pearson’s r indicates that scores on the two measures are highly correlated.

• It will not show that the scores of the two measures are systematically diverging from each other. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Reliability, Agreement, and Precision of Measures (continued)

• The total variance can be due to true variance (the variability between subjects) and error variance (the difference between the true score and the observed score). • • Sources of error include error or biological variability by the subject, or error by the tester or instrumentation used to take the measure. Reliability may be thought of as the proportion of total variability (s T ²) that stems from between-subject variability (s t ²). The remainder of the variability is thus attributable to error (s e ²). This may be expressed with this formula:

s

T ² = ___s t ² __ s t ² + s e ² Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Reliability, Agreement, and Precision of Measures (continued)

• • The error term can be further divided into systematic error (

s

se ² ) and random error (

s

re ² ).

Systematic error may include constant error and bias. • Constant error affects all measures in the same manner, whereas bias affects certain types of scores in specific ways .

• The reliability formula can thus be expressed more robustly as:

s

T ² = ___s t ² __

s

t ² + s se ² + s re ² Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

ANOVAs

• The calculation of the ICC (intraclass correlation coefficient) stems from the parsing of the variability contributions from an analysis of variance (ANOVA). • There are several types of ANOVAs, but ICC estimates are calculated specifically from the single within factor ANOVA model (also called a “repeated measures ANOVA”).

• This analysis is performed to determine if two (or more) sets of measurements are significantly different from each other. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Estimating the Precision of Measures

Precision of measurement: how confident one is in the reproducibility of a measure. • Precision is reported as the standard error of measurement (SEM) in the unit of measure. • Precision takes into account the ICC of the measure as well as the standard deviation(s) of the data set. • The formula is: SEM= s √ 1–ICC Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Limits of Agreement

• Bland and Altman recommend that the limits of agreement (LOA) be calculated when two measurement techniques (or two raters) are being compared to each other. • This technique compares the absolute differences between two measurement techniques and specifically looks for systematic error. • If the subject difference is zero, the two techniques are identical. The LOA represent a 95% confidence interval of the difference between the two measures.

• • The LOA for the entire data set is computed as:

LOA

  

x

1

n

x

2   1 .

96  

diff

In addition to calculating the LOA, a Bland-Altman plot should also be constructed and analyzed when comparing two measurement techniques. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Agreement

Agreement: Estimates of the consistency or reproducibility of categorical data. • Intrarater and interrater agreement are defined the same as with reliability measures. • Estimates of agreement are reported with the kappa statistic, which also ranges from 0 to 1 with 1 indicating perfect agreement.

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

Chapter Summary and Key Points

• • • • • • • • A key to quantitative inquiry is unbiased and objective measurement of the dependent variables. Validity is an inherent principle in research design; this important concept has many components. Reliability and agreement of measures are key components of validity.

To ensure objective results, the design of an experiment to test the hypothesis must be done in an unbiased way.

Blinding affects and is pertinent to the internal validity of a study.

Agreement is reported with the kappa statistic, which ranges from 0 to 1.

The randomized pretest–posttest design is the gold standard for most experiments.

Estimates of the consistency or reproducibility of categorical data are called agreement. Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins