Reliability & Validity - Monroe County Community College

Download Report

Transcript Reliability & Validity - Monroe County Community College

Jamie DeLeeuw, Ph.D.

5/7/13

Reliability

Consistency of measurement. The measure itself is dependable.

***A measure must be reliable to be valid!***  High reliability = greater consistency = lower randomness (error)  Weight scale   Low reliability = less consistency = more error Error: Can come from observer, the way an item’s phrased, time of day, etc.

  0-1.0 scale Solution? Measure the construct multiple ways (helps cancel out error).

Types of Reliability

1.) Internal reliability/consistency: Consistency within a set of items intended to measure the same construct.  Multiple ?s to assess one construct    Highly reliable scale = people’s responses to the items are highly intercorrelated/consistent.

Cronbach’s alpha/KR-20, Split-half reliability .7 is acceptable for research; more lenient depending on grading purposes (construct of “knowledge”)         Animal Attitudes Scale (partial) Wild animals, such as mink and raccoon, should not be trapped and their skins made into fur coats.

There is nothing morally wrong with hunting wild animals for food.

I think people who object to raising animals for meat are too sentimental.

Much of the scientific research done with animals is unnecessary and cruel.

Basically, humans have the right to use animals as we see fit.

Continued research with animals will be necessary if we are to ever conquer diseases such as cancer, heart disease, and AIDS.

It is unethical to breed purebred dogs for pets when millions of dogs are killed in animal shelters each year.

The production of inexpensive meat, eggs, and dairy products justifies maintaining animals under crowded conditions.

Types of Reliability

 2.) Inter-rater reliability: Consistency in judgments across multiple raters.

 Olympics   % agreement Fleiss Kappa controls for chance agreement; > .6 is “good”  Rubrics are a step in the right direction  Writing vs. content  3.) Test-retest reliability: Consistency or stability of the test across time (multiple administrations).

  CJ performance Magazine quizzes = low reliability

Which type of reliability seems easiest to establish?

Types of Validity

Does it measure what it’s supposed to measure? Accuracy of the inferences, interpretations, or actions made on the basis of test scores (Messick, 1989).

Construct: The accuracy w/ which a measure reflects the underlying construct.  *Content: Whether items/questions represent the construct.

Face: Does the scale look like it measures what it’s supposed to?

 Criterion: Examines how well a measure correlates with a standard of comparison (criterion) or predicted behavior.

  Predictive: The extent to which a measure correlates with an individual’s future behavior.

Concurrent: ……….. current behavior.

 Discriminant: The degree to which a scale does NOT measure unintended qualities.

Construct Validity

 The accuracy w/ which a measure reflects the underlying construct (e.g. personality, love, need for cognition)  Indicates a match between conceptual and operational definitions    Researchers try to figure out critical components of the conceptual definition and include them in the measure.

Many potential operational definitions per concept.

Ex.: Empathy, poverty, aggression  Most important type of validity for hypothesis testing  Other types of validity help establish construct validity.

Criterion Validity

 Examines how well a measure correlates with a standard of comparison (criterion) or predicted behavior.

 Concurrent and predictive  Ex: Does a measure of math ability predict how well a person will do in an engineering-based profession (predictive)?

 Ex: Does a depression scale correlate with behavioral observations of depressed individuals (concurrent)?  Ex: Does the self-esteem scale predict who will volunteer answers in class (concurrent)? Issue: Need to make sure the criterion is a good reflection of the construct!

Discriminant Validity

 Indicates that a scale does NOT correlate with other assessment devices presumed to measure conceptually dissimilar constructs.

 Self-esteem vs. narcissism (r = .26)  Also helps alleviate the 3 rd variable issue  Kids with bigger feet (shoe size) have stronger reading skills.

 If age isn’t correlated with either…

Content Validity

Judgment by experts of the degree to which items, tasks, or questions on a test adequately represent the construct.

Ex: Grief

Matches study guide? Course objectives/outcomes?

 Includes “face validity”   Construct Depression Optimism Item Do you often feel sad or blue?

Do you generally expect good things to happen?

Classic Representation of Reliability and Validity

Not Reliable Not Valid Reliable Not Valid Reliable Valid

Must be reliable to be valid!

Culture and Validity

 Important questions:  Does the construct exist in all cultures?  Are items interpreted the same in each culture?

 Language, translation  Essay vs. MC

‘I’d rather vacation at a popular beach than an isolated cabin

in the woods’ -- SES

Challenges to Validity (in Research)

 Response sets  Acquiescence: tendency to say ‘yes’   Dealt with by using positive and negatively worded items “I tend to be alert”, “I usually don’t feel very energetic”  Social desirability: tendency to portray self positively  Dealt with by   Making social desirability less salient (phrasing, experiment) Measure and correct for social desirability