Educational Research

Download Report

Transcript Educational Research

Educational Research
Chapter 6
Selecting Measuring Instruments
Gay, Mills, and Airasian
10th Edition
Topics Discussed in this Chapter


Data collection
Measuring instruments




Technical issues



Terminology
Interpreting data
Types of instruments
Validity
Reliability
Selection of a test
Data Collection

Scientific inquiry requires the collection,
analysis, and interpretation of data


Data – the pieces of information that are
collected to examine the research topic
Issues related to the collection of this
information are the focus of this chapter
Data Collection

Terminology related to data

Constructs – abstractions that cannot be
observed directly but are helpful when
trying to explain behavior



Intelligence
Teacher effectiveness
Self concept
Data Collection

Data terminology (continued)

Operational definition – the ways by which
constructs are observed and measured




Weschler IQ test
Virgilio Teacher Effectiveness Inventory
Tennessee Self-Concept Scale
Variable – a construct that has been
operationalized and has two or more
values
Data Collection

Measurement scales

Nominal – categories


Ordinal – ordered categories


E.g., Rank in class, order of finish, etc.
Interval – equal intervals on a scale


E.g., Gender, ethnicity, etc.
E.g., Test scores, attitude scores, etc.
Ratio – absolute zero



Same as all of the others, but has a zero point on the
scale.
E.g., Time, height, weight, etc.
Also, percent correct
Data Collection

Types of variables

Categorical or quantitative


Categorical variables reflect nominal scales and
measure the presence of different qualities
(e.g., gender, ethnicity, etc.)
Quantitative variables reflect ordinal, interval,
or ratio scales and measure different quantities
of a variable (e.g., test scores, self-esteem
scores, etc.)
Data Collection

Types of variables

Independent or dependent



Independent variables are purported causes
Dependent variables are purported effects
Two instructional strategies, co-operative groups and
traditional lectures, were used during a three week social
studies unit. Students’ exam scores were analyzed for
differences between the groups.


The independent variable is the instructional approach (of
which there are two levels)
The dependent variable is the students’ achievement
Measurement Instruments

Important terms




Instrument – a tool used to collect data
Test – a formal, systematic procedure for
gathering information
Assessment – the general process of
collecting, synthesizing, and interpreting
information
Measurement – the process of quantifying
or scoring a subject’s performance
Measurement Instruments

Important terms (continued)



Cognitive tests – examining subjects’
thoughts and thought processes
Affective tests – examining subjects’
feelings, interests, attitudes, beliefs, etc.
Standardized tests – tests that are
administered, scored, and interpreted in a
consistent manner
Measurement Instruments

Important terms (continued)

Selected response item format – respondents
select answers from a set of alternatives




Multiple choice
True-false
Matching
Supply response item format – respondents
construct answers



Short answer
Completion
Essay
Measurement Instruments

Important terms (continued)



Individual tests – tests administered on an
individual basis
Group tests – tests administered to a
group of subjects at the same time
Performance assessments – assessments
that focus on processes or products that
have been created
Measurement Instruments

Interpreting data


Raw scores – the actual score made on a
test
Standard scores – statistical
transformations of raw scores



Percentiles (0.00 – 99.9)
Stanines (1 – 9)
Normal Curve Equivalents (0.00 – 99.99)
Measurement Instruments

Interpreting data (continued)



Norm-referenced – scores are interpreted
relative to the scores of others taking the
test
Criterion-referenced – scores are
interpreted relative to a predetermined
level of performance
Self-referenced – scores are interpreted
relative to changes over time
Measurement Instruments

Types of instruments



Cognitive – measuring intellectual processes such
as thinking, memorizing, problem solving,
analyzing, or reasoning (e.g., most classroom
based tests that require thinking).
Achievement – measuring what students already
know (e.g., most tests that require recalling info).
Aptitude – measuring general mental ability,
usually for predicting future performance (e.g., IQ
testing)
Measurement Instruments

Types of instruments (continued)

Affective – assessing individuals’ feelings,
values, attitudes, beliefs, etc.

Typical affective characteristics of interest




Values – deeply held beliefs about ideas, persons, or
objects
Attitudes – dispositions that are favorable or unfavorable
toward things
Interests – inclinations to seek out or participate in
particular activities, objects, ideas, etc.
Personality – characteristics that represent a person’s
typical behaviors
Measurement Instruments

Types of instruments (continued)

Affective (continued)

Scales used for responding to items on affective tests




Likert
 Positive or negative statements to which subjects
respond on scales such as strongly disagree, disagree,
neutral, agree, or strongly agree
Semantic differential
 Bipolar adjectives (i.e., two opposite adjectives) with a
scale between each adjective
 Dislike: ___ ___ ___ ___ ___ :Like
Rating scales – rankings based on how a subject would
rate the trait of interest (always do _ _ _ _ never do)
There are other types as well.
Measurement Instruments

Issues for cognitive, aptitude, or affective
tests

Problems inherent in the use of self-report
measures


Bias – distortions of a respondent’s performance or
responses based on ethnicity, race, gender, language,
etc.
Responses to affective test items




Socially acceptable responses
Accuracy of responses
Response sets (respond similar way for each item)
Alternatives include the use of projective tests
Technical Issues

Two concerns


Validity
Reliability
Technical Issues

Validity – extent to which
interpretations made from a test score
are appropriate

Characteristics




The most important technical characteristic
Situation specific
Does not refer to the instrument but to the
interpretations of scores on the instrument
Best thought of in terms of degree
Technical Issues

Validity (continued)

Four types

Content – to what extent does the test
measure what it is supposed to measure



Item validity (are test items relevant to topic)
Sampling validity (do the items sample everything
needed to know)
Determined by expert judgment
Technical Issues

Validity (continued)

Criterion-related




Predictive – to what extent does the test predict a
future performance
Concurrent - to what extent does the test predict a
performance measured at the same time
Estimated by correlations between two tests
Construct – the extent to which a test
measures the construct it represents


Underlying difficulty defining constructs
Estimated in many ways
Technical Issues

Validity (continued)

Consequential – to what extent are the
consequences that occur from the test harmful



E.g., would interpreting an English-only test score from a
bilingual child possibly hurt the child?
Estimated by empirical and expert judgment
Factors affecting validity



Unclear test directions
Confusing and ambiguous test items
Vocabulary that is too difficult for test takers
Technical Issues

Factors affecting validity (continued)





Overly difficult and complex sentence
structure
Inconsistent and subjective scoring
Untaught items
Failure to follow standardized
administration procedures
Cheating by the participants or someone
teaching to the test items
Technical Issues

Reliability – the degree to which a test
consistently measures whatever it is
measuring

Characteristics


Expressed as a coefficient ranging from 0 to 1
A necessary but not sufficient characteristic of
a test
Technical Issues

Reliability (continued)

Six reliability coefficients

Stability – consistency over time with the same
instrument



Test – retest
Estimated by a correlation between the two
administrations of the same test
Equivalence – consistency with two parallel
tests administered at the same time


Parallel forms
Estimated by a correlation between the parallel tests
Technical Issues

Reliability (continued)
 Six reliability coefficients (continued)

Equivalence and stability – consistency over
time with parallel forms of the test



Combines attributes of stability and equivalence
Estimated by a correlation between the parallel forms
Internal consistency – artificially splitting the
test into halves


Several coefficients – split halves, KR 20, KR 21, Cronbach
alpha
All coefficients provide estimates ranging from 0 to 1
Technical Issues

Reliability (continued)

Six reliability coefficients

Scorer/rater – consistency of observations
between raters



Inter-judge – two observers
Intra-judge – one judge over two occasions
Estimated by percent agreement between
observations
Technical Issues

Reliability (continued)

Six reliability coefficients (continued)

Standard error of measurement (SEM) – an estimate of
how much difference there is between a person’s
obtained score and his or her true score



Function of the variation of the test and the reliability
coefficient (e.g., KR 20, Cronbach alpha, etc.)
Estimated by specifying an interval rather than a point
estimate of a person’s score
SEM provides information that would give you confidence
interval scores.
Selection of a Test

Sources of test information

Mental Measurement Yearbooks (MMY)






The reviews in MMY are most easily accessed through
your university library and the services to which they
subscribe (e.g., EBSCO)
Provides factual information on all known tests
Provides objective test reviews
Comprehensive bibliography for specific tests
Indices: titles, acronyms, subject, publishers, developers
Buros Institute
Selection of a Test

Sources (continued)

Tests in Print





Tests in Print is a subsidiary of the Buros Institute
The reviews in it are most easily accessed through your
university library and the services to which they
subscribe (e.g., EBSCO)
Bibliography of all known commercially produced tests
currently available
Very useful to determine availability
Tests in Print
Selection of a Test

Sources (continued)

ETS Test Collection




Published and unpublished tests
Includes test title, author, publication date, target
population, publisher, and description of purpose
Annotated bibliographies on achievement, aptitude,
attitude and interests, personality, sensory motor, special
populations, vocational/occupational, and miscellaneous
ETS Test Collection
Selection of a Test

Sources (continued)



Professional journals
Test publishers and distributors
Issues to consider when selecting tests

Psychometric properties




Validity
Reliability
Length of test
Scoring and score interpretation
Selection of a Test

Issues to consider when selecting tests

Non-psychometric issues




Cost
Administrative time
Objections to content by parents or others
Duplication of testing
Selection of a Test

Designing your own tests


Get help from others with experience in
developing tests
Item writing guidelines





Avoid ambiguous and confusing wording and sentence
structure
Use appropriate vocabulary
Write items that have only one correct answer
Give information about the nature of the desired answer
Do not provide clues to the correct answer
Selection of a Test

Test administration guidelines



Plan ahead
Be certain that there is consistency across
testing sessions
Be familiar with any and all procedures
necessary to administer a test