Educational Research
Download
Report
Transcript Educational Research
Educational Research
Chapter 6
Selecting Measuring Instruments
Gay, Mills, and Airasian
10th Edition
Topics Discussed in this Chapter
Data collection
Measuring instruments
Technical issues
Terminology
Interpreting data
Types of instruments
Validity
Reliability
Selection of a test
Data Collection
Scientific inquiry requires the collection,
analysis, and interpretation of data
Data – the pieces of information that are
collected to examine the research topic
Issues related to the collection of this
information are the focus of this chapter
Data Collection
Terminology related to data
Constructs – abstractions that cannot be
observed directly but are helpful when
trying to explain behavior
Intelligence
Teacher effectiveness
Self concept
Data Collection
Data terminology (continued)
Operational definition – the ways by which
constructs are observed and measured
Weschler IQ test
Virgilio Teacher Effectiveness Inventory
Tennessee Self-Concept Scale
Variable – a construct that has been
operationalized and has two or more
values
Data Collection
Measurement scales
Nominal – categories
Ordinal – ordered categories
E.g., Rank in class, order of finish, etc.
Interval – equal intervals on a scale
E.g., Gender, ethnicity, etc.
E.g., Test scores, attitude scores, etc.
Ratio – absolute zero
Same as all of the others, but has a zero point on the
scale.
E.g., Time, height, weight, etc.
Also, percent correct
Data Collection
Types of variables
Categorical or quantitative
Categorical variables reflect nominal scales and
measure the presence of different qualities
(e.g., gender, ethnicity, etc.)
Quantitative variables reflect ordinal, interval,
or ratio scales and measure different quantities
of a variable (e.g., test scores, self-esteem
scores, etc.)
Data Collection
Types of variables
Independent or dependent
Independent variables are purported causes
Dependent variables are purported effects
Two instructional strategies, co-operative groups and
traditional lectures, were used during a three week social
studies unit. Students’ exam scores were analyzed for
differences between the groups.
The independent variable is the instructional approach (of
which there are two levels)
The dependent variable is the students’ achievement
Measurement Instruments
Important terms
Instrument – a tool used to collect data
Test – a formal, systematic procedure for
gathering information
Assessment – the general process of
collecting, synthesizing, and interpreting
information
Measurement – the process of quantifying
or scoring a subject’s performance
Measurement Instruments
Important terms (continued)
Cognitive tests – examining subjects’
thoughts and thought processes
Affective tests – examining subjects’
feelings, interests, attitudes, beliefs, etc.
Standardized tests – tests that are
administered, scored, and interpreted in a
consistent manner
Measurement Instruments
Important terms (continued)
Selected response item format – respondents
select answers from a set of alternatives
Multiple choice
True-false
Matching
Supply response item format – respondents
construct answers
Short answer
Completion
Essay
Measurement Instruments
Important terms (continued)
Individual tests – tests administered on an
individual basis
Group tests – tests administered to a
group of subjects at the same time
Performance assessments – assessments
that focus on processes or products that
have been created
Measurement Instruments
Interpreting data
Raw scores – the actual score made on a
test
Standard scores – statistical
transformations of raw scores
Percentiles (0.00 – 99.9)
Stanines (1 – 9)
Normal Curve Equivalents (0.00 – 99.99)
Measurement Instruments
Interpreting data (continued)
Norm-referenced – scores are interpreted
relative to the scores of others taking the
test
Criterion-referenced – scores are
interpreted relative to a predetermined
level of performance
Self-referenced – scores are interpreted
relative to changes over time
Measurement Instruments
Types of instruments
Cognitive – measuring intellectual processes such
as thinking, memorizing, problem solving,
analyzing, or reasoning (e.g., most classroom
based tests that require thinking).
Achievement – measuring what students already
know (e.g., most tests that require recalling info).
Aptitude – measuring general mental ability,
usually for predicting future performance (e.g., IQ
testing)
Measurement Instruments
Types of instruments (continued)
Affective – assessing individuals’ feelings,
values, attitudes, beliefs, etc.
Typical affective characteristics of interest
Values – deeply held beliefs about ideas, persons, or
objects
Attitudes – dispositions that are favorable or unfavorable
toward things
Interests – inclinations to seek out or participate in
particular activities, objects, ideas, etc.
Personality – characteristics that represent a person’s
typical behaviors
Measurement Instruments
Types of instruments (continued)
Affective (continued)
Scales used for responding to items on affective tests
Likert
Positive or negative statements to which subjects
respond on scales such as strongly disagree, disagree,
neutral, agree, or strongly agree
Semantic differential
Bipolar adjectives (i.e., two opposite adjectives) with a
scale between each adjective
Dislike: ___ ___ ___ ___ ___ :Like
Rating scales – rankings based on how a subject would
rate the trait of interest (always do _ _ _ _ never do)
There are other types as well.
Measurement Instruments
Issues for cognitive, aptitude, or affective
tests
Problems inherent in the use of self-report
measures
Bias – distortions of a respondent’s performance or
responses based on ethnicity, race, gender, language,
etc.
Responses to affective test items
Socially acceptable responses
Accuracy of responses
Response sets (respond similar way for each item)
Alternatives include the use of projective tests
Technical Issues
Two concerns
Validity
Reliability
Technical Issues
Validity – extent to which
interpretations made from a test score
are appropriate
Characteristics
The most important technical characteristic
Situation specific
Does not refer to the instrument but to the
interpretations of scores on the instrument
Best thought of in terms of degree
Technical Issues
Validity (continued)
Four types
Content – to what extent does the test
measure what it is supposed to measure
Item validity (are test items relevant to topic)
Sampling validity (do the items sample everything
needed to know)
Determined by expert judgment
Technical Issues
Validity (continued)
Criterion-related
Predictive – to what extent does the test predict a
future performance
Concurrent - to what extent does the test predict a
performance measured at the same time
Estimated by correlations between two tests
Construct – the extent to which a test
measures the construct it represents
Underlying difficulty defining constructs
Estimated in many ways
Technical Issues
Validity (continued)
Consequential – to what extent are the
consequences that occur from the test harmful
E.g., would interpreting an English-only test score from a
bilingual child possibly hurt the child?
Estimated by empirical and expert judgment
Factors affecting validity
Unclear test directions
Confusing and ambiguous test items
Vocabulary that is too difficult for test takers
Technical Issues
Factors affecting validity (continued)
Overly difficult and complex sentence
structure
Inconsistent and subjective scoring
Untaught items
Failure to follow standardized
administration procedures
Cheating by the participants or someone
teaching to the test items
Technical Issues
Reliability – the degree to which a test
consistently measures whatever it is
measuring
Characteristics
Expressed as a coefficient ranging from 0 to 1
A necessary but not sufficient characteristic of
a test
Technical Issues
Reliability (continued)
Six reliability coefficients
Stability – consistency over time with the same
instrument
Test – retest
Estimated by a correlation between the two
administrations of the same test
Equivalence – consistency with two parallel
tests administered at the same time
Parallel forms
Estimated by a correlation between the parallel tests
Technical Issues
Reliability (continued)
Six reliability coefficients (continued)
Equivalence and stability – consistency over
time with parallel forms of the test
Combines attributes of stability and equivalence
Estimated by a correlation between the parallel forms
Internal consistency – artificially splitting the
test into halves
Several coefficients – split halves, KR 20, KR 21, Cronbach
alpha
All coefficients provide estimates ranging from 0 to 1
Technical Issues
Reliability (continued)
Six reliability coefficients
Scorer/rater – consistency of observations
between raters
Inter-judge – two observers
Intra-judge – one judge over two occasions
Estimated by percent agreement between
observations
Technical Issues
Reliability (continued)
Six reliability coefficients (continued)
Standard error of measurement (SEM) – an estimate of
how much difference there is between a person’s
obtained score and his or her true score
Function of the variation of the test and the reliability
coefficient (e.g., KR 20, Cronbach alpha, etc.)
Estimated by specifying an interval rather than a point
estimate of a person’s score
SEM provides information that would give you confidence
interval scores.
Selection of a Test
Sources of test information
Mental Measurement Yearbooks (MMY)
The reviews in MMY are most easily accessed through
your university library and the services to which they
subscribe (e.g., EBSCO)
Provides factual information on all known tests
Provides objective test reviews
Comprehensive bibliography for specific tests
Indices: titles, acronyms, subject, publishers, developers
Buros Institute
Selection of a Test
Sources (continued)
Tests in Print
Tests in Print is a subsidiary of the Buros Institute
The reviews in it are most easily accessed through your
university library and the services to which they
subscribe (e.g., EBSCO)
Bibliography of all known commercially produced tests
currently available
Very useful to determine availability
Tests in Print
Selection of a Test
Sources (continued)
ETS Test Collection
Published and unpublished tests
Includes test title, author, publication date, target
population, publisher, and description of purpose
Annotated bibliographies on achievement, aptitude,
attitude and interests, personality, sensory motor, special
populations, vocational/occupational, and miscellaneous
ETS Test Collection
Selection of a Test
Sources (continued)
Professional journals
Test publishers and distributors
Issues to consider when selecting tests
Psychometric properties
Validity
Reliability
Length of test
Scoring and score interpretation
Selection of a Test
Issues to consider when selecting tests
Non-psychometric issues
Cost
Administrative time
Objections to content by parents or others
Duplication of testing
Selection of a Test
Designing your own tests
Get help from others with experience in
developing tests
Item writing guidelines
Avoid ambiguous and confusing wording and sentence
structure
Use appropriate vocabulary
Write items that have only one correct answer
Give information about the nature of the desired answer
Do not provide clues to the correct answer
Selection of a Test
Test administration guidelines
Plan ahead
Be certain that there is consistency across
testing sessions
Be familiar with any and all procedures
necessary to administer a test