Validity of Test Score Interpretation and Use

Download Report

Transcript Validity of Test Score Interpretation and Use

Understanding the
Construct to be Assessed
Stephen N. Elliott, PhD
Learning Science Institute &
Dept. of Special Education
Vanderbilt University
Elliott / October 2007
1
Construct:
Dictionary Definition

To form by assembling parts; build.

To create by systematically arranging
ideas or expressions.

Something, especially a concept, that
is synthesized or constructed from
simple elements.
Elliott / October 2007
2
Basic Premises

Understanding the construct to be assessed
is critical to developing a plan to validate the
resulting test score interpretation.

Understanding what is meant by the term
construct is important to facilitating
communication with test score users and
others interested in student achievement.
Elliott / October 2007
3
Constructs & Test
Score Validity: Some History

The term construct –logical or hypothetical – originated
in Betrand Russell’s 1929 maxim that wherever
possible, logical constructions are to be substituted for
inferred entities.

McCorquodale & Meehl (1948) distinguished
hypothetical constructs (unobservable, inferred entities)
from intervening variables (abstractions from
observations).

Since the 1954 Test Standards published by APA.
Construct validity was defined in the Standards as “the
degree to which the individual possesses some
hypothetical trait or quality [construct] presumed to be
reflected in the test performance.”
Elliott / October 2007
4
More History:
Construct as Attribute

The concept of validating a construct was more fully developed
by Cronbach & Meehl (1955) who referred to a “construct” as an
attribute. They went on to list construct validation procedures (a)
criterion-group differences, (b) factor analysis, (c) item analysis,
(d) experimental studies, and (e) studies of process.

Through work of Cronbach with contributions from Messick
(1980, 1989), the common view is one conception of validity
referred to as construct validity. Thus, the validation of a test
score can be taken to include every form of evidence that the
score to some acceptable extent measures a specified attribute –
quantifiable property of quality – of a respondent.
Elliott / October 2007
5
Nature of Attributes

Observable and unobservable

Achievements and aptitudes

Levels of Inference: Abstractive to existential
Thank goodness for test items that yield scores! Items
help defined the content from which we make
attributions. These attributions often take the form of a
test score interpretation.
Elliott / October 2007
6
Test Score Interpretation

The proposed interpretation refers to the construct or
concepts the test is intended to measure. Examples
of constructs are mathematics achievement,
performance as a computer technician, …. To support
test development, the proposed interpretation is
elaborated by describing its scope and extent and by
delineating the aspects of the construct that are to be
represented. The detailed description provides a
conceptual framework for the test, delineating the
knowledge, skills, abilities, …to be assessed.
(AERA, APA, & NCME, 1999, p. 9)
Elliott / October 2007
7
Our World:
Student Achievement




We are interested in understanding student achievement. That
is, the knowledge and skills students posses at a given point in
time in content domains of language arts, mathematics, and
science.
We gain insights into student achievement by observing the
amount or quantity of knowledge and skills students posses in
these defined content domains. This amount or quantity of the
measured attribute takes the form of a test score.
We attribute more knowledge or skills for samples of behavior
or work where students demonstrate “correct” responses to a
correspondingly larger number or more complex type of items.
Our interpretations about student attributes are situated within
broad academic content domains and framed by performance
level descriptors.
Elliott / October 2007
8
Construct Logic Simplified
Observed &
Inferred
Performances
on Item/Task
Test Score
Elliott / October 2007
Test Score
Interpretation
&
Abstracted
Attribution
9
Unified View of Validity

1985 Test Standards and Messick’s epic
chapter united all types of validity under
construct validity. As described by Messick,
“construct validity is …the unifying concept of
validity that integrates content and criterion
considerations into a common framework for
testing rational hypotheses about theoretically
relevant hypotheses.” (1989)
Elliott / October 2007
10
Information Sources for the
Constructs Assessed with AAs

State’s academic content standards,

State’s academic achievement standards, in
particular, the Performance Level Descriptors for each
content area,

Validity & alignment studies as reported in Alternate
Assessment Technical Manuals, and

Reports to consumers of the assessment results.
Elliott / October 2007
11
Sample Content Framework
Elliott / October 2007
12
Sample Performance
Level Descriptors
Elliott / October 2007
13
Sample Evidence Based
Support for Construct Claims
Elliott / October 2007
14
Another Sample of Evidence to
Support Construct Claims
Elliott / October 2007
15
More on Validity
& Test Score Interpretation
As we investigate the constructs measured by alternate
assessments, we are confronted with a number of issues
that affect the validity of the test score interpretation. For
example:
 Teachers’ support and prompting,
 Tests with items or tasks that are non-academic,
 Assessments that sample a limited portion of the
intended domain, and
 Item or task rubrics that score for more than
achievement.
Elliott / October 2007
16
Construct Underrepresentation &
Construct Irrelevant Variance
Elliott / October 2007
17
Understanding the construct
assessed is foundational
“A validity argument provides an overall
evaluation of the plausibility of the
proposed interpretations and uses of
test scores….To evaluate …a test
score interpretation, it is necessary to
be clear about what the interpretation
claims.”
(Kane, 2002)
Elliott / October 2007
18
Thanks & more

Key References





AERA, APA, & NCME (1999). Standards for educational and
psychological testing. Washington, DC: Authors.
Kane, M. (2002). Validating high-stakes testing programs.
Educational Measurement: Issues and Practices, 21 (1), 31-41.
McDonald, R.P. (1999). Test theory: A unified treatment.
Mahwah, NJ: LEA.
Messick, S. (1989). Meaning and values in test validation: The
science and ethics of assessment. Educational Researcher, Vol.
18 (2), 5-11.
Contact Information

[email protected]
Elliott / October 2007
19