Transcript Chapter 1

CHAPTER 1
Assessment in Social and Educational
Contexts (Salvia, Ysseldyke & Bolt, 2012)
Dr. Julie Esparza Brown
SPED 512: Diagnostic Assessment
Winter 2013
Chapters 1, 11, 12, 13, and 14 are included in this
presentation
AGENDA – Week 3
• Questions for the Good of the Group
• Instruction and Lab Time: Continue
WJ-III
• Break
• Group activity to process Chapters 1,
3, 11, 12, and 14
• Powerpoint overview of Chapters 1, 3,
11, 12, and 14
Individualized Support
• Schools must provide support as a function of individual
student need
• To what extent is the current level of instruction working?
• How much instruction is needed?
• What kind of instruction is needed?
• Are additional supports necessary?
Assessment Defined
• Assessment is the process of collecting information
(data) for the purpose of making decisions about students
• E.g. what to teach, how to teach, whether the student is eligible for
special services
How Are Assessment Data Collected?
• Assessment extends beyond testing and may include:
• Record review
• Observations
• Tests
• Professional judgments
• Recollections
Why Care About Assessment?
• A direct link exists between assessment and the decisions
that we make. Sometimes these decisions are markedly
important.
• Thus, the procedures for gathering data are of interest to
many people – and rightfully so.
• Why might students, parents, and teachers care?
• The general public?
• Certification boards?
Common Themes Moving Forward
• Not all tests are created equal
• Differences in content, reliability, validity, and utility
• Assessment practices are dynamic
• Changes in the political, technological, and cultural landscape drive
a continuous process of revision
Common Themes Moving Forward
• The importance of assessment in education
• Educators are faced with difficult decisions
• Effective decision-making will require knowledge of
effective assessment
• Assessment can be intimidating, but significant
improvements have happened and continue to
happen
• More confidence in the technical adequacy of
instruments
• Improvements in the utility and relevance of
assessment practices
• MTSS framework
CHAPTER 11
Assessment of Academic Achievement with
Multiple-Skill Devices
Achievement Tests
• Achievement Tests
• Norm-referenced
• Allow for comparisons between students
• Criterion-referenced
• Allow for comparisons between individual students and a
skill benchmark.
• Why do we use achievement tests?
• Assist teachers in determining skills students do
and do not have
• Inform instruction
• Academic screening
• Progress evaluation
Classifying Achievement Tests
Diagnostic Achievement
High
Low
More efficient administration –
Comparisons between students can be
made but very little power in determining
strengths and weaknesses
Less efficient administration – Dense
content and numerous items allow
teachers to uncover specific strengths
and weaknesses
Number of students who can be tested
Low
Less efficient administration – Allows for
more qualitative information about the
student.
High
Efficient administration – Typically only
quantitative data are available
Considerations for Selecting a Test
• Four Factors
• Content validity
• What the test actually measures should match its intended
use
• Stimulus-response modes
• Students should not be hindered by the manner of test
administration or required response
• Standards used in state
• Relevant norms
• Does the student population being assessed match the
population from which the normative data were acquired?
Tests of Academic Achievement
• Peabody Individual Achievement Test (PIAT-R/NU)
• Wide Range Achievement Test 4 (WRAT4)
• Wechsler Individual Achievement Test 3 (WIAT-III)
Peabody Individual Achievement TestRevised/Normative Update (PIAT-R/NU)
• In general…
• Individually administered; norm-referenced for K-12 students
• Norm population
• Most recent update was completed in 1998
• Representative of each grade level
• No changes to test structure
PIAT-R/NU
Subtests
Mathematics: 100
multiple-choice items
assess students’
knowledge and
application of math
concepts and facts
Reading recognition:
100 multiple-choice
items require students
to match and name
letters and words
General information:
100 questions
presented orally.
Content areas include
social studies, science,
sports, and fine arts.
Reading
comprehension: 81
multiple-choice items
require students to
select an appropriate
answer following a
reading passage
Spelling: 100 items
ranging in difficulty
from kindergarten
(letter naming) to high
school (multiple-choice
following verbal
presentation)
Written expression:
Split into two levels.
Level 1 assesses prewriting skills and Level
II requires story writing
following a picture
prompt
PIAT-R/NU
• Scores
• For all but one subtest (written expression),
response to each item is pass/fail
• Raw scores converted into:
•
•
•
•
Standard scores
Percentile ranks
Normal curve equivalents
Stanines
• 3 composite scores
• Total reading
• Total test
• Written language
PIAT-R/NU
• Reliability and Validity
• Despite new norms, reliability and validity data are only available
for the original PIAT-R (1989)
• Previous reliability and validity data are likely outdated
• Outdated tests may not be relevant in the current educational context
Wide Range Achievement Test 4
(WRAT4)
• In general…
• Individually administered
• 15-45 minute test length depending on age (5-94 age range)
• Norm-referenced, but covers a limited sample of behaviors in 4
content areas
• Norm population
• Stratified across age, gender, ethnicity, geographic region, and
parental education
WRAT4
Subtests
Word Reading:
The student is
required to
name letters
and read words
Sentence
Comprehension:
The student is
shown sentences
and fills in missing
words
Spelling: The
student write
down words as
they are read
aloud
Math
Computation:
The student
solves basic
computation
problems
• Scores
• Raw scores converted to:
• Standard scores, confidence
intervals, percentiles, grade
equivalents, and stanines
• Reading composite available
• Reliability
• Internal consistency and
alternate-form data are sufficient
for screening purposes
• Validity
• Performance increases with age
• WRAT4 is linked to other tests
that have since been updated;
additional evidence is necessary
Wechsler Individual Achievement TestThird Edition (WIAT-III)
• General
• Diagnostic, norm-referenced achievement test
• Reading, mathematics, written expression, listening, and speaking
• Ages 4-19
• Norm Population
• Stratified sampling was used to sample within several common
demographic variables:
• Pre K – 12, age, race/ethnicity, sex, parent education, geographic region
WIAT-III
• Subtests and scores
• 16 subtests arranged into 7 domain composite scores and one total
achievement score (structure provided on next slide)
• Raw scores converted to:
• Standard scores, percentile ranks, normal curve equivalents, stanines,
age and grade equivalents, and growth scale value scores.
WIAT-III Subtests
Composite
Subtest
Basic Reading
Word Reading
Pseudoword Decoding
Reading Comprehension
and Fluency
Reading Comprehension
Oral Reading Fluency
Early Reading Skills
Mathematics
Math Problem Solving
Numerical Operations
Math Fluency
Math Fluency – (Addition, Subtraction, & Multiplication)
Written Expression
Alphabet Writing Fluency
Spelling
Sentence Composition
Essay Composition
Oral Expression
Listening Comprehension
Oral Expression
WIAT-III
• Reliability
• Adequate reliability evidence
• Split-half
• Test-retest
• Interrater agreement
• Validity
• Adequate validity evidence
•
•
•
•
Content
Construct
Criterion
Clinical Utility
• Stronger reliability and validity evidence increase
the relevance of information derived from the WIATIII
Getting the Most Out of an Achievement Test
• Helpful but not sufficient – most tests allow teachers to
find an appropriate starting point
• What is the nature of the behaviors being sampled by the
test?
• Need to seek out additional information concerning student
strengths and weaknesses
• Which items did the student excel on? Which did he or she struggle
with?
• Were there patterns of responding?
CHAPTER TWELVE
Using Diagnostic Reading Tests
Why Do We Assess Reading?
• Reading is fundamental to success in our society, and
therefore reading skill development should be closely
monitored
• Diagnostic tests can help to plan appropriate intervention
• Diagnostic tests an help determine a student’s continuing
need for special services
The Ways in Which Reading is Taught
• The effectiveness of different approaches is heavily
debated
• Whole-word vs. code-based approaches
• Over time, research has supported the importance of
phonemic awareness and phonics
Skills Assessed by Diagnostic
Approaches
• Oral Reading
• Rate of Reading
• Oral Reading Errors
• Teacher pronunciation/aid
• Hesitation
• Gross mispronunciation
• Partial mispronunciation
• Omission of a word
• Insertion
• Substitution
• Repitition
• Inversion
Skills Assessed by Diagnostic
Approaches (cont.)
• Reading Comprehension
• Literal comprehension
• Inferential comprehension
• Critical comprehension
• Affective comprehension
• Lexical comprehension
Skills Assessed by Diagnostic
Approaches (cont.)
• Word-Attack Skills (i.e., word analysis skills) – use of
letter-sound correspondence and sound blending to
identify words
• Word Recognition Skills – “sight vocabulary”
Diagnostic Reading Tests
• See Table 12.1
• Group Reading Assessment and Diagnostic Evaluation
(GRADE)
• DIBELS Next
• Test of Phonemic Awareness – 2 Plus (TOPA 2+)
GRADE (Williams, 2001)
• Pre-school to 12th grade
• 60 to 90 minutes
• Assesses pre-reading, reading readiness, vocabulary,
comprehension, and oral language
• Missing some important demographic information for
norm group, high total reliabilities (lower subscale
reliabilities), adequate information to support validity of
total score.
DIBELS Next
(Good and Kaminski, 2010)
• Kindergarten-6th grade
• Very brief administration (used for screening and
monitoring)
• First Sound Fluency, Letter Naming Fluency,
Phoneme Segmentation Fluency, Nonsense Word
Fluency, Oral Reading Fluency, and DAZE
(comprehension)
• Use of benchmark expectations or development of
local norms
• Multiple administrations necessary for making
important decisions
TOPA 2+ (Torgesen & Bryant, 2004)
• Ages 5 to 8
• Phonemic awareness and letter-sound correspondence
• Good norms description
• Reliability better for kindergarteners than for more
advanced students
• Adequate overall validity
CHAPTER 13
Using Diagnostic Mathematics Measures
Why Do We Assess Mathematics?
• Multiple-skill assessments provide broad levels of information,
but lack specificity when compared to diagnostic assessments
• More intensive assessment of mathematics helps educators:
• Assess the extent to which current instruction is working
• Plan individualized instruction
• Make informed eligibility decisions
Ways to Teach Mathematics
1960s: New Math; movement away from
traditional approaches to mathematics
instruction
< 1960: Emphasis on basic facts
and algorithms, deductive
reasoning, and proofs
1980s: Constructivist approach –
standards-based math. Students
construct knowledge with little or
no help from teachers
> 2000: Evidence supports explicit
and systematic instruction (most
similar to “traditional” approaches
to instruction).
Behaviors Sampled by Diagnostic
Mathematics Tests
• National Council of Teachers of Mathematics (NCTM)
• Content Standards
– Number and
operations
– Algebra
– Geometry
– Measurement
– Data analysis and
probability
• Process Standards
– Problem solving
– Reasoning and
proof
– Communication
– Connections
– Representation
Specific Diagnostic Math Tests
• Group Mathematics Assessment and Diagnostic
Evaluation (G●MADE)
• KeyMath-3 Diagnostic Assessment (KeyMath-3 DA)
G●MADE
• General
• Group administered, norm-referenced, standards-based test
• Used to identify specific math skill strengths and weaknesses
• Students K-12
• 9 levels of difficulty teachers may select from
G●MADE
• Subtests
• Concepts and communication
• Language, vocabulary, and representations of math
• Operations and computation
• Addition, subtraction, multiplication, and division
• Process and applications
• Applying appropriate operations and computations to solve word problems
G●MADE
• Scores
• Raw scores converted to:
• Standard scores, grade scores, stanines, percentiles, and normal curve
equivalents, and growth scale values.
• Norm population
• 2002 and 2003; nearly 28,000 students
• Selected based on geographic region, community type, socioeconomic
status, students with disabilities
G●MADE
• Reliability
• Acceptable levels of split-half and alternative form reliability
• Validity
• Based on NCTM standards (content validity)
• Strong criterion related evidence
KeyMath-3 Diagnostic Assessment (KeyMath3 DA)
• General
• Comprehensive assessment of math skills and concepts
• Untimed, individually administered, norm-referenced test; 30-40 minutes
• 4 years 6 months through 21 years
KeyMath-3 DA
Subtests
• Numeration
• Algebra
• Geometry
• Measurement
• Data analysis and
probability
• Mental computation and
estimation
– Addition and
subtraction
– Multiplication
and division
– Foundations of
problem solving
– Applied problem
solving
KeyMath-3 DA
• Scores
• Raw scores converted to:
• Standard scores, scaled scores, percentile rank, grade and age
equivalents, growth scale values
• Composite scores
• Operations, basic concepts, and application
• Norm population
• 3,630 individuals
• 4, 6, and 21 years – demographic distribution
approximates data reported in 2004 census
KeyMath-3 DA
• Reliability
• Internal consistency, alternate-form, and test-retest reliability
• Adequate for screening and diagnostic purposes
• Validity
• Adequate content and criterion-related validity evidence for all
composite scores
CHAPTER 14
Using Measures of Oral and Written
Language
Assessing Language Competence
• When assessing language skills, it is important to break
language down into processes and measure each one
– Language appears in written and verbal format
• Comprehension
• Expression
– Normal levels of comprehension ≠ normal expression
– Normal levels of expression ≠ normal comprehension
Terminology: Language as Code
• Phonology:
• Hearing and discriminating word sounds
• Semantics:
• Understanding vocabulary, meaning, and concepts
• Morphology and syntax:
• Understanding the grammatical structure of language
• Supralinguistics and pragmatics:
• Understanding a speaker’s or writer’s intentions
Assessing Oral and Written Language
• Why?
• Ability to converse and express thoughts is
desirable
• Basic oral and written language skills underlie
higher-order skills
• Considerations in assessing oral language
• Cultural diversity
• Differences in dialect are different, but not incorrect
• Disordered production of primary language or dialect should be
considered when evaluating oral language
• Are the norms and materials appropriate?
• Developmental considerations
• Be aware of development norms for language acquisition
Assessing Oral and Written Language
• Considerations in assessing written language
• Form and Content
• Penmanship
• Spelling
• Style
• May be best assessed by evaluating students’ written work and
developing tests (vocabulary, spelling, etc.) that parallel the
curriculum
Methods for Observing Language
Behavior
• Spontaneous language
– Record what child says while talking to an adult or playing with toys
– Prompts may be used for older children
– Analyze phonology, semantics, morphology, syntax, and
pragmatics
• Imitation
– Require children to repeat words, phrases, or sentences produced
by the examiner
– Valid predictor of spontaneous production
– Standardized imitation tasks often used in oral language
assessment instruments
• Elicited language
– A picture stimulus is used to elicit language
Methods for Observing Language
Behavior
Advantages and disadvantages of each method
Spontaneous
•Advantages
• Most natural indicator
of everyday language
performance
• Informal testing
environment
•Disadvantages
• Not a standardized
procedure (more
variability)
• Time-intensive
Imitation
Elicited language
•Advantages
• Comprehensive
• Structured and
efficient administration
•Disadvantages
• Auditory memory may
affect results
• Hard to draw
conclusions from
accurate imitations
• Boring for child
•Advantages
• Interesting and
efficient
• Comprehensive
•Disadvantages
• Difficult to create
valid
measurement
tools
Specific Oral and Written Language Tests
• Test of Written Language – Fourth Edition (TOWL-4)
• Test of Language Development: Primary – Fourth Edition
(TOLD-P:4)
• Test of Language Development: Intermediate – Fourth
Edition (TOLD-I:4)
• Oral and Written Language Scales (OWLS)
Test of Written Language – Fourth Edition
(TOWL-4)
• General
• Norm-referenced
• Designed to assess written language competence of students
between the ages of 9 and 17
• Two formats
• Contrived
• Spontaneous
TOWL-4
Subtests
• Contrived
– Vocabulary
– Spelling
– Punctuation
– Logical sentences
– Sentence combining
• Spontaneous
• Contextual conventions
• Story composition
TOWL-4
• Scores
• Raw scores can be converted to percentile or standard scores
• Three composite scores and one overall score
• Contrived writing
• Logical sentences
• Spontaneous writing
• Overall writing
TOWL-4
• Norms
– Three age ranges: 9-11, 12-14, and 15-17
– Distribution approximates nationwide school-age population for
2005; however, insufficient data are presented to confirm this
• Reliability
– Variable data for internal consistency, stability, and inter-scorer
agreement
– 2 composites reliable for making educational decisions about
students
• Validity
– Content, construct, and predictive validity evidence is presented
– Validity of inferences drawn from data is somewhat unclear
Test of Language Development: Primary –
Fourth Edition (TOLD-P:4)
• General
• Norm-referenced, untimed, individually administered test
• 4-8 years of age
• Used to:
•
•
•
•
Identify children significantly below their peers in oral language
Determine specific strengths and weaknesses
Document progress in remedial programs
Measure oral language in research studies
TOLD-P:4
• Subtests
• Picture vocabulary
• Relational vocabulary
• Oral vocabulary
• Syntactic understanding
• Sentence imitation
• Morphological completion
• Word discrimination
• Word analysis
• Word articulation
• Scores
– Raw scores converted to:
• Age equivalents, percentile
ranks, subtests scaled scores,
and composite scores
– Composite scores
•
•
•
•
•
•
Listening
Organizing
Speaking
Grammar
Semantics
Spoken language
TOLD-P:4
• Norm population
• 1,108 individuals across 4 geographic regions
• Sample partitioned according to the 2007 census
• Reliability
• Adequate estimates of reliability
• Coefficient alpha
• Test-retest
• Scorer difference
• Validity
• Adequate content, construct, and criterion-related
validity evidence
Test of Language Development: Intermediate –
Fourth Edition (TOLD-I:4)
• General
• Norm-referenced, untimed, individually administered test
• 8-17 years of age
• Used to:
•
•
•
•
Identify children significantly below their peers in oral language
Determine specific strengths and weaknesses
Document progress in remedial programs
Measure oral language in research studies
TOLD-I:4
• Subtests
• Sentence combining
• Picture vocabulary
• Word ordering
• Relational vocabulary
• Morphological comprehension
• Multiple meanings
• Norm population
• 1,097 students from 4
geographic regions
• Sample partitioned according
to the 2007 census
• Scores
– Raw scores converted to:
• Age equivalents, percentile
ranks, subtests scaled
scores, and composite
scores
– Composite scores
• Listening
• Organizing
• Speaking
• Grammar
• Semantics
• Spoken language
TOLD-I:4
• Reliability
• Adequate estimates of
reliability
• Coefficient alpha
• Test-retest
• Scorer difference
• Validity
• Adequate content,
construct, and criterionrelated validity evidence
Oral and Written Language Scales
(OWLS)
• General
• Norm referenced, individually administered assessment of
receptive and expressive language
• 3-21 years of age
• Subtests
• Listening comprehension
• Oral expression
• Written expression
OWLS
• Norm population
• 1,985 students matched to 1991 census data
• Scores
• Raw scores converted to:
• Standard scores, age equivalents, normal-curve equivalents,
percentiles, and stanines
• Scores generated for each subtest, an oral language composite, and for
a written language composite
OWLS
• Reliability
• Sufficient internal and test-retest reliability for screening, but not for
making important decisions about individual students
• Validity
• Adequate criterion-related validity