Chapter 12: Using Diagnostic Mathematics Measures

Download Report

Transcript Chapter 12: Using Diagnostic Mathematics Measures

Diagnostics Mathematics
Assessments: Main Ideas
 Now typically assess the knowledge and skill on the
subsets of the 10 standards specified by the National
Council of Teachers of Mathematics
Designed to identify specific strengths and weaknesses
in skill development
Attempt to assess a wide variety of skills
Fewer diagnostic math assessments than reading since
math is more clear cut
Purpose for Assessing Math
 Provide detailed information so that teachers and
interventionists can determine a student’s mastery of
skills and plan individualized math instruction
 Provide teachers with specific information on the
kinds of items that students pass or fail


Gives insight into how curriculum and instruction are
working in the class
Also allows for modification of the curriculum
Purpose for Assessing Math
 Teachers need to know if students have mastered
facts and concepts
 Occasionally used to make exceptionality and
eligibility decisions

Often used to establish special learning needs and
eligibility for programs for children with learning
disabilities in math
National Council of Teachers of
Mathematics
Suggest that a curriculum follow these in each and
grades just at different levels.
 Content Standards
 Process Standards
National Council of Teachers of
Mathematics
Content Standards- followed at all grades
 Numbers and Operations
 Algebra
 Geometry
 Measurement
 Data Analysis and Probability
National Council of Teachers of
Mathematics
So, you ask, what would these look like in First grade?
 Numbers and Operations- 3 + 1+
 Algebra- 3 + ☐= 4
 Geometry- What shape is + __________
 Measurement- measure the temperature, time etc.
 Data Analysis and Probability- Graph how many people
have teddy bears and how many have teddy dogs, teddy
rabbits
National Council of Teachers of
Mathematics
 Process Standards
 Problem Solving
 Reasoning and Proof
 Communication
 Connections
 Representation
National Council of Teachers of
Mathematics
 What does it look like in first grade for Process
Standards
 Reasoning and Proof

Complete the patter …
Group Mathematics Assessment and
Diagnostic Evaluation (G-MADE)
 Group administered, norm-referenced, standard
based test for assessing the math skills of students in
K-12
 Purpose: to identify specific math skill development
strengths and weaknesses and to lead to teaching
strategies
 Test materials include a CD that provides a crossreference between specific math skills and teaching
resources
 Diagnosis of skills is broad
G-MADE Subtests
Concepts and Communication
 Measures student knowledge of the language,
vocabulary, and representations of math
Operation and Computation
 Measures skills in using the basic operations of
addition, subtraction, multiplication, and division
Process and Application
 Measures skill in taking in the language and concepts of
math and applying the appropriate operations and
computations to solve a word problem
G-MADE Scores
 Raw scores can be converted to standard scores with a
mean of 100 and a standard deviation of 15
 Growth Scale Values are provided to track growth of
math skills
 Can track growth over one year or from year to year
Test Materials
Teacher’s Manual
Student Booklets
Answer Sheets
Hand-Scoring Template
Technical Manual
Age-Based Norms and Grade-Based Out of Level
Norms Supplement
Scoring and Reporting Software
Reliability
 All reliabilities exceed .74 with more than 90%
exceeding .80
 Only low reliabilities are 7th grade Concepts and
Communications and Process and Applications at all
grades beyond 4th
 Internal consistency and stability are sufficient for
using the test to make decisions about individuals
Validity
 Content is based on NCTM standards
 Created based on year long study of standards,
curriculum benchmarks, score and sequence
commonly used in math textbooks, and review of
research based on best math practices for teaching
concepts and skills
 Many studies support criterion related validity of test
 In comparison with KeyMath, all correlations were in
excess of .80, making the 2 tests highly comparable
Other Information
 Test is not timed since it is meant to test power not
speed
 Older students can complete test in one hour long
session where most students finish in about 45
minutes
 With younger students, multiple, short testing
sessions are recommended
KeyMath-3 Diagnostic Assessment
(KeyMath-3 DA)
 An untimed, individually administered, norm-
referenced test designed to provide a comprehensive
assessment of essential math concepts and skills in
individuals ages 4 years, 6 months through 21 years
 Time: 30-40 minutes in lower elementary and 70-90
minutes for older students
 Provides a means of monitoring individual’s progress
over time with 2 parallel forms that can be
administered in alternating sequence every 3 months
 Also provides Growth Scale Values (GSVs), a type of
developmental scale score
Uses for KeyMath-3 DA
 Assess math proficiency by providing comprehensive
coverage of concepts and skills taught in regular math
instruction
 Assess student progress in math
 Support instructional planning
 Support educational placement decisions
KeyMath-3 DA
 2 parallel forms (A and B) of the test
 Each test has 372 items divided into the following subtests:
 Numeration
 Algebra
 Geometry
 Measurement
 Data Analysis and Probability
 Mental Computation and Estimation
 Addition and Subtraction
 Multiplication and Division
 Foundations of Problem Solving
 Applied Problem Solving
KeyMath-3 DA Resources
 Manual
 Two free standing easels for either Form A or B
 25 record forms with detachable Written
Computation Examinee Booklets
 Two additional products that are available:
 ASSIST Scoring and Reporting Software Program
 KeyMath-3 DA Essential Resources Instructional
Program
KeyMath-3 DA Scores
 Can be hand scored or by using software
 Relative Standing: scale scores, standard scores,
percentile rank
 Developmental Scores: grade and age equivalents,
growth scale values
 Composite Scores: basic concepts, operations,
application
 Software can produce progress reports, narrative
summaries, export scores to Excel, parent reports
Reliability
 Internal Consistency – low in K and 1st but in other
ages exceed .80
 Alternate Form – exceed .80 with exception of
different forms for Geometry and Data Analysis and
Probability
 Adjusted Test-Retest – based on 103 students, grades
K-12 generally exceed .80 with exception of
Foundations of Problem Solving (.70) and Geometry
(.78) subtests
 Adequate for screening and diagnostic purposes
Validity
 Correlates very highly with scores on KeyMath-
Revised normative update and scores on Kaufman Test
of Educational Achievement, Measures of Academic
Progress (MAP), and G-MADE
 Evidence for content validity is good based on
alignment with state and NCTM standards
Weaknesses for Diagnostic Math
Assessments
 Recurring issue of curriculum match
 Selecting appropriate test for the type of decision to
be made
 Do not test a sufficiently detailed sample of math
concepts and facts – must generalize
 Due to weaknesses, tests are not very useful in
assessing readiness or strengths and weaknesses in
order to plan instructional programs
 Preferred practice is for teachers to develop
curriculum-based achievement tests that exactly
parallel curriculum being taught
Goal of Oral and Written
Language Assessments
“The assessment of language
competence should include
evaluation of a student’s ability to
process, both in comprehension
and in expression, language in a
spoken or written format.”
Major Communication
Processes
Oral Comprehension – listening
and comprehending speech
2. Written Comprehension –
reading
3. Oral Expression – speaking
4. Written Expression - writing
1.
Related Terminology
Language
Component
Reception/
Expression/
Comprehensio Production
n
Phonology
Hearing and
discriminating
speech sounds
Articulating
speech sounds
Morphology
and Syntax
Understanding
the grammatical
structure of
language
Using the
grammatical
structure of
language
Semantics
Understanding
vocabulary,
meaning, and
concepts
Using
vocabulary,
meaning, and
concepts
Pragmatics and
Supralinguistics
Understanding
a speaker’s or
writer’s
intentions
Using awareness
of social aspects
of language
Assessing Oral
Language
 Cultural Diversity
 Birth place, pronunciations,
comparing with the same
language community
 Developmental Considerations
 Sounds, linguistic structures,
and some semantic elements are
developmental
Considerations in Assessing
Written Language
 Content – Production
 Formulating, elaborating,
sequencing, clarifying, and
precise word choice to convey
meaning
 Form
 Penmanship, spelling,
and style rules
Observing Language Behavior
The following are the three main
procedures for gathering a
sample of a student’s language
behavior.
 Spontaneous Language
 Imitation
 Elicited Language
Observing Language Behavior
Advantages to Spontaneous
Language
 Spontaneity is the best and most
natural indicator of everyday
language performance.
 Informality makes assessment
easy, no formal testing
atmosphere.
Observing Language Behavior
Disadvantages of Spontaneous
Language
 There is a non-standard nature
to the data collected by this type
of test.
 This test can take a very long
time to collect data.
Observing Language Behavior
Advantages of Imitation
 Overcomes many of the problems
associated with the spontaneous
approach.
 Assesses many different language
elements to give a representative
view of child’s language system
 Structure of the test allows
examiner to know all elements of
language being assessed.
 Test can be administered much
more quickly than with
spontaneous tests.
Observing Language Behavior
Disadvantages of Imitation



Children’s auditory memory may effect the
results – a child can score well by imitation
without demonstrating productive
knowledge of the language structures
being tested.
A child can repeat exactly what is said if
the utterance or sentence is too small
requiring no memory processing.
Children become very bored and can’t sit
still. There is no stimuli like pictures or
toys present. Just the repetition of
repeating 50 to 100 sentences after the
examiner.
Observing Language Behavior
Advantages to Elicited Language
 Pictures can be structured to test
desired language elements while
retaining some of the
spontaneous language samples.
 Allows children to create
language on their own.
 There is no time limit so results
do not depend on child’s word
retention ability.
Observing Language Behavior
Disadvantages of
Elicited Language
 Difficult to find pictures to
guarantee exact word or sentence
response.
 Child may not produce or
attempt to produce the desired
language structure.
Tests

Test of Written Language – 4th (ed)
(TOWL-4)

Test of Language Development:
Primary – 4th
edition (TOLD-P:4)
Test of Language Development:
Intermediate – 4th edition (TOLD-I:4)


Oral an Written Language Scales
(OWLS)

Test of Auditory Reasoning and
Processing Skills
(TARPS)
Six Subtests






Sentence combining. The child is
required to form one compound or
complex sentence from two or more simple
sentences spoken by the examiner.
Picture vocabulary. The child points to
the picture that best represents a series of
two-word items.
Word ordering. The child forms a
complete, correct sentence from a
randomly-ordered string of words, ranging
from three to seven in length.
Relational vocabulary. The child tells
how three words, spoken by the examiner,
are alike.
Morphological comprehension. The
child distinguishes between
grammatically correct and incorrect
sentences.
Multiple meanings. The examiner says a
word and the student responds by
saying as many different meanings for that
word as he/she can think of.
Reliability and Validity
TOLD-I:4 appears to meet and often
exceed the standards for reliability for
making screening and diagnostic
decisions.
The coefficients for reliability exceed
0.90
Unlike the TOLD – P:4, there is good
evidence for construct validity of this
test which is based on oral language
ability which is known to be related
to literacy and this test has a high
correlation with reading and writing
abilities.
Oral and Written Language Scales
(OWLS)
Individually administered assessment of
receptive and expressive language.
Test includes three scales:
- Listening Comprehension
- Oral Expression
- Written Expression
Recommended uses: Ages 3 – 21
To determine broad levels of language skills
and specific performance in listening,
speaking, and writing.
Create intervention plans, and monitor
student progress scores can be converted to
obtain age equivalents/percentiles, etc.
Measures understanding of
spoken language
111 items – examiner reads aloud
a verbal stimulus. The student
has to identify which 4 pictures
is the best response to the
stimulus.
Measures understanding of and
use of spoken language.
96 items – examiner reads aloud a
verbal stimulus and shows a
picture.
Student responds orally by either
answering a question, completing
a sentence, or generating one or
more sentences.
Measures ability of students 521 yrs old regarding use
spelling, punctuation, syntax –
sentence structure, phrases,
etc., and communicate with
appropriate content, coherence,
organization, etc.
The student responds to direct
writing prompts by the
examiner.
 There are wide ranges in reliability
coefficients for this test.
 Results of this test are sufficient to
use as a screening device but are not
sufficient to use in making important
decisions about individual students.
 Authors of this test report that the
validity studies comparing these
subtests to established criterion
measured tests were similar in
performance and within the
expected range of validity.
 Theory of multiple intelligences
 Heredity
 Learn through experiences
 Today most theorists recognize the importance of both
heredity and experience.
 Intelligence test results are used to determine
eligibility for special services.
 School Psychologists are trained professionals who
administer Intelligence Tests.
 IQ tests are helpful in providing general information
as to how to pace instruction.
• An inferred ability; to explain differences in
present behavior and to predict differences in
future behavior.
• It is a general ability that enables people to do
many different things.
 A child’s background experiences and learning
opportunities that they already have.
 Culture
 Experiences available in one’s environment
 Age
…..that may influence the psychological demands presented by the
test.
***Failure is NOT due to an inability to comprehend or solve a
problem, but a deficiency in background experience***
 Discrimination: identify the item that is different from




the others
Generalization: given a stimulus, identify from a group
the one that goes with the stimulus
Motor Behavior: requires motor response in duplicating a
geometric design using blocks, tracing a path through a
maze, or reconstructing designs from memory.
General Knowledge: factual questions
Vocabulary: naming pictures or reading a definition and
selecting a picture (depending on age)
 Induction: State a rule or principle from a series of objects
 Comprehension: 3 types: those related to directions, to
printed material, or to social customs and mores.
 Sequencing: identify the response that continues a series
 Detail Recognition: identify the missing parts of a picture
 Analogical Reasoning: How things are related to each
other “A : B :: C : _____?
 Pattern Completion: completing a pattern or identifying
a missing part of a pattern
 Abstract Reasoning: identify the absurdity in a picture or
verbal statement
 Memory: many different assessments are used to measure
memory, ex. verbatim repetition of a sentence or series of
numbers
 Individual Tests: given one on one by a certified
evaluator; most commonly used for educational
placement decisions.
Three types of Intelligence Tests
 Group Tests: may be used as a screening tool for
individual students, or to gain information about
groups of students.
 Nonverbal Intelligence Tests:
Picture- Vocabulary test;
Administered to non-readers, ELL’s and hearing
impaired students.
* This test measures only one aspect of intelligence
(receptive vocabulary,) and should not be used to
determine eligibility for special services.
 Developed by David Wechsler in 1949, is has since had
several revisions.
 Wechsler states, “intelligence is the overall capacity of
an individual to understand and cope with the world
around him.”
 The test is a measure of the cognitive ability and
problem-solving process of a person ages 6 years to 16
years, 11 months.
 Subtests; Core and Supplemental*:

Verbal Comprehension Index (VCI)
 Similarities
 Vocabulary
 Comprehension
 Information*
 Word Reasoning*
Wechsler Intelligence Scale for Children-IV (WISC-IV)
 Subtests; Core and Supplemental*:
 Perceptual Reasoning Index (PRI)




Block Design
Picture Concepts*
Matrix Reasoning*
Picture Completion
Wechsler Intelligence Scale for Children-IV (WISC-IV)
 Subtests; Core and Supplemental*:
 Working Memory Index (WMI)



Digital span
Letter-Number Sequencing*
Arithmetic
Wechsler Intelligence Scale for Children-IV (WISC-IV)
 Subtests; Core and Supplemental*:
 Processing Speed Index (PSI)



Coding
Symbol Search
Cancellation*
 The full-scale IQ (FSIQ) is reliable enough to make
important educational decisions. There is not
enough information gathered from the subtests
alone to make the educational decisions.
 When using the WISC-VI to determine
educational needs for a student, examiners should
only use the FSIQ.
 timed test
 sample
 2 minutes
 9 blocks
Pick one picture from each row with common characteristics
Look at this picture. What part is missing?
 Measures general intellectual ability , specific cognitive abilities,




scholastic aptitudes, oral language and achievement.
Individually administered and norm-referenced
For ages 2-90+
Computer scored
Each Test Record contains a seven-category Test Session
Observation Checklist to rate a student’s conversational
proficiency, cooperation, activity, attention and concentration,
self-confidence, care in responding and response to difficult tasks.
 20 subtests measuring broad and narrow abilities
 Comprehension-knowledge, long-term retrieval, visual-spatial
thinking, auditory processing, fluid reasoning, processing speed, shortterm memory.
 Subtests can be combined to create additional clusters for verbal ability,
thinking ability, cognitive efficiency, phonemic awareness and working
memory.
 Additional supplemental subtests create more clusters, broad
attention, cognitive fluency and executive processes
 22 tests can be combined to form several clusters.
 Subtests and clusters from the standard battery
can be combined to form scores for broad areas in
reading, math and writing.

Oral expression, listening comprehension, basic reading skills,
reading comprehension, phoneme/grapheme knowledge, math
calculation skills, math reasoning, written expression
 Individual tests are combined to provide clusters for
educational decision making
 Cluster reliabilities for some age groups are less than
.90, but all median reliabilities across age groups for
the standard and broad cognitive and achievement
clusters exceed .90
 Careful item selection is consistent with claims for the content validity
of both tests
 Studies using a broad range of individuals provides evidence for
validity
 For the Cognitive Ability Tests, the correlations between the WJ-III
General Intellectual Ability score and the WISC-III Full-Scale IQ range
from .69 to .73
 For the Achievement Tests, the pattern and magnitude of correlations
between the Wechsler Individual tests suggest that the WJ-III measures
the same skills similar to those measured by other achievement tests.
 A non-timed test primarily given to younger children and ELL’s
 Assesses the receptive(hearing) vocabulary of examinees
 It consists of stimuli sets of 12 and examinees are tested at their ability or age
level
 As part of a broader assessment, can be useful in evaluating language
competence, selecting the level and content of instruction and measuring
learning
 The assessment of vocabulary is also useful when evaluating the effects of
injury or disease
 It is individually administered using an easel
 Available in Spanish
 Examinees earn a raw score based on the number of
pictures correctly identified between basal and ceiling
items
 Basal - the lowest set administered that contains one or no
errors
 Ceiling – the highest set administered that contains eight
or more errors
 Testing is discontinued once a ceiling is established
 Multiple kinds of reliability are reported
 The scores of a PPVT-4 test are very precise and
consistent
 Data also included on the testing and performance of
students with disabilities
 Five studies were conducted and indicate that there is adequate validity
 Slightly lower correlations were found on assessments that measured
broader areas of language than primarily vocabulary
 Data is also provided on how students with speech and language
impairments, hearing impairments, specific learning disabilities,
mental retardation, giftedness, emotional/behavioral disturbances and
ADHD, perform in relation to the general population
 Results indicate the value of the PPVT-4 in assessing these special
populations
 Assessing children’s IQ is controversial
 Intelligence tests assess samples of behavior
 Different intelligence tests sample different behaviors
 Educators must always ask “IQ on what test?”
 Test authors have their own definitions of intelligence and therefore test those
items/behaviors they feel represent their definition
 When interpreting intelligence scores, avoid making judgments that suggest
that the score represents much more than the specific behaviors sampled
 The quality of measurement can be affected by several different types
of student characteristics and therefore must be taken into
consideration
 “Many of the behaviors sampled on intelligence tests are more
indicative of actual achievement than ability to achieve.”
 For example, “students who have had more opportunities to learn and
achieve are likely to perform better than those who have had less
exposure to information, even if they both have the same overall
potential to learn.”
 “Intelligence tests are by no means a pure representation of a student’s
ability to learn.”