Chapter 12: Using Diagnostic Mathematics Measures
Download
Report
Transcript Chapter 12: Using Diagnostic Mathematics Measures
Diagnostics Mathematics
Assessments: Main Ideas
Now typically assess the knowledge and skill on the
subsets of the 10 standards specified by the National
Council of Teachers of Mathematics
Designed to identify specific strengths and weaknesses
in skill development
Attempt to assess a wide variety of skills
Fewer diagnostic math assessments than reading since
math is more clear cut
Purpose for Assessing Math
Provide detailed information so that teachers and
interventionists can determine a student’s mastery of
skills and plan individualized math instruction
Provide teachers with specific information on the
kinds of items that students pass or fail
Gives insight into how curriculum and instruction are
working in the class
Also allows for modification of the curriculum
Purpose for Assessing Math
Teachers need to know if students have mastered
facts and concepts
Occasionally used to make exceptionality and
eligibility decisions
Often used to establish special learning needs and
eligibility for programs for children with learning
disabilities in math
National Council of Teachers of
Mathematics
Suggest that a curriculum follow these in each and
grades just at different levels.
Content Standards
Process Standards
National Council of Teachers of
Mathematics
Content Standards- followed at all grades
Numbers and Operations
Algebra
Geometry
Measurement
Data Analysis and Probability
National Council of Teachers of
Mathematics
So, you ask, what would these look like in First grade?
Numbers and Operations- 3 + 1+
Algebra- 3 + ☐= 4
Geometry- What shape is + __________
Measurement- measure the temperature, time etc.
Data Analysis and Probability- Graph how many people
have teddy bears and how many have teddy dogs, teddy
rabbits
National Council of Teachers of
Mathematics
Process Standards
Problem Solving
Reasoning and Proof
Communication
Connections
Representation
National Council of Teachers of
Mathematics
What does it look like in first grade for Process
Standards
Reasoning and Proof
Complete the patter …
Group Mathematics Assessment and
Diagnostic Evaluation (G-MADE)
Group administered, norm-referenced, standard
based test for assessing the math skills of students in
K-12
Purpose: to identify specific math skill development
strengths and weaknesses and to lead to teaching
strategies
Test materials include a CD that provides a crossreference between specific math skills and teaching
resources
Diagnosis of skills is broad
G-MADE Subtests
Concepts and Communication
Measures student knowledge of the language,
vocabulary, and representations of math
Operation and Computation
Measures skills in using the basic operations of
addition, subtraction, multiplication, and division
Process and Application
Measures skill in taking in the language and concepts of
math and applying the appropriate operations and
computations to solve a word problem
G-MADE Scores
Raw scores can be converted to standard scores with a
mean of 100 and a standard deviation of 15
Growth Scale Values are provided to track growth of
math skills
Can track growth over one year or from year to year
Test Materials
Teacher’s Manual
Student Booklets
Answer Sheets
Hand-Scoring Template
Technical Manual
Age-Based Norms and Grade-Based Out of Level
Norms Supplement
Scoring and Reporting Software
Reliability
All reliabilities exceed .74 with more than 90%
exceeding .80
Only low reliabilities are 7th grade Concepts and
Communications and Process and Applications at all
grades beyond 4th
Internal consistency and stability are sufficient for
using the test to make decisions about individuals
Validity
Content is based on NCTM standards
Created based on year long study of standards,
curriculum benchmarks, score and sequence
commonly used in math textbooks, and review of
research based on best math practices for teaching
concepts and skills
Many studies support criterion related validity of test
In comparison with KeyMath, all correlations were in
excess of .80, making the 2 tests highly comparable
Other Information
Test is not timed since it is meant to test power not
speed
Older students can complete test in one hour long
session where most students finish in about 45
minutes
With younger students, multiple, short testing
sessions are recommended
KeyMath-3 Diagnostic Assessment
(KeyMath-3 DA)
An untimed, individually administered, norm-
referenced test designed to provide a comprehensive
assessment of essential math concepts and skills in
individuals ages 4 years, 6 months through 21 years
Time: 30-40 minutes in lower elementary and 70-90
minutes for older students
Provides a means of monitoring individual’s progress
over time with 2 parallel forms that can be
administered in alternating sequence every 3 months
Also provides Growth Scale Values (GSVs), a type of
developmental scale score
Uses for KeyMath-3 DA
Assess math proficiency by providing comprehensive
coverage of concepts and skills taught in regular math
instruction
Assess student progress in math
Support instructional planning
Support educational placement decisions
KeyMath-3 DA
2 parallel forms (A and B) of the test
Each test has 372 items divided into the following subtests:
Numeration
Algebra
Geometry
Measurement
Data Analysis and Probability
Mental Computation and Estimation
Addition and Subtraction
Multiplication and Division
Foundations of Problem Solving
Applied Problem Solving
KeyMath-3 DA Resources
Manual
Two free standing easels for either Form A or B
25 record forms with detachable Written
Computation Examinee Booklets
Two additional products that are available:
ASSIST Scoring and Reporting Software Program
KeyMath-3 DA Essential Resources Instructional
Program
KeyMath-3 DA Scores
Can be hand scored or by using software
Relative Standing: scale scores, standard scores,
percentile rank
Developmental Scores: grade and age equivalents,
growth scale values
Composite Scores: basic concepts, operations,
application
Software can produce progress reports, narrative
summaries, export scores to Excel, parent reports
Reliability
Internal Consistency – low in K and 1st but in other
ages exceed .80
Alternate Form – exceed .80 with exception of
different forms for Geometry and Data Analysis and
Probability
Adjusted Test-Retest – based on 103 students, grades
K-12 generally exceed .80 with exception of
Foundations of Problem Solving (.70) and Geometry
(.78) subtests
Adequate for screening and diagnostic purposes
Validity
Correlates very highly with scores on KeyMath-
Revised normative update and scores on Kaufman Test
of Educational Achievement, Measures of Academic
Progress (MAP), and G-MADE
Evidence for content validity is good based on
alignment with state and NCTM standards
Weaknesses for Diagnostic Math
Assessments
Recurring issue of curriculum match
Selecting appropriate test for the type of decision to
be made
Do not test a sufficiently detailed sample of math
concepts and facts – must generalize
Due to weaknesses, tests are not very useful in
assessing readiness or strengths and weaknesses in
order to plan instructional programs
Preferred practice is for teachers to develop
curriculum-based achievement tests that exactly
parallel curriculum being taught
Goal of Oral and Written
Language Assessments
“The assessment of language
competence should include
evaluation of a student’s ability to
process, both in comprehension
and in expression, language in a
spoken or written format.”
Major Communication
Processes
Oral Comprehension – listening
and comprehending speech
2. Written Comprehension –
reading
3. Oral Expression – speaking
4. Written Expression - writing
1.
Related Terminology
Language
Component
Reception/
Expression/
Comprehensio Production
n
Phonology
Hearing and
discriminating
speech sounds
Articulating
speech sounds
Morphology
and Syntax
Understanding
the grammatical
structure of
language
Using the
grammatical
structure of
language
Semantics
Understanding
vocabulary,
meaning, and
concepts
Using
vocabulary,
meaning, and
concepts
Pragmatics and
Supralinguistics
Understanding
a speaker’s or
writer’s
intentions
Using awareness
of social aspects
of language
Assessing Oral
Language
Cultural Diversity
Birth place, pronunciations,
comparing with the same
language community
Developmental Considerations
Sounds, linguistic structures,
and some semantic elements are
developmental
Considerations in Assessing
Written Language
Content – Production
Formulating, elaborating,
sequencing, clarifying, and
precise word choice to convey
meaning
Form
Penmanship, spelling,
and style rules
Observing Language Behavior
The following are the three main
procedures for gathering a
sample of a student’s language
behavior.
Spontaneous Language
Imitation
Elicited Language
Observing Language Behavior
Advantages to Spontaneous
Language
Spontaneity is the best and most
natural indicator of everyday
language performance.
Informality makes assessment
easy, no formal testing
atmosphere.
Observing Language Behavior
Disadvantages of Spontaneous
Language
There is a non-standard nature
to the data collected by this type
of test.
This test can take a very long
time to collect data.
Observing Language Behavior
Advantages of Imitation
Overcomes many of the problems
associated with the spontaneous
approach.
Assesses many different language
elements to give a representative
view of child’s language system
Structure of the test allows
examiner to know all elements of
language being assessed.
Test can be administered much
more quickly than with
spontaneous tests.
Observing Language Behavior
Disadvantages of Imitation
Children’s auditory memory may effect the
results – a child can score well by imitation
without demonstrating productive
knowledge of the language structures
being tested.
A child can repeat exactly what is said if
the utterance or sentence is too small
requiring no memory processing.
Children become very bored and can’t sit
still. There is no stimuli like pictures or
toys present. Just the repetition of
repeating 50 to 100 sentences after the
examiner.
Observing Language Behavior
Advantages to Elicited Language
Pictures can be structured to test
desired language elements while
retaining some of the
spontaneous language samples.
Allows children to create
language on their own.
There is no time limit so results
do not depend on child’s word
retention ability.
Observing Language Behavior
Disadvantages of
Elicited Language
Difficult to find pictures to
guarantee exact word or sentence
response.
Child may not produce or
attempt to produce the desired
language structure.
Tests
Test of Written Language – 4th (ed)
(TOWL-4)
Test of Language Development:
Primary – 4th
edition (TOLD-P:4)
Test of Language Development:
Intermediate – 4th edition (TOLD-I:4)
Oral an Written Language Scales
(OWLS)
Test of Auditory Reasoning and
Processing Skills
(TARPS)
Six Subtests
Sentence combining. The child is
required to form one compound or
complex sentence from two or more simple
sentences spoken by the examiner.
Picture vocabulary. The child points to
the picture that best represents a series of
two-word items.
Word ordering. The child forms a
complete, correct sentence from a
randomly-ordered string of words, ranging
from three to seven in length.
Relational vocabulary. The child tells
how three words, spoken by the examiner,
are alike.
Morphological comprehension. The
child distinguishes between
grammatically correct and incorrect
sentences.
Multiple meanings. The examiner says a
word and the student responds by
saying as many different meanings for that
word as he/she can think of.
Reliability and Validity
TOLD-I:4 appears to meet and often
exceed the standards for reliability for
making screening and diagnostic
decisions.
The coefficients for reliability exceed
0.90
Unlike the TOLD – P:4, there is good
evidence for construct validity of this
test which is based on oral language
ability which is known to be related
to literacy and this test has a high
correlation with reading and writing
abilities.
Oral and Written Language Scales
(OWLS)
Individually administered assessment of
receptive and expressive language.
Test includes three scales:
- Listening Comprehension
- Oral Expression
- Written Expression
Recommended uses: Ages 3 – 21
To determine broad levels of language skills
and specific performance in listening,
speaking, and writing.
Create intervention plans, and monitor
student progress scores can be converted to
obtain age equivalents/percentiles, etc.
Measures understanding of
spoken language
111 items – examiner reads aloud
a verbal stimulus. The student
has to identify which 4 pictures
is the best response to the
stimulus.
Measures understanding of and
use of spoken language.
96 items – examiner reads aloud a
verbal stimulus and shows a
picture.
Student responds orally by either
answering a question, completing
a sentence, or generating one or
more sentences.
Measures ability of students 521 yrs old regarding use
spelling, punctuation, syntax –
sentence structure, phrases,
etc., and communicate with
appropriate content, coherence,
organization, etc.
The student responds to direct
writing prompts by the
examiner.
There are wide ranges in reliability
coefficients for this test.
Results of this test are sufficient to
use as a screening device but are not
sufficient to use in making important
decisions about individual students.
Authors of this test report that the
validity studies comparing these
subtests to established criterion
measured tests were similar in
performance and within the
expected range of validity.
Theory of multiple intelligences
Heredity
Learn through experiences
Today most theorists recognize the importance of both
heredity and experience.
Intelligence test results are used to determine
eligibility for special services.
School Psychologists are trained professionals who
administer Intelligence Tests.
IQ tests are helpful in providing general information
as to how to pace instruction.
• An inferred ability; to explain differences in
present behavior and to predict differences in
future behavior.
• It is a general ability that enables people to do
many different things.
A child’s background experiences and learning
opportunities that they already have.
Culture
Experiences available in one’s environment
Age
…..that may influence the psychological demands presented by the
test.
***Failure is NOT due to an inability to comprehend or solve a
problem, but a deficiency in background experience***
Discrimination: identify the item that is different from
the others
Generalization: given a stimulus, identify from a group
the one that goes with the stimulus
Motor Behavior: requires motor response in duplicating a
geometric design using blocks, tracing a path through a
maze, or reconstructing designs from memory.
General Knowledge: factual questions
Vocabulary: naming pictures or reading a definition and
selecting a picture (depending on age)
Induction: State a rule or principle from a series of objects
Comprehension: 3 types: those related to directions, to
printed material, or to social customs and mores.
Sequencing: identify the response that continues a series
Detail Recognition: identify the missing parts of a picture
Analogical Reasoning: How things are related to each
other “A : B :: C : _____?
Pattern Completion: completing a pattern or identifying
a missing part of a pattern
Abstract Reasoning: identify the absurdity in a picture or
verbal statement
Memory: many different assessments are used to measure
memory, ex. verbatim repetition of a sentence or series of
numbers
Individual Tests: given one on one by a certified
evaluator; most commonly used for educational
placement decisions.
Three types of Intelligence Tests
Group Tests: may be used as a screening tool for
individual students, or to gain information about
groups of students.
Nonverbal Intelligence Tests:
Picture- Vocabulary test;
Administered to non-readers, ELL’s and hearing
impaired students.
* This test measures only one aspect of intelligence
(receptive vocabulary,) and should not be used to
determine eligibility for special services.
Developed by David Wechsler in 1949, is has since had
several revisions.
Wechsler states, “intelligence is the overall capacity of
an individual to understand and cope with the world
around him.”
The test is a measure of the cognitive ability and
problem-solving process of a person ages 6 years to 16
years, 11 months.
Subtests; Core and Supplemental*:
Verbal Comprehension Index (VCI)
Similarities
Vocabulary
Comprehension
Information*
Word Reasoning*
Wechsler Intelligence Scale for Children-IV (WISC-IV)
Subtests; Core and Supplemental*:
Perceptual Reasoning Index (PRI)
Block Design
Picture Concepts*
Matrix Reasoning*
Picture Completion
Wechsler Intelligence Scale for Children-IV (WISC-IV)
Subtests; Core and Supplemental*:
Working Memory Index (WMI)
Digital span
Letter-Number Sequencing*
Arithmetic
Wechsler Intelligence Scale for Children-IV (WISC-IV)
Subtests; Core and Supplemental*:
Processing Speed Index (PSI)
Coding
Symbol Search
Cancellation*
The full-scale IQ (FSIQ) is reliable enough to make
important educational decisions. There is not
enough information gathered from the subtests
alone to make the educational decisions.
When using the WISC-VI to determine
educational needs for a student, examiners should
only use the FSIQ.
timed test
sample
2 minutes
9 blocks
Pick one picture from each row with common characteristics
Look at this picture. What part is missing?
Measures general intellectual ability , specific cognitive abilities,
scholastic aptitudes, oral language and achievement.
Individually administered and norm-referenced
For ages 2-90+
Computer scored
Each Test Record contains a seven-category Test Session
Observation Checklist to rate a student’s conversational
proficiency, cooperation, activity, attention and concentration,
self-confidence, care in responding and response to difficult tasks.
20 subtests measuring broad and narrow abilities
Comprehension-knowledge, long-term retrieval, visual-spatial
thinking, auditory processing, fluid reasoning, processing speed, shortterm memory.
Subtests can be combined to create additional clusters for verbal ability,
thinking ability, cognitive efficiency, phonemic awareness and working
memory.
Additional supplemental subtests create more clusters, broad
attention, cognitive fluency and executive processes
22 tests can be combined to form several clusters.
Subtests and clusters from the standard battery
can be combined to form scores for broad areas in
reading, math and writing.
Oral expression, listening comprehension, basic reading skills,
reading comprehension, phoneme/grapheme knowledge, math
calculation skills, math reasoning, written expression
Individual tests are combined to provide clusters for
educational decision making
Cluster reliabilities for some age groups are less than
.90, but all median reliabilities across age groups for
the standard and broad cognitive and achievement
clusters exceed .90
Careful item selection is consistent with claims for the content validity
of both tests
Studies using a broad range of individuals provides evidence for
validity
For the Cognitive Ability Tests, the correlations between the WJ-III
General Intellectual Ability score and the WISC-III Full-Scale IQ range
from .69 to .73
For the Achievement Tests, the pattern and magnitude of correlations
between the Wechsler Individual tests suggest that the WJ-III measures
the same skills similar to those measured by other achievement tests.
A non-timed test primarily given to younger children and ELL’s
Assesses the receptive(hearing) vocabulary of examinees
It consists of stimuli sets of 12 and examinees are tested at their ability or age
level
As part of a broader assessment, can be useful in evaluating language
competence, selecting the level and content of instruction and measuring
learning
The assessment of vocabulary is also useful when evaluating the effects of
injury or disease
It is individually administered using an easel
Available in Spanish
Examinees earn a raw score based on the number of
pictures correctly identified between basal and ceiling
items
Basal - the lowest set administered that contains one or no
errors
Ceiling – the highest set administered that contains eight
or more errors
Testing is discontinued once a ceiling is established
Multiple kinds of reliability are reported
The scores of a PPVT-4 test are very precise and
consistent
Data also included on the testing and performance of
students with disabilities
Five studies were conducted and indicate that there is adequate validity
Slightly lower correlations were found on assessments that measured
broader areas of language than primarily vocabulary
Data is also provided on how students with speech and language
impairments, hearing impairments, specific learning disabilities,
mental retardation, giftedness, emotional/behavioral disturbances and
ADHD, perform in relation to the general population
Results indicate the value of the PPVT-4 in assessing these special
populations
Assessing children’s IQ is controversial
Intelligence tests assess samples of behavior
Different intelligence tests sample different behaviors
Educators must always ask “IQ on what test?”
Test authors have their own definitions of intelligence and therefore test those
items/behaviors they feel represent their definition
When interpreting intelligence scores, avoid making judgments that suggest
that the score represents much more than the specific behaviors sampled
The quality of measurement can be affected by several different types
of student characteristics and therefore must be taken into
consideration
“Many of the behaviors sampled on intelligence tests are more
indicative of actual achievement than ability to achieve.”
For example, “students who have had more opportunities to learn and
achieve are likely to perform better than those who have had less
exposure to information, even if they both have the same overall
potential to learn.”
“Intelligence tests are by no means a pure representation of a student’s
ability to learn.”