Technical Assessment Development and Validation: Methods for

Transcript Technical Assessment Development and Validation: Methods for

Technical Assessment
Development and
Validation: Methods for
Ensuring The Utility, Validity
and Reliability of Technical
Skill Assessment Systems
Session Outcomes
• Build understanding of key criteria for technical
skill assessments:
 Utility
 Validity
 Reliability
• Understand a state-led process for developing a
technical assessment system that meets such
criteria.
Carl D. Perkins Career and
Technical Education Act of 2006
• Each state established a performance
accountability system with multiple measures of
student learning, program completion, and
transitions to further education, employment and
the military
• Perkins III allowed wide flexibility in how to
measure “technical skill attainment”
• Perkins IV requires a more focused assessment
approach for technical skill attainment
Technical Skill Attainment -Secondary
• Sec 113 (b)(2)(A)
• …”core indicators of performance…that are valid
and reliable… measures of each of the following:”
• “Student attainment of career and technical skill
proficiencies, including student achievement on
technical assessments, that are aligned with
industry-recognized standards, if available and
appropriate.”
Technical Skill Attainment -- Postsecondary
 Sec 113 (b)(2)(B)
 …”core indicators of performance…that are valid
and reliable…measures of each of the following:”
 “Student attainment of career and technical skill
proficiencies, including student achievement on
technical assessments, that are aligned with
industry-recognized standards, if available and
appropriate.”
 “Student attainment of an industry-recognized
credential, a certificate, or a degree.”
Critical Features …
a) Measure what is important
b) Useful and timely feedback to stakeholders
c) Fair, consistent and accurate measures (e.g.,
reliable & valid assessments)
Key Concepts
 Utility
 Validity
 Reliability
Utility: Something useful or designed for use
Some Core Assumptions
1. We have to do this, so let’s do it in a way that is
going to be maximally useful to our stakeholders.
2. Assessment systems should ultimately influence
and reflect what is occurring in the educational
setting(s).
3. Without buy-in, items 1 & 2 will never happen.
4. A systematic process for stakeholder
involvement and communication must be
explicitly planned and built-in.
Validity: To what extent does the
assessment measure what it is
supposed to measure?
Commonly used methods…
 Face and content validity
 Construct validity (convergent, divergent, factor
analytic techniques, etc.)
 Criterion-related validity (concurrent validity,
predictive validity)
Reliability refers to the stability or
consistency of assessment results. Does
the assessment measure yield consistent
results across different raters, different
periods of time, different samples of tasks,
and so forth.
Commonly used methods
 Internal consistency reliability
 Test-Retest reliability
 Inter-rater reliability
 Others: (equivalency/parallel content, expertrater reliability, etc.)
Wyoming CTE Assessment
Project Goals
• Establish shared expectations as to what students
should know and be able to do in Wyoming’s CTE
programs
• Develop a valid and reliable CTE assessment
system
• Ensure the system provides useful, timely and
accurate feedback to teachers, administrators,
students and employers
Options to fulfill Perkins IV are:
Use Industry-Based Certifications or other
standardized-assessments –
AND/OR
Develop valid and reliable assessments
through a statewide collaborative process
Challenges and Considerations
•
•
•
•
•
•
•
•
Access to assessment data to improve classroom
instruction
Dealing with the expense of buying IBC’s
Making sure the IBC’s match up to the course content
Making sure the IBC is valuable to employers and the job
market
Getting data from externally administered exams
Deciding when to assess (end-of-program or course by
course)
Assessments that are appropriate to various program
structures and goals
Assessing CTE skills AND employability skills
Putting First Things First
FIRST, decide WHAT to assess
THEN decide
HOW to assess
Source of Standards
State standards
SCANS
State standards
SCCI K&S
State standards
SCCI K&S
State standards
Industry
State standards
defacto – texts
and tests
Courtesy of Steve Klein & MPR Associates
Can One Assessment Measure
It All?
Program of
Study
Industry
certification
test
State test from
National Item
Banks
Commercial
employability
skills test
No matter the approach, it is inherent that the program will include applied
academic skills, employability skills, cluster- and pathway-level skills, and
program/occupation skills. When considering assessments, consider if one
assessment can adequately measure all those skills.
Setting up the Structure
Assessment Project Advisory Group
• 20-25 participants, CTE administrators, community college
administrators, teachers from various clusters, state agency
staff,
• Provide general input on development process
• Liaison to the education communities at secondary and
postsecondary levels
• Identify and prioritize clusters for development in remainder
of project
• Meet in-person and through webinars, 2-3 times per year
Setting up the Structure
Business/Industry Advisory Group
• Cross-section of business/industry representatives.
• Should include representatives from each of the three initial
clusters (Agriculture, Construction, Manufacturing)
• Provide general input on development process from
business/industry perspective
• Liaison to the business communities across the state
• Review, provide input, affirm content developed by
Cluster/Pathway Work Team
• Advise on raising value of CTE and the CTE assessment
system within Wyoming business/industry.
• Meet in-person and through webinars, 2-3 times per year
Setting up the Structure
Cluster/Pathway Work Groups
• 7-10 content experts in each of three clusters: Agriculture.
Construction, Manufacturing
• Provide input on the priority competencies to include in the
assessment system
• Assist in identifying the relative usefulness and applicability
of existing assessments
• Provide input on any state-developed assessments that are
determined to be necessary
• Kick-off briefings on March 7, 2008.
• Work sessions March 16-20, 2008.
• In-person and webinar follow-up sessions through June
2008.
• Optional involvement in assessment pilot phase.
Identifying Competencies and Objectives
March 08
• Convene initial Cluster/Pathway Working Groups (CPWG).
• Each CPWG identifies core competencies (technical,
academic, employability) that need to be assessed in each
Cluster/Pathway.
April-May 08
• Draft Competencies are completed by CPWG and posted
online for review.
• Other WY teachers and faculty invited to review and
comment on Draft Competencies.
• Cluster/Pathway Competencies finalized
Identifying Test Items and
Assessment Options
May-June 08
• 5/12/08, Manufacturing and Arch/Construction CPWG’s meet
to review sample test items for the Competencies
June 08
• Agriculture/Natural Resources CPWG meets to review
sample test items for competencies
• Consultant team gathers information on assessment
resources (NOCTI, SkillsUSA, industry groups) and delivery
system options
• Consultant team completes feasibility report for assessment
development phase.
Pilot Testing Assessments and
Next Steps
Fall 2008 – Spring 2009
• Development and pilot testing of first phase assessments for
initial clusters
• Possibly begin to work with additional Cluster/Pathway Work
Groups to identify Essential Core competencies for other
areas
Example – the Utah CTE assessment
system
Overview of Assessment
Development Process
• Identify competencies and objectives.
• Decide what to assess and how (e.g., develop an
assessment blueprint).
• Feasibility phases  examine existing options
• Pilot assessments and conduct necessary
analyses to document technical quality of
assessments.
• Finalize assessments, delivery, key features the
system must possess.
What is an Assessment Blueprint?
An assessment blueprint helps us determine
what should be covered in the assessment(s)
as well as the number of test items that
should be included in each category. You
can also use it to help determine the total
length of the test as well as the types of
items to be included.
Example: NOCTI
Experienced Worker Assessment Blueprint
Areas covered in the Building Construction Occupations,
Written Assessment:
29%
17%
7%
4%
6%
10%
8%
19%
Carpentry
Electrical
Plumbing
Math
Metal Work/Guttering
Painting & Decorating
Building Code & Safety
Masonry
Example: Texas Education Agency
TAKS 9th Grade Reading
Objective 1: Reading and
basic understanding
9 multiple-choice items
Objective 2: Reading –
literacy elements and
techniques
12 multiple-choice items
1 short answer item
Objective 3: Reading –
analysis and critical
evaluation
12 multiple-choice items
2 short answer items
TOTAL NUMBER OF
ITEMS
33 multiple-choice items
3 short answer items
One last example …
A sample assessment blueprint from a test on human
geography is provided below.:
Area
Knows
common
terms
Knows
specific
facts
Understand Applies
principles
principles
Interprets
charts and
graphs
Total
Food
1
2
1
2
2
8
Clothing
2
1
0
1
1
5
Shelter
1
0
1
2
1
5
The construction of the blueprint is a useful process as it helps to ensure that items
are produced that cover both the content of the program and the educational
objectives. It can also allow the balance of 'worth' of individual items to be
determined.
Developing an Assessment
Blueprint
• What is the relative emphasis you want to place on cluster
level competencies versus pathway level competencies?
• What is the relative emphasis you want to place on areas
within the pathway?
• Within a cluster/pathway, are there objectives that are a
greater priority for you to measure than others?
• What parameters do you wish to set for the total length and
duration of the test?
• What are your thoughts regarding the distribution and use of
different types of assessment items across the areas? How
many multiple choice, short-answer/constructed response, or
performance tasks? (Note: Blooms Taxonomy, etc.)
Some factors to consider when examining
potential existing assessments
• Alignment: Do the items align or match up with the
competencies/objectives we’ve identified as important?
(e.g., Is the assessment measuring what we want it to measure?
• Ease of use: Is it manageable for teachers and students
(e.g., administration method, directions clear, requirements in terms of
resources, etc.)
•
•
•
•
•
Administration Time
Cost
Flexibility
Fairness
Content in terms of assessment items (see next page)
Some characteristics of good
assessment items …
• Each item has a specific purpose and is designed to test a
significant learning outcome.
• Items are clear (avoid irrelevant material, also language should be
clear and easy to understand or else you may be measuring student
comprehension of English rather than the trait you wish to measure).
• Contains plausible distractors
• Do the items contained within the assessment employ
multiple methods to provide a more complete picture of
student knowledge and/or skills?
• Do questions discriminate between the more able and less
able students? Do they allow students to go well beyond
the threshold requirements if they are able to?
Key features of the assessment
system?
•
•
•
•
Timing of assessments
Types of scores produced
Reporting (ongoing documentation? Access?)
Order of presentation (items presented in a specific order,
randomly, etc.)
• If online, desired features? (security/access, timed access,
etc.)
• Other things you want to be sure the system has or is able to
do???
As a teacher, what are the key features this CTE
assessment system should have in order to make it
really useful for you and your students?
For more information, contact:
• Mariam Azin: [email protected]
www.presassociates.com
• Hans Meeder: [email protected]
www.MeederConsulting.com

Technical Assessment Development and Validation: Methods for

Transcript Technical Assessment Development and Validation: Methods for

Directory