Transcript Making Assessments Matter to Students
Making Assessments Matter to Students
Andrew Jones | Donna Sundre | Peter Swerdzewski | Carol Barry | Abby Lau
Low-Stakes Assessment, Student Motivation, and the Validity of Scores: A General Introduction
Making Assessments Matter to Students
• Overview Provide a general introduction to the concepts of low stakes testing, student motivation, and the validity of scores Provide an overview for the remainder of the session
The Need for Assessment
• Accountability has led to an increased demand for universities to engage in assessment Spellings Commission Accreditation Agencies State Councils
The Need for Assessment
• Universities and stakeholders are using these assessments to make decisions about the effectiveness of programs • Should we be concerned with how this data is collected and in turn, how these scores are used?
Motivation and Student Scores
• Often, it is assumed that students put forth effort in an assessment situation This is typically the case when stakes are attached to the testing situation Classroom tests, quizzes SAT, GRE If student does poorly, there will be negative consequences Not admitted to colleges, graduate school Conversely, there are rewards for doing well
Motivation and Assessment
• Students can also be tested in situations where they do not perceive personal consequences • This results in a “low-stakes” testing situation
Motivation and Assessment
• What is a low-stakes testing situation?
Scores on the assessment do not impact the student No gain as a result of doing well No punishment for doing poorly Students may not even receive feedback as to how they perform on the assessment
Motivation and Assessment
• Many assessments in higher ed contexts can be categorized as “low-stakes” assessment Especially the case for assessments administered for program evaluation or accountability
Motivation and Assessment
• How do these low-stakes settings impact scores?
Students may not put forth as much effort in comparison to a “high-stakes” setting Unmotivated students will not score as high on achievement tests (Wise & DeMars, 2005)
Why Does Reduced Effort Matter
• If students are not putting forth effort, then they are not truly representing what they know or who they are Students may have increased greatly in a specific competency, but their scores would not reflect this due to a lack of effort on the items Students attitudes may have changed, but their scores would not reflect this because they have not attended properly to the instrument
Why Does Reduced Effort Matter?
• Impacts the validity of scores from an assessment • What do we mean by validity?
Conceptually May not be measuring what we say we are measuring Essentially, scores are not as useful Scores are not providing stakeholders with the information that they intended to gather!
Motivation and Assessment
• Why not just make all higher ed assessments high stakes tests?
Test Anxiety Legal Issues Creates a much more political environment Typically, higher ed institutions are not interested in assessment at the individual level Structure of universities are not geared toward high stakes assessment
Motivation and Assessment
• Given that low-stakes assessment will continue to be used in higher ed, we need to be concerned about this issue of motivation • Critical to assess motivation and find ways to improve motivation, as this impacts the inferences we make about programs
Overview of Presentations Today
• Providing an overview for JMU’s model of assessment (Donna Sundre) Providing one method to measure motivation • What do students think of assessment and how does this impact motivation? (Peter Swerdzewski) • How does control over the assessment environment impact student scores? (Carol Barry) • What impact can proctors have on student motivation in low-stakes settings? (Abby Lau) • General recommendations to increase motivation and the usefulness of test scores in low-stakes settings • Questions and discussion
Assessment at JMU
Donna Sundre
The Assessment Culture at JMU
JMU requires students to take a series of student outcomes assessments prior to their graduation. These assessments are held at four stages of students’ academic careers:
as entering first-year students
at the mid-undergraduate point when they have earned 45 to 70 credit hours, typically the sophomore year
as graduating seniors in their academic major(s) Students will also complete an alumni survey after graduation
-JMU Undergraduate Catalog
The Assessment Culture at JMU
• CARS supports all general education assessment • CARS administers all JMU alumni surveys • CARS supports assessment for every academic program • CARS supports assessment for the Division of Student Affairs • All programs must collect and report on assessment data annually • Academic Program Reviews are scheduled every 6 years for every major degree program graduate and undergraduate
The Assessment Culture at JMU
• Long-standing and pervasive expectation at JMU that assessment findings will guide decision making.
Annual reports, Assessment Progress Templates, program change proposals, and all academic program review self-study documents all require substantial descriptions of how Assessment guides decision-making • The Center for Assessment and Research Studies (CARS) is the largest higher education assessment center in the US with 10 Faculty, 3 Support Staff, and 15 Graduate Assistants
Data Collection Strategies Two institution-wide Assessment Days
Fall (August):
Incoming freshmen tested at orientation
Spring (February):
Students with 45-70 credits ; typically the sophomore year • Classes are cancelled on this day All students are required to participate, else course registration is blocked Students are randomly assigned using the last two digits of their JMU ID number to testing rooms where a particular series of instruments are administered
This results in large, representative samples of students
Student ID numbers do not change; therefore, we can assure that students complete the same instruments at time 2 as they did at time 1
JMU just completed its 23 rd Spring Assessment Day
The Spring Assessment Day is also used by many majors to collect data on their graduating seniors
Data Collection Scheme:
Repeated Measures
Spring 2005
Fall 2005 Spring 2006 Fall 2006 Spring 2007
Fall 2007
Spring 2008 COHORT 1 COHORT 2 COHORT 3
Students in each cohort are tested once as freshmen the twice sophomore on the same instrument – incoming and again in the second semester of year.
What is Learning Assessment?
Assessment is the systematic basis for making inferences about the learning and development of students.
Stages of the Assessment Process
Establishing Objectives Using Information
Continuous Cycle
Selecting/ Designing Instruments Analyzing/ Maintaining Information Collecting Information
Not Just Any Data Will Do…
• If we want faculty to pay attention to the results, we need credible evidence • To obtain credible evidence: We need a representative sample or a census We need good instrumentation The tasks demanded must represent the content domain Reliability and validity We need students who are motivated to perform
Prerequisites for Quality Assessment
• We must have three important components
Excellence in sampling of students
Either large, representative student samples or a census
Sound assessment instrumentation
Reliable, valid assessment methods Instruments that faculty find meaningful
Motivated students to participate in assessment activities
Can we tell if students are motivated?
Can we influence examinee motivation?
Fulfilling the Prerequisites
Excellence in sampling of students
Using our Assessment Day design, we can achieve this
Sound assessment instrumentation
Working collaboratively with departmental faculty our CARS liaisons can facilitate identification or development of sound tools
Motivated students to participate in assessment activities
Can we tell if students are motivated? YES!
Can we influence examinee motivation? YES!
Student Opinion Scale (SOS)
• This is a 10-item instrument: provides two scores Importance-perceived importance of the task(s) Effort-Examinee self-report of level of effort expended in task completion Both measures result in reliability estimates in mid .80s
SOS scores are NOT correlated with SAT scores!
This instrument, scoring instructions, and manual are freely available and downloadable from www.jmu.edu/assessment/
Student Opinion Scale (SOS)
• Responses are on a 5-point Likert scale Strongly Disagree to Strongly Agree • Sample Items Effort: I gave my best effort on these tests Importance: Doing well on these tests was important to me
Using the SOS Scores, we have
• Described and quantified the level of our students’ motivation • Shared this information with our faculty • Included SOS scores in our data analysis • Positively impacted student motivation levels • Improved our proctor selection and training
My colleagues will provide details on this work!
Making Assessment Matter to Students: Exploring Examinees’ Perceptions of Assessment
Peter Swerdzewski
6 = “Always”
The Need to Consider Perception
• Students are highly-tested by the time they reach college… “America's public schools administer more than 100 million standardized exams each year, including IQ, achievement, screening, and readiness tests.” (FairTest, 2007) • Students’ perceptions about testing do influence the validity of the inferences we can make about their test scores.
Questions of Interest
1.
2.
3.
What do students know and how do they feel about low-stakes institution-wide learning outcomes assessment?
How does this knowledge and affect contribute to test-taking motivation?
What do students suggest can be done to increase the validity of the inferences that college administrators can make from low-stakes assessments?
Methods
• Focused on JMU’s Assessment Day tests Data used to assess general education and student affairs programs All freshmen and sophomores must participate in Assessment Day • Used two approaches to collect qualitative data: Constructed responses from a Web-based survey Focus groups • A modified grounded-theory mixed-methods approach was used to evaluate responses
Results: Question #1
1.
What do students know and how do they feel about low-stakes institution-wide learning outcomes assessment?
Is Assessment Day Valuable?
[N/A] 3% No 23% Yes 45% Yes, but… 29%
“Yes”
• “Yes. Assessment Day gives the faculty of JMU the opportunity for insight into what their students are learning and thinking. It also provides a social blueprint in terms of how students think of themselves and others. If there were no assessments, JMU administrators would have nothing off of which to base the structuring of courses provided here or the way in which they are taught.” “Justin” • “It is always helpful to reflect on the past and to see how I can stay focused on my goals.” “Abigail”
“Yes, but…”
• “It probably is, so I put forth the effort, but to me personally I doubt it will affect me at all.” “Neil” • “I guess it's valuable in the long run. It doesn't seem very valuable because we do not hear much about it after we finish taking the tests. If the results and data were actually shown to us in an interesting way then it would seem more valuable.” “Rachel”
“No”
• “No. It has no bearing on anything.” “Kevin” • “No. We have been taking assesments all our lives. Enough is enough.” “Timothy” • “Because of our individualist society people feel the need to compare themselves to everyone else. JMU obviously wants to see how well their students are doing so that they may compare its overall achievement as a university to other universities. Because this is the accepted way to measure acheivement[,] Assesment day is valuable. But in [no] way does it enrich the soul, which is the true purpose of education.” “Ben”
In your opinion, is Assessment Day valuable? Why or why not?
Describes Student Growth Used for Program Change Motivation Problems Bad Improve Content Validity Valuable to JMU, but Not to Student Describes Student Snapshot Remove Redundant Items Take Too Long Lack of Use of Data / Hearing About Results Personal reflection Compares JMU to Other Schools GenEd Courses Not Yet Complete Too Opinion-Based Not Major-Specific Miscellaneous 0.00% 1.00% 4.00% 4.00% 4.00% 3.50% 8.00% 10.50% 9.50% 7.50% 7.00% 6.50% 5.50% 5.00% 10.00% 15.00% 12.50% 15.00% 19.50% 20.00% 25.00%
N
= 200 regular Assessment Day attendees
Results: Question #2
2.
How does this knowledge and affect contribute to test-taking motivation?
“Everyone is slacking except for me”
• Students claim others are not trying, but they themselves
are
putting forth effort • Problematic items: Lengthy items Essays, complex items, multimedia items Repetitive items • Students claim to put forth less effort if they do not see the benefit of the assessment • Students put forth effort on items that interest them
A student’s effort on a low-stakes test is a function of his or her:
1. perceived success on the test, 2. perceived level of effort the test will consume, 3. perceived importance of the test, and 4. affective and emotional reaction to the various test items.
(Wise & Demars, 2005)
Results: Question #3
3.
What do students suggest can be done to increase the validity of the inferences that college administrators can make from low-stakes assessments?
According to Our Focus Groups:
• Shorten test time • Provide more explanation about the uses of the assessments • Give scores to students • Ensure the tests assess student growth • Ensure the tests assess teacher quality • Further assess learning styles to improve the classroom experience
How could we have changed Assessment Day so that everyone would attend (and not choose to attend a make-up session)?
30.00% Shorten Testing Time Hold on More Convenient Day No Changes Needed Administer at Home Hold Later in Day Make Self-Paced Provide Incentive Make Less Repetitive Increase Penalty Provide Food Make More Important Make Make-Up Worse Get Rid of Testing Improve Proctors [N/A] Miscellaneous 0% 3.00% 5.00% 3.00% 2.50% 2.00% 1.00% 1.00% 5% 10.00% 9.00% 9.00% 9.00% 8.00% 12.50% 10% 11.50% 11.50% 15% 20% 25% 30% 35%
N
= 200 regular Assessment Day attendees
A Selection of Suggestions
• Shorten Testing Time (30.0%) “not make it as long of a session. a hour and a half max, but 2.5-3 hours makes kids want to skip and not care about it.” (Chelsea) • Administer at Home (9.00%) “MAKE IT ONLINE. It would be a thousand times easier. It's extremely annoying to have to wait for everyone to finish a test when you have been done for 30 minutes and there are still people who are at the beginning. Mandatory online tests to be taken on the assessment date would be so much better.” (Leah) • Provide Incentives (8.00%) “I believe Assessment Day attendance would greatly improve if some incentive were offered beyond "personal satisfaction" and the threat of a Saturday make-up.” (Emma) “Give a reward, ice cream/t-shirt/etc.” (Carrie) “Pay us” (Mitchell) “Give out free puppies” (Jackie)
Final Thoughts
• We do need to consider students’ perceptions of assessment • Qualitative assessment reveals interesting insight into students’ perceptions and suggestions regarding assessment • In general, students do believe assessment is valuable, but there are areas of improvement that could further increase the value of our assessments
The Impact of Control in Low Stakes Testing
Carol L. Barry
Low-Stakes Testing and Motivation
• Goal is to get scores on trait of interest, however, we may end up measuring noise This has implications on the functioning of instruments AND the decisions we can make based on scores • How are scores on instruments impacted when students are unmotivated?
Scores on instruments ultimately impact the usefulness of the decisions that are made about a program Without valid scores on instruments, useful decisions cannot be made about the program We must exercise caution when making decisions about program effectiveness Sundre (1999); Sundre & Kitsantas (2004); Wise & DeMars (2005)
How do we improve the quality of low stakes data?
• Some possibilities Training proctors Providing incentives for students Increasing the level of control within the testing condition • This research focused on increasing the level of control within the testing condition
Purpose of the Current Research
• Problem: Low-stakes testing environments are unavoidable and may impact the properties of the decisions made based on scores.
• Questions: How can useful and valid data be collected that is of no- or low-stakes to the participants? Will changes in the testing context impact student motivation and responses?
Would a more controlled testing environment improve the quality of low-stakes data?
Purpose of the Current Research
• Given these questions, the purpose of the current research was to examine how different levels of control impacted the student motivation and the quality of data gathered in low-stakes situations
Purpose of Current Research
• How was control defined?
Uncontrolled Unproctored assessment administered over the internet Students completed the survey at their own pace under their preferred conditions Somewhat controlled Proctored assessment on JMU’s Assessment Day • Proctor attentiveness and emphasis on importance may vary Room sizes may be quite large (potentially over 200 students) Students may have more of a feeling of anonymity Very controlled Proctored assessment Proctor was attentive and emphasized importance of assessment Room sizes were small (less than 25 students)
Methods
• Instrument used – College Self-Efficacy Inventory (CSEI) • Levels of control Uncontrolled environment Somewhat controlled environment Very controlled environment Solberg, O’Brien, Villareal, Kennel, & Davis (1993)
Methods
• Motivation was assessed indirectly • Assumption is that poor quality data is an indicator that students were unmotivated as they responded to items • Unmotivated students may be inattentive to items How much noise or error is measured, depending on the level of control?
Expect higher quality data when level of control increases
Data Analysis
• Data quality was judged by examining the item functioning (i.e., model fit) across the three different testing conditions.
• Model fit?
Social SE 1 2 3 4 5
Results
• Model-fit improved as the level of control increased.
• As level of control increased: Students appeared to attend more to items Item order effect Items were related to each other simply because they were next to each other on the instrument, despite the fact that they were written to assess distinct constructs In the very controlled condition, this effect was no longer present Students appeared to respond such that scores indicated true(r) levels of the construct The quality of data appeared to increase
35 30 25 20 15 10 5 0 Sample 1
Visual Representation of Results Larger numbers = more misfit
Sample 2 Sample 3 Sample 4 Sample 5 Uncontrolled Very Controlled Somewhat Controlled
Implications and Conclusion
• We gather data used for accountability and instrument development purposes in low-stakes environments • The quality of the data, and ultimately the inferences we make based upon that data, are dependent upon how controlled the testing context is Data collected in an uncontrolled testing condition might lead individuals to make unnecessary changes to an instrument or program, or to not make necessary changes • A high degree of control seems to increase student motivation even though the test is low-stakes
Implications and Conclusion
• Control of the testing environment can be an effective way to increase student motivation Smaller rooms Attentive proctors Emphasis on the importance of assessment • Sound assessment practice begins with data that we can trust • Before making inferences regarding program effectiveness we must be able to trust the data upon which those inferences are based
The Proctor Role In Motivating Students to do Assessment
Traditional Proctor Role
• Test Administrator Follow standardized procedures Distribute test materials Keep track of time for each test segment Monitor students for cheating Answer students questions Handle unexpected situations
Assessment Day Proctors at JMU
•
Why do we have proctors?
Testing ~3,000 students on one day in 30+ different classrooms across campus •
Who do we hire?
Retired teachers (external to JMU) University administrative staff JMU Graduate Students JMU Undergraduate Student Leaders •
Why are we talking about proctors?
Effort Scores by Test Room Spring 2007
effort 11 10 9 8 7 6 5 25 24 23 22 21 20 19 18 17 16 15 14 13 12
Low-Stakes Test Proctor Roles Administrator Officer
Low-Stakes Testing
Coach
The proctor as a “test officer”
• • •
Why?
Students’ behavior less predictable in low stakes contexts Students arrive to the testing situation with different attitudes and perspectives about the assessment
How?
Monitoring students behavior and identifying and responding to inappropriate behaviors: Rapid guessing, Sleeping, Talking, Cheating Disciplinary action steps: 1.
Ask students to correct the behavior 2.
Provide a warning about being asked to leave 3.
Ask the continually disruptive student to leave
The proctor as “test coach”
• Communicating importance of the tests • Showing appreciation of students • Acknowledging good efforts when seen • Encouraging good efforts when lacking • Helping students stay motivated • Showing enthusiasm and creating a positive atmosphere for the testing • Treating students with
utmost
respect
Implementing the Proctor Model
1.
2.
3.
Proctor job description updated Proctor Training Workshop Standardization of Proctor tasks
Proctor Training Workshop
•
Format
: 1 hr right before testing begins Provided breakfast, meet other proctors, get materials •
Content
: The reason for the assessment testing The issue of examinee motivation The impact proctors can have on motivation The expectations for proctor’s roles Example scenarios and how to respond
Standardization of Proctor Tasks
• Proctor Scripts: Provide standardized message to students that reflects the proctor roles • Proctor Checklist: Detailed instructions on how the testing session should proceed Standardizes the test administration procedures to maximize score comparability
Evaluation of the Model
• Proctor model successfully implemented: Students reported that proctors implemented model Proctors self-reported that they implemented the model • Proctor model positively impacted motivation: Students reported putting forth more effort this Assessment Day than the previous one. Less variability in Effort across test rooms
Effort by Room spring 2008
25.00
24.00
23.00
22.00
21.00
20.00
19.00
18.00
17.00
16.00
15.00
14.00
13.00
12.00
11.00
10.00
9.00
8.00
7.00
6.00
5.00
Fo cu s G ro up IS A T 2 H 36 H S 2 H 03 H 7 S 1 A 05 sh 6 by S L ho ab w ke S r ho 10 w 5 ke r 10 H H 6 S 1 21 IS 0 A T 3 H 36 H S 1 30 H H 2 S 1 20 IS 4 A T 1 59 ZS H 2 06 IS A T S 1 ho 36 w ke r H G H 5 S 1 30 H H 1 S 2 30 H H 1 S 2 H 20 H 4 S 2 20 H H 9 S 0 21 H H 0 S 0 20 H H 8 S 1 20 IS 9 A T S 1 ho 48 w ke r G 6
Room Name
25 24 23 22 21 16 15 14 13 20 19 18 17 12 11 10 9 8 7 6 5 showG6hhs1208hhs0210hhs2301 isa t35 0 hhs1203 isa t14 8 hhs2203HhsLab* fest06 isa t15 9 Ch an dle * hhs0208hhs1204hhs0209 fest07 hhs1302 dw inL * showG7hhs2208AshbyLa*showG5hhs1209
Conclusions
• Conducting program assessment often requires collecting data from students under conditions in which they are not motivated to perform to the best of their ability • When assessments matter to students, they try harder on the assessments which increases the validity of the assessment scores
How do to make assessments matter to students
•
Assessment Policy
Provide feedback to students about their performance Make testing reasonable (timing/length) Provide information about why the tests are important •
Controlling Administration Conditions
Avoid giving low-stakes assessments online Small administration rooms •
Proctors who care
Encouragement & Appreciation