Making Assessments Matter to Students

Download Report

Transcript Making Assessments Matter to Students

Making Assessments Matter to Students

Andrew Jones | Donna Sundre | Peter Swerdzewski | Carol Barry | Abby Lau

Low-Stakes Assessment, Student Motivation, and the Validity of Scores: A General Introduction

Making Assessments Matter to Students

• Overview  Provide a general introduction to the concepts of low stakes testing, student motivation, and the validity of scores  Provide an overview for the remainder of the session

The Need for Assessment

• Accountability has led to an increased demand for universities to engage in assessment  Spellings Commission  Accreditation Agencies  State Councils

The Need for Assessment

• Universities and stakeholders are using these assessments to make decisions about the effectiveness of programs • Should we be concerned with how this data is collected and in turn, how these scores are used?

Motivation and Student Scores

• Often, it is assumed that students put forth effort in an assessment situation  This is typically the case when stakes are attached to the testing situation  Classroom tests, quizzes   SAT, GRE If student does poorly, there will be negative consequences  Not admitted to colleges, graduate school  Conversely, there are rewards for doing well

Motivation and Assessment

• Students can also be tested in situations where they do not perceive personal consequences • This results in a “low-stakes” testing situation

Motivation and Assessment

• What is a low-stakes testing situation?

 Scores on the assessment do not impact the student  No gain as a result of doing well  No punishment for doing poorly  Students may not even receive feedback as to how they perform on the assessment

Motivation and Assessment

• Many assessments in higher ed contexts can be categorized as “low-stakes” assessment  Especially the case for assessments administered for program evaluation or accountability

Motivation and Assessment

• How do these low-stakes settings impact scores?

 Students may not put forth as much effort in comparison to a “high-stakes” setting  Unmotivated students will not score as high on achievement tests (Wise & DeMars, 2005)

Why Does Reduced Effort Matter

• If students are not putting forth effort, then they are not truly representing what they know or who they are  Students may have increased greatly in a specific competency, but their scores would not reflect this due to a lack of effort on the items  Students attitudes may have changed, but their scores would not reflect this because they have not attended properly to the instrument

Why Does Reduced Effort Matter?

• Impacts the validity of scores from an assessment • What do we mean by validity?

 Conceptually  May not be measuring what we say we are measuring  Essentially, scores are not as useful  Scores are not providing stakeholders with the information that they intended to gather!

Motivation and Assessment

• Why not just make all higher ed assessments high stakes tests?

 Test Anxiety  Legal Issues  Creates a much more political environment  Typically, higher ed institutions are not interested in assessment at the individual level  Structure of universities are not geared toward high stakes assessment

Motivation and Assessment

• Given that low-stakes assessment will continue to be used in higher ed, we need to be concerned about this issue of motivation • Critical to assess motivation and find ways to improve motivation, as this impacts the inferences we make about programs

Overview of Presentations Today

• Providing an overview for JMU’s model of assessment (Donna Sundre)  Providing one method to measure motivation • What do students think of assessment and how does this impact motivation? (Peter Swerdzewski) • How does control over the assessment environment impact student scores? (Carol Barry) • What impact can proctors have on student motivation in low-stakes settings? (Abby Lau) • General recommendations to increase motivation and the usefulness of test scores in low-stakes settings • Questions and discussion

Assessment at JMU

Donna Sundre

The Assessment Culture at JMU

JMU requires students to take a series of student outcomes assessments prior to their graduation. These assessments are held at four stages of students’ academic careers:

as entering first-year students

at the mid-undergraduate point when they have earned 45 to 70 credit hours, typically the sophomore year

 

as graduating seniors in their academic major(s) Students will also complete an alumni survey after graduation

-JMU Undergraduate Catalog

The Assessment Culture at JMU

• CARS supports all general education assessment • CARS administers all JMU alumni surveys • CARS supports assessment for every academic program • CARS supports assessment for the Division of Student Affairs • All programs must collect and report on assessment data annually • Academic Program Reviews are scheduled every 6 years for every major degree program  graduate and undergraduate

The Assessment Culture at JMU

• Long-standing and pervasive expectation at JMU that assessment findings will guide decision making.

 Annual reports, Assessment Progress Templates, program change proposals, and all academic program review self-study documents all require substantial descriptions of how Assessment guides decision-making • The Center for Assessment and Research Studies (CARS) is the largest higher education assessment center in the US  with 10 Faculty, 3 Support Staff, and 15 Graduate Assistants

Data Collection Strategies  Two institution-wide Assessment Days  

Fall (August):

Incoming freshmen tested at orientation

Spring (February):

Students with 45-70 credits ; typically the sophomore year    • Classes are cancelled on this day All students are required to participate, else course registration is blocked Students are randomly assigned using the last two digits of their JMU ID number to testing rooms where a particular series of instruments are administered 

This results in large, representative samples of students

Student ID numbers do not change; therefore, we can assure that students complete the same instruments at time 2 as they did at time 1

JMU just completed its 23 rd Spring Assessment Day

The Spring Assessment Day is also used by many majors to collect data on their graduating seniors

Data Collection Scheme:

Repeated Measures

Spring 2005

Fall 2005 Spring 2006 Fall 2006 Spring 2007

Fall 2007

Spring 2008 COHORT 1 COHORT 2 COHORT 3

Students in each cohort are tested once as freshmen the twice sophomore on the same instrument – incoming and again in the second semester of year.

What is Learning Assessment?

Assessment is the systematic basis for making inferences about the learning and development of students.

Stages of the Assessment Process

Establishing Objectives Using Information

Continuous Cycle

Selecting/ Designing Instruments Analyzing/ Maintaining Information Collecting Information

Not Just Any Data Will Do…

• If we want faculty to pay attention to the results, we need credible evidence • To obtain credible evidence:  We need a representative sample or a census  We need good instrumentation  The tasks demanded must represent the content domain  Reliability and validity  We need students who are motivated to perform

Prerequisites for Quality Assessment

• We must have three important components 

Excellence in sampling of students

 Either large, representative student samples or a census 

Sound assessment instrumentation

 Reliable, valid assessment methods  Instruments that faculty find meaningful 

Motivated students to participate in assessment activities

 Can we tell if students are motivated?

 Can we influence examinee motivation?

Fulfilling the Prerequisites

Excellence in sampling of students

 Using our Assessment Day design, we can achieve this 

Sound assessment instrumentation

 Working collaboratively with departmental faculty our CARS liaisons can facilitate identification or development of sound tools 

Motivated students to participate in assessment activities

 Can we tell if students are motivated? YES!

 Can we influence examinee motivation? YES!

Student Opinion Scale (SOS)

• This is a 10-item instrument: provides two scores  Importance-perceived importance of the task(s)  Effort-Examinee self-report of level of effort expended in task completion  Both measures result in reliability estimates in mid .80s

 SOS scores are NOT correlated with SAT scores!

 This instrument, scoring instructions, and manual are freely available and downloadable from www.jmu.edu/assessment/

Student Opinion Scale (SOS)

• Responses are on a 5-point Likert scale  Strongly Disagree to Strongly Agree • Sample Items  Effort: I gave my best effort on these tests  Importance: Doing well on these tests was important to me

Using the SOS Scores, we have

• Described and quantified the level of our students’ motivation • Shared this information with our faculty • Included SOS scores in our data analysis • Positively impacted student motivation levels • Improved our proctor selection and training

My colleagues will provide details on this work!

Making Assessment Matter to Students: Exploring Examinees’ Perceptions of Assessment

Peter Swerdzewski

6 = “Always”

The Need to Consider Perception

• Students are highly-tested by the time they reach college… “America's public schools administer more than 100 million standardized exams each year, including IQ, achievement, screening, and readiness tests.” (FairTest, 2007) • Students’ perceptions about testing do influence the validity of the inferences we can make about their test scores.

Questions of Interest

1.

2.

3.

What do students know and how do they feel about low-stakes institution-wide learning outcomes assessment?

How does this knowledge and affect contribute to test-taking motivation?

What do students suggest can be done to increase the validity of the inferences that college administrators can make from low-stakes assessments?

Methods

• Focused on JMU’s Assessment Day tests   Data used to assess general education and student affairs programs All freshmen and sophomores must participate in Assessment Day • Used two approaches to collect qualitative data:   Constructed responses from a Web-based survey Focus groups • A modified grounded-theory mixed-methods approach was used to evaluate responses

Results: Question #1

1.

What do students know and how do they feel about low-stakes institution-wide learning outcomes assessment?

Is Assessment Day Valuable?

[N/A] 3% No 23% Yes 45% Yes, but… 29%

“Yes”

• “Yes. Assessment Day gives the faculty of JMU the opportunity for insight into what their students are learning and thinking. It also provides a social blueprint in terms of how students think of themselves and others. If there were no assessments, JMU administrators would have nothing off of which to base the structuring of courses provided here or the way in which they are taught.”  “Justin” • “It is always helpful to reflect on the past and to see how I can stay focused on my goals.”  “Abigail”

“Yes, but…”

• “It probably is, so I put forth the effort, but to me personally I doubt it will affect me at all.”  “Neil” • “I guess it's valuable in the long run. It doesn't seem very valuable because we do not hear much about it after we finish taking the tests. If the results and data were actually shown to us in an interesting way then it would seem more valuable.”  “Rachel”

“No”

• “No. It has no bearing on anything.”  “Kevin” • “No. We have been taking assesments all our lives. Enough is enough.”  “Timothy” • “Because of our individualist society people feel the need to compare themselves to everyone else. JMU obviously wants to see how well their students are doing so that they may compare its overall achievement as a university to other universities. Because this is the accepted way to measure acheivement[,] Assesment day is valuable. But in [no] way does it enrich the soul, which is the true purpose of education.”  “Ben”

In your opinion, is Assessment Day valuable? Why or why not?

Describes Student Growth Used for Program Change Motivation Problems Bad Improve Content Validity Valuable to JMU, but Not to Student Describes Student Snapshot Remove Redundant Items Take Too Long Lack of Use of Data / Hearing About Results Personal reflection Compares JMU to Other Schools GenEd Courses Not Yet Complete Too Opinion-Based Not Major-Specific Miscellaneous 0.00% 1.00% 4.00% 4.00% 4.00% 3.50% 8.00% 10.50% 9.50% 7.50% 7.00% 6.50% 5.50% 5.00% 10.00% 15.00% 12.50% 15.00% 19.50% 20.00% 25.00%

N

= 200 regular Assessment Day attendees

Results: Question #2

2.

How does this knowledge and affect contribute to test-taking motivation?

“Everyone is slacking except for me”

• Students claim others are not trying, but they themselves

are

putting forth effort • Problematic items:    Lengthy items Essays, complex items, multimedia items Repetitive items • Students claim to put forth less effort if they do not see the benefit of the assessment • Students put forth effort on items that interest them

A student’s effort on a low-stakes test is a function of his or her:

1. perceived success on the test, 2. perceived level of effort the test will consume, 3. perceived importance of the test, and 4. affective and emotional reaction to the various test items.

(Wise & Demars, 2005)

Results: Question #3

3.

What do students suggest can be done to increase the validity of the inferences that college administrators can make from low-stakes assessments?

According to Our Focus Groups:

• Shorten test time • Provide more explanation about the uses of the assessments • Give scores to students • Ensure the tests assess student growth • Ensure the tests assess teacher quality • Further assess learning styles to improve the classroom experience

How could we have changed Assessment Day so that everyone would attend (and not choose to attend a make-up session)?

30.00% Shorten Testing Time Hold on More Convenient Day No Changes Needed Administer at Home Hold Later in Day Make Self-Paced Provide Incentive Make Less Repetitive Increase Penalty Provide Food Make More Important Make Make-Up Worse Get Rid of Testing Improve Proctors [N/A] Miscellaneous 0% 3.00% 5.00% 3.00% 2.50% 2.00% 1.00% 1.00% 5% 10.00% 9.00% 9.00% 9.00% 8.00% 12.50% 10% 11.50% 11.50% 15% 20% 25% 30% 35%

N

= 200 regular Assessment Day attendees

A Selection of Suggestions

• Shorten Testing Time (30.0%)  “not make it as long of a session. a hour and a half max, but 2.5-3 hours makes kids want to skip and not care about it.” (Chelsea) • Administer at Home (9.00%)  “MAKE IT ONLINE. It would be a thousand times easier. It's extremely annoying to have to wait for everyone to finish a test when you have been done for 30 minutes and there are still people who are at the beginning. Mandatory online tests to be taken on the assessment date would be so much better.” (Leah) • Provide Incentives (8.00%)  “I believe Assessment Day attendance would greatly improve if some incentive were offered beyond "personal satisfaction" and the threat of a Saturday make-up.” (Emma)    “Give a reward, ice cream/t-shirt/etc.” (Carrie) “Pay us” (Mitchell) “Give out free puppies” (Jackie)

Final Thoughts

• We do need to consider students’ perceptions of assessment • Qualitative assessment reveals interesting insight into students’ perceptions and suggestions regarding assessment • In general, students do believe assessment is valuable, but there are areas of improvement that could further increase the value of our assessments

The Impact of Control in Low Stakes Testing

Carol L. Barry

Low-Stakes Testing and Motivation

• Goal is to get scores on trait of interest, however, we may end up measuring noise  This has implications on the functioning of instruments AND the decisions we can make based on scores • How are scores on instruments impacted when students are unmotivated?

 Scores on instruments ultimately impact the usefulness of the decisions that are made about a program  Without valid scores on instruments, useful decisions cannot be made about the program  We must exercise caution when making decisions about program effectiveness Sundre (1999); Sundre & Kitsantas (2004); Wise & DeMars (2005)

How do we improve the quality of low stakes data?

• Some possibilities  Training proctors  Providing incentives for students  Increasing the level of control within the testing condition • This research focused on increasing the level of control within the testing condition

Purpose of the Current Research

• Problem: Low-stakes testing environments are unavoidable and may impact the properties of the decisions made based on scores.

• Questions:  How can useful and valid data be collected that is of no- or low-stakes to the participants?  Will changes in the testing context impact student motivation and responses?

 Would a more controlled testing environment improve the quality of low-stakes data?

Purpose of the Current Research

• Given these questions, the purpose of the current research was to examine how different levels of control impacted the student motivation and the quality of data gathered in low-stakes situations

Purpose of Current Research

• How was control defined?

 Uncontrolled   Unproctored assessment administered over the internet Students completed the survey at their own pace under their preferred conditions  Somewhat controlled  Proctored assessment on JMU’s Assessment Day • Proctor attentiveness and emphasis on importance may vary  Room sizes may be quite large (potentially over 200 students)  Students may have more of a feeling of anonymity  Very controlled    Proctored assessment Proctor was attentive and emphasized importance of assessment Room sizes were small (less than 25 students)

Methods

• Instrument used – College Self-Efficacy Inventory (CSEI) • Levels of control  Uncontrolled environment  Somewhat controlled environment  Very controlled environment Solberg, O’Brien, Villareal, Kennel, & Davis (1993)

Methods

• Motivation was assessed indirectly • Assumption is that poor quality data is an indicator that students were unmotivated as they responded to items • Unmotivated students may be inattentive to items  How much noise or error is measured, depending on the level of control?

 Expect higher quality data when level of control increases

Data Analysis

• Data quality was judged by examining the item functioning (i.e., model fit) across the three different testing conditions.

• Model fit?

Social SE 1 2 3 4 5

Results

• Model-fit improved as the level of control increased.

• As level of control increased:  Students appeared to attend more to items  Item order effect  Items were related to each other simply because they were next to each other on the instrument, despite the fact that they were written to assess distinct constructs  In the very controlled condition, this effect was no longer present  Students appeared to respond such that scores indicated true(r) levels of the construct  The quality of data appeared to increase

35 30 25 20 15 10 5 0 Sample 1

Visual Representation of Results Larger numbers = more misfit

Sample 2 Sample 3 Sample 4 Sample 5 Uncontrolled Very Controlled Somewhat Controlled

Implications and Conclusion

• We gather data used for accountability and instrument development purposes in low-stakes environments • The quality of the data, and ultimately the inferences we make based upon that data, are dependent upon how controlled the testing context is  Data collected in an uncontrolled testing condition might lead individuals to make unnecessary changes to an instrument or program, or to not make necessary changes • A high degree of control seems to increase student motivation even though the test is low-stakes

Implications and Conclusion

• Control of the testing environment can be an effective way to increase student motivation  Smaller rooms  Attentive proctors  Emphasis on the importance of assessment • Sound assessment practice begins with data that we can trust • Before making inferences regarding program effectiveness we must be able to trust the data upon which those inferences are based

The Proctor Role In Motivating Students to do Assessment

Traditional Proctor Role

• Test Administrator  Follow standardized procedures  Distribute test materials  Keep track of time for each test segment  Monitor students for cheating  Answer students questions  Handle unexpected situations

Assessment Day Proctors at JMU

Why do we have proctors?

 Testing ~3,000 students on one day in 30+ different classrooms across campus •

Who do we hire?

 Retired teachers (external to JMU)  University administrative staff  JMU Graduate Students  JMU Undergraduate Student Leaders •

Why are we talking about proctors?

Effort Scores by Test Room Spring 2007

effort 11 10 9 8 7 6 5 25 24 23 22 21 20 19 18 17 16 15 14 13 12

Low-Stakes Test Proctor Roles Administrator Officer

Low-Stakes Testing

Coach

The proctor as a “test officer”

• • •

Why?

Students’ behavior less predictable in low stakes contexts  Students arrive to the testing situation with different attitudes and perspectives about the assessment

How?

Monitoring students behavior and identifying and responding to inappropriate behaviors: Rapid guessing, Sleeping, Talking, Cheating Disciplinary action steps: 1.

Ask students to correct the behavior 2.

Provide a warning about being asked to leave 3.

Ask the continually disruptive student to leave

The proctor as “test coach”

• Communicating importance of the tests • Showing appreciation of students • Acknowledging good efforts when seen • Encouraging good efforts when lacking • Helping students stay motivated • Showing enthusiasm and creating a positive atmosphere for the testing • Treating students with

utmost

respect

Implementing the Proctor Model

1.

2.

3.

Proctor job description updated Proctor Training Workshop Standardization of Proctor tasks

Proctor Training Workshop

Format

:  1 hr right before testing begins  Provided breakfast, meet other proctors, get materials •

Content

:  The reason for the assessment testing  The issue of examinee motivation  The impact proctors can have on motivation  The expectations for proctor’s roles  Example scenarios and how to respond

Standardization of Proctor Tasks

• Proctor Scripts:  Provide standardized message to students that reflects the proctor roles • Proctor Checklist:  Detailed instructions on how the testing session should proceed  Standardizes the test administration procedures to maximize score comparability

Evaluation of the Model

• Proctor model successfully implemented:  Students reported that proctors implemented model  Proctors self-reported that they implemented the model • Proctor model positively impacted motivation:  Students reported putting forth more effort this Assessment Day than the previous one.  Less variability in Effort across test rooms

Effort by Room spring 2008

25.00

24.00

23.00

22.00

21.00

20.00

19.00

18.00

17.00

16.00

15.00

14.00

13.00

12.00

11.00

10.00

9.00

8.00

7.00

6.00

5.00

Fo cu s G ro up IS A T 2 H 36 H S 2 H 03 H 7 S 1 A 05 sh 6 by S L ho ab w ke S r ho 10 w 5 ke r 10 H H 6 S 1 21 IS 0 A T 3 H 36 H S 1 30 H H 2 S 1 20 IS 4 A T 1 59 ZS H 2 06 IS A T S 1 ho 36 w ke r H G H 5 S 1 30 H H 1 S 2 30 H H 1 S 2 H 20 H 4 S 2 20 H H 9 S 0 21 H H 0 S 0 20 H H 8 S 1 20 IS 9 A T S 1 ho 48 w ke r G 6

Room Name

25 24 23 22 21 16 15 14 13 20 19 18 17 12 11 10 9 8 7 6 5 showG6hhs1208hhs0210hhs2301 isa t35 0 hhs1203 isa t14 8 hhs2203HhsLab* fest06 isa t15 9 Ch an dle * hhs0208hhs1204hhs0209 fest07 hhs1302 dw inL * showG7hhs2208AshbyLa*showG5hhs1209

Conclusions

• Conducting program assessment often requires collecting data from students under conditions in which they are not motivated to perform to the best of their ability • When assessments matter to students, they try harder on the assessments which increases the validity of the assessment scores

How do to make assessments matter to students

Assessment Policy

 Provide feedback to students about their performance  Make testing reasonable (timing/length)  Provide information about why the tests are important •

Controlling Administration Conditions

 Avoid giving low-stakes assessments online  Small administration rooms •

Proctors who care

 Encouragement & Appreciation

Questions?