Transcript Meital2004
חידושים במדידה חינוכית Introduction Assessment Design Item Development מיכל בלר מנכ"לית ראמ"ה [email protected] Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges יום עיון לכבוד יצחק פרידמן 23.5.2006 ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 1 הקדמה Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning & Conclusions Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 2 תחום המדידה וההערכה בחינוך ,בארץ ובעולם ,נדרש לעמוד בשנים האחרונות ברמת דרישות וציפיות הולכת וגדלה. מדידה והערכה בחינוך ,אינה מטרה כשלעצמה ואינה פתרון קסם לקשיים ,לבעיות ולאתגרים של העשייה החינוכית. תהליכי המדידה וההערכה עצמם יידרשו לעמוד בסטנדרטים ברורים של איכות ושל עלות-תועלת ובכללם :תקפות, מהימנות ,זמינות ,תרומה ,ויעילות כלכלית. צריך לעשות יותר ,יותר מהר ,יותר טוב ובפחות תקציב. איך עושים זאת? הרצאה זו תתמקד בסקירה של חלק מהחידושים בתחום בניית המבחנים ,העברתם ,ציינונם ,והדיווח לגביהם. חידושים אלה מחייבים שילוב מושכל בין היכרות טובה עם תחום הדעת הנמדד ,תורת המבחנים ,מדעי הקוגניציה ,מדעי המחשב ,והתפתחויות עכשוויות בשדה הטכנולוגיה. Thanks My Colleagues at ETS: Henry Braun Bob Mislevy Introduction Assessment Design Item Development Randy Bennett Delivery Isaac Bejar Scoring Pat Kyllonen Dylan Wiliam Et al. Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 3 Functions of assessment For evaluating institutions For describing individuals For supporting learning Monitoring learning • Whether learning is taking place Diagnosing (informing) learning • What is not being learnt Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges Forming learning • What to do about it ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 4 New Capabilities – New Challenges Introduction What sort of tests, in which setting, contribute to learning? New modes of learning – new modes of assessment? Assessment Design Item Development Delivery Scoring Reporting Scalability and sustainability New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 5 Roles of Technology in Assessment Introduction Improving efficiency and Quality of existing practices Assessment Design Item Development Delivery • Test Creation Assistant • Computer delivery of tests (CAT) • Automated scoring of complex items (e-Rater) • Simulated-based assessment tasks • Intelligent tutoring systems • Virtual reality systems Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה Expanding the domain of testing רשות ארצית למדידה והערכה בחינוך © 2006 #26 Improving Efficiency and Quality Assessment Design Introduction Item Development Assessment Design Item Development Assessment Delivery Automated Scoring Score Reporting Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 7 Introduction Assessment Design Item Development Delivery Scoring Assessment Design Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 8 The Need for Assessment Design • More ambitious goals and more constraints! • More stakes are involved • Beyond psychometrics – cognitive sciences, content expertise • Team work Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct A need for a disciplined approach to assessment design Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 9 Assessment Design Introduction Assessment Design Item Development Delivery From Art To Science Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 10 What is Needed? Establish a science of assessment design and build the infrastructure to support it Introduction Assessment Design Item Development Strengthen the linkages between assessment design and instructional design in technology-rich environments Delivery Scoring Reporting New Construct Assessment for Learning Develop measurement models to better assess learning in naturalistic settings over extended time periods Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 11 Evidence-Centered Assessment Design (ECD) Introduction [R. Mislevy, L. Steinberg, R. Almond & Associates] Assessment Design Item Development ECD uses general principles of evidentiary reasoning to create a framework that supports a disciplined approach to assessment design Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה http://www.education.umd.edu/EDMS/mislevy/papers/ רשות ארצית למדידה 13 והערכה בחינוך © 2006 # 13 ECD is an attempt to obtain clear answers to three basic assessment questions: Introduction Assessment Design What do you want to say about persons taking the assessment? What observations (behaviors or work products) would provide the best evidence for what you want to say? Item Development Delivery Scoring Reporting New Construct Assessment for Learning What kinds of tasks allow you to make the necessary observations or collect pertinent evidence? Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 14 To apply the ECD framework in the design of assessment tasks, a subject matter expert (e.g., a teacher or test developer) begins by creating three models: Introduction Assessment Design Item Development student model, defining the range and relationships of the knowledge and skills to be measured; evidence model, specifying the performance data associated with this knowledge and these skills, for varying levels of mastery; and task model, spelling out the features of task performance situations that will elicit relevant evidence. Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 15 The Target Models Introduction Stat model Evidence Assessment Design rules 1. xxxxxxxx 2. xxxxxxxx 3. xxxxxxxx 4. xxxxxxxx Item Development 5. xxxxxxxx 6. xxxxxxxx Student Model Evidence Model(s) 7. xxxxxxxx 8. xxxxxxxx Delivery Task Model(s) Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 18 Introduction Assessment Design Item Development Delivery Scoring Item Development Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 22 Item Development Computer-assisted item creation: Introduction Fully automated real-time item generation (with specified psychometric properties) Intelligent text search for stimulus Assessment Design Item Development Delivery Scoring Materials Multimedia content Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 23 Item Models: Definitions An item model is a set of specifications that defines a class of items with respect to a set of properties. A class of items may share: similar content (LaDuca, Staples, Templeton, & Holzman, 1986) Introduction Assessment Design Item Development Delivery Scoring Reporting similar psychometric properties (Bejar, 2002) similar format. Once an item model is defined, it can be programmed into software that automatically generates instances (Singley & Bennett, 2002). New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה 24 בחינוך © 2006 # 24 Item Model Development Loop Introduction Design Model Assessment Design Item Development Delivery Scoring Revise Model Generate Instances Reporting New Construct Sample Instances Assessment for Learning Conclusions & Challenges Finalize Model Analyze Instances Pretest Instances ראמ"ה רשות ארצית למדידה והערכה 25 בחינוך © 2006 # 25 Roles of Technology in Assessment Introduction Improving efficiency of existing practices Assessment Design Item Development Delivery • Test Creation Assistant • Computer delivery of tests (CAT) • Automated scoring of complex items (e-Rater) • Simulated-based assessment tasks • Intelligent tutoring systems • Virtual reality systems Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה Expanding the domain of testing רשות ארצית למדידה והערכה בחינוך © 2006 #226 Expanding the Domain Framework content areas that can be better measured with technology Apply: Simulations Web searches Video clips Virtual reality … Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 27 Simulation Based Assessment Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 28 The ICT construct Introduction Assessment Design Define ICT Literacy Item Development Access Delivery Scoring Manage Cognitive Integrate Ethical Evaluate Technical Reporting New Construct Assessment for Learning Conclusions & Challenges Create ראמ"ה Communicate רשות ארצית למדידה והערכה בחינוך © 2006 # 30 ECD Phase I: Defining the Claims Domain analysis (What are we trying to measure?) Rationales (Why do we want to test it?) Claims: What do we want to assert about test takers at different score levels? Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 31 Example of a claim for the Higher Education ICT Literacy Assessment Introduction Assessment Design “The student uses ICT tools to collect and/or retrieve information effectively and efficiently, identify likely information sources, and extract the information from the sources.” Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 32 ECD Phase II: Defining the Evidence What evidence would give us the confidence to make our claims? How much evidence is needed? How many different kinds of evidence? Introduction Assessment Design Item Development Delivery Scoring What would be authentic evidence in this field, for the claims we want to make? Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 33 ECD Phase III: Designing the Tasks and Blueprint What tasks could be designed to gather the needed evidence? How to convert the evidence into scores? What psychometric model? What operational constraints must be considered as tasks are designed? What will be the overall design of the assessment (“blueprint”)? Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 34 Introduction Assessment Design Item Development Delivery Scoring Delivery Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 35 Modes of Delivery Online Adaptive Introduction Linear Intelligent Tutoring Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 36 Adaptive Computer Delivery of Tests Computerized (Adaptive) Testing Introduction GRE Assessment Design GMAT Item Development CLEP Delivery Amiram (Israel) Scoring Main advantages: Main disadvantages: Administration and scoring Cost (due to security issues) Accuracy Multimedia integration Time saving Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 #237 Adaptive Testing 1st Generation: Item-level adaptation Introduction Assessment Design High-stakes testing applications have proven somewhat disappointing: • Time savings modest Item Development Delivery Scoring Reporting • CBT delivery costs substantial • Item library costs substantial • Psychometric properties of scores for certain groups problematic New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה 38 בחינוך © 2006 # 38 Adaptive Testing 2nd Generation: Stage-level adaptation Introduction Assessment Design Using “testlets” as building blocks Item Development Delivery Better control of item context and item exposure More economical in use of items Particularly useful for low-stakes or diagnostic testing Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה 39 בחינוך © 2006 # 39 Linear Administration Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה http://nces.ed.gov/nationsreportcard/pdf/studies/2005457_1.pdf רשות ארצית למדידה והערכה בחינוך © 2006 # 40 Intelligent Tutoring - ALEKS http://www.highed.aleks.com/ Introduction Assessment Design Item Development Delivery Scoring • Adaptive assessment • Directs the students an appropriate class • Developed by a multimillion NSF grant • The theory behind ALEKS is a specialized field of mathematical cognitive science called "Knowledge Spaces." Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 41 1 Online Test Delivery Lifecycle Introduction Online Test Delivery Lifecycle Test Design Item Authoring Test Packaging Scheduling , Eligibility, Registration Assessment Design Item Development Test Delivery Scoring Reporting & Analysis Delivery Scoring • • • • Item Models Test Models Test Layout Test Behavior • Item Creation • Item Banking & Management • Version Control • Test Creation & Structure • Content Creation (i.e. Online Help, Instructions, etc.) • • • • Scheduling Eligibility Registration Payment Processing • Test Delivery • Item Display • Results Capture • Automated Scoring • Interactive Scoring • Score Reports • Results Analysis Reporting New Construct Assessment for Learning Conclusions & Challenges TCS TestPrep iSER iBT eRater/OSN Genasys ראמ"ה iBT is ETSs new Internet-based test delivery platform רשות ארצית למדידה והערכה בחינוך © 2006 # 42 Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 43 Introduction Assessment Design Item Development Delivery Scoring Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 44 OSN - Online Constructed Response Scoring Process Introduction Assessment Design Item Development Test Delivery Scoring Scan MC responses Prepare samples/benchmarks Train scorers Separate/scan essays Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה Score Essays Combine MC and essay scores. Create Score Report Receive Score Report רשות ארצית למדידה והערכה בחינוך © 2006 # 45 Automated Scoring General mathematical expressions Introduction Graphics Short answer responses Assessment Design Item Development Delivery Essays Speech Professional practice (Architecture, medicine, information technology) Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה 46 בחינוך © 2006 # 46 Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 47 Automated Essay Scoring Capabilities Introduction e-rater® Assessment Design Item Development CritiqueSM Writing Analysis Tools Grammar, Usage, and Mechanics Score c-rater™ Delivery Scoring Reporting New Construct Assessment for Learning Faster Better More Cost Effective Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 #148 CriterionSM Online Writing Evaluation Service Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 #149 Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 50 c-rater Measure performance: content and reasoning Provide instant feedback Identify missing or flawed elements of conceptual understanding Reduce costs Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 54 Introduction Assessment Design Item Development Delivery Scoring Reporting Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 55 The Dual Challenge Introduction Assessment Design Item Development Delivery High Quality Assessments Meaningful and Interpretable Results Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 56 Data Challenges Too much data Introduction Not enough useful information Decision support Assessment Design Item Development Delivery Turning data into action …. Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 57 NAEP Data Tools – Select Criteria Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 58 NAEP Data Tools – View Results Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 59 Score Reports Richer, diagnostic information Online, interactive delivery Formats and content in plain language targeted to specific audiences—teacher, parent, student Clear connections between results, standards, tasks, and next steps Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 60 Introduction Assessment Design Item Development Delivery Challenges Scoring Reporting New Constructs Assessment for Learning New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 61 Introduction Assessment Design Item Development Delivery Scoring New Constructs Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 62 “New constructs” Development of measures covering a broad range of cognitive and non-cognitive characteristics, such as: learning ability Introduction Assessment Design Item Development critical reasoning in context Delivery creativity Scoring practical intelligence Reporting motivation New Construct persistence teamwork Assessment for Learning Conclusions & Challenges self-concept confidence communication skills ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 63 Listening Communication Skills Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 64 Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 65 MOR Constructs measured: Introduction Interpersonal communication Ability to cope with pressure Assessment Design Item Development Delivery Initiative and leadership Conscientiousness and ethical attitude Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 66 Introduction Assessment Design Item Development Delivery Scoring Assessment for Learning Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 68 In analyses of the role of national educational assessment, insufficient attention has been paid to the central place of the classroom. Rather than encouraging a two-way flow of information, today's "standards-based" frameworks tend to direct the flow of accountability from the outside into the classroom. Introduction Assessment Design Item Development Delivery The authors of this volume emphasize that assessment, as it exists in schools today, consists mainly of the measurements that teachers themselves design, evaluate, and act upon every day. Improving the usefulness of assessment in schools primarily requires assisting and harnessing this flood of assessment information, both as a means of learning within the classroom and as the source of crucial information flowing out of classrooms. Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges This volume aims to encourage debate and reflection among educational researchers, professionals, and policymakers. Five source chapters describe successful classroom assessment models developed in partnership with teachers, while additional commentaries give a range of perspectives on the issues of classroom assessment, standardized testing, and accountability. ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 69 Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 70 ניתוח תשובות מיצ"ב כיתה ה' תשס"ו Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה ידי תמי גירון וצוותה-מתוך חוברת שהוכנה על רשות ארצית למדידה והערכה בחינוך © 2006 # 71 NGfL Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 72 מסקנות עיקריות Introduction Assessment Design Item Development Delivery Scoring ההשפעה של טכנולוגיות על מדידה ומבחנים היא משמעותית ונמשכת שימוש מושכל בטכנולוגיות יכול לשנות את נקודת האיזון בין תוקף לבין יעילות לשיטות מדידה חדשות יכולות להיות השלכות חשובות על תועלת ושוויון Reporting New Construct Assessment for Learning & Conclusions Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 73 תהליכי המדידה וההערכה עצמם צריכים לעמוד בסטנדרטים ברורים של איכות ושל עלות-תועלת ובכללם :תקפות ,מהימנות ,זמינות ,תרומה ,ויעילות כלכלית מדידה והערכה בחינוך ,אינה מטרה כשלעצמה ואינה פתרון קסם לקשיים ,לבעיות ולאתגרים של העשייה החינוכית אתגרים Introduction Assessment Design Item Development Delivery Scoring Reporting New Construct Assessment for Learning & Conclusions Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 74 מהם האתגרים המרכזיים העומדים בפני הקהיליה של מדידה והערכה? אינטגרציה של מחקר אודות מדידה – מעיצוב המבחן ועד לתהליך ההעברה שלו צפי של ההשלכות של טכנולוגיות על "העולם האמיתי" ועל עולם החינוך. יתר מעורבות בתפקידה של מדידה והערכה בקביעת מדיניות חינוך Introduction Assessment Design Item Development תודה Delivery Scoring Reporting ?שאלות? הערות New Construct Assessment for Learning Conclusions & Challenges ראמ"ה רשות ארצית למדידה והערכה בחינוך © 2006 # 75