Predicting Course Withdrawal Using Data Mining

Download Report

Transcript Predicting Course Withdrawal Using Data Mining

RioPACE
March 30, 2012
Relentless Improvement
Purpose of Project
• Tremendous growth in online community college
enrollment (Allen and Seaman, 2008). Need practical
institutional responses to the challenges of online
retention and success.
• Develop predictive models to identify at-risk online
students and drive early alert systems.
• Facilitate and strengthen linkages between instructors
and at-risk students within a dynamic online
environment.
Predictive Analytics
Predictive Analytics
Purdue
Signals
PAR Framework
NAU
GPS
Rio Salado College
Rio PACE
Building Analytics Capacity
Questions
• Which factors are effective as early/point-in-time predictors
of course outcome* in an online environment?
• Can we predict course outcomes using data retrieved from
our SIS and LMS? If so, can we generate early and/or
rolling (i.e. daily) predictions?
• How do we respond to students flagged as at-risk?
* Successful = ‘C’ grade or higher.
Previous Research
•
Significant correlation between LMS activity markers and course
outcomes. (Macfadyen & Dawson, 2010)
•
Online community college instructors tend to agree that ‘social
presence’ is a predictor of student success. (Hernandez, 2009)
•
Small, but growing trend towards data-driven at-risk alerting for
online students. (Kolowich, 2009); (Iten, Arnond, & Pistilli, 2008)
 Gap in literature related to practical models for point-in-time
prediction of course outcome.
LMS Activity Factors = LSP
• Log ins
o
Frequency of log ins to the course section homepage
• Site engagement
o Frequency of LMS activities that suggest at least some
level of course engagement (e.g. Opening a lesson,
viewing assignment feedback, etc.)
• Pace
o
Measured using total points submitted for grading
What was not included
A decision was made from the beginning
to only include variables were the result
of student behavior and could be affected
by instructor/support staff intervention
if needed.
Log ins
Chemistry 151 Summer & Fall 2009 (Nunit 3 = 159)
Measurement as of the
beginning of the “week.”
Weekly measurements only
include students who are
ungraded and still enrolled as
of the beginning of the week.
Course weeks converted to 16-unit scale to ensure
homogenous comparisons within variable length courses.
(Ex: 2nd week of 8-week course is 4th unit on a 16-unit scale.)
Successful = ‘C’ grade or higher.
Site engagement
Chemistry 151 Summer & Fall 2009 (Nunit 3 = 159)
Pace
Chemistry 151 Summer & Fall 2009 (Nunit 3 = 159)
Activity weighting
• In a practical application, recent behavior is
most relevant.
• Log in and site engagement factors
weighted based on when the event occurred
relative to the course start and end dates.
Activity weighting
Chemistry 151 Summer & Fall 2009 (Nunit 3 = 159)
Weighted log ins
Weighted site engagement
Predictive Model #1 (8th day at-risk)
•
Purpose
o
•
Factors
o
•
30 factors selected covering broad spectrum of LMS
behavioral data and enrollment information.
Methodology - Naïve Bayes classification model
o
•
Run only on 8th day of class. Derive estimated probability of
success and generate warning levels: Low, Moderate, High.
Accurate, robust, fast, easy to implement. (Lewis, 1998);
(Domingos & Pazzani, 1997)
Accuracy**
o
o
70% of unsuccessful* students correctly predicted for 6
participating disciplines.
Warning levels correlated with course outcome.
* Success = ‘C’ or higher
** Tested using random sub-sampling cross-validation (10 repetitions)
Predictive Model #2 (Rio PACE)
• Rio Progress And Course Engagement
• Next stage of predictive modeling at Rio Salado
• Started pilot phase in April 2010 with instructor
only view and live with student facing view in Fall
2011
• In the pilot student statuses were updated weekly,
going into production this was changed to daily
• Integrated within RioLearn course rosters
Predictive Model #2 (Rio PACE)
Color indicates warning level: Green, no risk; Yellow, Moderate;
Red, High Risk
Pop-up showing activity metrics
.
Predictive Model #2 (Rio PACE)
• Warning levels
o
Generated using naïve Bayes model with 5 input
factors
• Weighted log-in frequency
• Weighted site engagement
• Points earned
• Points submitted
• Current credit load
o
‘High’ warning = Student has low probability of
success if his/her current trajectory does not change.
Predictive Model #2 (Rio PACE)
•
Activity metrics
o
Log in
•
o
Site engagement
•
o
Weighted
measurements
used
Excellent, Good, or Below average
Excellent, Good, or Below average
Pace
•
Working ahead, Keeping pace, Falling behind
Calculated using historical daily means and standard deviations for
each metric in each course. Derived using previously successful
students only. Example:
Log in activity:
Below average
Good
…
-σ
μ
Excellent
σ
…
Predictive Model #2 (Rio PACE)
• Warning level distribution
o
o
Distribution is approximately uniform at beginning of class.
‘Moderate’ decreases and ‘Low’/‘High’ increase over time.
Chemistry 151 Summer & Fall 2009
Accuracy*
Correlation
between warning
level and
success rate.
Successlow > Successmod > Successhigh
.
*Obtained using random sub-sampling cross-validation (50 repetitions)
Predictive Model #2 (Rio PACE)
Sociology 101 Summer & Fall 2009 (N = 731)
Accuracy*
Other courses
Accounting 111 Summer & Fall 2009 (N = 539)
*Obtained using random sub-sampling cross-validation (50 repetitions)
At-risk interventions
• Course welcome emails
o
o
o
Encourage students to engage early.
Gen-ed students who log in on the 1st day of class succeed
21% more often than students who do not.*
Small trial in Fall ‘09 showed 40% decrease in drop rate.
*Obtained from Spring 2009 online general education courses at Rio Salado College.
Implementing Interventions
•
Student success interventions designed by Faculty Chair.
•
Examples:
o
o
o
Communication – Phone call from instructor during 2nd week
and follow-up 7 days before mid-point.
Mathematics – Phone call from the Instructional Helpdesk.
Physical Science – Phone call and email.
What we learned from the pilot
• Student statuses need to be updated more frequently to
match students’ activities in courses
• While we have a lot of data points available, we want
more. We are currently designing the next version of
our LMS and are building in more points to capture
data which we hope to use to further refine our model.
• We need to integrate PACE into Rio’s systems
approach
Rio’s Systems Approach
What we learned from the pilot
And, perhaps most importantly, while we have
a new tool to help instructors identify students
at risk and to communicate that status to
students, without integrating this tool into Rio
Salado’s support systems it will never achieve
its maximum potential.
Future Directions
•
Integrating Rio PACE into Rio’s support systems
•
Develop at risk models that look beyond the course.
•
Fraud detection models
•
Better analytics so that actionable information is available to
those that can act on it.
Conclusions
•
LSP Matters! Log ins, site engagement, and pace are
correlated with course outcome, even at an early point in the
course.
•
Colleges can build predictive models using existing LMS
data to identify online students who are struggling or are
likely to struggle.
•
Simple metrics can be used to assess activity performance,
which might help instructors launch more customized
interventions.
•
More research needed on the intervention side.
Thank You!