Transcript Document

Program Review
Support Tool
Nathan Pellegrin
Research Analyst
Goals

Background and purpose of the tool

Demonstration

Cal-PASS update

OLAP Development at Cal-PASS

OLAP Success Story: SSPIRE Cube

Future development
The Program Review Support Tool
•
•
•
•
Funded by Hewlett.
Currently being tested by several colleges.
All data is from MIS.
Caveat: figures may not match what is found in locally produced
reports due to differences in master data sources and formulae used
to derive figures.
• Like a “data smorgasbord” and includes
– student demographics
– course grades
– TOP code course hierarchy
… the menu will be expanding !
Purpose
• Not the product of a mandate or requirement from
the Chancellors office.
• Not intended to take the place of local tools.
• Not intended to drive evaluation activities.
• Intended for use by colleges as an optional FREE
tool in their program review process.
• Obtain feedback from users to scale and improve
our data model and OLAP infrastructure.
Cal-PASS Statistics
 Over 300,000,000 records
 Up to 15 years of data in some regions
 Over 7,000 schools, colleges and university members
 Over 150 research studies conducted in the last two years
 Sixty-six Professional Learning Councils (1,200+ faculty)
Universities (23)
CSU
.
•Channel Islands
•Dominguez Hills
•Fresno
•Long Beach
•Los Angeles
•Monterey Bay
•Pomona
•Sacramento
•San Bernardino
•San Marcos
•Stanislaus
•San Francisco
•San Jose
•Sonoma
UC
.
•Davis
•Merced
•Riverside
•San Diego
•Santa Barbara
•Santa Cruz
PRIVATE
.
•Otis College of Art and Design
•National University
•University of the Pacific
Changing the Paradigm:
OLAP Applications
• OLAP = On-Line Analytical Processing
• Like Excel pivot tables, except Excel handles only two
dimensional data.
• Stores pre-computed aggregations of data with B-Tree indexing
for delivering fast retrieval times and fast calculation.
• Enables users to perform analysis of data quickly with dragand-drop manipulation of variables and dynamic visualization.
• Web-based for easy access – all processing is performed on the
server so it does not tie up your work station (zero footprint).
• Big time savings!
• Ideal for the action research paradigm and design research.
3 Layers of the Application
User Interface - Dundas
MDX
OLAP Cube - SSAS
SQL
Database(s) – SQL Server
Development Process of the OLAP project is a
technical collaboration between IT and Research …
•
•
•
•
Server Architecture/O.R. – Alex Zakharenkov (IT)
Submission Processes/User Interface – Nick Wade (IT)
Data Model/ETL – Nathan Pellegrin (Research)
Design/Feedback of OLAP cubes - All IT and Research
Staff, including Terrence Willett and Mary Kay Patton
Development times
• Development of initial Dim Model started in July
2008 … incremental additions/changes congealed
into a (basic) model by February, 2009.
• Initial development of Program Review, including
feedback and changes ≈ 8 weeks.
• Dim model ETL execution ≈ 15 hours.
• Processing of OLAP cube ≈ 20 min./300K rows .
• Initial deployment of UI ≈ 3 weeks. Several changes
since then.
• UI required tweaks to OLAP cube design.
Development Tools
• .NET
• SQL Server
– storage
– Integration Services
– Analysis Services
• BIDS
• Dundas
Cal-PASS
Data Flow
Dimensional
Model
•Analytical
•Integrated K-12/CC/Univ
•Time-dependent
•2NF (Redundant CK)
•Optimized Indexing
ETL
Object
Repository
0607,01612590000001,0000179441,U,,12,ABARCA,CARL
OS,09091988,M,500,,,,,,,,,,,,,,,,,,,01,15,1,,,,,,N,010,Y,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,275,0,0,4,24,0,0,0,0,0,0,,1
0607,01612590000001,000
0032006,X,,,,,,,,,,,,,,,,,,,,,,,N,Y,,,8,,,,,,,,,,
0179441,U,,12,ABARCA,CARLO
0607,01612590000001,0000154281,9107510861,,11,BLAC
S,09091988,M,500,,,,,,
K,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,,
,,,,,,,,,,,,,01,15,1,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5
,,,,,N,010,Y,,,,,,,,,
,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,N,N,U,72,80,,,,,,,,,,
,,,,,,,,,,,,,,
0607,01612590000001,0000159553,U,,11,BOWIE,EARLIS
9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,
HA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,
,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,1003200
,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,
6,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12,
0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,,
67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,,
0607,01612590000001,0000161233,U,,
0607,01612590000001,0000159553,U,,11,BOWIE,EARLIS
HA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,
,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,
0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,,
0607,01612590000001,0000161233,U,,
CUSTOM
FILES
ETL
Cal-PASS
Submission
Universities
UC
UC UC
•Semistructured data
•Format/value Validation
K-12
CSU
CSU
Private
•Storage
•Application Integration
•Key-value pairs (KVP) design
CCCCO
MIS
LA
What does a dimensional data model do for Cal-PASS?
UNIFY:
Data from across segments is integrated into a unified dataset.
STANDARDIZE:
Table and field names, data types and value coding systems are
standardized to be the same for all segments.
SIMPLIFY:
The number of tables and fields used to store the data is reduced.
Granularity of tables are at the units of analysis. Table relationships reflect
analytical relationships between entities.
IMPROVE DATA QUALITY:
Merging data brings data quality issues to light so
they can be noted and/or resolved. Establishing primary and foreign key
relationships enforces referential integrity. Multiple student identifiers are
unified to produce a single “metakey” Missing course CBEDS classifications
imputed using machine learning.
REDUCE RISK: Without it, in order to produce one metric for all segments
separate analytical data processing pathways are required for each segment,
which means more maintenance and increased risk of inconsistent results. Using a
dimensional model the analytical computations and services are centralized.
BOOST PRODUCTIVITY:
The simpler, cleaner data model makes it easier to
develop cubes with re-usable components, generalized for all segments. Currently,
analytical data processing must be developed separately for each segment. Using
a dimensional model, only one pathway needs to be developed that applies to all
segments.
Organization
Cal-PASS Unified
Dimensional Data Model
(Selected Tables)
Student
Status
Student
Course
Outcome
Course
Taxonomy
Course
Award
= Fact Table
= Dimension Table
Term
= Foreign Key
Relationship
The Ideal: Centralization of
analytical query processing
Presentation & User
Engagement
OLAP
Each statistic can
emerge at
multiple
presentation
points, but there
is only one logical
control point.
Views and
Stored Proc’s
Dimensional
Model
Tables
User-defined
cohorts;
model outputs
Student identifiers from each source system are mapped to
a new identifier through transitive closure of all connected
values (using a modified version of the Floyd–Warshall
algorithm).
Local district student id
Name + gender + DOB
CSIS SSID
CCCCO SID (SB00)
n1
d1
d2
o1
n2
c1
d3
Each edge represents a record linking two values of different
identifiers in submitted student records.
m1
OLAP Success Story: SSPIRE Cube
•
•
•
•
•
Funded by Irvine Foundation.
Currently used by nine colleges.
Incorporates MIS data with data submitted by colleges (custom files).
Tracks cohorts of students.
Demonstrate using Merced college (thank you Dr. Duran!)
Program
Review
Support Tool
This is only the beginning…
• Provide access to K-12 districts and Universities
• Inter-segmental OLAP Cubes
• Link non-academic outcomes (Employment
Development Department, Child Welfare Data
System)
“Success at Every Level”
Education Data and Information Act of 2008
SB 1298
1. convene a high-level working group to decide the best the governance
structure for the comprehensive education data system;
2. directs the State Chief Information Office (CIO), in consultation with educators
and education policymakers, to prepare a strategy plan outlining a clear path for
technical implementation; and
3. requires the various education segments to begin using a common student
identifier, so that once a governance structure and technical architecture are in
place we can begin linking records from pre-k through the university with relative
ease and speed.
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
High level cross-agency systems map of key collections
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
CDE Data Systems
In development
Existing
WORKING DRAFT
NOT EXHAUSTIVE
Direct data sharing*
Data sharing through local
agencies
Non CDE Data Systems
National Student
Clearinghouse
AYP/API
CASAS
TOPS Pro
CASEMIS
CSU ERS
CCC COMIS
CALTIDES**
Assessments
CPEC
CALPADS**
Early
Childcare
Migrant
Prisons,
Census
CALPASS
CCTC CASE
SACS
Other CDE systems/
units including
CDS, Charter
schools
Source:
Planned/
potential
UC CSS
ConApps
Other CDE units
including Homeless,
CALSAFE, Title 3,
Private Schools etc.
CDPH
EDD
From Franchise tax,
benefits system etc.
* Does not imply direct data linkages. Only state system linkages shown
** CALPADS is envisioned to replace much of the CBEDS, Language Census, Student National Origin Report and select Consolidated Application data
*** CALTIDES is envisioned to collect data primarily from CALPADS and Commission on Teacher Credentialing’s CCTC’s Credential Automation System Enterprise
CASE system
Interviews with respective agencies, RAND, team analysis
High level system profiles of key CDE collections (1/2)
DRAFT
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
System name
Description
Key identifier
Data categories
Granularity
Data sharing
CALPADS
California Longitudinal Pupil
Achievement Data System. System
(under development) for tracking
K12 students longitudinally, that
will replace CBEDS collections
SSID
Student demographic, program
participation, grade level, enrollment,
course enrollment and completion,
discipline, and statewide assessment
Student
Planned
includeAssessments,
API/AYP,
Migrant,
ConApps,
CALTIDES
California Longitudinal Teacher
Integrated Data Education System.
Iintegrated data system for teacher
data based on unique SEID
SEID
Teacher credentials, authorizations,
teacher participation program, alternative
routes, participation in Beginning Teacher
Support and intern program, SEID, Salary
Student
Planned
includeCALPADS,
CCTC CASE
CASEMIS
California Special Education
Management Information System.
Integrated data system for special
education students on students,
services and provider programs
SSID
Attendance/Enrollment, Disciplinary,
Education Agency, Mobility, Special
Education, Staffing Data, Student
Demographic, Other (services, age,
gender, race/ethnicity)
Student, School None at state
district, School, level
county, region
Assessments
California High School Exit Exam
SSID
CAHSEE, Standardized Testing and
Reporting STAR and CELDT
Attendance/Enrollment, Education
Agency, Food and Nutrition, Parent Data,
Special Education, Student Demographic
Student, School CASAS,
District,
Migrant,
School, County AYP/API,
CALPADS
(planned)
API/ AYP
Accountability related information
CDS code
based on California's Public Schools
Accountability Act of 1999 as well
as No Child Left Behind Act of
2001
AYP/API score by student characteristics
School
Source: Respective CDE departments
Assessments
High level system profiles of key CDE collections (2/2)
DRAFT
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
System name
Description
Key identifier
Data categories
Granularity
Data sharing
Migrant
Student enrollments in migrant
education programs. Includes migrant
education forms and a directory of
offices providing services
Migrant ID,
COE number,
CDS code
Student demographics, educational programs,
counseling, health and support services,
emergency health, clothing, food,
transportation
Student
Assessments,
CALPADS
(planned)
SACS
Standardized Account Code Structure.
Offers LEAs with a means of reporting
financial information
CDS code
For every general ledger accounting
transaction- information on funds, resources,
project year, goal, function, and object.
Includes information on
Attendance/Enrollment, Education Agency,
Fiscal, Transportation
School, District
CDS, Charter
schools
ConAPPS
Consolidated applications. Includes
information on categorical programs
e.g., Title I, II, V etc.
CDS code
Student demographic, Title I, III, V, Part A,
Immigrant, LEP, funding model, charter
status, Gradespan, participants
School, District,
County
CALPADS
(planned)
Early Childcare
Systems
CD-801A,B, CDMIS, Special Education SSID
Desired Result System SEDRS, and CD
9600
Child demographics, IEP flag, family
identification/case number, household name,
type of program, DRDP Desired Result
Development Profile, Early Childhood
Environment Rating Scale ECERS
Student
None
CASAS
TOPSPro
Comprehensive Adult Student
Assessment Systems. System for
tracking Students in Adult Education
Programs
Student demographics, Agency, instruction
Student
level and program, assessment scores, date of
entry, reason for exit, class number, attainable
goal within program year
Source: Respective CDE departments
ADA ID, SSID,
CASAS no
Assessments
2
High level system profiles of non-CDE collections
DRAFT
Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
System name
Description
Key identifier
Data categories
Granularity
Data sharing
CPEC
California Post Secondary Education
Commission. Data system for Higher
Ed- post secondary systems
Student ID
based of SSN
Demographic, IEP, grade level, program,
Graduation rate, teacher, institution
Student
CDE, CSU,
UC,CCC,
prison, census
UC CSS
Corporate Student System provides
SSN
information on student enrollment and
performance for University of
California campuses
Student demographic, income, financial aid, Student
education history, assessment
CDE, CCC,
CALPASS
CCC COMIS
California Community Colleges
SSN, Student
Management Information System.
ID
COMIS data is used to prepare reports
for Federal and State reports including
Integrated Postsecondary Education
Data System (IPEDS) and to track
student outcomes
student demographic, income, financial aid,
education history, assessment, teacher,
institution
Student
CALPASS,
CPEC, CSU,
EDD, National
Student
Clearinghouse
CSU ERS
Enrollment Recording
System is used by Cal State to
track student retention and graduation
to support regular term reports,
IPEDS, and state budget requests
SSN
Student demographic, financial aid,
education history, assessment
Student
CPEC,
CALPASS,
CCC
CDPH
California Department of Public
Health. System use to track
CDPH ID
Case ID and demographics, clinical and
diagnostic data
Case
None
EDD
Employment Development Database
ID based of
SSN
Wages, payroll taxes, unemployment
tracking, job matching, job training
Employee
Franchise tax,
benefits system,
CCC
Source: Respective agencies, RAND
2
Thank you!
Have fun ….
HAPPY DATA!
Data
Data