Transcript Document

USM Oracle/PeopleSoft Mid Atlantic
Regional Conference
Gettysburg College – June 2, 2009
Understanding Dimensional Data Models
Mark Max
[email protected]
www.istrategysolutions.com
About iStrategy Solutions
• Founded in 1999, iStrategy is 100% focused on analytic
applications for Higher Education
• In 2003, we released the PeopleSoft edition of HigherEd
AnalyticsTM, the first packaged student “analytic
application” for colleges and universities
• Products: Student, Finance, HR, Advancement, Financial
Aid, NCATE
• ERP Integration: PeopleSoft, Datatel, Banner
• Diverse Customer Base: Large Publics, Privates, For
Profit, Community Colleges, State Systems (USM, USG,
CSU)
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Agenda
1. The Reporting Paradox (i.e., why we do what
we do)
2. Introduction to Dimensional Data Models
3. Relational vs. OLAP databases
4. Examples – Derived Student Term Information
5. Q&A
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
1
Why is it so difficult getting information
from Higher Education ERP Systems?
SELECT b.EMPLID, b.ADMIT_TERM
b.PROG_ACTION, b.PROG_REASON,
b.ACAD_LOAD_APPR
FROM PS_ADM_APPL_DATA a inner join
PS_ADM_APPL_PROG b
on a. EMPLID = b.EMPLID and
a. ACAD_CAREER = b.ACAD_CAREER and
a. ADMIT_TERM = b.ADMIT_TERM
WHERE b.ACTION_DT = (SELECT
max(ACTION_DT) as ACTION_DT FROM
PS_ADM_APPL_PROG
WHERE EMPLID = b.EMPLID and
ACAD_CAREER = b.ACAD_CAREER and
ADMIT_TERM = b.ADMIT_TERM) and
b.EFFSEQ = (SELECT MAX(EFFSEQ) FROM
PS_ADM_APPL_PROG
WHERE EMPLID = b.EMPLID and
ACAD_CAREER = b.ACAD_CAREER and
ADMIT_TERM = b.ADMIT_TERM and
ACTION_DT = b.ACTION_DT)
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Academic Analytics & Derived Information
The vast majority of management information is
derived by applying business rules or algorithms
(i.e., turning data into information!
Dimensional Attributes
• Start Term
• First Term Indicator
• Degree Seeking Indicator
• Enrolled Indicator
• Prior Major
• GPA Band
• SAT Band
• Student Term Status
Measures
• Admit Count
• Enrolled Applicant Count
• % Admits Enrolled
• Enrolled Student Count
• Major Count
• Class Utilization %
• Retention %
• Graduation Rates
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
The Impact of Distributed Business Rules
Reports/Queries
The same logic must be continually repeated
over and over in various similar reports
resulting in inconsistencies
and data integrity
Centralized
issues.
Business Logic
ETL
ERP
Database
ERP
Database
Distributed
Business Logic
 Information derived
in each report
•
•
•
•
Improved
Improved
Improved
Improved
Dimensional
Data
Warehouse
Data Integrity
Productivity
Access to Information
Decision Making
 Information (measures and dimensions)
derived and stored in DW
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Definition of Paradox
A paradox is a statement or group of
statements that leads to a
contradiction or a situation which
defies intuition.
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
The “Reporting Paradox”
• Truth #1 – There is no way to provide analytics or
achieve self service reporting directly from ERP
databases.
• Truth #2 – Analytics and self service reporting requires
transforming “data into information”, and storing this
information in a data model optimized for reporting (e.g.,
dimensional data model).
• Truth #3 – There’s no way to predict every possible
information need to incorporate into the optimized data
model (…so how do you design the data model).
What can we do?
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Applying Pareto’s Law to Reporting
Pareto’s Law (the 80/20 rule) - 80
% of the effects are achieved with
20 % of the means. Subsequently,
it takes 80 % of the means to
achieve the remaining 20 % of the
objectives.
In other words, the cost of the last
20 % is four times greater than the
first 80 %.
HigherEd Analytics applies the Pareto principle to reporting…
 80+% of the reporting requirements can be met very quickly
and easily;
 the 20% (or less) can be added over time, or on an exception
basis directly from ERP data using Excel, hard-coded reports,
SQL queries, etc.
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
2
Dimensional Data Model
• Data model optimized for reporting, not
transaction processing.
– Two types of primary objects: measures and
dimensions
– Derived information
– Optimized key relationships to address nulls and many
to many join scenarios
• Dimensional modeling is not a physical
database implementation – it’s a logical
framework and a methodology
• Database implementation options:
– Relational
– OLAP (Cubes)
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Relational Dimensional Data Model
Star Schema Design
Dimension
Dimension
Dimension
Dimension
Fact Table
Dimension
Dimension
Measures
Dimension
Dimension
Keys
Secondary
Information
Confidential and Proprietary
Dimension
All Rights Reserved © 2003-2009, iStrategy, LLC
Star Schema — Simple Example
DimStudent
Student Key
EMPLID
Name
Gender
Birthdate
Citizenship
Ethnicity
….etc
12
FactRegistration
Student Key
Term Key
Class Key
Credits taken
Credits Earned
Grade (numerical)
Enrolled count (0/1)
….etc
Confidential and Proprietary
DimTerm
Term Key
STRM
Description
Acad Year
Start Date
….etc
DimClass
Class Key
Subject
Catalog Number
Class Section
Description
….etc
All Rights Reserved © 2003-2009, iStrategy, LLC
How is Reporting Optimized?
•
Surrogate keys
– Unique primary key in dimension tables
– Dummy integer value
•
Fact Tables foreign (dimension) keys
– Key constraints w/No null foreign keys
– 100% one to many relationships (Dim to Fact)
i.e., only one dimension row for a given Fact
foreign key [why is the important!]
– Highly efficient storage and performance
•
ERP Nulls and Invalid Keys
– Dimension “placeholder” row for null (i.e.,
Unknown) and invalid keys
– Transform fact row to valid placeholder
dimension key
•
Derived Information
– Measures (e.g., Withdraw Count)
– Dimensions (e.g., First Term Indicator)
•
No “Snowflake” dimensions
– Dimension attributes are de-normalized
– No 2 tier joins
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Student Model
Dimensions
Admissions:
Application Method
Applicant Zip Code
Applicant Fin Aid Interest
Applicant Housing Interest
Applicant High School
Recruiting Category
Applicant Status
Admit Category
Applicant SAT Band
Applicant HS GPA Band
Applicant HS Rank Band
Applicant Age Band
Faculty Attributes:
Faculty
Faculty Rank
Highest Education Level
Tenured Status
Graduates:
Graduate Apply Status
Degree
Years to Graduate Band
Fact Areas
Admissions
Student
Term
Student
Plan
Class
Schedule
Registration
Faculty
Term
Degree
Awards
Student
Financials
Confidential and Proprietary
Dimensions
Institutional:
Term
Career/Plan
Academic Org
Student Term:
Academic Level
Academic Standing
Cohort/Cohort Type
Student Term Status
FT/PT; Credit Hour Band
Class/Grade:
Subject/Class
Course Level
Class Type
Grade
GPA Band
Student Attributes:
Student
Student Citizenship
Student Ethnicity
Student Gender
Student Geography
Student Age Band
All Rights Reserved © 2003-2009, iStrategy, LLC
Financial Model
Dimensions
Account
Business Unit
Calendar Budget
Calendar Fiscal
Class
Department
Fund
Program
Project
Dimensions
Fact Areas
GL Actual
Budget
PO’s /
Encumbrances
Requisitions /
Pre-Encumbrances
Pre-Encumbrance
Liquidation
Budget Check Status
Budget Reference
Budget Transaction
Type
Closed Status
Status Date
Invoice Date
Days Open Band
PO Status
Requisition Status
Vendor
Encumbrance
Liquidation
AP Vouchers
Accounts
Receivable
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Human Resource Model
Dimensions
Fact Areas
Account
Age Band
Calendar
Days Open Band
Department
Employee
Employment Category
Employment Status
End of Period Indicator
FLSA Status
FTE Band
Funding Department
Hire Type
Initial Hire Date
Job Code
Job Family
Job Hire Date
Job Analytics:
-
Employee Profile
Compensation
Mobility
Empl Value
Effectiveness
Development
Financial FTE
Requisitions
Confidential and Proprietary
Dimensions
Location
Minority Indicator
Open Requisition
Date
Performance Rating
Requisition Status
Salary Range
Separation Reason
Start of Period
Indicator
Supervisor
Termination Date
Years in Position
Band
Years of Service Band
Years to Retirement
Eligibility Band
All Rights Reserved © 2003-2009, iStrategy, LLC
3
Tabular vs. Crosstab Orientation
• Excel Example
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Why OLAP Technology?
• Intuitive
– Multi-dimensional presentation is the natural
orientation for business information and
analysis
– Intuitive and easy to use
– Hides user from underlying relational data
model
• Fast
– OLAP Technology is very fast – 98% of reports
in less than 1-3 seconds
– Speed advantage substantial in highly
aggregated reports such as multi-year trends
• Powerful
ECAR Study on Academic
Analytics substantiates greater
analytic effectiveness and
results using OLAP technology
– Calculated Measures - enables calculations that
are difficult using relational technology (e.g.,
prior period % change, retention %,
graduation rate)
– Rankings and Exceptions (e.g., Top 10)
• Produces Consistent Information
– Logic is in the cube, not the reports
– Calculated metrics repository
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Database Implementation: Relational or OLAP?
• At iStrategy, we use both!
• Relational for the complex transformations
(turning data into information).
• OLAP for the speed, power and ease of use.
– Quick OLAP Demo
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
HigherEd Analytics Overview
Analytic Application
Personalized
Dashboard
Guided
Analysis
Interactive
Reports
• Intuitive self service access to reliable
information (“single version of the
truth”)
• Single DW solution for casual and power
users
OLAP Cubes
Dimensional
Data Warehouse
ETL Modules
• Rich library of higher education
performance metrics
• Powerful analytic capability to support
decision making and management
information needs
• Personalized Dashboard for executive
users
• “Out-of-the-box integration to leading
ERP vendors
ERP
Other
Data
Sources
• An open technology platform that is
easily extensible
• Rapid deployment capability
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Key Terminology
• Fact Table – a central table in a data warehouse schema that
contains numerical measures and keys relating facts to
dimension tables. A fact table contains data that describes a
specific event and has a defined grain, or level of detail
• Cube – multi-dimensional data structure consisting of measures
and dimensions
• Measures - the facts we are measuring within a subject area
(e.g., Admit Count, Avg. GPA, Retention %)
– Stored Measures - based on stored data in the relational fact tables
(e.g., Admit Count, Registration Count)
– Calculated measures – calculated on the fly based on specified
algorithm ranging from simple to highly complex (e.g., Avg GPA,
Course Utilization, Retention %)
• Dimensions – defines how the measures are segmented
– e.g., Term, Admit Type, Course, Career, etc.
– Dimension Members – the list of values
– Hierarchies – parent-child relationships defining how structures roll
up (e.g., Year – Quarter – Month - Day)
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Derived Information Examples
• Admissions Funnel (grain: one record per application)
– Applications, Admits, Matriculated, Enrolled, Yield %’s
• Student Term (grain: one record per student, term,
career)
–
–
–
–
–
First Term in Career (First Term vs. Returning)
Returned Next Term Indicator
Start Term for Career
Admit Type
Enrolled Count
• Human Resources (grain: one record per employee
job/position per month)
– Start of Month/ End of Month Count
– Activity (New Hires, Transfers, Terminations, Promotions,
etc.)
– Internal vs. External Hires
– Voluntary vs. Involuntary Terminations
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
“Factless” Fact Tables (Relationship Tables)
•
•
•
•
Student Groups
Service Indicators
Class Attributes
Recruiting Categories
Student
Groups
•Student
•Term
•Group
Student
Term
•Student
•Term
•Service Ind
Confidential and Proprietary
Service
Indicators
All Rights Reserved © 2003-2009, iStrategy, LLC
Types of Reporting
Operational Reporting
• Tends to be tabular
transactional reporting (e.g.,
student level)
• Some “Real time” reporting
requirements (e.g., student
registration inquiry, billing
report, transcript)
• Near time data may be
sufficient for many
requirements
• Static reporting (i.e., not
interactive)
• Available operational
reporting tools:
Analytic / Management Reporting
• Used for performance
measurement, analysis and
decision making
• Generally multi-dimensional
presentation (instead of
tabular)
• Tends to be aggregate
information, often time series
data
• Advanced filtering: Exceptions
and Rankings
• Interactive reporting
requirements (as compared to
static transactional reports)
– ERP On-line Inquiry
– SQL Queries
– Third party relational
reporting tools
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Operational vs. Analytics:
Matching Reporting Objectives to the Optimum Source
Operational
(Real-time)
Operational
(Near-time)
Analytics/
Census
Trans
Source
Dim
DW
hours to days
5 - 30 minutes
OLAP
1- 5 minutes
Effort
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Flexibility vs. Simplicity
• Dimensional modeling is about creating
leverage. Like many things, the challenge is
managing the trade-offs: flexibility vs. simplicity
• Key Factors:
– How many measures and dimensions need to be
included in the model?
• Fine line between robust and overly complicated
• Too many choices will be too confusing for the average
user
– What complex questions need to be supported by the
model on a recurring basis
– How complex is the transformation logic to derive any
given information
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC
Questions?
Confidential and Proprietary
All Rights Reserved © 2003-2009, iStrategy, LLC