Transcript Document
USM Oracle/PeopleSoft Mid Atlantic Regional Conference Gettysburg College – June 2, 2009 Understanding Dimensional Data Models Mark Max [email protected] www.istrategysolutions.com About iStrategy Solutions • Founded in 1999, iStrategy is 100% focused on analytic applications for Higher Education • In 2003, we released the PeopleSoft edition of HigherEd AnalyticsTM, the first packaged student “analytic application” for colleges and universities • Products: Student, Finance, HR, Advancement, Financial Aid, NCATE • ERP Integration: PeopleSoft, Datatel, Banner • Diverse Customer Base: Large Publics, Privates, For Profit, Community Colleges, State Systems (USM, USG, CSU) Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Agenda 1. The Reporting Paradox (i.e., why we do what we do) 2. Introduction to Dimensional Data Models 3. Relational vs. OLAP databases 4. Examples – Derived Student Term Information 5. Q&A Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC 1 Why is it so difficult getting information from Higher Education ERP Systems? SELECT b.EMPLID, b.ADMIT_TERM b.PROG_ACTION, b.PROG_REASON, b.ACAD_LOAD_APPR FROM PS_ADM_APPL_DATA a inner join PS_ADM_APPL_PROG b on a. EMPLID = b.EMPLID and a. ACAD_CAREER = b.ACAD_CAREER and a. ADMIT_TERM = b.ADMIT_TERM WHERE b.ACTION_DT = (SELECT max(ACTION_DT) as ACTION_DT FROM PS_ADM_APPL_PROG WHERE EMPLID = b.EMPLID and ACAD_CAREER = b.ACAD_CAREER and ADMIT_TERM = b.ADMIT_TERM) and b.EFFSEQ = (SELECT MAX(EFFSEQ) FROM PS_ADM_APPL_PROG WHERE EMPLID = b.EMPLID and ACAD_CAREER = b.ACAD_CAREER and ADMIT_TERM = b.ADMIT_TERM and ACTION_DT = b.ACTION_DT) Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Academic Analytics & Derived Information The vast majority of management information is derived by applying business rules or algorithms (i.e., turning data into information! Dimensional Attributes • Start Term • First Term Indicator • Degree Seeking Indicator • Enrolled Indicator • Prior Major • GPA Band • SAT Band • Student Term Status Measures • Admit Count • Enrolled Applicant Count • % Admits Enrolled • Enrolled Student Count • Major Count • Class Utilization % • Retention % • Graduation Rates Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC The Impact of Distributed Business Rules Reports/Queries The same logic must be continually repeated over and over in various similar reports resulting in inconsistencies and data integrity Centralized issues. Business Logic ETL ERP Database ERP Database Distributed Business Logic Information derived in each report • • • • Improved Improved Improved Improved Dimensional Data Warehouse Data Integrity Productivity Access to Information Decision Making Information (measures and dimensions) derived and stored in DW Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Definition of Paradox A paradox is a statement or group of statements that leads to a contradiction or a situation which defies intuition. Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC The “Reporting Paradox” • Truth #1 – There is no way to provide analytics or achieve self service reporting directly from ERP databases. • Truth #2 – Analytics and self service reporting requires transforming “data into information”, and storing this information in a data model optimized for reporting (e.g., dimensional data model). • Truth #3 – There’s no way to predict every possible information need to incorporate into the optimized data model (…so how do you design the data model). What can we do? Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Applying Pareto’s Law to Reporting Pareto’s Law (the 80/20 rule) - 80 % of the effects are achieved with 20 % of the means. Subsequently, it takes 80 % of the means to achieve the remaining 20 % of the objectives. In other words, the cost of the last 20 % is four times greater than the first 80 %. HigherEd Analytics applies the Pareto principle to reporting… 80+% of the reporting requirements can be met very quickly and easily; the 20% (or less) can be added over time, or on an exception basis directly from ERP data using Excel, hard-coded reports, SQL queries, etc. Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC 2 Dimensional Data Model • Data model optimized for reporting, not transaction processing. – Two types of primary objects: measures and dimensions – Derived information – Optimized key relationships to address nulls and many to many join scenarios • Dimensional modeling is not a physical database implementation – it’s a logical framework and a methodology • Database implementation options: – Relational – OLAP (Cubes) Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Relational Dimensional Data Model Star Schema Design Dimension Dimension Dimension Dimension Fact Table Dimension Dimension Measures Dimension Dimension Keys Secondary Information Confidential and Proprietary Dimension All Rights Reserved © 2003-2009, iStrategy, LLC Star Schema — Simple Example DimStudent Student Key EMPLID Name Gender Birthdate Citizenship Ethnicity ….etc 12 FactRegistration Student Key Term Key Class Key Credits taken Credits Earned Grade (numerical) Enrolled count (0/1) ….etc Confidential and Proprietary DimTerm Term Key STRM Description Acad Year Start Date ….etc DimClass Class Key Subject Catalog Number Class Section Description ….etc All Rights Reserved © 2003-2009, iStrategy, LLC How is Reporting Optimized? • Surrogate keys – Unique primary key in dimension tables – Dummy integer value • Fact Tables foreign (dimension) keys – Key constraints w/No null foreign keys – 100% one to many relationships (Dim to Fact) i.e., only one dimension row for a given Fact foreign key [why is the important!] – Highly efficient storage and performance • ERP Nulls and Invalid Keys – Dimension “placeholder” row for null (i.e., Unknown) and invalid keys – Transform fact row to valid placeholder dimension key • Derived Information – Measures (e.g., Withdraw Count) – Dimensions (e.g., First Term Indicator) • No “Snowflake” dimensions – Dimension attributes are de-normalized – No 2 tier joins Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Student Model Dimensions Admissions: Application Method Applicant Zip Code Applicant Fin Aid Interest Applicant Housing Interest Applicant High School Recruiting Category Applicant Status Admit Category Applicant SAT Band Applicant HS GPA Band Applicant HS Rank Band Applicant Age Band Faculty Attributes: Faculty Faculty Rank Highest Education Level Tenured Status Graduates: Graduate Apply Status Degree Years to Graduate Band Fact Areas Admissions Student Term Student Plan Class Schedule Registration Faculty Term Degree Awards Student Financials Confidential and Proprietary Dimensions Institutional: Term Career/Plan Academic Org Student Term: Academic Level Academic Standing Cohort/Cohort Type Student Term Status FT/PT; Credit Hour Band Class/Grade: Subject/Class Course Level Class Type Grade GPA Band Student Attributes: Student Student Citizenship Student Ethnicity Student Gender Student Geography Student Age Band All Rights Reserved © 2003-2009, iStrategy, LLC Financial Model Dimensions Account Business Unit Calendar Budget Calendar Fiscal Class Department Fund Program Project Dimensions Fact Areas GL Actual Budget PO’s / Encumbrances Requisitions / Pre-Encumbrances Pre-Encumbrance Liquidation Budget Check Status Budget Reference Budget Transaction Type Closed Status Status Date Invoice Date Days Open Band PO Status Requisition Status Vendor Encumbrance Liquidation AP Vouchers Accounts Receivable Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Human Resource Model Dimensions Fact Areas Account Age Band Calendar Days Open Band Department Employee Employment Category Employment Status End of Period Indicator FLSA Status FTE Band Funding Department Hire Type Initial Hire Date Job Code Job Family Job Hire Date Job Analytics: - Employee Profile Compensation Mobility Empl Value Effectiveness Development Financial FTE Requisitions Confidential and Proprietary Dimensions Location Minority Indicator Open Requisition Date Performance Rating Requisition Status Salary Range Separation Reason Start of Period Indicator Supervisor Termination Date Years in Position Band Years of Service Band Years to Retirement Eligibility Band All Rights Reserved © 2003-2009, iStrategy, LLC 3 Tabular vs. Crosstab Orientation • Excel Example Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Why OLAP Technology? • Intuitive – Multi-dimensional presentation is the natural orientation for business information and analysis – Intuitive and easy to use – Hides user from underlying relational data model • Fast – OLAP Technology is very fast – 98% of reports in less than 1-3 seconds – Speed advantage substantial in highly aggregated reports such as multi-year trends • Powerful ECAR Study on Academic Analytics substantiates greater analytic effectiveness and results using OLAP technology – Calculated Measures - enables calculations that are difficult using relational technology (e.g., prior period % change, retention %, graduation rate) – Rankings and Exceptions (e.g., Top 10) • Produces Consistent Information – Logic is in the cube, not the reports – Calculated metrics repository Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Database Implementation: Relational or OLAP? • At iStrategy, we use both! • Relational for the complex transformations (turning data into information). • OLAP for the speed, power and ease of use. – Quick OLAP Demo Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC HigherEd Analytics Overview Analytic Application Personalized Dashboard Guided Analysis Interactive Reports • Intuitive self service access to reliable information (“single version of the truth”) • Single DW solution for casual and power users OLAP Cubes Dimensional Data Warehouse ETL Modules • Rich library of higher education performance metrics • Powerful analytic capability to support decision making and management information needs • Personalized Dashboard for executive users • “Out-of-the-box integration to leading ERP vendors ERP Other Data Sources • An open technology platform that is easily extensible • Rapid deployment capability Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Key Terminology • Fact Table – a central table in a data warehouse schema that contains numerical measures and keys relating facts to dimension tables. A fact table contains data that describes a specific event and has a defined grain, or level of detail • Cube – multi-dimensional data structure consisting of measures and dimensions • Measures - the facts we are measuring within a subject area (e.g., Admit Count, Avg. GPA, Retention %) – Stored Measures - based on stored data in the relational fact tables (e.g., Admit Count, Registration Count) – Calculated measures – calculated on the fly based on specified algorithm ranging from simple to highly complex (e.g., Avg GPA, Course Utilization, Retention %) • Dimensions – defines how the measures are segmented – e.g., Term, Admit Type, Course, Career, etc. – Dimension Members – the list of values – Hierarchies – parent-child relationships defining how structures roll up (e.g., Year – Quarter – Month - Day) Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Derived Information Examples • Admissions Funnel (grain: one record per application) – Applications, Admits, Matriculated, Enrolled, Yield %’s • Student Term (grain: one record per student, term, career) – – – – – First Term in Career (First Term vs. Returning) Returned Next Term Indicator Start Term for Career Admit Type Enrolled Count • Human Resources (grain: one record per employee job/position per month) – Start of Month/ End of Month Count – Activity (New Hires, Transfers, Terminations, Promotions, etc.) – Internal vs. External Hires – Voluntary vs. Involuntary Terminations Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC “Factless” Fact Tables (Relationship Tables) • • • • Student Groups Service Indicators Class Attributes Recruiting Categories Student Groups •Student •Term •Group Student Term •Student •Term •Service Ind Confidential and Proprietary Service Indicators All Rights Reserved © 2003-2009, iStrategy, LLC Types of Reporting Operational Reporting • Tends to be tabular transactional reporting (e.g., student level) • Some “Real time” reporting requirements (e.g., student registration inquiry, billing report, transcript) • Near time data may be sufficient for many requirements • Static reporting (i.e., not interactive) • Available operational reporting tools: Analytic / Management Reporting • Used for performance measurement, analysis and decision making • Generally multi-dimensional presentation (instead of tabular) • Tends to be aggregate information, often time series data • Advanced filtering: Exceptions and Rankings • Interactive reporting requirements (as compared to static transactional reports) – ERP On-line Inquiry – SQL Queries – Third party relational reporting tools Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Operational vs. Analytics: Matching Reporting Objectives to the Optimum Source Operational (Real-time) Operational (Near-time) Analytics/ Census Trans Source Dim DW hours to days 5 - 30 minutes OLAP 1- 5 minutes Effort Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Flexibility vs. Simplicity • Dimensional modeling is about creating leverage. Like many things, the challenge is managing the trade-offs: flexibility vs. simplicity • Key Factors: – How many measures and dimensions need to be included in the model? • Fine line between robust and overly complicated • Too many choices will be too confusing for the average user – What complex questions need to be supported by the model on a recurring basis – How complex is the transformation logic to derive any given information Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC Questions? Confidential and Proprietary All Rights Reserved © 2003-2009, iStrategy, LLC