TEACHING THE DATA WAREHOUSE COURSE

Download Report

Transcript TEACHING THE DATA WAREHOUSE COURSE

TEACHING THE DATA WAREHOUSE COURSE

Paul Gray ISECON 2001 1

Outline

   Overview of what is Data Warehousing The 5 and 10 week courses The Indiana University of Pennsylvania course 2

ORIGINS

  Data warehouses are the results of two software solutions needing and finding one another:   Data base firms developed data warehouses and were looking for applications EIS and DSS software developers and vendors needed to deal with ever-increasing data bases About 10 years ago, the two groups started interacting with the results described here. 3

ORIGINS

  Database developers long understood that their software was required for both transactional and analytic processing However, their principal developments were directed to ever-larger transactional data bases. This process occurred even through operational and analytic data are separate with different requirements and different user communities.

4

ORIGINS

  Once these differences were understood, new data bases were created specifically for analysis use.

Today, data warehouses have 3 major applications   On-line analytic processing for business intelligence Data Mining  Customer Relationship Management 5

WHAT IS DATA WAREHOUSING

    A data warehouse is typically a dedicated data base system for decision making that is separate from the production data base(s) used operationally. It differs from production system in that: it covers a much longer time horizon than transaction systems it includes multiple data bases that have been processed so that the warehouse’s data are defined uniformly (i.e., ‘clean’ data) it is optimized for answering complex queries from managers and analysts.

6

WHAT IS DATA WAREHOUSING?

  In the last 5 years, data warehousing has become a major industry within computing which has brought together the ideas of data bases and decision support. It has also been the foundation for efforts in data mining and in CRM Data mining refers to finding answers about an organization from the information in the data warehouse that the executive or the analyst had not thought to ask. Data mining is made possible by the very presence of large databases in the data warehouse. It provides techniques that allow managers to obtain managerial information from their legacy systems. Its objective is to identify valid, novel, potentially useful, and understandable patterns in data.

7

WHAT IS DATA WAREHOUSING?

 The objective of a data warehouse is to create a “single truth”  Data warehousing is a major new application area. It rates extremely high salaries (up to $100,000 for specialists, $300,000 for consultants). 8

DEFINITION

 A data warehouse is a:    Subject oriented Integrated Time-variant  Non-volatile Collection of data in support of management decision processes 9

NOTE:

  Data warehouse is physically separated from operational systems and operational data bases Data warehouses hold both aggregated and detailed data for management separate from the databases used for On-Line Transaction Processing (OLTP) 10

CHARACTERISTICS

Subject oriented Data are organized by how users refer to it Integrated Non-volatile Inconsistencies are removed in both nomenclature and conflicting information; (i.e. data are ‘clean’) Read-only data. Data do not change over time. Time series Data are time series, not current status 11

CHARACTERISTICS

Summarized Larger Non normalized Operational data are mapped into decision usable form Time series implies much more data is retained Data can be redundant Metadata Input =Data about data Unintegrated, operational en vironment (‘legacy systems’) 12

SUBJECT ORIENTATION

  Data is organized around major subjects of the enterprise Example: OPERATIONAL •Loans DATA WAREHOUSE Customer •Savings •Bank card Vendor Product •Trust Activity An application A subject orientation orientation 13

USING THE WAREHOUSE

   The higher the level of summarization, the more the data is used The more summarized the data, the quicker it is to retrieve However, the higher the level of summarization, the lower the level of detail 14

DATA MARTS

    Data Mart: A scaled-down version of the data warehouse A data mart is a small warehouse designed for the SBU or department level.

It is often a way to gain entry and provide an opportunity to learn Major problem: if they differ from department to department, they can be difficult to integrate enterprise-wide 15

COST

    Data warehouses are not cheap Median cost to create (does not include operating cost) = $2.2M

Multimillion dollar costs are common Their design and implementation is still an art and they require considerable time to create 16

SIZE

  Being designed for the enterprise so that everyone has a common data set, they are large and increase in size with time. Typical storage sizes run from 50 Gigabytes to several Terabytes 17

SIZE OF INDUSTRY

   Data warehouses are a major industry within information systems. 6B$/year Estimates vary but it is clear that many more than 90% of Fortune 1000 have data warehouse projects Major players include:   Oracle IBM+Informix Sybase NCR + BI companies (Brio, Cognos, Pilot,….) 18

MARKET COMPONENTS

      High end business intelligence (OLAP) Low end query tools Data cleansing Data marts Data mining Customer relationship management (CRM) 19

APPLICATION – OLAP/BI

   OLAP = On Line analytic processing Basic idea of OLAP: managers should be able to manipulate enterprise data models across many dimensions to understand changes that are occurring Vendors claim they are OLAP compliant even if they are not 20

APPLICATION - DATA MINING

  Also known as Knowledge Data Discovery (KDD) Mining terminology refers to finding answers about a business from the data warehouse that the executive or analyst had not thought to ask 21

APPLICATION - DATA MINING

   KDD applies techniques mostly from artificial intelligence and statistics to discover new information.

It is designed to find information that queries and reports don’t reveal effectively KDD uses AI and statistics to find pattern in data and to infer rules.

22

APPLICATION - DATA MINING

 Some successes:      People who buy scuba gear take Australian vacations Fraud detection, consumer loan analysis Optimizing production lines IBM’s SCOUT (apocryphal?) Men who buy diapers buy beer 23

APPLICATION - DATA MINING

Associations Sequences Classifications Clusters Forecasting Things done together (buy groceries) Events over time (house.refrigerator) Pattern recognition (rules) Define new groups Predictions from time series 24

CUSTOMER RELATIONSHIP MANAGEMENT (CRM)

    Successor to data base marketing Implies marketing to customers on a 1:1 basis Requires data granularity at the level of the individual customer large amounts of data Data warehouse is only a part of the CRM concept. 25

History of Course I

    First given as PhD Seminar on Management of Information Systems 1996 Overview lecture by instructor Students present topics each week – typically 2 or 3 Great source of material!

26

History of Course II

   Attend Data Warehouse Institute conference in San Diego. Agree to write book with Watson. January 1997 Complete book in late 1997. Desk-top published. Used sabbatical at UCI to do the writing. Move course to regular MS course in Spring 1997.

27

Two Versions:

    14 week for Claremont Graduate University 5 Week for UC Irvine Taught 1/year for last 5 years WHY DO STUDENTS TAKE COURSE?

 High salaries (100K specialist, $300K for consultant) 28

Both Versions

    Text: Decision Support in the Data Warehouse by P. Gray and H.J. Watson Prentice Hall 1998 PLUS readings PLUS hands-on exercises PLUS term paper 29

5 WEEK VERSION

3 4 5 2 1 Wk Part 1 Part 2 Overview of DW OLAP, applications Demo Startracker Strategic use, software Framework Data sources, Planning and cleansing,metadata operating the DW Data marts,ODS Industry, economics Mining, BI CRM, Developments 30

Date 9-9 9-16 9-23 9-30 10-7 10-14 10-21 10-28 11-7 11-14 14-WEEK COURSE Introduction to DW Architecture, Data Input Metadata, Multiple Dimensions Interface, Strategic Uses Design Methods; Data Marts Progress Report; ODS; Maintaining the DW Midterm Economics of DW; Personnel Querying, Data Mining Progress Report; CRM 11-21 11-28 12-2 12-9 Business Intelligence ; Future Developments Student Reports Student Reports Final Exam

31

Using Software—Star Tracker

   Simple DW The Data Warehouse Toolkit : Practical Techniques for Building Dimensional Data Warehouses by Ralph Kimball (Wiley 1996) Commercial version “Synchrony” from If… no longer available 32

Using Software – Commercial

   Red Brick (now part of Informix which is now owned by IBM) Gift from Red Brick Mostly used with student projects 33

Data Sources

   Journal of Data Warehousing DM Review   On-line (dmreview.com) Hard copy Lots and lots on Internet e.g.,    www.Dw-institute.com

Pwp.starnetinc.com/larryg www.datawarehouse.org

34

Assignments

   Find 10 articles on Internet  teaches search techniques, nomenclature, rapid course immersion Exercise with software The DW industry 35

Term Projects for DW

 Next chart shows 22 of the 30 suggested topics. Last topic allows student to choose topic subject to instructor approval. 36

DW Architecture Data sources, loading, cleaning, summarizing, granularity Metadata in a DW Data Modeling and the DW Multiple Dimensions in DWs via ROLAP and MOLAP Indexing in DWs DW Interfaces –conventional, window, browser Design principles for DWs Strategic use of the DW Justification of the DW including C/B analysis DW Security Data Marts Operational Data Stores Maintaining the DW Economics of DWs Querying and reporting in a DW OLAP and the DW Decision Support Systems and the DW Database marketing and the DW Data Mining and the DW Personnel considerations for building the DW and for maintaining the DW Organizing the DW Project (Project management etc.) 37

Elizabeth Pierce’s Course at Indiana University of Penn.

   Paper published in Communications of AIS (CAIS) Sept. 1999 (Vol. 2 Article 16)

Developing and Delivering a Data Warehousing and Mining Course

“Introduces students to the strategies, technologies, and techniques” 38

Elizabeth Pierce’s Course at Indiana University of Penn.

 “Students learn what is involved in planning, designing, building, using, and managing a data warehouse. Students also learn about how a data warehouse must fit into an over all corporate data architecture that may include legacy systems, operational data stores, enterprise data warehouses, and data marts. In addition, students are exposed to the different data mining techniques used by organizations to derive information from the data warehouse for strategic and long-term business decision making.” (abstract) 39

U. Of Indiana Course

  14 weeks –7 weeks on DW Challenges:  Evolving nature of the subject   Hands on experience for students Textbook selection—mostly tradebooks 40

Useful Tables in Pierce

    List of Internet sources Available software demos Some on-line data warehouses with public access Schools that offered data mining courses in 1999 41

Conclusions

    Students come because it is a job source A good 2 nd course for data base (can be taken w/o database course) Allows covering the new topics of the 90’s and 00’s—OLAP,CRM,Mining, in more depth than Intro to IS course.

It’s a fun topic and you can even do research on it. 42