The Data Warehouse

Download Report

Transcript The Data Warehouse

Lecture 4

Themes in this session • How OLAP really works • Enterprise data models for data warehousing • Metadata

How OLAP really works

see demonstration….

Enterprise data models for data warehousing

From 2-layered to 3-layer information architecture

• Before (2 layered architecture): • Layer 1 - real-time data - run the business • Layer 2 - derived data - manage the business • Suggested (3 layered architecture): • Layer 1 - real time data • Layer 2 - reconciled data • Layer 3 - derived data

The failings of traditional data modelling in enterprises

Used only in business applications with well defined boundaries and roles. This means that: – entities are generalised only to the extent needed within the boundaries – provides no support for integrating applications – provides no support for combining data from different sources

Enterprise data modelling

Modelling at the enterprise level and not the operational level aids the understanding needed for the reconciliation of operational data demands by: – showing how different data sets interrelate – showing the role of the different data sets in the business This is achieved by: – treating and modelling data entities at their most general level – making all commonalties in business data visible and usable

Aims for enterprise data modelling

• Providing a

single systems development base

promoting the integration of existing applications • Supporting the

sharing of data

and between different areas of the business • Enabling

effective management of data resources

by providing a single set of consistent data divisions • Supporting the establishment and maintenance of a company-wide

comprehensive management information system

• Providing a

structured methodology

business users to be involved in the implementation of business information strategies which allows

The structure of the enterprise data model

The enterprise data model has 5 distinct layers: – scope and architecture layer – business data classifications – generic entity relationship model – logical applications view – physical data design

The challenges of enterprise data modelling

• Very wide scope, is always very complex • Demands input from all areas of the business • Is very time consuming and constitutes a moving target • Requires good management • Must be planned so as to deliver value and gain momentum • Requires access to skilled and knowledgeable business users, this is always in direst competition to operational priorities • Difficult to apply in application development situations

A strategy for enterprise data modelling

• Tackle problems by breaking them down and dealing with them piece by piece • Use a layered structure with vertical subdivisions into business subject areas • Employ a stages definition approach • Employ a staged implementation approach

General steps in the creation of the enterprise data model

• Obtain a unified view of the data needed to run the company • should be sufficiently generic for all the sections of the business to accept • should be sufficiently detailed to allow reasonably independent subsets to be identified as the basis for further work • All key entities that are commonly used across the enterprise must be identified with certainty • Local key attributes must be given initial definitions • Important relationships between key entities must be identified Note: for many industries pre-constructed industry models can be bought and customised

Modelling the Business Data Warehouse (reconciliation layer)

• Determine the vertical segment (depth and breadth) of the subject area which the BDW data model is to cover • this should include a set of strongly interrelated entities • Choose which section of the Generic ERM to model on the basis of the business units’ needs • Develop a logical application view in which the entities in the GERM are customised in order fit the purposes of the application • Generate an optimal physical data design for the logical application view.

• Note: the physical data design will nearly always differ form the actual design of the BDW because the legacy systems compromise the physical design of the BDW

Modelling the Business Information Warehouse (derivation layer)

• Identify end-user groups and ascertain their intentions and requirements for the use of information • Select the relevant subset of the GERM from within the bounds of the BDW segment • Identify any isolated data needs that fall outside the bounds of the BDW • Create the Logical Application view for the BIW • Create the Physical Data Design for the BIW • Map the transformations between the reconciled and derived data models stipulated how the data will be moved from the physical sources in the BDW to the BIW (Metadata) Note: Sometimes it will be necessary to bypass the BDW and collect data directly from the real-time systems. This must be seen as a short term solution and eliminated ASAP.

Retrofitting the model

• Employ an approach of modifying operational applications at the same time as the BDW is evolved.

• Try and steer both the operational applications and the BDW towards the optimal form dictated by the model.

• The model serves as a goal towards which both the operational applications and BDW are fitted

The staged implementation of a data warehouse

• Stage 1 - define the high level enterprise model – (1-3 months) • Stage 2 - model the subset intended for the BDW – (6-9 months) • Stage 3 - model the first BIW – (1-2 months) • Stage 4 release initial versions and continue with BDW/BIW evolution Note: the whole process of modelling, in parallel with an implementation program may take a year to get off the ground!

Using pilot applications for Data Warehousing

In order to achieve quicker ROI a scaled down version of the BDW can be released. The size of the BDW can coincide exactly with that of the first BIW.

• In order to prevent a fragmented evolution of the BDW several guidelines must be followed: – Use a pilot only once in any given business area – The structure of the BDW/BIW should not be too highly optimised for performance – A plan should for the migration to a full-blown three layer architecture should be be delivered and approved before the delivery of the pilot

Approaches to building the data warehouses and data marts with enterprise models

• The top-down approach • first develop an enterprise data warehouse from the enterprise data model. Follow this up with data marts until a multi-tier architecture is obtained • The bottom-up approach • random growth of data marts, hopefully the evolution of a enterprise data warehouse on top of these after a while • The hybrid approach • develop the enterprise model first. When the model is in place begin building the enterprise warehouse and data marts in parallel

IFW - an example of an enterprise data model

Structure Strategy Skills Data Function Work flow Solution Applica tion Network System Organization View Business View Technical View

IFW - the business view

Level Data Functi on Workfl ow A B C

Data Concepts Workflow Concepts Function Hierarchies Classification Hierarchies Workflow Structures Business Object ER Models Business Object Mo State Transmission dels

C´ D

The nine key data concepts on the FSDM A level

• Involved Party • Location • Arrangement • Business Direction Item • Classification • Condition • Product • Event • Resource Item ex. customer, supplier ex. address e x. contract ex. goal, method ex. account, ex. price, interest ex. article, service ex. payment ex. price list, document

Classification hierarchies on the B-level of the FSDM

Value SCHEME Location LOCATION TYPE Value Geographic Area SCHEME GEOGRAPHIC AREA TYPE Value Country Postcode Area Time Zone

: :

Address ADDRESS TYPE Legal Address Internal Address Postal Address

: :

Geographic Area Name Type

Generic ERD on the C-Level

Location Geographic Area Name Geographic Area ID includes is_included_in includes is_included_in Geographic Area is_classified_by classifies Geographic Area Type is_included_in includes Geographic Area Time Zone Difference Geographic Area ID Type Time Zone

Metadata

What is metadata?

Data about data Main functions are to give...

• data definitions • the origin of data • the structure of data • rules for the selection and transfer of data • qualitative and quantitative data about data

Why is metadata needed?

• Increasing functionality of data warehouses • Increasing size and complexity of data warehouses • Increasing number of varied user groups • Evolution of data warehouses and historical data analysis requirements Users and developers need a better, more standardised, way to document and communicate their knowledge of the warehouse, its rules and data sources

The metadata repository

A specialised database designed to maintain metadata together with tools and interfaces which allow the company to collect and distribute the data • Is a combination of shared and local data about data • Is the vital component in a distributed metadata architecture – supports the distribution of sharable components – supports the autonomy and control of unshared local components

The life cycle of metadata

• Collection • identify and capture metadata in a central repository • Maintenance • establish processes to synchronise metadata automatically with the changing data structure • Deployment • provide metadata to users in the right form and with the right tools

Focus areas for the collection of metadata

• Warehouse data sources • physical data structures • business definitions of all data elements • platforms,data formats, update frequencies • Data models • the logical and physical enterprise data model • Warehouse data models • the logical and physical schemas for the data warehouse • Warehouse mappings • between warehouse and operational data structures • Warehouse usage information • who’s using the warehouse and how they’re using it • try and relate business problems and specific queries

Target groups for metadata deployment

• Warehouse developers • physical structure models for data sources • target physical data structures as they evolve • evolving mapping schemas • Warehouse maintenance staff • monitor changes in the provision and utilisation environment and manage the effects of these changes on the DW • responsible for updating the metadata when the Dw architecture is affected • ensure the capability of tracing changes • End-users • aid exploration and understandability of information • validate information on the basis of source and quality • standard queries for specific business problems

Integration with data access tools

4 levels of possible integration are suggested • Side-by-side access • Use a query tool to provide context sensitive help texts • Query tools specifically suited for accessing metadata itself • Full interconnectivity between metadata tool and query tool • access to business query tools through metadata • transparent move from business query tools to metadata

Versioning of metadata and metadata maintenance

• DW always contains a long history of data in order to support analysis • The time-specific context of the information has to be saved in order to explain the content • Changes in the DW demand a new version of the metadata • Parallel version management of the DW and metadata