Warehousing on the Web Webhouse

Download Report

Transcript Warehousing on the Web Webhouse

Warehousing on the Web

Webhouse

Why Utilize the Web?

What is the data Webhouse

Managing clickstreams

WWW today

ROI

DSS

Data Webhouse

Defined by Ralph Kimball

Two distict focuses

Bringing the web to the warehouse

Clickstream data as a source of information

Bringing existing data warehouses to web

Fully distributed environment

Required Capabilities

Capture clickstream logs and convert to tables for analysis

Merge customer demographic and account info with above

Interpret customer paths in website

Identify abandoned sessions

Use dw to drive customer responses appearing on your website

DW querying and reporting available through web browsers

Attach multimedia to DW

DW security

Architecture – Web to Warehouse

Beyond comprehensive snapshot of business on real-time basis also want knowledge of customer behavior

Extended design factors

• • •

Timliness – real-time Data volume – no upper limit Response time – less than 10 seconds

Hot Response Cache

A file server holding complex file objects

As a file server it is an I/O engine (bandwidth)

Must hold objects which will be requested

Security responsibility of requesting server

Extension of original operational data store (ODS)

Does not physically speed up database creates illusion by storing predictable answers

Who are our users?

Traditional

Power users

need database connectivity

Analysts

want to manipulate existing data

Report viewers

view standardized reports

Web

• • •

Our customers Our business partners Our employees

Clickstreams

Clickstream not another data source

Distributed nature leads to multiple data sources which require synchronization

• •

Multiple parties More than a dozen log file formats for capturing clickstream data

Search specification

Basic form of clickstream data stateless

Log shows isolated page retrieval event

Clickstream data anonymous

Todays Promotions

Clickthroughs and referrals as a revenue source

Clickstreams

Clickstream post-processor – receives raw long data from web server and normalizes it into a format which can be combined with application derived data for insertion into dw

Todays Promotions

Clickthroughs and referrals as a revenue source

Why Bring DW to Web?

Primary function of dw to publish information – web good partner

Need distrnuted dw – web provides universal connectivity

Universal front-end – web browser

Web Pushes Data Warehouse

User interface effectiveness measurable

 

Queries and updates mixed Speed expected – 10 second rule

Global

27 X 7 expected

International characters, dates, addresses

Expanded multimedia

• • •

Animation, zoomable images, maps, video clips Need material in digital form Enterprise information portal will require items to be searchable

Web Pushes Data Warehouse

Mass customization

Dynamically created web pages – XML

Fully distributed

Linking together all the data marts

Security and Privacy

• •

Publish only to those who need to know User profiles and access profiles defined in one place

Full-time expert security person

Second Generation User Interface Guidelines

Near- instantaneous performance

Website Design

Design for lowest common denominator

• • • • • • •

Measure page performance on a continuous basis Paint navigation buttons immediately Disclose content progressively Implement page caching Cache data, reports Improve web server bandwidth Improve server throughput

Second Generation User Interface Guidelines

Data Webhouse design

• •

Adapt all web design responses Select appropriate DBMS software – dimensional models, OLAP

• • • •

Use indexes, aggregations Partition files Increase RAM Use parallel processing

Meet User Expectations

Website design

• • •

Site navigation choices Help choices Communication with various groups – response must be assured

• • •

Headlines serious and define content Indicate off-screen material Survey customer needs and wants

Meet User Expectations

Data Webhouse design

• • • •

Report library Folder of previous queries, reports … Dimension browser – viewing dimension can assist report creation Business metadata interface –understand organizations data assets

Streamline Process

Business processes designed from ground up to work seamlessly on web

 • • •

Website design

Reengineer to streamline process and make navigation easier, uniform interfaces Remove barriers to reaching page Minimize clicks and new windows Allow interruption and return

Streamline Process

Data Webhouse design

Build an explicit value chain for reporting and analysis around the application suite using conformed dimensions and facts

• •

Drill across functions Single user interface for reporting against all parts of business

• •

Master report library and FAQs Single login and single console access to webhouse

Reassure Users

Website Design

Map of processes

Data Webhouse design

Provide status and lineage of current data

• • • • •

Provide status of running reports Active notification Allow for entry of NA if data not available Time stamped dimensions Time stamped reports

Allow Problem Resolution

Website design

• • • •

Allow backtracking, rollback, play forward Keep old transactions Easy error reporting Acknowledge, track and follow-up all user inputs, show wait time

Assist searching

Data Webhouse design

• • •

Provide adequate end user support Show aggregates in use and available Show system load and percent completed

Build Trust

Clearly state and observe website’s policies for using customer’s identity

Website design

• • • •

Do not abuse privacy Link to privacy statement Use friendly pictures of people Distinguish between ad content and editorial content

Build Trust

Data Webhouse design

Two-factor security

What you know – password

What you posses – token

Track changes in employee and contractor status

Create and enforce roles for employees, contractors and customers

Manage webhouse security directly

Provide Communication Hooks

Website design

• •

Provide useful links to others – internal and external Remove links that invalidate the “back” button

• •

Use copyable URLs Use URL as medium of distribution

Advantages of Web Today 1998

2000

Immediate worldwide access

Centralized management Decentralized

Thin client

Multi-platform (client and server) Distributed

Little or no software distribution Downloads

A+

Disadvantages of Web Today 1998

2000

Immature technology Teenager

Security Solutions

Speed restricted by bandwidth - data and logic must both travel across internet

Design limited to least common denominator or access restricted to specific browser

Vulnerabilities

Physical assets

Information assets

• •

theft modification

Software assets

Ability to conduct business

Web Architecture

Thin Client

Communication layer (network/internet) •Browser •Applets/ActiveX •Email •Spreadsheet •Word-processing

Internet Server Application Application Analysis/ Graphics Report SQL statistics Writer Query Database Servers OLAP Server Multidimensional Summary/Alternative Database Relational Tables Data Warehouse - Relational Database

Business Management through Information

Analysis of historical records

order processing, inventory levels, shipments, receivables, customer history, etc.

Goals include:

• •

Measures of efficiency Anticipate changes (planning and forecasting)

• •

Make adjustments Integration of model and control function

Rule-Based Management

Create Strategic rules

IF market demand increases THEN implement marketing campaign A3

IF profit margin drops below value X THEN adjust overhead by …

Must not forget alert rules

If unanticipated condition, then notify CFO

Must not be too reactive

would cause thrashing

OLDM Decision Process

Simultaneous capture of:

Decision support information

Surveyed customer on-line in exchange for an additional discount

with business function inputs

Immediate computation or estimation of secondary information

based on planning and forecasting rules

Decision support information is:

• •

available on-line ready to use “as is”

Management Defined !

OLDM Decision Process

Derived data becomes control information

Automation of analysis and decision support

immediately available to management

Problems documented on-line

Classes of problem and corrective action codified

• •

problem recognition decision rules

OLDM Decision Process

Requires four types of information

Characteristics which identify a class of problem

Corrective action ( management responses by problem class)

• •

Rules to implement actions Record of result

Potential of OLDM

Better managed business

• • •

knowledge asset capture and retention consistency across enterprise flexible, highly responsive

Close loop with customer

event and market driven but controlled

Direct customer interaction

via web, telephone, remote connection

Improved systems capacity planning and system management

Re-alignment of business and IT