Transcript Warehousing on the Web Webhouse
Warehousing on the Web
Webhouse
Why Utilize the Web?
What is the data Webhouse
Managing clickstreams
WWW today
ROI
DSS
Data Webhouse
Defined by Ralph Kimball
Two distict focuses
•
Bringing the web to the warehouse
–
Clickstream data as a source of information
•
Bringing existing data warehouses to web
–
Fully distributed environment
Required Capabilities
Capture clickstream logs and convert to tables for analysis
Merge customer demographic and account info with above
Interpret customer paths in website
Identify abandoned sessions
Use dw to drive customer responses appearing on your website
DW querying and reporting available through web browsers
Attach multimedia to DW
DW security
Architecture – Web to Warehouse
Beyond comprehensive snapshot of business on real-time basis also want knowledge of customer behavior
Extended design factors
• • •
Timliness – real-time Data volume – no upper limit Response time – less than 10 seconds
Hot Response Cache
A file server holding complex file objects
As a file server it is an I/O engine (bandwidth)
Must hold objects which will be requested
Security responsibility of requesting server
Extension of original operational data store (ODS)
Does not physically speed up database creates illusion by storing predictable answers
Who are our users?
Traditional
•
Power users
–
need database connectivity
•
Analysts
–
want to manipulate existing data
•
Report viewers
–
view standardized reports
Web
• • •
Our customers Our business partners Our employees
Clickstreams
Clickstream not another data source
•
Distributed nature leads to multiple data sources which require synchronization
• •
Multiple parties More than a dozen log file formats for capturing clickstream data
•
Search specification
Basic form of clickstream data stateless
•
Log shows isolated page retrieval event
Clickstream data anonymous
Todays Promotions
•
Clickthroughs and referrals as a revenue source
Clickstreams
Clickstream post-processor – receives raw long data from web server and normalizes it into a format which can be combined with application derived data for insertion into dw
Todays Promotions
•
Clickthroughs and referrals as a revenue source
Why Bring DW to Web?
Primary function of dw to publish information – web good partner
Need distrnuted dw – web provides universal connectivity
Universal front-end – web browser
Web Pushes Data Warehouse
User interface effectiveness measurable
Queries and updates mixed Speed expected – 10 second rule
Global
•
27 X 7 expected
•
International characters, dates, addresses
Expanded multimedia
• • •
Animation, zoomable images, maps, video clips Need material in digital form Enterprise information portal will require items to be searchable
Web Pushes Data Warehouse
Mass customization
•
Dynamically created web pages – XML
Fully distributed
•
Linking together all the data marts
Security and Privacy
• •
Publish only to those who need to know User profiles and access profiles defined in one place
•
Full-time expert security person
Second Generation User Interface Guidelines
Near- instantaneous performance
Website Design
•
Design for lowest common denominator
• • • • • • •
Measure page performance on a continuous basis Paint navigation buttons immediately Disclose content progressively Implement page caching Cache data, reports Improve web server bandwidth Improve server throughput
Second Generation User Interface Guidelines
Data Webhouse design
• •
Adapt all web design responses Select appropriate DBMS software – dimensional models, OLAP
• • • •
Use indexes, aggregations Partition files Increase RAM Use parallel processing
Meet User Expectations
Website design
• • •
Site navigation choices Help choices Communication with various groups – response must be assured
• • •
Headlines serious and define content Indicate off-screen material Survey customer needs and wants
Meet User Expectations
Data Webhouse design
• • • •
Report library Folder of previous queries, reports … Dimension browser – viewing dimension can assist report creation Business metadata interface –understand organizations data assets
Streamline Process
Business processes designed from ground up to work seamlessly on web
• • •
Website design
•
Reengineer to streamline process and make navigation easier, uniform interfaces Remove barriers to reaching page Minimize clicks and new windows Allow interruption and return
Streamline Process
Data Webhouse design
•
Build an explicit value chain for reporting and analysis around the application suite using conformed dimensions and facts
• •
Drill across functions Single user interface for reporting against all parts of business
• •
Master report library and FAQs Single login and single console access to webhouse
Reassure Users
Website Design
•
Map of processes
Data Webhouse design
•
Provide status and lineage of current data
• • • • •
Provide status of running reports Active notification Allow for entry of NA if data not available Time stamped dimensions Time stamped reports
Allow Problem Resolution
Website design
• • • •
Allow backtracking, rollback, play forward Keep old transactions Easy error reporting Acknowledge, track and follow-up all user inputs, show wait time
•
Assist searching
Data Webhouse design
• • •
Provide adequate end user support Show aggregates in use and available Show system load and percent completed
Build Trust
Clearly state and observe website’s policies for using customer’s identity
Website design
• • • •
Do not abuse privacy Link to privacy statement Use friendly pictures of people Distinguish between ad content and editorial content
Build Trust
Data Webhouse design
•
Two-factor security
–
What you know – password
–
What you posses – token
•
Track changes in employee and contractor status
•
Create and enforce roles for employees, contractors and customers
•
Manage webhouse security directly
Provide Communication Hooks
Website design
• •
Provide useful links to others – internal and external Remove links that invalidate the “back” button
• •
Use copyable URLs Use URL as medium of distribution
Advantages of Web Today 1998
2000
Immediate worldwide access
Centralized management Decentralized
Thin client
Multi-platform (client and server) Distributed
Little or no software distribution Downloads
A+
Disadvantages of Web Today 1998
2000
Immature technology Teenager
Security Solutions
Speed restricted by bandwidth - data and logic must both travel across internet
Design limited to least common denominator or access restricted to specific browser
Vulnerabilities
Physical assets
Information assets
• •
theft modification
Software assets
Ability to conduct business
Web Architecture
Thin Client
Communication layer (network/internet) •Browser •Applets/ActiveX •Email •Spreadsheet •Word-processing
Internet Server Application Application Analysis/ Graphics Report SQL statistics Writer Query Database Servers OLAP Server Multidimensional Summary/Alternative Database Relational Tables Data Warehouse - Relational Database
Business Management through Information
Analysis of historical records
•
order processing, inventory levels, shipments, receivables, customer history, etc.
Goals include:
• •
Measures of efficiency Anticipate changes (planning and forecasting)
• •
Make adjustments Integration of model and control function
Rule-Based Management
Create Strategic rules
•
IF market demand increases THEN implement marketing campaign A3
•
IF profit margin drops below value X THEN adjust overhead by …
Must not forget alert rules
•
If unanticipated condition, then notify CFO
Must not be too reactive
•
would cause thrashing
OLDM Decision Process
Simultaneous capture of:
•
Decision support information
–
Surveyed customer on-line in exchange for an additional discount
•
with business function inputs
Immediate computation or estimation of secondary information
•
based on planning and forecasting rules
Decision support information is:
• •
available on-line ready to use “as is”
Management Defined !
OLDM Decision Process
Derived data becomes control information
Automation of analysis and decision support
•
immediately available to management
Problems documented on-line
Classes of problem and corrective action codified
• •
problem recognition decision rules
OLDM Decision Process
Requires four types of information
•
Characteristics which identify a class of problem
•
Corrective action ( management responses by problem class)
• •
Rules to implement actions Record of result
Potential of OLDM
Better managed business
• • •
knowledge asset capture and retention consistency across enterprise flexible, highly responsive
Close loop with customer
•
event and market driven but controlled
Direct customer interaction
•
via web, telephone, remote connection
Improved systems capacity planning and system management
Re-alignment of business and IT