Generic Information Builders' Presentation Template

Download Report

Transcript Generic Information Builders' Presentation Template

Maximize WebFOCUS Performance
with Hyperstage
Apr, 2012
Agenda
 Introduction to Hyperstage
 How does it work
 Recent results
 Demonstration
 Wrap Up and Q&A
Introducing Hyperstage
Copyright 2007, Information Builders. Slide 3
WebFOCUS Hyperstage
Why?
Why Do BI Applications Fail? Typically 3 Reasons….
1. Too Complicated
Self-Service, Guided Ad hoc
2. Bad Data
Data Quality
3. Too Slow
Hyperstage
Hyperstage will improve
database performance for
WebFOCUS applications with
less hardware, no database
tuning and easy migration.
What is WebFOCUS Hyperstage
 Embedded, columnar data store that can dramatically increase the





performance of WebFOCUS applications
 Columnar = reduced I/O (vs relational)
Easily implemented without the need for database administration
Disk footprint is reduced with a powerful compression algorithm
Includes embedded ETL for seamless migration of existing analytical
databases
 No change in query or application required
 Data migrations are seamless and easy
WF 7.7.03M and higher includes optimized Hyperstage Adapter
Runs on commodity hardware (Intel based)
 Windows 64
 Linux (Redhat, Centos, Suse, Debian)
5
Introducing WebFOCUS Hyperstage ….
Hyperstage is an integrated columnar oriented data store that
helps WebFOCUS applications achieve outstanding query
performance.
WebFOCUS Hyperstage Engine
How does it work?
Column Orientation
Smarter
Architecture
Knowledge Grid – statistics
and metadata “describing”
the super-compressed data
 No maintenance
 No query planning
 No partition schemes
 No DBA
Data Packs – data
stored
in manageably sized,
highly compressed
data packs
Data compressed
using algorithms
tailored to
data type
Data Organization and the
Knowledge Grid …
Copyright 2007, Information Builders. Slide 8
Pivoting Your Perspective: Columnar Technology
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Data stored in rows
Data stored in columns
1
Smith
New York
50,000
1
Smith
New York
50,000
2
Jones
New York
65,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
4
Fraser
Boston
70,000
Data Organization and the Knowledge Grid ….
Data Packs - The data within each column is stored
in groupings of 65,536 values called Data Packs
 Data Packs improves data compression as the optimal
compression algorithm is applied based on the data contents
 An average compression ratio of 10:1 is achieved after
loading data into Hyperstage. For example 1TB of raw data
can be stored in about 100GB of space.
Data Pack
Data Pack
Data Pack
Data Pack
Data Pack
Data Pack
Data Organization and the Knowledge Grid ….
Data Packs and Compression
Data Packs
64K
 Each data pack contains 65, 536 data values
 Compression is applied to each individual data pack
 The compression algorithm varies depending on data
64K
type and data distribution
Compression
 Results vary depending on the
64K
64K
Patent Pending
Compression
Algorithms
distribution of data among data
packs
 A typical overall compression
ratio seen in the field is 10:1
 Some customers have seen
results have been as high as 40:1
Data Organization and the Knowledge Grid ….
The Knowledge Grid
Column A
Knowledge Nodes
Column B
Pack Row 1
Global Knowledge
Pack Row 2
String and character data
Pack Row 3
Pack Row 4
Numeric data
Pack Row 5
Distributions
Built during
LOAD
Pack Row 6
Dynamic Knowledge
Built per-query
e.g. for
aggregates, joins
Data Organization and the Knowledge Grid ….
Data Pack Nodes (DPN)
A separate DPN is created for every data pack created
in the database to store basic statistical information
Character Maps (CMAPs)
Every Data Pack that contains text creates a matrix
that records the occurrence of every possible ASCII
character
Histograms
Histograms are created for every Data Pack that
contains numeric data and creates 1024 MIN-MAX
intervals.
Pack-to-Pack Nodes (PPN)
PPNs track relationships between Data Packs when
tables are joined. Query performance gets better as
the database is used.
This metadata layer = 1% of the compressed volume
How does it work …
Copyright 2007, Information Builders. Slide
14
WebFOCUS Hyperstage Example: Query and Knowledge Grid
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: salary > 50000
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
All values match
Completely Irrelevant
WebFOCUS Hyperstage Example: age<65
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: job = ‘shipping
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
3. Find the Data Packs that have job = ‘shipping’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: city = ‘Toronto
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
1.
2.
3.
4.
salary age
job city
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Eliminate Pack Rows
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
1.
2.
3.
4.
5.
salary age
job city
All packs
ignored
All packs
ignored
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
Eliminate All rows that have been flagged as
irrelevant
All packs
ignored
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Decompress and scan
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
Eliminate All rows that have been flagged as
irrelevant
6. Finally we identify the pack that needs to be
decompressed
salary age
job city
All packs
ignored
All packs
ignored
1.
2.
3.
4.
5.
All packs
ignored
Only this pack will
be de-compressed
All values match
Completely Irrelevant
Suspect
POC Results (Internal Use Only)



Insurance Company
 Query performance issues with SQL Server - Insurance claims analysis
 3 day POC - Compression achieved 40:1
 Most queries running 3X faster in Hyperstage
Large Bank
 Query performance issues with SQL Server - Web traffic analysis
 3 day POC -Compression achieved 10:1
 Queries than ran for 10 to 15 mins in SQL Server ran sub-second in
Hyperstage
Government Application
 Query performance issues with Oracle – Federal Loan/Grant Tracking
 3 day POC -Compression achieved 15:1
 Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in
Hyperstage
POCs can typically be completed with 3 days
22





WF Service
WF Connector
.Net
Java
WebFOCUS
Client
WebFOCUS
Reporting
Server
WF Hyperstage Adapter
Beyond WebFOCUS
WebFOCUS
Hyperstage
Server
Generic App
Java
C
.Net
PHP
Perl
Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and
is administered using the WebFOCUS console
WebFOCUS client applications communicate directly through the reporting server
Custom applications developed via Java or .Net can access the reporting server via
WebFOCUS services and a supplied WebFOCUS connector
Hyperstage also supports connections from any application via industry standard JDBC or
ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect
directly to the Hyperstage engine.
Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any
commercial ETL tool.
23
Hyperstage vs. OLAP
 Many companies are looking to migrate from legacy OLAP solutions
 Hyperstage can offer excellent query performance with a commonly


understood star pattern database
WebFOCUS can offer navigation and drill path navigation
Hyperstage can support large numbers of dimensional attributes and
can be easily updated
OLAP
WebFOCUS HyperStage

Limited number of dimensions




Difficult to add new dimensions

Rebuilding cubes can be slow

Up to 10X raw data size to amount 
of disk consumed
Supports up to 4096 columns on a
single table
Dimension tables can be updated
Bulk loads of up to 500GB per hour
Typically 10:1 compression
24
Hyperstage vs. In-Memory
 WebFOCUS Hyperstage is a viable alternative to BI tools that utilize



an in-memory architecture like QlikView, Tableau, Cognos TM1 and
Tibco/Spotfire
In-memory is limited to the amount of data you can store in RAM.
Hyperstage is a hybrid approach that efficiently uses disk I/O without
sacrificing the performance achieved by in-memory
Tableau for example has approximately a 100GB limit on its inmemory cache.
In Memory Solutions
WebFOCUS HyperStage
Storage: RAM
 Expensive
 Short term
 Requires additional hardware
Storage: RAM/Disk
 Cheap
 Long Term
 Leverage existing hardware
25
Demonstration …
Copyright 2007, Information Builders. Slide
26
NYSE Daily Stock Price History
 Downloaded from internet daily history from 1970 to 2006 for 7000
stocks
 14 million rows
 1.4GB of raw data
 Compressed to 70MB
 Test query summarizes stock information for top tech companies in
March 2000 and compares the information for the same period in
March 2002 (dot com collapse)
 Note: Hyperstage running on a Dell laptop 1 duo core processor with
4GB of RAM
NYSE Daily Stock Price History (exploded)
 Simulated additional stock prices up to 2043
 2 billion rows
 200GB of raw data
 Compressed to 17GB
 Test query summarizes stock information for top tech companies in
March 2000 and compares the information for the same period in
March 2002 (dot com collapse)
WebFOCUS Hyperstage
The Big Deal…
 No indexes
 No partitions
 No views
 No materialized aggregates
 Value proposition
Low IT overhead
Allows for autonomy from IT
Ease of implementation
Fast time to market
Less Hardware
Lower TCO
No DBA
Required!
Q&A
Copyright 2007, Information Builders. Slide
30