Generic Information Builders' Presentation Template
Download
Report
Transcript Generic Information Builders' Presentation Template
Maximize WebFOCUS Performance
with Hyperstage
Apr, 2012
Agenda
Introduction to Hyperstage
How does it work
Recent results
Demonstration
Wrap Up and Q&A
Introducing Hyperstage
Copyright 2007, Information Builders. Slide 3
WebFOCUS Hyperstage
Why?
Why Do BI Applications Fail? Typically 3 Reasons….
1. Too Complicated
Self-Service, Guided Ad hoc
2. Bad Data
Data Quality
3. Too Slow
Hyperstage
Hyperstage will improve
database performance for
WebFOCUS applications with
less hardware, no database
tuning and easy migration.
What is WebFOCUS Hyperstage
Embedded, columnar data store that can dramatically increase the
performance of WebFOCUS applications
Columnar = reduced I/O (vs relational)
Easily implemented without the need for database administration
Disk footprint is reduced with a powerful compression algorithm
Includes embedded ETL for seamless migration of existing analytical
databases
No change in query or application required
Data migrations are seamless and easy
WF 7.7.03M and higher includes optimized Hyperstage Adapter
Runs on commodity hardware (Intel based)
Windows 64
Linux (Redhat, Centos, Suse, Debian)
5
Introducing WebFOCUS Hyperstage ….
Hyperstage is an integrated columnar oriented data store that
helps WebFOCUS applications achieve outstanding query
performance.
WebFOCUS Hyperstage Engine
How does it work?
Column Orientation
Smarter
Architecture
Knowledge Grid – statistics
and metadata “describing”
the super-compressed data
No maintenance
No query planning
No partition schemes
No DBA
Data Packs – data
stored
in manageably sized,
highly compressed
data packs
Data compressed
using algorithms
tailored to
data type
Data Organization and the
Knowledge Grid …
Copyright 2007, Information Builders. Slide 8
Pivoting Your Perspective: Columnar Technology
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Data stored in rows
Data stored in columns
1
Smith
New York
50,000
1
Smith
New York
50,000
2
Jones
New York
65,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
4
Fraser
Boston
70,000
Data Organization and the Knowledge Grid ….
Data Packs - The data within each column is stored
in groupings of 65,536 values called Data Packs
Data Packs improves data compression as the optimal
compression algorithm is applied based on the data contents
An average compression ratio of 10:1 is achieved after
loading data into Hyperstage. For example 1TB of raw data
can be stored in about 100GB of space.
Data Pack
Data Pack
Data Pack
Data Pack
Data Pack
Data Pack
Data Organization and the Knowledge Grid ….
Data Packs and Compression
Data Packs
64K
Each data pack contains 65, 536 data values
Compression is applied to each individual data pack
The compression algorithm varies depending on data
64K
type and data distribution
Compression
Results vary depending on the
64K
64K
Patent Pending
Compression
Algorithms
distribution of data among data
packs
A typical overall compression
ratio seen in the field is 10:1
Some customers have seen
results have been as high as 40:1
Data Organization and the Knowledge Grid ….
The Knowledge Grid
Column A
Knowledge Nodes
Column B
Pack Row 1
Global Knowledge
Pack Row 2
String and character data
Pack Row 3
Pack Row 4
Numeric data
Pack Row 5
Distributions
Built during
LOAD
Pack Row 6
Dynamic Knowledge
Built per-query
e.g. for
aggregates, joins
Data Organization and the Knowledge Grid ….
Data Pack Nodes (DPN)
A separate DPN is created for every data pack created
in the database to store basic statistical information
Character Maps (CMAPs)
Every Data Pack that contains text creates a matrix
that records the occurrence of every possible ASCII
character
Histograms
Histograms are created for every Data Pack that
contains numeric data and creates 1024 MIN-MAX
intervals.
Pack-to-Pack Nodes (PPN)
PPNs track relationships between Data Packs when
tables are joined. Query performance gets better as
the database is used.
This metadata layer = 1% of the compressed volume
How does it work …
Copyright 2007, Information Builders. Slide
14
WebFOCUS Hyperstage Example: Query and Knowledge Grid
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: salary > 50000
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
All values match
Completely Irrelevant
WebFOCUS Hyperstage Example: age<65
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: job = ‘shipping
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
3. Find the Data Packs that have job = ‘shipping’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: city = ‘Toronto
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
1.
2.
3.
4.
salary age
job city
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Eliminate Pack Rows
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
1.
2.
3.
4.
5.
salary age
job city
All packs
ignored
All packs
ignored
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
Eliminate All rows that have been flagged as
irrelevant
All packs
ignored
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Decompress and scan
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Toronto’;
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Toronto’
Eliminate All rows that have been flagged as
irrelevant
6. Finally we identify the pack that needs to be
decompressed
salary age
job city
All packs
ignored
All packs
ignored
1.
2.
3.
4.
5.
All packs
ignored
Only this pack will
be de-compressed
All values match
Completely Irrelevant
Suspect
POC Results (Internal Use Only)
Insurance Company
Query performance issues with SQL Server - Insurance claims analysis
3 day POC - Compression achieved 40:1
Most queries running 3X faster in Hyperstage
Large Bank
Query performance issues with SQL Server - Web traffic analysis
3 day POC -Compression achieved 10:1
Queries than ran for 10 to 15 mins in SQL Server ran sub-second in
Hyperstage
Government Application
Query performance issues with Oracle – Federal Loan/Grant Tracking
3 day POC -Compression achieved 15:1
Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in
Hyperstage
POCs can typically be completed with 3 days
22
WF Service
WF Connector
.Net
Java
WebFOCUS
Client
WebFOCUS
Reporting
Server
WF Hyperstage Adapter
Beyond WebFOCUS
WebFOCUS
Hyperstage
Server
Generic App
Java
C
.Net
PHP
Perl
Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and
is administered using the WebFOCUS console
WebFOCUS client applications communicate directly through the reporting server
Custom applications developed via Java or .Net can access the reporting server via
WebFOCUS services and a supplied WebFOCUS connector
Hyperstage also supports connections from any application via industry standard JDBC or
ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect
directly to the Hyperstage engine.
Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any
commercial ETL tool.
23
Hyperstage vs. OLAP
Many companies are looking to migrate from legacy OLAP solutions
Hyperstage can offer excellent query performance with a commonly
understood star pattern database
WebFOCUS can offer navigation and drill path navigation
Hyperstage can support large numbers of dimensional attributes and
can be easily updated
OLAP
WebFOCUS HyperStage
Limited number of dimensions
Difficult to add new dimensions
Rebuilding cubes can be slow
Up to 10X raw data size to amount
of disk consumed
Supports up to 4096 columns on a
single table
Dimension tables can be updated
Bulk loads of up to 500GB per hour
Typically 10:1 compression
24
Hyperstage vs. In-Memory
WebFOCUS Hyperstage is a viable alternative to BI tools that utilize
an in-memory architecture like QlikView, Tableau, Cognos TM1 and
Tibco/Spotfire
In-memory is limited to the amount of data you can store in RAM.
Hyperstage is a hybrid approach that efficiently uses disk I/O without
sacrificing the performance achieved by in-memory
Tableau for example has approximately a 100GB limit on its inmemory cache.
In Memory Solutions
WebFOCUS HyperStage
Storage: RAM
Expensive
Short term
Requires additional hardware
Storage: RAM/Disk
Cheap
Long Term
Leverage existing hardware
25
Demonstration …
Copyright 2007, Information Builders. Slide
26
NYSE Daily Stock Price History
Downloaded from internet daily history from 1970 to 2006 for 7000
stocks
14 million rows
1.4GB of raw data
Compressed to 70MB
Test query summarizes stock information for top tech companies in
March 2000 and compares the information for the same period in
March 2002 (dot com collapse)
Note: Hyperstage running on a Dell laptop 1 duo core processor with
4GB of RAM
NYSE Daily Stock Price History (exploded)
Simulated additional stock prices up to 2043
2 billion rows
200GB of raw data
Compressed to 17GB
Test query summarizes stock information for top tech companies in
March 2000 and compares the information for the same period in
March 2002 (dot com collapse)
WebFOCUS Hyperstage
The Big Deal…
No indexes
No partitions
No views
No materialized aggregates
Value proposition
Low IT overhead
Allows for autonomy from IT
Ease of implementation
Fast time to market
Less Hardware
Lower TCO
No DBA
Required!
Q&A
Copyright 2007, Information Builders. Slide
30