Star Schema Optimization - CSCI 6442

Download Report

Transcript Star Schema Optimization - CSCI 6442

Data Warehouse
and the
Star Schema
CSCI 242
©Copyright 2015, David C. Roberts, all rights reserved
Red Brick



2
Invented data warehouse; they sold a hardware
product with a star schema database
You loaded the Red Brick Warehouse and then
queried it for OLTP
It featured new optimizations for star schemas, was
very fast
Enter Sybase




3
Sybase learned the optimization and
developed their own product.
The Sybase product was a stand-alone
software data warehouse product
It couldn’t do general-purpose database
work, was just a data warehouse
They appear to have copied the Red Brick
idea, without selling hardware
Enter Oracle



4
Oracle, later, also copied the same
optimization
They added a bitmap index to their database
product, and added the star schema
optimization
Now their product could do data warehouse
as well as database
Status Today



5
Oracle dominates the field today
IBM eventually bought Red Brick so still
offers some sort of Red Brick product
Sybase offers their OLTP product, now as an
offering of SAP
So what is this algorithm that is so copied?
THE ALGORITHM
6
Optimizing Star Queries



7
Build a bitmap index on each foreign key
column of the fact table
Index is a 2-dimensional array, one column
for each row being indexed, one row per
value of that column
Bitmap indexes are typically much smaller
than b-tree indexes, that can be larger than
the data itself
Bitmap Index Example
8
Query Processing


The typical query is a join of foreign keys of
dimension tables to the fact table
This is processed in two phases:
1.
2.
9
From the fact table, retrieve all rows that are part
of the result, using bitmap indexes
Join the result of the step above to the
dimension tables
Example Query
Find sales and profits from the grocery
departments of stores in the West and
Southwest districts over the last three quarters
10
Example Query
SELECT
store.sales_district,
time.fiscal_period,
SUM(sales.dollar_sales) revenue,
SUM(dollar_sales) - SUM(dollar_cost) income
FROM
sales, store, time, product
WHERE
sales.store_key = store.store_key AND
sales.time_key = time.time_key AND
sales.product_key = product.product_key AND
time.fiscal_period IN ('3Q95', '4Q95', '1Q96') and
product.department = 'Grocery' AND
store.sales_district IN ('San Francisco', 'Los Angeles')
GROUP BY
store.sales_district, time.fiscal_period;
11
Phase 1
Finding the rows in the SALES table (using bitmap indexes):
SELECT ... FROM sales
WHERE
store_key IN (SELECT store_key FROM store WHERE
sales_district IN ('WEST', 'SOUTHWEST')) AND
time_key IN (SELECT time_key FROM time WHERE
quarter IN ('3Q96', '4Q96', '1Q97')) AND
product_key IN (SELECT product_key FROM product WHERE
department = 'GROCERY');
12
Phase 2
Now the fact table is joined to dimension
tables. For dimension tables of small
cardinality, a full-table scan may be used. For
large cardinality, a hash join could be used.
13
The Star Transformation

Use bitmap indexes to retrieve all relevant
rows from the fact table, based on foreign
key values
–

Join this result set to the dimension tables
–
–
14
This happens very fast
If there are many values, a hash join may be used
If there are fewer values, a b-tree driven join may
be used