eCommerce&Security

Download Report

Transcript eCommerce&Security

Chapter 4 - Data Warehouses
• 4.1 Data Warehouse Introduction
• 4.2 Conceptual Structure & Architecture
• 4.3 Data Modelling for Data Warehouses
• 4.4 Functionality of a Data Warehouse
• 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
4.1 Data Warehouse Introduction (1)
• Traditional database usage:
+ databases are used by organisations for maintaining data that documents their
everyday operations
+ transactions typically make small changes to this operational data
+ example:
• adding a reservation
• logging sales of goods
+ these systems are called online transaction processing (OLTP) applications
+ DBMS have been optimised extensively to perform well in such applications
• Recently, there has been an increased emphasis on analysis and
exploration of data
CA306 Data Warehouses
4-2
Data Warehouse Introduction (2)
• The key functionality of data warehouses is data analysis. They
provide
+ storage of data
+ query functionality
for queries and analyses beyond the capabilities of traditional transactionoriented databases (OLTP)
• When these queries and analyses become recurrent and
predictable, then the development of special read-only databases
with efficient data access becomes sensible.
• These data warehouses are mainly used as decision support
systems in organisations.
CA306 Data Warehouses
4-3
Terminology and Definitions
• Data warehouses provide access to data for
+ complex analysis
+ knowledge discovery
+ decision making
• Types of applications:
+ OLAP (on-line analytical processing) - extraction, processing, and presentation
of complex data for analytic and decision-making purposes.
+ DSS (decision support systems) - support for an organisation’s decision makers
with higher-level data.
+ data mining - knowledge discovery, search for unexpected knowledge.
CA306 Data Warehouses
4-4
Characteristics of Data Warehouses
• Data sources: Data warehouses usually store integrated data from
multiple sources - multidatabases on the other hand provide
uniform access to disjoint and heterogeneous databases.
• Time and historical data: Data warehouses support time-series and
trend analysis - transactional databases usually do not offer specific
support for historical data.
• Updates: Data warehouses are updated incrementally - in
transactional databases single transactions are the unit of change.
• A data warehouse is a collection of decision support technologies,
aimed at enabling the knowledge worker to make better and faster
decisions. (Chaudhuri and Dayal, in ACM SIGMOD Record 26(1),
1997)
CA306 Data Warehouses
4-5
Characteristics of Data Warehouses
•
•
•
•
•
•
Multi-dimensional conceptual view
generic dimensionality
unlimited dimensions and
aggregation levels
unrestricted cross-dimensional
operations
dynamic sparse matrix handling
client-server architecture
•
•
•
•
•
•
Multi-user support
accessibility
transparency
intuitive data manipulation
Consistent reporting performance
Flexible reporting
after E. Codd et.al. (1993): Providing
OLAP (on-line analytical processing)
to user-analysts: An IT mandate.
White paper, 1993.
see http://www.hyperion.com/
CA306 Data Warehouses
4-6
Data Analysis
• Inadequacies of SQL for data analysis:
+ queries with extensive use of AND and in particular OR do not perform well in
some DBMS implementations of SQL
+ extensive use of statistical functions is not well supported
+ time series analysis (e.g. aggregating values over time) is poorly supported
• Systems/tools that support these specific forms of queries are
categorised as OLAP-tools
• Data mining tools and other forms of exploratory data analysis are
concerned with finding trends or patterns, rather than complex
queries.
CA306 Data Warehouses
4-7
Summary
• Data Warehousing: A subject-oriented, integrated, time-variant,
and non-volatile collection of data in support of management’s
decision-making process.
• Subject-Oriented: The warehouse is organised around the major
subjects of the enterprise (customer, products, sales) rather than
the major application areas (invoicing, stock control). This is as a
result of the need to store decision-support data rather than
application-oriented data.
• Integrated: The integration of source data from different
enterprise-wide application systems. As source data will reside in
different formats, it must be represented in a canonical formal for
users.
CA306 Data Warehouses
4-8
• Time-Variant: Data in the warehouse in only accurate and valid at
some point in time, or over a specified time-interval. The timeinvariance of the D.W. is due to the implicit or explicit association of
time with all data, and the fact that data represents a series of
snapshots.
• Non-Volatile: Data is not updated in real-time but is refreshed
from operational systems on a regular basis. New data is always
added as a supplement to the database (not a replacement). The
D.W. continually absorbs new data, integrating it with previous
data.
CA306 Data Warehouses
4-9
Sections Covered
 4.1 Data Warehouse Introduction
• 4.2 Conceptual Structure & Architecture
• 4.3 Data Modelling for Data Warehouses
• 4.4 Functionality of a Data Warehouse
• 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-10
4.2 Conceptual Structure and Architecture
of a Data Warehouse
• The overall process of data warehousing.
+ data acquisition: populating and updating the data warehouse
+ analysis of warehouse data
+ feedback of analysis results
CA306 Data Warehouses
4-11
EXTERNAL DATA SOURCES
Data Warehouse
Architecture
• External data sources provide
input for data warehouse
• The data warehouse stores
data and metadata
• Data warehouses underlie data
analyses such as
+ OLAP
+ data mining
EXTRACT
TRANSFORM
LOAD
REFRESH
Metadata
Repository
DATA
WAREHOUSE
SUPPORTS
CA306 Data Warehouses
DATA
MINING
OLAP
4-12
OLTP v Data Warehousing
• A DBMS built for OnLine Transaction Processing (OLTP) is generally
regarded as unsuitable for data mining, as both have different
requirements.
• OLTP are designed to maximise the transaction processing capacity,
while data warehouses are designed to support ad hoc query
processing.
• Organisations have a large number of OLTP systems which are
detailed, current, and subject to change. These systems are
optimised for a high volume number of transactions, that are
predictable, repetitive and update intensive.
• An organisation will have a single data warehouse, with data that is
historical, detailed, summarised to various levels, and rarely subject
to change. It supports a relatively low number of transactions that
are unpredictable in nature, unstructured, and heuristic.
CA306 Data Warehouses
4-13
Comparison
OLTP SYSTEMS
DATA WAREHOUSING SYSTEMS
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Holds Current Data
Stores Detailed Data
Data is dynamic
Repetitive Processing
High volume of transactions
Predictable pattern of usage
Transaction-driven
Application-oriented
Supports day-to-day decisions
Serves a large number of users
CA306 Data Warehouses
Holds historical data
Stores detailed and summarised data
Data is generally static
Ad hoc, unstructured, heuristic proc.
Medium to low volume of transactions
Unpredictable pattern of usage
Analysis Driver
Subject-oriented
Supports strategic decisions
Serves low number of managerial users
4-14
Sections Covered
 4.1 Data Warehouse Introduction
 4.2 Conceptual Structure & Architecture
• 4.3 Data Modelling for Data Warehouses
• 4.4 Functionality of a Data Warehouse
• 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-15
4.3 Data Modelling for Data Warehouses
• The most typical OLAP-queries are complex ad-hoc queries.
• In SQL-terms these are queries with group-by and aggregation
operators.
• A multidimensional data model is the most natural model for OLAP.
• Multidimensional modelling can enhance query performance.
• Principle:
+ In the multidimensional data model, data is described as a collection of
numeric measures.
+ Each measure depends on a set of dimensions
CA306 Data Warehouses
4-16
Data Warehouses - Data Modelling
• Example: sales revenues in a corporate data warehouse is a
measure that depends on several dimensions (e.g. product, region,
fiscal period, etc)
+ two-dimensional: a two-dimensional matrix based on product and region
CA306 Data Warehouses
4-17
3D Modelling
+ three-dimensional: a three-dimensional matrix (data cube) based on product,
region, and fiscal period
+ for a given product, region and period there is at most one sales value
+ higher dimensions: hypercubes …difficult to present graphically
• Not all data lends itself to be represented in a multidimensional
format.
How did product P125 do in
The 3rd quarter for region 2 ?
CA306 Data Warehouses
4-18
Storage of Multidimensional Models
• The storage of multidimensional models involves two types of
tables: dimension tables and fact tables.
• A dimension table is a relation that contains tuples of attributes of a
dimension.
+ example: a product dimension table with tuples containing values for
ProductNo, ProductName, etc.
• A fact table is where each tuple in the table contains a fact, which is
collection of measured/observed variables. It represents a multidimensional array as a relation - relating one value for each
dimension with the measured value for the input dimensions. It
might also contain pointers to dimension tables.
+ example: a SALES fact table with attributes regionID, productID, periodID,
and sales.
CA306 Data Warehouses
4-19
Dimension and Fact Tables
Dimension table:
Fact table:
product ID
weight
11
12
13
22
20
15
13
8
10
10
11 12
productID
Data cube:
30
20
50
25
8
15
1
2
3
CA306 Data Warehouses
periodID
regionID
(ID = 1
displayed)
product
ID
11
11
11
12
12
12
13
13
13
11
period
ID
1
2
3
1
2
3
1
2
3
1
region
ID
1
1
1
1
1
1
1
1
1
2
sales
25
8
15
30
20
50
8
10
10
35
4-20
Dimension Hierarchies
• For each dimension, the set of values can be organized in a
hierarchy:
PRODUCT
PERIOD
REGION
year
quarter
category
pname
CA306 Data Warehouses
week
date
month
country
state
city
4-21
Multidimensional Schemas: Star
• Star Schema: one fact table with a single table for each dimension.
The fact table is the central entity; it represents the facts on which
the analysis is based
• Example: a central sales-table with dimension tables product,
region and period.
CA306 Data Warehouses
4-22
Multidimensional Schemas: Snowflake
• Snowflake Schema: special form of a star schema, where dimension
tables are organised into a hierarchy (normalisation).
• Example: a period attribute is structured into beginDate and
endDate.
CA306 Data Warehouses
4-23
Star Schema
PERIOD
periodID date week month quarter year holiday_flag
productID periodID regionID
sales
PRODUCTS
productID pname category price
(Fact table)
SALES
REGION
regionID
city state country
• Fact table in BCNF; dimension tables not normalized.
+ Dimension tables are small; updates/inserts/deletes are rare. So, anomalies
less important than good query performance.
• This kind of schema is very common in OLAP applications, and is
called a star schema.
• Computing the join of all these relations is called a star join.
CA306 Data Warehouses
4-24
Storage Strategies
• In OLAP applications, data can be represented in multidimensional
arrays (similar to programming languages)
• These OLAP systems are called multidimensional OLAP (MOLAP)
systems
• Other OLAP systems represent data in the form of fact and
dimension tables, i.e. as relations.
• These OLAP systems are called relational OLAP (ROLAP) systems
CA306 Data Warehouses
4-25
Sections Covered
 4.1 Data Warehouse Introduction
 4.2 Conceptual Structure & Architecture
 4.3 Data Modelling for Data Warehouses
• 4.4 Functionality of a Data Warehouse
• 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-26
4.4
•
•
•
•
•
Functionality of Data Warehouses
Queries, Aggregation
Pivoting
Comparison with SQL Queries
Hierarchical Views
Cube Operator
CA306 Data Warehouses
4-27
Functionality
• Typical functionality of data warehouses is to facilitate complex,
data-intensive and frequent ad-hoc queries
+
+
+
+
+
enhanced spreadsheet functionality
efficient query processing
structured queries
ad-hoc queries
data mining
CA306 Data Warehouses
4-28
OLAP Queries
• Operations supported by OLAP systems are influenced by
spreadsheet programs. Some examples on how multidimensional
data can be queried:
+ find the total sales (for each city, for each state)
+ find the top five products ranked by total sales
The first query can be expressed using SQL (query fact and
dimension tables), the second can’t (only a complete ordered list
can be produced).
• Aggregation is an important construct: measures on one or more
dimensions can be aggregated.
• Example: total sales by city depends only on region, whereas the
original sales measure depends on product, region and period.
CA306 Data Warehouses
4-29
Pre-programmed Functionality
• Pre-programmed functionality:
+ roll-up - data is summarised with increasing generalisation (higher level of
abstraction)
+ drill-down - increasing levels of data details are revealed
+ pivoting - cross-tabulation (rotation)
+ slice and dice - perform projection operation on the dimensions
+ sorting - data is sorted by ordinal value
+ selection - data is available by value or range
+ derived attributes - attribute values are computed
• Users:
+ knowledge workers
+ decision makers
CA306 Data Warehouses
4-30
Pivoting
• Data can be queried in any combination of dimensions.
• Tools exist for viewing data according to the user’s choice of
dimensions.
• Given a fact table - such as the sales table - the user can pivot on
specific dimensions
• Pivoting is also called rotation.
WI CA Total
(see next slide & compare with 4.18)
• Pivoting can be combined with aggregation 1995 63 81 144
on selected dimensions.
+ Example:
Pivoting on Region and Period
yields a cross-tabulation
1996
38 107 145
1997
75
35 110
Total 176 223 339
CA306 Data Warehouses
4-31
CA306 Data Warehouses
4-32
Comparison with SQL Queries
• The cross-tabulation obtained by pivoting can also be computed
using a collection of SQL-queries:
SELECT SUM(S.sales)
FROM Sales S, Period P, Region R
WHERE S.periodID=P.periodID AND S.regionID=R.regionID
GROUP BY P.year, R.state
SELECT SUM(S.sales)
FROM Sales S, Period P
WHERE S.periodID=P.periodID
GROUP BY P.year
CA306 Data Warehouses
SELECT SUM(S.sales)
FROM Sales S, Region R
WHERE S.regionID=R.regionID
GROUP BY R.state
4-33
Hierarchical Views - Roll-up
• Roll-up display: elements of one dimension are grouped into
classes/categories, e.g. creating product groups or grouping weeks
into months.
Move from individual products to (the coarser grained of) product categories.
CA306 Data Warehouses
4-34
Hierarchical Views - Drill-Down
• Drill-down display: an element of one dimension is split up into
several elements, e.g. a region is split into sub-regions.
The drill-down operation offers a finer-grained view: dis-aggregating country
sales by region, and subsequently by sub-region, and breaking products into
styles.
CA306 Data Warehouses
4-35
The CUBE Operator
• Generalizing the previous example, if there are k dimensions, we
have 2k possible SQL GROUP BY queries that can be generated
through pivoting on a subset of dimensions.
• CUBE productID, regionID, periodID BY SUM Sales
+ Equivalent to rolling up Sales on all eight subsets of the set {productID,
regionID, periodID}
+ each roll-up corresponds to an SQL query of the form:
SELECT SUM(S.sales)
FROM Sales S
GROUP BY grouping-list
CA306 Data Warehouses
4-36
Grouping List
• The eight SQL-queries only differ on the grouping list (GROUP BY)
• They can be arranged in form of a lattice (subset-based)
(productID, periodID, regionID)
(productID, periodID)
(productID)
CA306 Data Warehouses
(productID, regionID)
(periodID)
()
(periodID, regionID)
(regionID)
4-37
Sections Covered
 4.1 Data Warehouse Introduction
 4.2 Conceptual Structure & Architecture
 4.3 Data Modelling for Data Warehouses
 4.4 Functionality of a Data Warehouse
• 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-38
4.5 Views and Decision Support
• OLAP queries are typically aggregate queries.
+ Precomputation is essential for reduced interactive response times.
+ The CUBE is in fact a collection of aggregate queries, and precomputation is
especially important: what is best to precompute given a limited amount of
space to store precomputed results?
+ The idea is to choose a subset of aggregate queries for (view) materialisation
(precomputation)
• Warehouses can be thought of as a collection of asynchronously
replicated tables and periodically maintained materialised views.
CA306 Data Warehouses
4-39
Query Modification (Evaluate On Demand)
• View: sales of products by category and state
CREATE VIEW RegionalSales(category,sales,state)
AS SELECT P.category, S.sales, R.state
FROM Product P, Sales S, Region R
WHERE P.productID=S.productID AND S.regionID=R.regionID
• and query: total sales for each category by state
SELECT RS.category, RS.state, SUM(RS.sales)
FROM RegionalSales AS RS GROUP BY RS.category, RS.state
• Query modification: replace occurrence of RegionalSales by its definition
SELECT RS.category, RS.state, SUM(RS.sales)
FROM (SELECT P.category, S.sales, R.state
FROM Product P, Sales S, Region R
WHERE P.productID=S.productID AND S.regionID=R.regionID) AS RS
GROUP BY RS.category, RS.state
CA306 Data Warehouses
4-40
View Materialisation (Precomputation)
• Computing on demand can cause unacceptable response times in
case of complex view definitions.
• Suppose we precompute the view RegionalSales and store it with a
clustered B+ tree index on search key [category,state,sales].
• Then, the previous query can be answered by an index-only scan.
• The disadvantage is that these views have to be maintained.
SELECT RS.state, SUM(RS.sales)
FROM RegionalSales RS
WHERE RS.category=“Laptop”
GROUP BY RS.state
Index on precomputed view
is useful: locate first index leaf entry
for “Laptop”, then scan leaf level
CA306 Data Warehouses
SELECT RS.state, SUM(RS.sales)
FROM RegionalSales RS
WHERE RS. state=“Wisconsin”
GROUP BY RS.category
Index is less useful: must
scan entire leaf level.
4-41
Issues in View Materialisation
• What views should we materialise, and what indexes should we
build on the pre-computed results? The previous example illustrates
this problem.
+ A small set of views should be chosen that support the most important queries.
• Given a query and a set of materialised views, can we use the
materialised views to answer the query?
+ It needs to be made sure that the materialised views are consistent with
underlying tables
• How frequently should we refresh materialised views to make them
consistent with the underlying tables? (And how can we do this
incrementally?)
+ A view maintenance policy determines the refreshment approach (e.g.
immediate vs. deferred)
CA306 Data Warehouses
4-42
Interactive Queries: Beyond Materialisation
• Top N Queries: If you want to find the 10 (or so) cheapest cars, it
would be nice if the DB could avoid computing the costs of all cars
before sorting to determine the 10 cheapest.
• The user wants to see answers quickly (compare Internet queries)
• Idea: Guess at a cost c such that the 10 cheapest all cost less than
c, and that not too many more cost less - guess is possible if
distribution of sales values is known. Then add the selection cost<c
and evaluate the query.
+ If the guess is right, we avoid computation for cars that cost more than c.
+ If the guess is wrong, we need to reset the selection and recompute the
original query.
CA306 Data Warehouses
4-43
Top N Queries
SELECT P.productID, P.productName, S.sales
FROM Sales S, Products P
WHERE S.productID=P.productID AND S.regionID=1 AND S.periodID=3
ORDER BY S.sales DESC
OPTIMIZE FOR 10 ROWS
• OPTIMIZE FOR construct is not in SQL:1999
SELECT P.productID, P.productName, S.sales
FROM Sales S, Products P
WHERE S.productID=P.productID AND S.regionID=1 AND
S.periodID=3
AND S.sales > c
ORDER BY S.sales DESC
• Cut-off value c is chosen by optimizer.
CA306 Data Warehouses
4-44
Interactive Queries: Online Aggregation
• Online Aggregation: Consider an aggregate query, e.g. finding
the average sales by state. Can we provide the user with some
information before the exact average is computed for all states?
+ Can show the current “running average” for each state as the computation
proceeds.
+ Even better, if we use statistical techniques and sample tuples to aggregate
instead of simply scanning the aggregated table, we can provide bounds such as
“the average for Wisconsin is 2000102 with 95% probability.
• Non-blocking algorithms should be used for implementation - an
algorithm is said to block if it does not produce output tuples until it
has consumed all input tuples.
CA306 Data Warehouses
4-45
Sections Covered
 4.1 Data Warehouse Introduction
 4.2 Conceptual Structure & Architecture
 4.3 Data Modelling for Data Warehouses
 4.4 Functionality of a Data Warehouse
 4.5 Views and Decision Support
• 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-46
4.6 Building a Data Warehouse
• Constructing a data warehouse
+ analysis of requirements
+ design of data model
+ acquisition of data
• Design criteria:
+ an appropriate schema should reflect the anticipated usage - might change
from organisation to organisation
+ the design of a data warehouse should specifically support ad-hoc, complex
querying (OLAP) of the fact and dimension tables
• Not all possible queries and analyses can be anticipated.
CA306 Data Warehouses
4-47
Data Acquisition
• Acquisition of data:
+ Sources: data is usually extracted from multiple, heterogeneous sources databases and other data sources
+ Consistency: data must be formatted for consistency - reconciliation of names,
meanings and domains
+ Validity: data must be cleaned to ensure validity - identification of erroneous
and incomplete data
+ Data model: data must be fitted (transformed) into the data model of the
data warehouse - conversion from relational/object-oriented/legacy data
model into multidimensional data model
+ Loading: data must be loaded into the warehouse - the huge volume of data can
cause problems; a refresh policy is needed
CA306 Data Warehouses
4-48
Requirements
• How up-to-date must the warehouse data be?
+
+
+
+
+
Can the warehouse go offline for loading? For how long?
What is the loading time (including cleaning, formatting, etc.)?
What are the data interdependencies?
What is the storage availability?
What are the distribution requirements (consider replication and
partitioning)?
CA306 Data Warehouses
4-49
Building Process
• Data storage in a warehouse involves the following processes:
+
+
+
+
+
+
+
storing the data according to the data model of the warehouse
creating and maintaining the required data storage structures
creating and maintaining appropriate access paths
providing for time-variant data as new data are added
supporting the updating of warehouse data
refreshing the data periodically
purging the data (perhaps onto archival media)
CA306 Data Warehouses
4-50
Specific Architectures (1)
• Metadata repository
+ includes both technical and business metadata
• technical: details of storage structures, acquisition processing, etc
• business: business rules, organisational details that support the warehouse
+ due to its size metadata might be stored in a separate database
+ information that is stored in addition to table descriptions includes
• the sources of each data warehouse table
• the last time the warehouse data was refreshed
CA306 Data Warehouses
4-51
Specific Architectures (2)
• Distributed architectures:
+ distributed warehouse:
• problems: replication, partitioning, communications, consistency
• benefits: load balancing, scalability of performance, higher availability
+ federated warehouse:
• decentralised confederation of autonomous data warehouses
• each with its own metadata repository
• Note that the problem of refreshing data warehouse tables and
asynchronously maintaining replicas of tables in a distributed DBMS
is similar.
CA306 Data Warehouses
4-52
Operational Difficulties
• Operational issues: construction, administration and quality control
• Construction:
+ project management (design, construction, implementation) is important
• Administration:
+ even though a read-only system, a data warehouse is not static and evolves
over time
• Quality control:
+ quality and consistency must remain after acquisition and refreshing
CA306 Data Warehouses
4-53
Sections Covered
 4.1 Data Warehouse Introduction
 4.2 Conceptual Structure & Architecture
 4.3 Data Modelling for Data Warehouses
 4.4 Functionality of a Data Warehouse
 4.5 Views and Decision Support
 4.6 Building a Data Warehouse
• 4.7 Implementation Techniques
CA306 Data Warehouses
4-54
4.7 Implementation Techniques
• Response time in data warehouses is essential
• Maintenance costs of indexes can be neglected/forgotten
• Indexing:
+ B+ trees from the basis of supporting the execution of complex queries
+ Other forms of support (e.g. in systems such as Oracle 8) include
• bitmap indexes
• join indexes
CA306 Data Warehouses
4-55
Indexing (1)
• Bitmap indexing
+ constructs a bit vector for each
value in a domain (column) being
indexed
+ 1-bit placed in the j-th position in
the vector if the j-th row
contains the value
+ example: car size classifications
(compact, midsize, etc) as vectors
CA306 Data Warehouses
CAR-table
compact-Vector
0
1
0
1
Name
Golf
Polo
Astra
Corsa
Size
midsize
compact
midsize
compact
midsize-Vector
1
0
1
0
4-56
Indexing (2)
• Join indexing
+ the efficiency of executing join queries can be improved by index structures
+ a join index is a traditional index relating primary and foreign key values - a
fact table is likely to be joined with one or more dimension tables
+ a value of a dimension is related to a row in the fact table
+ example: a sales table with dimensions city and period
• support queries involving a city: a join index on city maintains the list of tuple
ids t1, .. , tn of the fact table that contain the city c
=> index contains
<c, t1>, ... , <c, tn>
• support queries involving a city c and a period p
=> the index contains
<c,p, t1>, ... , <c,p, tn>
CA306 Data Warehouses
4-57