Stream Processing in Emerging Distributed Applications

Download Report

Transcript Stream Processing in Emerging Distributed Applications

Overview of Database Systems
Yanlei Diao
University of Massachusetts Amherst
Outline
• An outside look: DB Application
• An inside look: Anatomy of DBMS
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
An Outside Look: DB Application
High-level, declarative interface
DBMS
• Persistent storage
• Performance
• Concurrency
• Automatic recovery
• Security…
Database Management System (DBMS): a software package
designed to store and manage a large amount of data
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Case Study: The Internet Shop*
• DBDudes Inc.: a well-known database consulting
firm
• Barns and Nobble (B&N): a large bookstore
specializing in books on horse racing
• B&N decides to go online, asks DBDudes to help
with the database design and implementation
• Step 0: DBDudes makes B&N agree to
– pay steep fees and
– schedule a lunch meeting for requirements analysis
* The example and all related material was taken from “Database Management Systems” Edition 3.
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Step 1: Requirements Analysis
• “I’d like my customers to be able to browse my catalog of
books and place orders online.”
– Books:
• For each book, B&N’s catalog contains its ISBN number, title, author,
price, year of publication, …
– Customers:
• Most customers are regulars with names and addresses registered
with B&N.
• New customers must first call and establish an account.
– On the new website:
• Customers identify themselves before browsing and ordering.
• Each order contains the ISBN of a book and a quantity.
– Shipping:
• For each order, B&N ships all copies of a book together once they
become available.
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Step 2: Conceptual Design
• A high level description of the data in terms of the EntityRelationship (ER) model.
ordernum
author
qty_in_stock
title
price
isbn
order_date
qty
cardnum
Year
Books
cname
cid
address
ship_date
Orders
Customers
• Design review:
– What if a customer places two orders of the same book in one day?
– Modification: add “ordernum” to Orders.
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Step 3: Logical Design
• Mapping the ER diagram to the relational model
CREATE TABLE Books
(isbn
CHAR(10),
title
CHAR(80),
author CHAR(80),
qty_in_stock INTEGER,
price
REAL,
year
INTEGER,
PRIMARY KEY(isbn))
CREATE TABLE Customers
(cid
INTEGER,
cname CHAR(80),
address CHAR(200),
PRIMARY KEY(cid))
CREATE TABLE Orders
(ordernum INTEGER,
isbn
CHAR(10),
cid
INTEGER,
cardnum CHAR(16),
qty
INTEGER,
order_date DATE,
ship_date DATE,
PRIMARY KEY(ordernum, isbn),
FOREIGN KEY (isbn) CREATE VIEW OrderInfo
REFERENCES Books,
(isbn, cid, qty, order_date, ship_date)
FOREIGN KEY (cid) AS SELECT O.isbn, O.cid, O.qty,
REFERENCES Customers)
O.order_date, O.ship_date
FROM
Orders O
• Access control: use views to restrict the access of
certain employees to customer sensitive information
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Step 4: Schema Refinement
Orders
ordernum
isbn
cid
cardnum
qty
order_date
ship_date
120
0-07-11
123
40241160
2
Jan 3, 2006
Jan 6, 2006
120
1-12-23
123
40241160
1
Jan 3, 2006
Jan 11, 2006
120
0-07-24
123
40241160
3
Jan 3, 2006
Jan 26, 2006
Redundant Storage!
Orders
ordernum cid cardnum order_date
120
123
40241160 Jan 3, 2006
Yanlei Diao, University of Massachusetts Amherst
Orderlists
ordernum isbn
qty ship_date
120
0-07-11
2
Jan 6, 2006
120
1-12-23
1
Jan 11, 2006
120
0-07-24
3
Jan 26, 2006
7/20/2015
Step 5: Internet Application Development
Presentation tier
• Interface to the user
• Adapt to display devices
Client Program
(Web Browser)
HTML,
Javascript,
Cookies
B&N Client:
• User input
• Session state
HTTP
Application logic tier
• Business logic (actions,
state between steps)
• Access multiple sources
Data management tier
• One/multiple DBMS(s)
JSP,
Servlets,
XSLT
Application Server
(Apache Tomcat…)
JDBC
Database System
(DB2, MySQL…)
Yanlei Diao, University of Massachusetts Amherst
XML,
stored
procedures
7/20/2015
B&N Business logic:
• Home page
• Login page
• Search page
• Cart page
• Confirm page
B&N Data:
• Books
• Customers
(User login)
• Orders
• Orderlists
An Example Internet Store
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Example SQL Queries
Search Page
Confirm Page
SELECT isbn, title, author, price
FROM
Books
WHERE isbn = '%<SearchString>%'
ORDER BY title
INSERT INTO Orders
(cid, cardnum, order_date)
VALUES
(<Cid>,<CreditCardNumber>,<OrderDate>)
Login Page
SELECT cid, username, password
FROM
Customers
WHERE username = '<SpecifiedUsername>'
Yanlei Diao, University of Massachusetts Amherst
SELECT ordernum
FROM Orders
WHERE CID = <Cid>
ORDER BY ordernum DESC
INSERT INTO Orderlists
(ordernum, isbn, qty)
VALUES
(<OrderNumber>,<ISBN>,<Quantity>)
7/20/2015
Step 6: Physical Design
• Good performance for typical workloads
• Auxiliary data structures (indices) to speed up searches
Books
Hash Index on Books.isbn
isbn
title
0-07-11
Legacies of
the Turf
1-12-23
author price year qty
Edward L.
Bowen
29.95 2003
10
Seattle Slew Dan Mearns
24.95 2000
0
0-07-24
Spectacular
Bid
Timothy
Capps
16.95 2001
3
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Yanlei Diao, University of Massachusetts Amherst
isbn number
H1
1 2 3 … … … … … … N-1
7/20/2015
Outline
• An outside look: DB Application
• An inside look: Anatomy of DBMS
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
An Inside Look: Anatomy of DBMS
Query Parser
Query
Processor
Query Rewriter
DBMS
Query Optimizer
Query Executor
Lock Manager
Access Methods
Buffer Manager
Log Manager
Transactional
Storage Manager
Disk Space Manager
Disk
Manager
DB
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Query Processor
• Syntax checking
• Internal representation
• Handling views
• Logical/semantic rewriting
• Flattening subqueries
• Building a query execution plan
• Efficient, if not optimal
− Define plan space
SELECT C.cname, F.ordernum,
F.order_date
FROM
Customers C, OrderInfo F
WHERE C.cname = “John”
C.cid = F.cid
Query Parser
Query Rewriter
Query Optimizer
SELECT C.cname, O.ordernum,
O.order_date
FROM
Customers C, Orders O
WHERE C.cname = “John”
C.cid = O.cid
(On-the-fly)
C.cname,
O.ordernum,
O.order_date
− Cost estimation for each
− Search algorithm
• Pull-based execution of a plan
• Each operator is an Iterator:
init(), next()*, close()
Query Executor
(Indexed Join)
cid=cid
IndexScan
Customers
Yanlei Diao, University of Massachusetts Amherst
cname=“John”
7/20/2015
IndexScan
Orders
Transactional Storage Manager
(On-the-fly)
C.cname,
O.ordernum,
O.order_date
(Indexed Join)
cid=cid
IndexScan
Customers
cname=“John”
IndexScan
Orders
Access Methods
Lock Manager
Heap file, B+tree, Hash
Concurrency:
2PL
Buffer Manager
Log Manager
Recovery:
WAL
Replacement policy,
Support for Concurrency & Recovery
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Disk Manager
Buffer Manager
Allocate/Deallocate a page
Read/Write a page
Contiguous seq. of pages
Disk Space Manager
Database
Data
Indices
Yanlei Diao, University of Massachusetts Amherst
Log
Catalog
7/20/2015
Heap file
Page format
Record format
DBMS: Theory + Systems
Query Parser
Theory!
Query Rewriter
Query Optimizer
Query Executor
Lock Manager
Access Methods
Log Manager
Buffer Manager
Disk Space Manager
Systems!
DB
Yanlei Diao, University of Massachusetts Amherst
7/20/2015
Questions
Yanlei Diao, University of Massachusetts Amherst
7/20/2015