Insert Title Here

Download Report

Transcript Insert Title Here

The Relational Data Model and SQL

10:30AM—12noon Monday, July 18 th , 2005 CSIG05 Chaitan Baru 1

OUTLINE

• Foundations of the relational data model – Data models and database systems – Relations, attributes, keys • Introduction to SQL • “Hands-on” with SQL – Vishwanath Nandigam 2

Historical note: IBM’s role in database systems

• IBM’s early database systems were based on the hierarchical data model – IMS (Information Management System). – IMS serves more than 95 percent of Fortune 1000 companies – Manages 15 petabytes of production data – Supports more than two hundred million users per day S1 S2 Student C1 C2 C2 C3 C4 Course Instructor I1 I2 I2 I3 I1 3

IBM and Relational Database Systems

• Relational model was introduced to provide separation of application logic from the data representations • DB2 –  IBM’s second database product!

• Foundations of relational model were invented at IBM by E.F.Codd, IBM A Relational Model of Data for Large Shared Data Banks ,

E. F. Codd, June 1970

• First prototype, System R, was developed at IBM Almaden in mid-70’s – Introduced SQL – Provided as SQL/DS on IBM mainframe systems – Oracle was based on early System R work 4

Why the Relational Model?

• Deals with the application logic / data separation in business data processing, unlike the earlier “network” and “hierarchical” data model • Plus, an algebra for manipulating relations  the key innovation Student

N

takes

M

Course

N

teaches

1

Instructor S# student info C# course info S# C# 5

What is a Database System?

Database

(system) = – –

Database Instance

(e.g. set of tables of rows)

Database Management System

(DBMS) • • Origins in the commercial world: – to organize, query, and manipulate data more effectively, efficiently, and independently

Scientific databases

– often special features: • spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw & derived data, … 6

Why not just use files as “databases”?

• Works for some applications…

But

: – scanning & ‘grep’ing large files can be

very inefficient

– no language support for selecting desired data, joining them, etc.

• cannot express the kinds of questions/queries you’d like to ask • ‘grep’ is no substitute for a query language – redundant and/or inconsistent storage of data – no transaction management and concurrency control among multiple users – no security – no recovery – no data independence (application  – no data modeling support – … data) 7

Features of a Database System

• A

data model

(

relational

, object-oriented, prescribes how data can be organized:

XML

) – as relations (tables) of tuples (rows) – as classes of (linked) objects – as XML trees • A

(database) schema

(stored in the “data dictionary”) defines the structure of a specific database instance: – Relational schema – OO schema – XML Schema (or XML DTD) • A

query language

– Allows

ad hoc

, declarative (non-procedural) queries on schema 8

Features of a Database System

• • • • • Data is treated

uniformly

application

Efficient

data access

Queries

and

Transactions views Integrity constraints

and

separately

from the are expressed over the schema (checking and enforcement) combine sets of operations into logical units (

all-or-nothing

) • •

Synchronization

of concurrent user transactions

Recovery

(after system crash) – not to be confused w/ backup – instead: guarantee consistency by “roll-back” of partially executed transactions (how? Hint: logging) • … 9

DB features: E.g. Concurrency Control

• • Concurrent execution of simultaneous requests – long before web servers where around... – transaction management guarantees consistency despite concurrent/interleaved execution

Transaction

(= sequence of read/write operations) – –

Atomicity

: a transaction is executed completely or not at all

Consistency

: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained – –

Isolation

: to the user, a transaction seems to run in isolation

Durability

: the effect of a successful (“committed”) transaction remains even after system failure 10

11

Levels of Abstraction: Architecture Overview

User Conceptual … View 1 View 2 View n

Export schemas

Level

logical data independence ER-Model (Entity-Relationship) OO Models (Classes…)

part of DB design 

conceptual design

Logical (“conceptual”) level

Tables physical data independence

Physical level

Index structures

… often lost in the process… DB instances

Database Design: Entity Relationship (ER) Model

Name Salary Name Manager

12 Employee

works-for

Department • Entities: • Relationships: • Attributes: • ER Model:

since

– initial, high-level DB design (conceptual model) – easy to map to a relational schema (database tables) – comes with more

constraints extensions

: EER (

is-a

(cardinalities, aggregation) and => class hierarchies) – related: UML (Unified Modeling Language) class diagrams

Employee

The Relational Model

Emp

tom tim sally carol carol ….

Salary DNo

60k 57k 45k 30k 1 1 3 1 35k 2

FK: foreign key, pointing to another key

Department

DNo

1 2 3

Name Mgr

Toys carol Comp. carol Shoes sam • Relation/Table Name: –

employee, dept

Attributes = Column Names: –

Emp, Salary, DeptNo, Name, Mgr

• Relational Schema: – employee(Emp:string, Salary:integer, DeptNo:integer), ... • Tuple = Row of the table: – (“tom”, “60000”, “1”) • Relation = Set of tuples: – {(...), (...), ...} 13

14

Creating a Relational Database in SQL

CREATE TABLE employee ( ssn CHAR(11), name VARCHAR(30), deptNo INTEGER, PRIMARY KEY (ssn), FOREIGN KEY (deptNo) REFERENCES department ) CREATE TABLE

department

( deptNo name INTEGER, VARCHAR(20), manager CHAR(11), PRIMARY KEY (deptNo), FOREIGN KEY (manager) REFERENCES employee(ssn) )

What is a Query?

• Intuitively: – An

“executable question”

in terms of a database schema – Evaluating a query Q against a database instance D yields a set of answer objects: • Relational tuples or XML elements • Example: – Who are the employees in the ‘Toys’ dept.?

– Who is (are) the manager(s) of ‘Tom’?

– Show all pairs (Employee, Mgr) • Technically: – A

mapping

from an

input schema

(the given table schemas) to a

result schema

(the new columns you are interested in) defined in some

query language

15

Why (Declarative) Query Languages?

• Things we talk and think about in PLs and QLs – Assembly languages: • registers, memory locations, jumps, ... – C and the likes: • if-then-else, for, while, memory (de-)allocation, pointers, ...

– Object-oriented languages: • C++: C plus objects, methods, classes, ...

• Java: objects, methods, classes, references, ... • Smalltalk: objects, objects, objects, ... • OQL: object-query language 16

Why (Declarative) Query Languages?

• Things we talk and think about in PLs and QLs – Functional languages (Haskell, ML): • (higher-order) functions, fold(l|r), recursion, patterns, ...

=> Relational languages (SQL, Datalog) • relations (tables), tuples (rows); conceptual level: ER • relational operations:  ,  ,  ,  , ...,  ,  ,  ,  ,  ,...,  ,  ,

|X|

=> Semistructured/XML (Tree) & Graph Query Languages • trees, graphs, nodes, edges, children nodes, siblings, … • XPath, XQuery, … • Also: Focus on

what

, and not

how

!

17

Example: Querying a Relational

input tables

Employee

Database

Emp

anne john

Salary

62k 60k

DeptNo

2 1 Department

DeptNo

Mgr

1 anne 2 anne

SQL query (or view def.) answer (or view)

join

SELECT

e .Emp

, d .

Mgr

FROM Employee e, Department d WHERE e .

DeptNo

= d .

DeptNo

result

Emp

john anne

Mgr

anne anne

we don’t say how to evaluate this expression

18

Example Query: SQL vs DATALOG

• “List all employees and their managers” • In SQL: SELECT e.name, d.manager

FROM Employee e, Department d WHERE e.deptNo = d.deptNo

• In DATALOG:

a “join” operation

q(E, M) :- employee(E, S, D), department(D, N, M).

19

Important Relational Operations

• • • • • •

select

(R, Condition) – filter

rows

of a table wrt. a condition

project

(R, Attr) – remove unwanted

columns

; keep rest

join

(R1, A2, R2, A2, Condition) – find “matches” in a “related” table – e.g. match R1.

foreign key

= R2.

primary key

cartesian product

(R1, R2)

union

(“OR”),

intersection

(“AND”)

set-difference

(“NOT IN”) 20

Queries, Views, Integrity Constraints

• … can all be seen as “special queries” 21 • • •

Query

q(…) :- … ad-hoc queries

View

v(…) :- … exported views;

Integrity Constraints

– ic (…) :- …. MgrSal < EmpSal … – say what shouldn’t happen – if it does: alert the user (or refuse an update, …)

Query Evaluation vs Reasoning

Query evaluation

– Given a database instance D and a query Q, run Q(D) – What databases do all the time •

Reasoning

(aka “Semantic Query Optimization”) – Given a query Q and a constraint C, “optimize” Q&C (e.g., given C, Q might be

unsatisfiable

) – Given Q1 and Q2 decide whether Q1  Q2 – Given Q1,Q2, C decide whether Q1  Q2 | C – Note: we are NOT given a database instance D here; just the schema and the query/IC expressions 22

Summary QLs for Relational Databases

23

Natural Join: same attribute name

add condition that values must match

Relational Algebra

24

Relational Algebra

25

Relational Algebra

26

Relational Algebra

27

Relational Algebra

28