Introduction to Database System

Download Report

Transcript Introduction to Database System

The Relational Model

What DB models exist?

   Most widely current model: Relational Model  Vendors: IBM DB2, Microsoft, Oracle, Sybase, etc… “Legacy systems” have older models  E.g., IBM’s IMS (hierarchical), CODASYL network model Recent and future competitors:  object-oriented model: ObjectStore, Versant  Object-relational model: Oracle, DB2  Semi-structured: XML   Integrated in all major relational database systems Native XML database systems 421B: Database Systems - The Relational Model 2

Definitions

     Relational Database: a set of relations Relation: Consists of two parts:  Schema: specifies name of relation, plus a set of attributes, plus the domain/type of each attribute  E.g., Students(sid:

string

, name:

string

, login:

string

, faculty:string, gpa:

real

)  Instance: set of tuples (all tuples are distinct). A relation can be seen as a table:  Column headers = attribute names, rows = tuples/records, columns/fields = attribute values   #Rows = cardinality #Fields = degree / arity If clear from context, we say instead of “instance of a relation” simply “relation” Database Schema: collection of relation schemas 421B: Database Systems - The Relational Model 3

Example of Students Relation

sid name login faculty gpa 53666 Bartoli bartoli@cs Science 3.4

53688 Chang chang@eecs Eng 3.2

53650 Chang chang@math Science 3.8

...

Column headers = attributes    All rows are distinct (set-oriented) Rows are not ordered (a permutation of rows represents still the same table) Columns are per definition not ordered but in practice we often assume a fixed order  with this, a single tuple can be represented as (53666, Bartoli, bartoli@cs, Science, 3.4) 421B: Database Systems - The Relational Model 4

Relational DDL and DML

   Data Definition Language (DDL): defines the schema of a database Data Manipulation Language (DML): “manipulates” the data, i.e., the instances of the relations   Insert, update, delete tuples “Query” the relations: retrieve tuples that fulfill certain criteria (hence, often called “query language”) The Relational Model offers simple and powerful querying of data with precise semantics independent of how data is stored or whether changes in the physical structure are made (physical data independence) 421B: Database Systems - The Relational Model 5

The SQL Query Language

   Developed by IBM (system R) in the 1970s Need for a standard since it is used by many vendors Standards:  SQL-86    … SQL-99 / SQL3 (adds object-relational features) SQL:2003 (adds XML features) 421B: Database Systems - The Relational Model 6

SQL Data Types

    All attributes must have a data type.

SQL supports several basic data types Character and string types   CHAR(n) denotes a character string of fixed length (containing trailing blanks for padding if necessary).

VARCHAR(n) denotes a string of up to n characters (between 0 and n characters).  SQL permits reasonable coercion between values of character string types Integer Types   INT or INTEGER (names are synonyms) SHORTINT 421B: Database Systems - The Relational Model 7

Data Types (contd.)

    Floating point numbers    FLOAT or REAL (names are synonyms) DOUBLE PRECISION DECIMAL(n,d): real number with fixed decimal point. Value consists of n digits, with the decimal point d positions from the right.

Dates and time:    DATE: has the form ‘YYYY-MM-DD’ TIME: has the form ‘15:00:02’ or ‘15:00:02.5’ May be compared and converted to string types Bit strings User defined domains   New name for a data type Possibility to define restrictions on values of domain (< 10) 421B: Database Systems - The Relational Model 8

Data Definition: Table Creation

      Defines all attributes of the relation The type/domain of each attribute is specified DBMS enforce correct type whenever a tuple is added or modified SQL is case insensitive It is possible to define default values Special NULL value: ‘unknown’

CREATE TABLE Students (sid CHAR(20), name VARCHAR2(20), login CHAR(10), faculty VARCHAR(20), gpa REAL DEFAULT 0.0) CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2))

421B: Database Systems - The Relational Model 9

DROP TABLE Students

 Destroys the relation Students.

The schema information and the tuples are deleted.

ALTER TABLE Students ADD COLUMN firstYear:integer

 The schema of students is altered by adding a new field; every existing tuple in the current instance is extended with a

null

value in the new field.

421B: Database Systems - The Relational Model 10

 Insert a single tuple using:

INSERT INTO Students (sid,name,login,faculty,gpa) VALUES(53688,’Chang’,’cheng@eecs’,’Eng’,3.2)

 

INSERT INTO Students (sid,name,login,faculty) VALUES(53688,’Chang’,NULL,’Eng’)

Can delete all tuples satisfying some condition

DELETE FROM Students WHERE name = ‘Chang’

Can update all tuples satisfying some condition

UPDATE Students SET gpa = 3.4

WHERE sid = 53688

421B: Database Systems - The Relational Model 11

Querying the Data

 than 3.5

SELECT name, gpa FROM Students WHERE gpa < 3.5

sid name login faculty gpa 53666 Bartoli bartoli@cs Science 3.4

53688 Chang chang@eecs Eng 3.2

53650 Chang chang@math Science 3.8

name gpa Bartoli 3.4

Chang 3.2

421B: Database Systems - The Relational Model 12

Integrity Constraints (ICs)

  Integrity Constraints must be true for any instance of the database;    e.g., domain constraints ICs are specified when schema is defined ICs are checked when relations are modified A legal instance of a relation is one that satisfies all specified ICs.

   DBMS should not allow illegal instances If DBMS checks ICs, stored data is more faithful to real-world meaning (also checks for entry errors) Of course, DBMS can only check what is specified in the schema 421B: Database Systems - The Relational Model 13

Primary Key Constraints

     A set of fields is a key for a relation if   No two distinct tuples can have same values in all key fields, and This is not true for any subset of the key.

If there are two or more keys, one of the candidates is chosen to be the primary key. The primary key attributes of a tuple may not be NULL.

A set of fields that contains a subset of fields fulfilling the key constraint is called a superkey E.g.

sid

is a key for

Students

. (What about

name

?). The set

(sid,gpa)

is a superkey 421B: Database Systems - The Relational Model 14

Primary and Candidate Keys in SQL

   Possibly many candidate keys exist, one of which is chosen as the primary key  Each student has a unique id.

CREATE TABLE Students (sid CHAR(20) PRIMARY KEY, name VARCHAR2(20),

For a given student and course, there is a single grade; further, no two students in a course receive the same grade

… CREATE TABLE Enrolled

Application dependent Defined carelessly, an IC can prevent the storage of database instances that arise in practice!

(sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), UNIQUE (cid,grade))

421B: Database Systems - The Relational Model 15

Foreign Key

  Foreign Key: Set of fields in one relation that is used to “refer” to a tuple in another relation.

  Must correspond to the primary key of the second relation. Represents a “logical pointer”.

Examples  in relation

Enrolled, sid

is a foreign key referring to Students:

Students(sid:CHAR(20),name:VARCHAR(20),login:CHAR(10), faculty:VARCHAR(20), gpa:REAL) Enrolled(sid:CHAR(20),cid:CAHR(20),grade:CHAR(2))

421B: Database Systems - The Relational Model 16

Referential Integrity

  Foreign Key Constraint: the foreign key value of a tuple must represent an existing tuple in the referred relation  Enrollment may only contain a tuple of a student who exists in the Students relation If all foreign key constraints are enforced, referential integrity is achieved, i.e., no dangling references 421B: Database Systems - The Relational Model 17

Foreign Keys in SQL

 Only students listed in the Students relation should be allowed to enroll for courses

CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) FOREIGN KEY (sid) REFERENCES Students)

sid name login faculty gpa 53666 Bartoli bartoli@cs Science 3.4

53688 Chang chang@eecs Eng 3.2

53650 Chang chang@math Science 3.8

sid cid grade 53666 Topology112 C 53666 Reggae203 B 53650 Topology112 A 53668 History105 B 421B: Database Systems - The Relational Model 18

Enforcing Referential Integrity

   An

Enrolled

tuple with a sid is inserted such that no tuple with this sid exists in

Students

 A

Students

    Disallow insertion tuple is deleted Delete all

Enrolled

tuples that refer to it Disallow the deletion of a

Students

tuple to which

Enrolled

tuples point Set sid in

Enrolled

tuples that refer to it to “ (in SQL set sid in

Enrolled default sid”

tuples that refer to it to NULL value) The primary key of a

Students

  tuple is changed Update the sid of all

Enrolled

value Further options similar to delete tuples that refer to the original 421B: Database Systems - The Relational Model 19

Referential Integrity in SQL/92

 SQL standard supports all 4 options on deletes and updates  Default is

NO ACTION

 delete/update is rejected  

CASCADE

 also delete/update all tuples that refer to the deleted/updated tuple

SET NULL / SET DEFAULT

 Set foreign key value of referencing tuple to NULL / given default

CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) FOREIGN KEY (sid) REFERENCES Students ON DELETE CASCADE ON UPDATE SET NULL)

421B: Database Systems - The Relational Model 20

Where do ICs come from?

   ICs are based on the semantics of the real-world enterprise that is being described in the database relations We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance  An IC is a statement about all possible instances  For example, we might think that the student name is a key because in a given instance there do not exist two tuples with the same name; but it might happen in the future.

Key and foreign key ICs are the most common; more general ICs supported, too. (Discussed later in the course) 421B: Database Systems - The Relational Model 21

Logical Design: ER to Relational

 Entity sets to tables

eid name salary Employees Departments did dname budget

Employees(eid, name, salary) CREATE TABLE Employees (eid CHAR(11), name VARCHAR(20), salary REAL, PRIMARY KEY (eid))

Departments(did, dname, budget) CREATE TABLE Departments (did INTEGER, dname CHAR(20), budget REAL, PRIMARY KEY (did))

421B: Database Systems - The Relational Model 22

Many-to-many Relationship Sets

 Map relationship set to table. Attributes of the table  Keys for each participating entity set (as foreign keys)   This set of attributes forms the key for the relation All descriptive attributes

since eid name Employees salary Works_in Departments did dname budget

421B: Database Systems - The Relational Model 

Works_in(eid, did, since) CREATE TABLE Works_In (eid CHAR(11), did INTEGER, since DATE, PRIMARY KEY (eid,did), FOREIGN KEY (eid) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments )

23

Relationships Sets with Key Constraints

 Alternative 1: map relationship set to table  Many-one from entity set E1 to entity set E2: key of E1    i.e., key of entity-set with the key constraint is the key for the new relationship table (

did

is now the key) One-one: key of either entity set Separate tables for entity sets (

Employees

and

Departments

)

eid name Employees salary Manages Departments since did dname budget

421B: Database Systems - The Relational Model 

Manages(eid, did, since) CREATE TABLE Manages (eid CHAR(11), did INTEGER, since DATE,

PRIMARY KEY (did),

FOREIGN KEY (eid) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments )

24

 Alternative II: include relationship set in table of the entity set with the key constraint   Possible because there is at most one relationship per entity Not useful if many entities do not have a relationship (wasted space, many not filled values) 

eid name salary Employees Manages Departments since did dname budget

421B: Database Systems - The Relational Model

CREATE TABLE DepartmentsM (did INTEGER, dname CHAR(20), budget REAL,

eidmgr CHAR(11),

since DATE, PRIMARY KEY (did),

FOREIGN KEY (eidmgr)

REFERENCES EMPLOYEES(eid))

25

 Include relationship set in table of the entity set with the key constraint and the participation constraint  We can capture participation constraints involving one entity set in a binary relationship if it also has a key constraint, but little else (at least within the table definitions)

eid name salary

DepartmentsM(did, dname, budget, eidmgr, since) Employees Manages Departments since did dname budget

421B: Database Systems - The Relational Model

CREATE TABLE DepartmentsM (did INTEGER, dname CHAR(20), budget REAL, eidmgr CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (eidmgr)

Renaming

 In the case the keys of the participating entity sets have the same names we must rename attributes accordingly

name

Reports_To(super-eid, sub-eid) eid salary Employees

supervisor subordinate

Reports_To CREATE TABLE Reports_To (supervisor_eid CHAR(11), subordinate_eid CHAR(11), PRIMARY KEY (supervisor_eid, subordinate_eid), FOREIGN KEY (supervisor_eid) REFERENCES Employees(eid), FOREIGN KEY (subordinate_eid) REFERENCES Employees(eid))

421B: Database Systems - The Relational Model 27

Translating Weak Entity Sets

 Weak entity set and identifying relationship set are translated into a single table  When the owner entity is deleted, all owned weak entities must also be deleted

eid name salary

Dependants_Policy(pname, dob, cost, eid) Employees Policy Dependants pname cost dateofbirth CREATE TABLE Dependants_Policy (pname CHAR(20), dob DATE, cost REAL, eid CHAR(11), PRIMARY KEY (pname,eid), FOREIGN KEY (eid) REFERENCES Employees, ON DELETE CASCADE)

421B: Database Systems - The Relational Model 28

Translating ISA Hierarchies

 General Approach: distribute information among relations  Relation of superclass stores the general attributes and defines key   Relations of subclasses have key of superclass and addit. attributes sub-relation must be deleted  

Employees(eid, name, salary) Contract_Emps(contract_id,eid)

eid name

salary CREATE TABLE Employees (eid CHAR(20), name CHAR(20),salary REAL, PRIMARY KEY (ssn))

hours_worked hourly_wages Employees Hourly_Emps ISA Contract_id Contract_Emps

CREATE TABLE Contract_Emps (contract_id INTEGER, eid CHAR(11), PRIMARY KEY (eid), FOREIGN KEY (eid) REFERENCES Employees, ON DELETE CASCADE)

421B: Database Systems - The Relational Model 29

Translating ISA Hierarchies (contd.)

  Object-oriented approach:   Sub-classes have all attributes; Pro/Contra: + A query asking for all hourly employees only has to go to one relation (in general approach it has to read two relations) - Query on general attributes of all employees has to read all three tables - If an entity is both Hourly_emps and Contract_emps, name and salary are stored twice => undesired redundancy  if an entity is in a sub-class it does not appear in the super-class relation; 

Employees(eid, name, salary) Hourly_Emps(eid,name,salary, hourly_wages,hours_worked) Contract_Emps(eid, name, salary,contract_id) CREATE TABLE Employees (eid CHAR(20) PRIMARY KEY, name CHAR(20),salary REAL) CREATE TABLE Hourly_Emps (eid CHAR(20) PRIMARY KEY, name CHAR(20),salary REAL, hourly_wages REAL, hours_worked REAL) CREATE TABLE Contract_Emps (eid CHAR(20) PRIMARY KEY, name CHAR(20),salary REAL, contract_id INTEGER)

421B: Database Systems - The Relational Model 30

Translating ISA Hierarchies (contd.)

 Last Alternative: one big relation   Create only one relation for the root entity set with all attributes found anywhere in its network of subclasses. Put NULL in attributes not relevant to a given entity 

Employees(eid,name,salary, hourly_wages,hours_worked, CREATE TABLE Employees (eid CHAR(20), name CHAR(20), salary REAL, hourly_wages REAL, hourly_worked REAL, contract_id INTEGER, PRIMARY KEY (eid))

421B: Database Systems - The Relational Model 31

Translating Aggregation

  No key constraints      Projects(pid,started_on,pbudget) Departments(did,dname,budget) Employees(eid,name,salary) Sponsors(pid,did,since) Monitors(pid,did,eid,until)  Key constraint from Sponsors to Employees

name

  Sponsors(pid,did,eid,since,until) No Monitors Key constraint from  Projects(pid, started_on, pbudget, did, since)   No Sponsors Monitors(pid, eid, until)

Employees eid salary Monitors until pid Started_on pbudget Projects since Sponsors did dname budget Departments

421B: Database Systems - The Relational Model 32

Relational Model: Summary

     A tabular representation of data Simple and intuitive, currently the most widely used model.

Integrity constraints can be specified based on application semantics (up to a certain degree). DBMS checks for violations   Two important ICs : primary and foreign keys In addition, we always have domain constraints Powerful and natural query languages exist Rules to translate ER to relational model 421B: Database Systems - The Relational Model 33

Review: Binary vs. Ternary

  Putting any constraints in upper picture:  Key constraint on Policies (to guarantee that each policy only owned by one employee), would also mean that the policy can only cover one dependent

ssn name Employees pid

Constraints of lower picture:   Each dependant determined by one policy Each policy is owned by one employee

ssn name lot Policies Employees covers lot purchaser pname dob Dependants cost pname dob Dependants beneficiary cost Policies pid

421B: Database Systems - The Relational Model 34

Binary vs. Ternary

   Key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependants Participation constraints lead to NOT NULL constraints (or primary key in case of weak entity) What if policy is a weak entity?

  421B: Database Systems - The Relational Model

Policies(pid, cost, eid) Dependants(pname, dob, pid) CREATE TABLE Policies (pid INTEGER, cost REAL, eid CHAR(11) NOT NULL, PRIMARY KEY (pid), FOREIGN KEY (eid) REFERENCES Employees ON DELETE CASCADE) CREATE TABLE Dependants (pname CHAR(20), dob DATE, pid INTEGER, PRIMARY KEY (pname,pid), FOREIGN KEY (pid) REFERENCES Policies, ON DELETE CASCADE)

35