Introduction to Database Systems Yuri Breitbart MW 4:45 – 6:00pm Fall 2004

Download Report

Transcript Introduction to Database Systems Yuri Breitbart MW 4:45 – 6:00pm Fall 2004

Introduction to Database Systems
Yuri Breitbart
MW 4:45 – 6:00pm
Fall 2004
Course Goals
This course is an introduction to the design, use and internal
workings of database systems. We consider here systems that
are based on relational model – that is, users data is
represented as a set of two dimensional tables. During the
class we learn the ways to organize the data so that the user
applications may work concurrently and get data from
database quickly and reliably. We first study the data
modeling techniques and how to convert a data model into a
set of relations. We then study SQL query language. Finally,
we study database internal organization of data a
concurrency control and recovery issues in database systems.
References
• A. Silberschatz, H. F. Korth, S Sudarshan,
Database System Concepts, 4th Ed., McGrow Hill, 2002
http://www.bell-labs.com/topic/books/db-book
• Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom,
Database Systems, The Complete Book, Prentice Hall, 2002
http://www-db.stanford.edu/~ullman/dscb.html
• Class notes
Prerequisites
•
•
•
•
CS 33001 – Data Structures
CS31011 – Discrete Structures
Structured Programming Language (C++)
Software engineering topics related to project
documentation and project design
Workload & Requirements
•
•
•
•
4 Homeworks
Project
Midterm and Final Exams
Homeworks: 20% of the final grade
Project
35% of the final grade
Midterm
20% of the final grade
Final
25% of the final grade
• No late homeworks and/or projects are accepted
• A – 91 – 100; B – 80-90; C – 70–79; D - >64
Exam, Project, and Homeworks Due Dates
• Homework 1 – Out: 9/08/2004 In: 9/13/2004
Homework 2 – Out: 9/13/2004 In: 9/17/2004
Homework 3 – Out: 9/20/2004 In: 9/24/2004
Homework 4 - Out: 9/27/2004 In: 10/04/2004
• Midterm 10/13/2004
• Final 12/13/2004
• Project – Out: 10/18/2004 In: 12/01/2004
• Exams are all inclusive
Class Schedule
•
•
•
•
•
•
•
Week 1 – Database Overview
Week 2 – ER model
Week 3 – Relational Model
Week 4 – Relational Model
Week 5 – OO Model and XML
Week 6 – OO Model and XML
Week 7 – Relational Algebra
•
•
•
•
•
•
•
Week 8 - SQL
Week 9 – Constraints & Triggers
Week 10 –Data Storage
Week 11 – Indexes
Week 12 - Query Processing
Week 13 – Transactions
Week 14 – Recovery
Project
You will build a database application using Oracle. The project
consists of several parts:
– ER diagram for the application,
– conversion of the ER diagram into a set of relations,
– normalizing a set of relations,
– creating database under ORACLE,
– loading database,
– creating ORACLE application programs,
– testing the system.
Programming should be done in C++. Project documentation should include:
– application description,
– ER diagram,
– normalized set of relation,
– description of each application program,
– sample data for each relation, and
– description on how to install and run your application.
Database Overview
• File Management vs Database Management
• Advantages of Database systems: storage persistence,
programming interface, transaction management
• Three level Data Model
• DBMS Architecture
• Database System Components
• Users classification
File Management Systems
• File is uninterpreted, unstructured collection of information
• File operations: delete, catalog, create, rename, open,
close, read, write, find, …
• Access methods: Algorithms to implement operations
along with internal file organization
• Examples: File of Customers, File of Students; Access
method: implementation of a set of operations on a file of
students or customers.
File Management System Problems
•
•
•
•
•
•
•
Data redundancy
Data Access: New request-new program
Data is not isolated from the access implementation
Format incompatible data
Concurrent program execution on the same file
Difficulties with security enforcement
Integrity issues
Concurrent Program Execution
What is the final value of the account AC?
Program1
AC=AC-50
AC
Program2
AC=AC-100
#103 450
Security Problems
•
•
•
•
•
Allow access to the file only to the authorized personnel
Ability to restrict access to parts of the record
Ability to control operation usage by different users
Protection from unauthorized use
Protection from the derivation of unauthorized information
Data Integrity
• A database constraint is a logical constraint about the data
expressed in a logical language.
– STUDENT.AGE >15
– If (STUDENT.CLASS ==cs43005) then
(STUDENT.PRIOR_CLASS ==cs31001)
• Database is consistent if data at each time satisfies all
integrity constraints.
• Input to any application is a set of consistent data. An
application output is a set of consistent data.
Collection of Files
60’s
70's
Hierarchical
80's
Relational
Network
Choice for most new
applications
90’s
Object Bases
Knowledge Bases
now
Advantages of Databases
• Persistent Storage – Database not only provides persistent
storage but also efficient access to large amounts of data
• Programming Interface – Database allows users to access
and modify data using powerful query language. It
provides flexibility in data management
• Transaction Management – Database supports a
concurrent access to the data
Early Database Applications
• Airline Reservation Systems – Data items are: single passenger
reservations; Information about flights and airports; Information
about ticket prices and tickets restrictions.
• Banking Systems – Data items are accounts, customers, loans,
mortgages, balances, etc. Failures are not tolerable. Concurrent
access must be provided
• Corporate Records – Data items are: sales, accounts, bill of
materials records, employee and their dependents
Modern Database Applications
• Client – Server architecture
– DBMS serves as a server and client queries are sent to servers
– Where to locate servers
• Multimedia Applications
• Multidatabase Applications
• Data Warehouses
Three Aspects to Studying DBMS's
1. Modeling and design of databases.
– Allows exploration of issues before committing to an
implementation.
2. Programming: queries and DB operations like update.
– SQL = “intergalactic dataspeak.”
3. DBMS implementation.
.
Definitions
• A database is a collection of stored operational data used
by various applications and/or users by some particular
enterprise or by a set of outside authorized applications and
authorized users
• A DataBase Management System (DBMS) is a software
system that manages execution of users applications to
access and modify database data so that the data security,
data integrity, and data reliability is guaranteed for each
application and each application is written with an
assumption that it is the only application active in the
database.
What Is Data ?
• Different view points:
– A sequence of characters stored in computer memory or
storage
– Interpreted sequence of characters stored in computer
memory or storage
– Interpreted set of objects
Data Levels and their Roles
• Physical – corresponds to the first view of data: How data
is stored, how is it accessed, how data is modified, is data
ordered, how data is allocated to computer memory and/or
peripheral devices, how data items are actually represented
(ASCI, EBCDIC,…)
• Conceptual – corresponds to the second view of data:
What we want the data to express and what relationships
between data we must express, what “ story” data tells, are
all data necessary for the “story’ are discussed.
• View – corresponds to the third view of data:What part of
the data is seen by a specific application
Physical Data - Example
• Physical
10
benjamin
3
63
6
10
J
0000035000 james
3
3
6
000375
.
.
.
.
.
.
.
.
.
Examples
• Conceptual
1 TA
2 Name char(10),
2 Age char (3),
2 Salary Fixed Dec(6);
1 Student
2 Name char(10),
2 Year-of_study char(3)
2 GPA Fixed Dec(5,2);
Examples
1
STUDENTS-TA
2 Name char(25),
2 Age char (3),
2 Salary Fixed Dec(8,2),
2 Year-of_study char(3)
2 GPA Fixed Dec(3,2);
A view
Three Level Data View –
Data Abstractions
View1
. . .
Conceptual
View
Of Data
Phyisal
Data
Storage
.
.
View k
DBMS Architecture
Logical and Physical Database Components
• Data Definition Language (DDL)
•
•
•
•
•
Data Manipulation Language (DML)
Host Language Interface
Data Administrator
Users
Query Processor
– Compiler
– Optimizer
• Management
– Transaction Manager
– File Manager
– Buffer Manager
– Authorization and Integrity Manager
Logical
Physical
Database Languages
Department
Faculty
Name
Dept
Dept
Chair
SQL
SELECT Chair
FROM Faculty, Department
WHERE Faculty.name = “Ken Noname”
AND Faculty.Dept = Department.Dept
Data definition language (DDL) ~ like type definitions in C or C++
Data Manipulation Language (DML)
Query (SELECT)
UPDATE < relation name >
SET <attribute> = < new-value>
WHERE <condition>
Data Definition Language
• Specification notation for defining the database
schema
– E.g.
create table account (
account-number char(10),
balance
integer)
• DDL compiler generates a set of tables stored in a
data dictionary
• Data dictionary contains metadata (i.e., data about
data)
– database schema
– Data storage and definition language
• language in which the storage structure and access methods
used by the database system are specified
• Usually an extension of the data definition language
Data Manipulation Language
• Language for accessing and manipulating the data
organized by the appropriate data model
• Two classes of languages
– Procedural – user specifies what data is required and
how to get those data
– Nonprocedural – user specifies what data is required
without specifying how to get those data
• SQL is the most widely used query language
Database Host Languages
C, C++, Fortran, Lisp, COBOL
Application prog.
Calls to
DB
DBMS
Local Vars
(Memory)
(Storage)
Host language is completely general
Query language—less general "non procedural" and
optimizable
Data Administrator
• Coordinates all the activities of the database
system; the database administrator has a good
understanding of the enterprise’s information
resources and needs.
• Database administrator's duties include:
–
–
–
–
–
–
–
Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
Acting as liaison with users
Monitoring performance and responding to
changes in requirements
Database Users
• Naïve – do not know about database too much, invoke
application programs that are prepared already
• Application Programmers – know how to interact with the
system but may not know how DBMS is designed
• Sophisticated users that know advanced use of the system
and can use the system and packages on the top of the
system
• DBMS system users – write specialized database
applications that do not fit into the traditional data
processing framework
Query Processor
• Compiler – verifies whether a program or query is written
in accordance with DDL and DML rules
• Optimizer – Finds the most effective way to access the
required data and supply it in a user requested form.
Monitors the query execution and modifies a query
evaluation plan if necessary.
Transaction Management
• A transaction is a collection of operations that performs a
single logical function in a database application
• Transaction-management component ensures that the
database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating system
crashes) and transaction failures.
• Concurrency-control manager controls the interaction
among the concurrent transactions, to ensure the
consistency of the database.
Storage Management
• Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible to the
following tasks:
– interaction with the file manager
– efficient storing, retrieving and updating of data
File Manager
• File Manager is responsible for mapping logical database
units (objects, relations, etc.) into a set of low level files.
• It is responsible for maintenance of files and indexes on
them. It should be able to create and destroy index and
collect unused storage space to eliminate an unneeded gaps
on disks.
Buffer Manager
• Buffer Manager is responsible for the allocation and
maintenance buffer space in a memory to facilitate
processing database data by several concurrent
applications.
• Buffer Manager decides when to load data from a buffer to
a database or discard the data and under what conditions a
new data should be put into a buffer
Authorization and Integrity Manager
• This manager is responsible for granting an access to
database or portions thereof only to authorized users and
preventing the access to unauthorized users
• Integrity manager must assure data integrity during normal
database operations as well as during the database failures
The DBMS Marketplace
• Relational DBMS companies – Oracle, Sybase – are among the
largest software companies in the world.
• IBM offers its relational DB2 system. With IMS, a nonrelational
system, IBM is by some accounts the largest DBMS vendor in the
world.
• Microsoft offers SQL-Server, plus Microsoft Access for the cheap
DBMS on the desktop, answered by “lite” systems from other
competitors.
• Relational companies also challenged by “object-oriented DB”
companies.
• But countered with “object-relational” systems, which retain the
relational core while allowing type extension as in OO systems.
Logical Data Models
• A collection of tools for describing
– data
– data relationships
– data semantics
– data constraints
• Value based models: ER Model, OO Model
• Record Based Models: Relational Model
Entity-Relationship Model
• The enterprise data can be described as a set of entities and
a set of relationships between them.
• Entity – a data that pertains to, or describes some
component of the enterprise
• Each entity is characterized by a set of attributes
• Relationship – describes an interconnection between
different entities
• Entity Set – a set of entities that are characterized by the
same entity definition
• Relationship Set – a set of relationships of the same type
Entity-Relationship Model
Example of schema in the entity-relationship model
Object – Oriented Model
• An enterprise is described as a collection of objects and a
collection of algorithms that work with objects
• Example: Person is an object.
• Object is characterized by a set of public attributes.
Applications may refer only to public attributes; private
attributes . Algorithms that implement the object may refer
to private attributes; a set of protected attributes and a set
of methods
• Attribute of an object can be another object
• Objects are nested into a hierarchy and can inherit
attributes of their parents
Object Oriented Model
OBJECT DATA MODEL
1.
2.
3.
4.
Complex Objects – Nested Structure (pointers or
references)
Encapsulation, set of Methods/Access functions
Object Identity
Inheritance – Defining new classes like old classes
Object model: usually find objects via explicit navigation
Also query language in some systems
Example
• Class Person{
public:
Person();
~Person();
float GetSalary();
float PutSalary(float&);
string Name;
int SSN;
date BirthDate;
private:
float salary;
}
Object-Oriented Model
Data Encapsulation
• An object contains both data and methods to work with the
data
• The physical data representation is visible only to the
object creator.
• The implementation details of methods are not visible to
object users
• An interface of the object consists of public attributes and
methods
• Each object is characterized by an object identity
Relational Model
• An enterprise is represented as a set of relations
• Domain – is a set of atomic values. Each domain has a
NULL value.
• Data type – Description of a form that domain values can
be represented.
• Relation is a subset of a cartesian product of one or more
domains
• The elements of relations are called tuples. Each element
in the cartesian product is called attribute.
Relational model is good for:
Large amounts of data —> simple operations
Navigate among small number of relations
Difficult Applications for relational model:
• VLSI Design (CAD in general)
• CASE
• Graphical Data
ALU
ADDER
A
FA
CPU
Adder
ALU
ADDER
Bill of Materials or
transitive closure
Attributes
Relational Model
Student-id
Name
192-83-7465
Johnson
019-28-3746
Smith
192-83-7465
Johnson
321-12-3123
Jones
019-28-3746
Smith
Street
City
gpa
• Example of tabular data in the relational model
Alma
Palo Alto
3.6
North
Rye
2.7
Alma
Palo Alto
3.2
Main
Harrison
4.0
North
Rye
3.45