Running 1996 - University of Houston

Download Report

Transcript Running 1996 - University of Houston

Introduction File and Database Systems

Christoph F. Eick

First 4 Weeks

1.

Introduction to Databases

2.

Course Information

3.

Grading and Other Things

4.

5.

6.

7.

8.

Week1/2 Questionnaire The Relational Data Model

Relational Algebra / SQL Part1 The E/R Data Model SQL Part2

Weeks 2-4

Introduction File and Database Systems

Christoph F. Eick

Textbooks for COSC 6340

Required Text: Raghu Ramakrishnan and Johannes Gehrke, Data Management Systems, McGraw Hill, Third Edition, 2002

Other books with relevant material: Ramez Elmasri and Shamkant Navathe, Fundamentals of Database Systems, Fourth Edition

Introduction File and Database Systems

Christoph F. Eick

Lectures in COSC 3480

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

Basic Concepts of Database Management (2 classes; Chapter 1, 2.1,2.2, 2.3; instructor teaching material) Introduction to the Relational Data Model (1.5 classes; Chapter 3.1, 3.2, 3.3, 3.4, 3.6) Introduction to the Relational Algebra and SQL (3 classes; Chapter 4.2, Chapter 5) Conceptual Schema Design using the Entity Relationship Data Model (2-3 classes; instructor material; Chapters 2.4, 2.5) Relational Database Design and Normalization (2 classes; instructor material, Chapter 19) Introduction to KDD and Data Warehousing (1.5 classes; instructor material, Chapters 25 and 26) Disks, Files, Storage Structures, Index Structures and Physical Database Design (4 classes, Chapter 8, 9, 10, 11.1, 11.2, 13, 20) Spatial Data Management (1,5 classes, Chapter 28) Internet Databases and XML (2.5 classes; Chapter 7 and 27) Query Optimization (1 class; Chapter 12, only if enough time left) Summary: Where Do We Stand? (1 class; instructor material)

Introduction File and Database Systems

Christoph F. Eick

Other News

     The lab will start on Th, January 26, 2006, 8:30a . More details about the lab will be discussed next week. There will be three exams in Spring 2006: Tu., Feb. 28, Th., April 6, and ??, May ??.

All important information about the course can be found in the webpage associated with this course: http://www2.cs.uh.edu/~ceick/3480.html

Please inspect the webpage regularly.

The webpage is an evolving information document. Most of the information still refers to the Fall 2005 teaching of the course. The webpage will be updated during the course of the semester!

Introduction File and Database Systems

Christoph F. Eick UH Data Mining and Machine Learning Group (UH-DMML)

Christoph F. Eick and Ricardo Vilalta

http://www.tlc2.uh.edu/dmmlg Goal: Development of data analysis and data mining techniques and the application of these techniques to challenging problems in physics, geology, astronomy, environmental sciences, and medicine.

Topics investigated:

Meta Learning

Classification and Learning from Examples

Clustering

Distance Function Learning

Using Reinforcement Learning for Data Mining

Spatial Data Mining

Knowledge Discovery

Introduction File and Database Systems

Christoph F. Eick

Databases

Definition: A database is a collection of data with the following properties:

It represents certain aspect of the real-world.

Its data are logically related.

It is created for a specific purpose.

Introduction File and Database Systems

Christoph F. Eick

DBMS

Definition: A database management system (DBMS) is a set of software that are used to define, store, manipulate and control the data in a database.

 

define --- define data types, structures and constraints.

store --- store data; provide efficient access.

 

manipulate --- perform retrieval and update operations using a query language.

control --- control access to data.

Database System = Database + DBMS

Introduction File and Database Systems

Christoph F. Eick

A Brief History Note

 

Database technology has a history of about 40 years.

Database technology has gone through several generations .

First Generation : File systems, 50's -- 60's

A typical file system consists of a set of independent files, and a number of application programs

Definition: A file stores a set of record (on a disk drive) all of which have the same format.

Christoph F. Eick

Introduction File and Database Systems

An Example File System

A banking system may have

files for customers, saving accounts and checking accounts;

application programs to deposit and withdraw money, to find balance, etc.

different files are used for customers, saving and checking accounts

Introduction File and Database Systems

Christoph F. Eick

Problems of File Systems (1)

It is difficult to support new applications. Two existing application programs: (i) find customers who have a checking account

(ii) find customers who have a saving account Need a new program to find the customers who have a checking account and a saving account.

Introduction File and Database Systems

Christoph F. Eick

Problems of File Systems (2)

It has no centralized control of all data.

Files are often created for a particular application.

Files are created and managed independently.

Introduction File and Database Systems

Christoph F. Eick

Problems of File Systems (3)

There often exists severe data redundancy and inconsistency. Checking-Account: Acct#, Owner-name, Owner-SSN, Owner Addr, Balance, ...

Saving-Account: Acct#, Owner-name, Owner-SSN, Owner Addr, Balance, Interest, …

Introduction File and Database Systems

Christoph F. Eick

Problems of File Systems (4)

It lacks concurrency control.

Concurrency control: prevent mutual interference of concurrent requests.

Example (Airplane ticket reservation): Consider the situation when two customers are trying to book the only ticket left for a flight through two operators at about the same time.

Introduction File and Database Systems

Christoph F. Eick

Problems of File Systems (5)

Weak security

Can not provide multiple views of the same data

Lack isolation between program and data

Lack self-describing feature

Introduction File and Database Systems

Christoph F. Eick Database History (Continued) Second Generation : Hierarchical database systems (HDBS), late 60's -- early 70's

   

Best known HDBS: IMS (Information Management System of IBM).

One-to-many relationships between parent records and child records which can have different types.

Data are organized in trees Records are connected by pointers.

Introduction File and Database Systems

Christoph F. Eick An IMS Query Query: find all Binghamton University students whose major is computer science and whose GPA is higher than 3.5.

GU University (Name = `Binghamton University') Department (Name = `Computer Science') Student (GPA > 3.5) L1: GNP Student (GPA > 3.5) Goto L1

Christoph F. Eick

Introduction File and Database Systems

History of Database (Continued) Third Generation : Network database systems (NDBS), late 60's -- early 70's

Some commercial NDBSs: IDS II (Honeywell), DMS II (UNISYS).

In NDBS, record types are organized into an acyclic graph.

Main problem with HDBS and NDBS: difficult to use.

Christoph F. Eick

Introduction File and Database Systems

History of Database (Continued) Fourth Generation : Relational database systems (RDBS), early 70's -- now

 

Example relational DBSs: Oracle 7, Sybase, Informax, DB2, Ingres, ... In RDBS, data are organized into tables (relations).

Introduction File and Database Systems

Christoph F. Eick

History of Database (Continued)

Fifth Generation : Object-oriented and Object-Relational database systems (OODBS), 80's -- now

Example OODBSs: O2, Objectivity, ObjectStore, Versant, …

Example ORDBSs: Oracle 8, Informix, UniSQL/X.

Introduction File and Database Systems

Christoph F. Eick

Database Languages

Data Definition Language (DDL): used by DBA or database designer to define database schemas.

Data Manipulation Language (DML): used by database users to retrieve, insert, delete and update data in the database.

Query language : The part of DML that is used to retrieve data.

Data Control Language (DCL): used by database owners and DBA to control the access of data.

Introduction File and Database Systems

Christoph F. Eick

Persons Involving DBS (1)

DBMS developers : Those who design and implement DBMS software: buffer manager, query processor, transaction manager, interface, ...

Database designers : Those who are responsible for determining

what data should be stored in the database;

how data in the database should be organized;

the design of customized views;

the design of special data structures to improve the performance of the system.

Christoph F. Eick Persons Involving DBS (2)

Introduction File and Database Systems

Database administrator (DBA): Those who manage and monitor the daily operation of a database system.

authorization for database access, e.g., who can access what data in what mode.

routine maintenance: backup, install new tools, ...

modification to existing database design.

Introduction File and Database Systems

Christoph F. Eick

Persons Involving DBS (3)

End-users :

Casual users: those who access the database using SQL directly.

Naive users: those who access the database using pre-prepared packages.

Application programmers : Those who write menu applications for naive users, typically, through database calls embedded in a program.

Christoph F. Eick

Introduction File and Database Systems

After This Course, You Will Be

       

familiar with the relational data model; a decent database designer; a sophisticated casual user; a good application programmer; knowledgeable with major aspects on how to use a DBMS knowledge in a few advanced topics with respect to database systems This course will not teach you to become a database administrator The implementation of DBMS will only be partially covered.

Introduction File and Database Systems

Christoph F. Eick

Popular Topics in

Databases

         Efficient algorithms for data collections that reside on disks (or which are distributed over multiple disk drives, multiple computers or over the internet).

Study of data models (knowledge representation, mappings, theoretical properties) Algorithms to run a large number of transactions on a database in parallel; finding efficient implementation for queries that access large databases; database backup and recovery,… Database design How to use database management systems as an application programmer / end user.

How to use database management systems as database administrator How to implement database management systems Data summarization, knowledge discovery, and data mining Special purpose databases (genomic, geographical, internet,…)

Introduction File and Database Systems

Christoph F. Eick

Review: Why are integrated databases popular?

Bookkeeping Device Car Salesman Integrated Database

Introduction File and Database Systems

Christoph F. Eick

Review: Why are integrated databases popular?

     Avoidance of uncontrolled redundancy Making knowledge accessible that would otherwise not be accessible Standardization --- uniform representation of data facilitating import and export Reduction of software development (though the availability of data management systems) Support for Parallel Access and Data Security

Bookkeeping Device Integrated Database Car Salesman

Christoph F. Eick

Data Model

Data Model

is used to define

Schema

(defines a set of database states)

Current Database State

Introduction File and Database Systems

Introduction File and Database Systems

Christoph F. Eick

Schema for the Library Example using the E/R Data Model

ssn name phone when B# title author Person

(0,35)

Check_out

(0,1)

Book 1-to-1 1-to Many Many-to-1 Many-to-Many

Introduction File and Database Systems

Christoph F. Eick

Relational Schema for Library Example in SQL/92

CREATE TABLE Person (ssn CHAR(9), name CHAR(30), phone INTEGER, PRIMARY KEY (ssn)); CREATE TABLE Book (B# INTEGER, title CHAR(30), author CHAR(20), PRIMARY KEY (B#)); CREATE TABLE Checkout( book INTEGER, person CHAR(9), since DATE , PRIMARY KEY (book) , FOREIGN KEY (book) REFERENCES Book, FOREIGN KEY (person) REFERENCES Person ));

Introduction File and Database Systems

Christoph F. Eick

Example Instances

Person(name, ssn, phone): (Eick,111111111.33345), (Miller, 222222222,33337) Book(B#,title, author): (1,Today,Yu), (2, Today, Yu), (7, Blue, Xu) Checkout(book,person,since): (2,222222222.8/8/05), (7,222222222,8/8/05)

Christoph F. Eick

Introduction File and Database Systems

Referential Integrity in SQL/92

 SQL/92 supports all 4 options on deletes and updates.

   Default is NO ACTION (

delete/update is rejected

) CASCADE that refer to deleted tuple) SET NULL / (also delete all tuples SET DEFAULT (sets foreign key value of referencing tuple) CREATE TABLE Enrolled (sid CHAR (20), cid CHAR(20) , grade CHAR (2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students ON DELETE CASCADE ON UPDATE SET DEFAULT )

Christoph F. Eick

Introduction File and Database Systems

Example of an Internal Schema for the Library Example INTERNAL Schema Library12 references Library.

Book is stored sequentially, index on B# using hashing, index on Author using hashing.

Person is stored using hashing on ssn.

Check_out is stored sequentially, index on since using B+-tree.

Introduction File and Database Systems

Christoph F. Eick

Example: Stored Database

0 Index on B# Block#= B# mod 10 20 30 1 1 11 51 Relation Book (1, C,W) (20, Y,W) (51, C, B) (11, Y,W) (30, Z, B) Index on Author W, … 0 Relation Person Block#= sss mod 10 1 (200,…) (500,…) (101,…) Index on since Relation Checkout

Christoph F. Eick

External Schema

3 Schema Architecture

Introduction File and Database Systems

External Schema Conceptual Schema Internal Schema External Schema How users see the Data What the database contains and what constraints hold with respect to the database How the data are physically stored

Christoph F. Eick

Data Independence

Introduction File and Database Systems

Data Independence : the ability to modify the lower level descriptions of a database without causing application programs to be rewritten.

 

Logical Data Independence : the ability to modify the conceptual schema without causing application programs to be rewritten.

Physical Data Independence : the ability to modify the internal schema without causing application programs to be rewritten.

Data independence is achieved through proper manipulation of the above two mappings.

Modern Relational DBMS

Transaction Concepts

; capability of running many transactions in parallel; support for backup and recovery.

Support for

Web-Interfaces

, XML, and Data Exchange Support for

OO

; capability to store operations Support for

data driven computing Efficient Implementation of Queries

(Query Optimization, Join & Selection & Indexing techniques)

Modern DBMS

Support for

Data Mining

operations Support for

OLAP

and

Data Warehousing

Support for

special Data-types

: long fields, images, html-links, DNA-sequences, spatial information,… Support for

higher level user interfaces

: graphical, natural language, form-based,…

Introduction File and Database Systems

Christoph F. Eick

Disks and Files

 DBMS stores information on (“hard”) disks.

 This has major implications for DBMS design!

 READ: transfer data from disk to main memory (RAM).

 WRITE: transfer data from RAM to disk.

 Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

Introduction File and Database Systems

Christoph F. Eick

Why Not Store Everything in Main Memory?

  

Costs too much

. $100 will buy you either 512MB of RAM or 50GB of disk today --- that is disk storage 100 times cheaper slower) .

(but it is approx. 10000 times

Main memory is volatile

. We want data to be saved between runs. (Obviously!) Typical storage hierarchy:  Main memory (RAM) for currently used data.

 Disk for the main database (secondary storage).

 Tapes for archiving older versions of the data (tertiary storage).

Remark

: All reported disk performance/prize data are as of middle of 2003

Components of a Disk

 Disk head The platters spin (say, 90rps).

 a The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make

cylinder

(imaginary!).

 Only one head reads/writes at any one time.

 Arm assembly

Block size

is a multiple of

sector size

(which is fixed).

Arm movement Spindle Tracks Sector Platters

Christoph F. Eick

Introduction File and Database Systems

Accessing a Disk Page

  Time to access (read/write) a disk block: 

seek time

(moving arms to position disk head on track) 

rotational delay

(waiting for block to rotate under head) 

transfer time

(actually moving data to/from disk surface) Seek time and rotational delay dominate.

 Seek time varies from about 1 to 20msec  Rotational delay varies from 0 to 10msec  Transfer rate is about 1msec per 32KB page

Christoph F. Eick

Introduction File and Database Systems

DBMS Support Transactions

 Database management systems provide powerful transaction concepts that “guarantee” ACID properties  Transaction: Begin_Transaction End_Transaction  Usually 2 commands are available to terminate a transaction: Abort and Commit

Introduction File and Database Systems

Christoph F. Eick

Review: The ACID properties

 

A tomicity

: All actions of the transaction happen, or none happen.

C onsistency

: If each transaction is consistent, and the DB starts consistent, it ends up consistent.

I solation

: Execution of one transaction is isolated from that of other transaction s.

D urability

: If a transaction commits, its effects persist.

 The

Recovery Manager

guarantees Atomicity & Durability.

Introduction File and Database Systems

Christoph F. Eick

Example

 Consider two transactions (

Xacts

): T1: T2: BEGIN A=A+100, B=B-100 END BEGIN A=1.06*A, B=1.06*B END   Intuitively, the first transaction is transferring $100 from B’s account to A’s account. The second is crediting both accounts with a 6% interest payment.

There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.

Introduction File and Database Systems

Christoph F. Eick

Atomicity of Transactions

 A transaction might could

abort commit

after completing all its actions, or it (or be aborted by the DBMS) after executing some actions.

 A very important property guaranteed by the DBMS for all transactions is that they are

atomic

.

DBMS

logs

all actions so that it can

undo

the actions of aborted transactions and

redo

the actions of successful transactions.

Introduction File and Database Systems

Christoph F. Eick

Concurrency in a DBMS

 Users submit transactions, and can think of each transaction as executing by itself.

  Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions.

 Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins.

 DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements.

 Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).

Issues:

Effect of

interleaving

transactions, and

crashes

.

Introduction File and Database Systems

Christoph F. Eick

Example (Contd.)

 Consider a possible interleaving (

schedule

) : T1: T2: A=A+100, A=1.06*A, B=B-100 B=1.06*B  This is OK. But what about: T1: T2: A=A+100, A=1.06*A, B=1.06*B B=B-100  The DBMS’s view of the second schedule: T1: T2: R(A), W(A), R(A), W(A), R(B), W(B) R(B), W(B)

Introduction File and Database Systems

Christoph F. Eick

Summary

 Concurrency control and recovery are among the most important functions provided by a DBMS.

 Users need not worry about concurrency.

 System automatically inserts lock/unlock requests and schedules actions of different transactions in such a way as to ensure that the resulting execution is equivalent to executing the transactions one after the other in some order.

 Write-ahead logging (WAL) is used to undo the actions of aborted transactions and to restore the system to a consistent state after a crash.