Chapter 1 - Introduction

Download Report

Transcript Chapter 1 - Introduction

1. Introduction 2. ER Model 3. Relational Model 4. SQL 5. Integrity and Security

Syllabus

6. Relational Database Design 7. File Structure, Indexing and Hashing 8. Transaction 9. Concurrency control 10. Recovery System

Chapter 1: Introduction

• What is Database?

• Purpose of Database Systems • View of Data • Data Models • Data Definition Language • Data Manipulation Language • Transaction Management • Storage Management • Database Administrator • Database Users • Overall System Structure

Definition of Database

• A shared collection of logically related data, designed to meet the information needs of multiple users in an organization.

• The term database is often erroneously referred to as a synonym for a “DataBase Management System (DBMS)"

Contd…

A collection of data: part numbers, product codes, customer information, etc.

It usually refers to data organized and stored on a computer that can be searched and retrieved by a computer program.

• A data structure that stores metadata, i.e. data about data. More generally we can say an organized collection of information .

• A collection of information organized and presented to serve a specific purpose.

(A telephone book is a common database.) A computerized database is an updated, organized file of machine readable information that is rapidly searched and retrieved by computer.

• An organized collection of information in computerized format .

• A collection of related information about a subject organized in a useful manner that provides a base or foundation for procedures such as retrieving information, drawing conclusions, and making decisions.

Example

Thing

Cricket Player Scholars Food Vehicle

Data (Facts/Figures)

Country, name, date of birth, specialty, matches played, runs etc.

Name, data of birth, age, country, field, books published etc.

Name, ingredients, taste, preferred time, origin, etc.

Registration number, make, owner, type, price, etc.

Purpose / Need of Database Systems

Let us discuss an example

Example: Personal Calendar

• We might start by building a file with the following structure:

What Day When Who Where

Lunch 10/24 1pm CS123 10/25 9am Biking 10/26 9am Dinner 10/26 6PM Rick Dr. Egghead Jane Jane Joe’s Diner Morris234 Jane’s house Café Le Boeuf • This text file is easy to deal with. So there's no need for a DBMS!

Problem 1: Data Organization

• Consider the all-important ``who'' field. Do we also want to keep e-mail addresses, telephone numbers etc?

• Expand our file to look like:

What When Who-name Who-email … Who-tel …. Where

• Now we are keeping our address book in our calendar and doing so redundantly.

“Link” Calendar with Address Book?

• Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information. • This link could be based on something as simple as the person's name.

Problem 2: Efficiency

• Size of personal address book is probably less than one hundred entries, but there are things we'd like to do quickly and efficiently. – “Give me all appointments on 10/28” – “When am I next meeting Jim?” • “Program” these as quickly as possible. • Have these programs executed efficiently. • What would happen if you were using a “corporate” calendar with hundreds of thousands of entries?

Problem 3. Concurrency and Reliability

• Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?

• Suppose the system crashes while we are changing the calendar. How do we recover our work?

Example

Suppose a manager schedule a meeting with his staff today (3:00pm) and at the same time his secretary schedules him to meet with the Chairman. They both see that the time is open, but presumably only one of the two meetings will show on the calendar later.

What is a DBMS?

• A database (DB) is a large, integrated collection of data.

• A DB models a real-world enterprise / DBMS contains information about a particular enterprise. • A database management system (DBMS) is a software package designed to store and manage databases / set of programs to access the data.

• DBMS provides an environment that is simultaneously convenient, secure and efficient to use.

• Is the software or tool that is used to manage the database and its users.

• A DBMS consist of different components or subsystem .

• Each subsystem or component of the DBMS performs different function(s).

• So a DBMS is collection of different programs but they all work jointly to manage the data stored in the database and its users .

• Database is collection of data, DBMS is tool to manage this data, and both jointly are called database system .

What the DBMS is about

• Organization of data • Efficient retrieval of data • Reliable storage of data • Maintaining consistent data • All these topics are interrelated.

Drawbacks of file systems

• In the early days, database applications were built directly on top of file systems • Drawbacks of using file systems to store data: – Data redundancy and inconsistency • Multiple file formats, duplication of information in different files – Difficulty in accessing data • Need to write a new program to carry out each new task – Data isolation — multiple files and formats

Drawbacks of file systems

stated explicitly

(Cont.)

– Integrity problems • Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being • Hard to add new constraints or change existing ones – Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • Example: Transfer of funds from one account to another should either complete or not happen at all

Drawbacks of file systems

performance

(Cont.)

• Concurrent access by multiple users • Concurrent accessed needed for • Uncontrolled concurrent accesses can lead to inconsistencies – Example: Two people reading a balance problems and updating it at the same time • Security problems • Hard to provide user access to some, but not all, data • Database systems offer solutions to all the above

Database Applications

– Banking: all business transactions – Airlines: reservations, schedules – Universities: registration, grades – Sales: customers, products, purchases – Manufacturing: production, inventory, orders, supply chain – Human resources: employee records, salaries, tax deductions

Data and Information

• Data is the collection of raw facts collected from any specific environment for a specific purpose.

• Data in itself does not show anything about its environment.

• So to get desired types of results from the data we transform it into information by applying certain processing on it.

• Once we have processed data using different methods data is converted into meaningful form and that form of the Data is called information

Levels of Abstraction

DBMS users are not computer trained, developers hide the complexity from users thro’ levels of abstraction, to simplify user’s interactions with the system

Physical level: lowest level describes how a record (e.g., customer) is stored.

Logical level

:

next higher level describes what data stored in database, and the relationships among the data.

type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : integer; end;

View level:

highest level describes only part of the entire database. DBMS may provide many views for the same database.

View of Data

An architecture for a database system

Data Abstraction

What data users and application programs see ?

View 1

View Level

View 2 … View n

What data is stored ?

describe data properties such as data semantics, data relationships

How data is actually stored ?

e.g. are we using disks ? Which file system ?

Logical Level Physical Level

Instances and Schemas

Similar to types and variables in programming languages • Schema level – the logical structure / overall design of the database – Example: The database consists of information about a set of customers and accounts and the relationship between them – Analogous to type information of a variable in a program – Physical schema: database design at the physical – Logical schema: database design at the logical level

Instance – the actual content of the database at a particular point in time – Analogous to the value of a variable

Data Independence

• Physical Data Independence

schema others.

• Logical Data Independence –

– the ability to modify the physical schema without changing the logical – Applications depend on the logical schema – In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence the ability to modify the logical schema without causing application programs to be rewritten. It is difficult to achieve since application programs are heavily dependent on logical structure of data that they access.

Database Language

Language for accessing and manipulating the data organized by the appropriate data model • One to specify database schema, storage structure and access methods (DDL) and • other to express database queries and updates (DML) Data Definition Language (DDL) • Specification notation for defining the database schema data dictionary, a file that contains metadata, i.e. before reading or modifying the actual data

Data Manipulation Language (DML)

• Language for accessing and manipulating the data organized by the appropriate data model – DML also known as query language • Two classes of languages – Procedural – user specifies what data is required and how to get those data – Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language

Data Modeling

• A

data model

is a collection of concepts for describing data properties and domain knowledge: – Data relationships – Data semantics – Data constraints – Relational Model • Only one abstract concept • Closer to the physical representation on disk •

Normalization

• Entity-Relationship data model (mainly for database design) • Relational model • Object-based data models (Object oriented and Object-relational) • Semistructured data model (XML) • Other older models: – Network model – Hierarchical model

Entity-Relationship Model

• Models an enterprise as a collection of entities and relationships – Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • Described by a set of attributes – Relationship: an association among several entities e.g. each employee is an entity described by empno, empname, designation etc.

– Entity-relationship Model • Diagrammatic representation • Easier to work with • Syntax not important, but remember the “meaning” • Remember

what

you can model

• Entity set – set of all entities of the same type • Relationship set – set of all relationships of same type • Mapping cardinality – number of entities to which another entity can be associated via relationship set

• The overall logical design of database can be expressed graphically by an E-R diagram, which has following components: • Rectangles - entity set • Ellipses –attributes of an entity • Diamonds –relationship among entity sets • Lines – link attributes to entity sets and entity sets to relationship sets

Each component of E-R diagram is labeled with entity or relationship that it represents Example of schema in the entity-relationship model

Relational Model

• It uses collection of tables to represent data as well as relationship among those data.

• Each table has multiple columns, each column has unique name.

Other Models Network Model

• Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers.

• Records are organized as a collection of arbitrary graph

Other Models Cont.

Hierarchical Model

• Data are represented by collection of records, and relationships among those data are represented by links, which are viewed as pointers.

• Records are organized as a collection of trees rather than arbitrary graph

Database Users

Users are differentiated by the way they expect to interact with the system •

End Users

access to the database for querying, updating, and generating reports

Casual end users

: occasionally access the database need different information each time learn only a few facilities that they may use repeatedly.

use a sophisticated database query language to specify their requests typically middle- or high-level managers or other occasional browsers • Application programmers – through DML calls • Sophisticated users – query language interact with system form requests in a database

Database Users

• Specialized users – write specialized database applications that do not fit into the traditional data processing framework • Naïve users – application programs that have been written previously invoke one of the permanent – E.g. people accessing database over the web, bank tellers, clerical staff

Database Users

• •

System Analysts and Application Programmers

– Determine the requirements of end users, especially naive and parametric end users, and develop specifications for canned transactions that meet these requirements – Application programmers implement these specifications as programs; then they test, debug, document, and maintain these canned transactions

Workers behind the Scene

– Typically do not use the database for their own purposes – DBMS system designers and implementers – design and implement the DBMS modules (for implementing the catalog, query language, interface processors, data access, concurrency control, recovery, and security. ) and interfaces as a software package

Database Users

• Tool developers – Tools are optional packages that are often purchased separately – include packages for database design, performance monitoring, natural language or graphical interfaces, prototyping, simulation, and test data generation.

• Operators and maintenance personnel – system administration personnel who are responsible for the actual running and maintenance of the hardware and software environment for the database system

Database Administrator

• Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.

• Database administrator's duties include: – Schema definition – Storage structure and access method definition – Schema and physical organization modification – Granting user authority to access the database – Specifying integrity constraints – Monitoring performance and responding to changes in requirements

Database Actors

• Database Administrators – In a database environment, the primary resource is the database itself and the secondary resource is the DBMS and related software – authorizing access to the database – coordinating and monitoring its use – acquiring software and hardware resources as needed

Database Actors

• Database Designers – identifying the data to be stored in the database – choosing appropriate structures to represent and store this data undertaken before the database is actually implemented and populated with data – communicate with all prospective database users, in order to understand their requirements – develop a view of the database that meets the data and processing requirements for each group of users – These views are then analyzed and integrated with the views of other user groups. The final database design must be capable of supporting the requirements of all user groups

Overall Database System Structure

Query Processor Components • DML Compiler – translates DML statements in a query language into low level instructions • DDL interpreter – metadata.

interprets DDL statements and records them in a set of tables containing • Query evaluation engine – executes low-level instructions generated by DML compiler

Storage Manager Components • Authorization & integrity manager – for satisfaction of integrity constraints and checks the authority of user to access the data tests • Transaction manager – ensures the consistency of the database despite system failures, and concurrent transaction executions proceed with conflicting • File Manager – disk storage and the data structures used to represent information on disk Manages the allocation of space on • Buffer Manager – from disk storage into main memory, and deciding what data to catch in memory responsible for fetching data

Data structures are required as a part of physical implementation • Data files – stores database itself • Data dictionary - stores meta data about the structure of database • Indices – which provides fast access to data items that hold particular values • Statistical data – which stores statistical information about the data in the database, used by strategy selector

DATABASE SYSTEM ARCHITECTURE

Storage Management

Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

• The storage manager is responsible to the following tasks: – Interaction with the file manager – Efficient storing, retrieving and updating of data • Issues: – Storage access – File organization – Indexing and hashing

Transaction Management

• A

transaction

performs a single logical function in a database application is a collection of operations that • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.