Transcript Document

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

WFM 5201: Data Management and Statistical Analysis

Lecture-07: Database Management System

Akm Saiful Islam Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET)

June, 2008 Slide

1

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Outline

Database Management System

     Introduction to Databases File System Vs. Databases Advantages of using databases Data Models – Hierarchical, network, relational, object oriented Overview of Relational Database

Slide

2

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Introduction to Databases

     Information Systems process and manage data.

Data Management involves “Capturing”, “Retrieval,” and “Storage” of data.

Database Management Systems (DBMSs) are Computer systems that manage data in databases.

Today’s DBMSs are based on sophisticated software and powerful computer hardware.

Well known DBMS software includes ORACLE, Microsoft SQL Server, Sybase and MySQL(free download) among others.

Slide

3

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

File Organisation

Sequential Files

  records are stored in a fixed sequence records can only be read in that sequence, starting from the first record   records can only be added at the end of the file (append) sequential files are not efficient

Indexed Files

 Use an index to access records in a random fashion.

  Records can be sorted according to an attribute or preference. (e.g Alphabetically, Ascending, Descending, etc.) Indexed files are efficient, and faster to access.

Slide

4

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

The File Systems Approach

General Ledger File Production Planning File Invoicing File Despatch File Personnel File Payroll File Inventory File Order Entry File      Redundant Data Storage.

One file is used in each application.

No data sharing.

Cross-application transfers are difficult to manage and achieve.

File Systems are rarely used for data processing anymore.

Slide

5

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

The Database Approach

General Ledger Personnel Production Planning Invoicing Despatch Payroll Inventory Order Entry     Compactness. Data is stored in a single logical “place.” Data can be shared and related between applications Data transfer between applications is easier Used for a wide range of applications.

Slide

6

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Database Characteristics

• Amount  Database size depends on the number of records or files it contains.

 Complexity  Database complexity depends on the number of relations between the files.

 Volatility  A measure of the changes typically required in a given period of time.

 Immediacy  A measure of how rapidly changes must be made to data.

Slide

7

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Advantages of using a Database Approach

Flexible Data Access

. DBMSs have various tools to manipulate, query, or report data, such as Structured Query Language (SQL), and Report Generators. Hence:  Selected data is easily retrieved  A DBMS can accommodate different data views for different users 

Improved Data Integrity

. Modern DBMSs consist of various tools and methods to:  ensure that data is correct, consistent, and current  verify data input and check whether data is ‘reasonable’.

Slide

8

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Advantages of DBs (continued)

  

Improved Data Security

. Tools such as password access, and encryption, ensure that data is not:  deliberately or accidentally damaged or changed  accessed without proper authorisation

Data Independence

.

 Problems arising from the interdependence of data and programs are kept to a minimum.

Reduced Data Redundancy.

 Single version of the truth.

 Efficient data storage.

 Efficient time management of Hardware (CPU), programmer(s), analyst(s) and user(s).

 Relational DBs use Normalisation to reduce data redundancy.

Slide

9

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Advantages of DBs (continued)

  

Ability to Share and Relate Data

.

 Different user groups can use the same data.

 Data in different (physical or logical) parts of the system can be related for a certain application.

Standardisation of Data

.

 In general data items have common names and storage format.

Increased Productivity

.  The various tools reduce the complexity that is otherwise associated with DB maintenance when changes are required to the system. For example Law changes, Economy Changes, User Changes.

Slide

10

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Costs of Database Approach

The implementation and use of DBMSs is normally associated with various costs. Such as:  Initial expenses involve planning costs, and consultancy fees.

 Computer hardware costs.

 Software costs.

 Database Administrator costs, and staff training costs.

 Conversion costs of an existing system.

 Various operational costs.

Slide

11

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Data Models

1. Hierarchical 2. Network 3.

Relational 4. Object

Slide

12

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

1. Hierarchical Model

Stores data as hierarchically related to each other. Record shape are tree structure.

BUET CE Faculty of Civil Engineering WRE URP Faculty of Architectural Archit.

Slide

13

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Hierarchical Database Model

Slide

14

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Hierarchical Database Model

Logically represented by an upside down tree

 Each parent can have many children  Each child has only one parent

Slide

15

WFM 5201: Data Management and Statistical Analysis © © Dr. Akm Saiful Islam

Hierarchical Model

 Several records or files are hierarchically related with each other. For example, an organization has several departments, each of which has attributes such as name of director, number of staffs, annual products etc.

 Each department has several divisions with attributes of name of manager, number of staffs, annual products etc.  Then each division has several sections with attributes such as name of head, number of staff, number of PCs etc.

Slide

16

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Advantage and Disadvantages of Hierarchical Model

 Advantages  High speed access to large databases  Easy to update- (to add or delete new nodes)  Disadvantages  Links are only possible in Vertical Direction (from top to bottom) but not for horizontal or diagonal unless they have same parents.

 For example, it is hard to find what is the relation between URP and DCE from this data model.

Slide

17

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

2. Network Database Model

 Doesn’t force data into hierarchical levels  Owner/Member relationships:  Owner record type  Member record type  Each owner may have one or more member types  Each member type and corresponding owner record type form set, which represents relationship

Slide

18

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Network Database Model

Slide

19

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 

Network Database Model

Each record can have multiple parents  Composed of sets - relationships  Each set has owner record and member record  Member may have several owners  A set represents a 1:M relationship between the owner and the member Figure 1.10

Slide

20

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

3. Relational Model

 Based on two important concepts:

Student ID Student

Table

Name

1 Mr. X 

Key of relation

- one to one, one to many, many to many 

Primary attribute

– which can’t be duplicate 2 3

Cour seID

Mr. X Mr. Y

Course

table

Title

Student Table * * Course Table Many to many relationship 001 002 003

CourseID

001 002 003

Cre dit

RS & GIS in WM 3 Watershed Hydrology 3 Risk Management 3

Slide

21

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Relational Database

   Relational database is the most popular model for GIS. For example, the following relational database softwares are widely used. - INFO in ARC/INFO - DBASE III for several PC-based GIS - ORACLE for several GIS uses In a relational model, the following two important concepts should be defined.  

Key of relation ;

a subset of attributes Unique identification ; e.g. the key attributes is a phone directory in a set of last name, first name and address. non redundancy ; any key attribute selected and tabulated should keep the key's uniqueness. e.g. address can not be dropped from telephone address, because there may be many with the same names. 

Prime attribute :

an attribute listed in at least one key.

The most important point of the relational database design is to build a set of key attributes with a prime attribute, so as to allow dependence between attributes as well as to avoid loss of general information when records are inserted or deleted.

Slide

22

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Relational Database Model

Slide

23

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Relational Database Model

Figure 1.11

Slide

24

SQL

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

What is it?

S

tructured

Q

uery

L

anguage  Used in ORACLE and other DB systems  Non-procedural - i.e. Specify what you want not how to get it  SQL - (also pronounced SEQUEL) directly related to the development of the RELATIONAL MODEL by E.F.Codd.

Slide

25

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

SQL

 SQL is used to perform query in relations databases.

 For example, find the name of the student who took more than or equal to 6 credit hour in this term

SELECT Student.Name, Course.Credit FROM Student, Course WHERE Student.CourseID = Course.CourseID AND Credit >= 6

 The answer is : Mr. X 6

Slide

26

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Find the relationship between this two tables in the BUET Library

Book Table

ISBN 050 060 Title Applied Hydrology Irrigation Author David Maidmen Cheng ID

Borrow Table

Name 1 2 3 Mr. P Mr. Q Mr. R ISBN 050 060 070 One to one Many to Many One to Many

?

Slide

27

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Normalization of an Un-normalized Table to relational database

Slide

28

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Advantage of Relational Database

 Advantages  there is no redundancy.

  type of building of an owner can be changed without destroying the relation between type and rate.

a new type of building for example "Clay" can be inserted. (row insert is easy).

 Disadvantages   Require a number of tables and relationship Its difficult to add a new column in the table.

Slide

29

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

4. Object Databases

 Current generation systems have a need to handle complex data for complex applications such as  computer aided design  computer aided software engineering  geographic information systems  interactive web sites  Relational systems are inadequate for these systems  Why do you think this is?

Slide

30

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Object Database Types

Object-oriented

 extend a programming language such as Java with persistency and a query language 

Object-relational

 extend a current RDBMS (e.g. Oracle) with object-oriented extensions

Slide

31

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Object Oriented Model

BUET Is a CE Part of Departments Is a Is a URP WRE Is a = Inheritance Part of = association Part of DCE Institutes IWFM AIT Attributes: Faculty, Staff, Students

Slide

32

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Object Oriented Database

 An Object Oriented model uses functions to model spatial and non-spatial relationships of geographic objects and the attributes.

 An object is an encapsulated unit which is characterized by attributes, a set of orientations and rules. An object oriented model has the following characteristics. 

generic properties

relationship.

: there should be an inheritance 

abstraction

generated by classification, generalization, association and aggregation.

: objects, classes and super classes are to be 

adhoc queries

language.

: users can order spatial operations to obtain spatial relationships of geographic objects using a special

Slide

33

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Example of Object Oriented Model

Slide

34

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

5. Object-Relational Database Model

 Object-relational database management systems (ORDBMS):  Combine:  Ability of object technology to handle advanced relationship types  Data integrity, reliability, and recovery features of relational models  Most popular and powerful of modern database system applications  Oracle, Microsoft SQL Server

Slide

35

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Object-Relational Database Table

Slide

36

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Overview of Relational Database

Slide

37

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

What is a Relational Database?

 A database is more than just a collection of information - such as student and course information, faculty and grades.

 A database is a representation of the people and things your business needs to operate, and the way those people and things relate to each other.

 A database system supports the business rules defined by the customer.

Slide

38

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Logical to Physical Database Design

 The

Entities

in the

Logical Data Model

are translated into

Tables

database design in the physical  The entity

attributes

become

columns

of each table in the database  Data type (numeric, character, date)  Business rules for the legal values for the column (the domain of the column)

Slide

39

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Data Models

A

data model

is a collection of concepts for describing data.

A

schema

is a description of a particular collection of data, using the given data model.

The

relational model of data

is the most widely used model today.

 Main concept:

relation

, basically a table with rows and columns.

 Every relation has a

schema

, which describes the columns, or fields.

Slide

40

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Example: University Database

Conceptual schema:

 Students(sid: string, name: string, login: string, age: integer, gpa:real)  Courses(cid: string, cname:string, credits:integer)  Enrolled(sid:string, cid:string, grade:string) 

Physical schema:

 Relations stored as unordered files.  Index on first column of Students.

External Schema (View):

 Course_info(cid:string,enrollment:integer)

Slide

41

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Instance of Students Relation

Students

(

sid

: string,

name:

string,

login:

string,

age:

integer,

gpa:

real )

sid

53666 53688 53650

name

Jones Smith Smith

login

jones@cs

age

18 smith@ee 18 smith@math 19

gpa

3.4

3.2

3.8

Slide

42

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Levels of Abstraction

 Many

external schemata,

single

conceptual(logical) schema

and

physical schema

.

External Schema 1 External Schema 2 External Schema 3 Conceptual Schema  External schemata describe Physical Schema how users see the data.  Conceptual schema defines logical structure  Physical schema describes the files and indexes used.

* Schemas are defined using DDL; data is modified/queried using DML .

Slide

43

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Database Terminology

Tables

within a relational database hold sets of data using rows and columns 

Rows

(records) appear horizontally in a report, and contain one or more columns 

Columns

(fields) are named data elements and appear vertically in a report 

Primary Keys

identify uniqueness in a row 

Indexes

are created for faster access to the data in the database

Slide

44

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Basic Database Concepts

u 

Table

A set of related records

Record

A collection of data about an individual item Name: Barry Harris College: Medicine Tel: 392-5555 Name: Barry Harris College: Medicine Tel: 392-5555

u

Field

A single item of data common to all records Name: Barry Harris Slide

45

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

An Example of a Table

Fields Records Name

Graff Harris Ipswich

GatorLink

rgraff bharris zipswich

Phone

392-3900 392-5555 846-5656

College

Pharmacy Medicine PHHP

Slide

46

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Different parts of a database

Fields –

different types of data (number or text) 

Records

Queries

Reports

Slide

47

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Concepts pf Relational Database

 Based on two important concepts: 

Key of relation

- one to one, one to many, many to many.

Primary attribute

be duplicate – which can’t

Slide

48

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Primary Key

 The column or set of columns that provide the

uniqueness

for the row.

 A table can have only

one primary key.

 Existing values in primary key columns may not be modified (insert new value and then delete old value)  The table of a relationship containing the primary key is called the Parent Table.

Slide

49

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Foreign Keys

 A primary key referenced from another table is called a foreign key  For each foreign key value, there must be a row in a table whose primary key has the

same value

.

 The foreign key can be made up of one or more columns of a table but must match the primary key it is referencing  A table can have any number of foreign keys.

Slide

50

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Primary Keys & Foreign Keys

Name

Graff Harris Ipswich

User

rgraff bharris zipswich

Phone

392-3900 392-5555 846-5656

College

Pharmacy Medicine PHHP To ensure that each record is unique in each table, we can set one field to be a

Primary Key

field.

A Primary Key is a field that that will contain

no duplicates

and

no blank values

.

Foreign Keys link to data in other tables Slide

51

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Relationship Types

One-to-One

: relationship is single valued in both directions  A manager manages one department; a department has only one manager.

One-to-Many

in the parent table is associated with many rows in the dependent table. : relationship is multi-valued in one direction - one row  One department has many employees.

Many-to-Many

: relationships are multi-valued in both directions. This type of relationship can be expressed in a table with a column for each entity. (crosswalk table)  An employee can work on more than one project, and a project can have more than one employee assigned. Employee, Project, and Employee/Project tables.

Slide

52

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Data Integrity

   For a table to have

Domain Integrity

 the value of each column of data is meaningful and acceptable in the business environment, and passes all the edits we impose on it.

For a table to have

Association Integrity

 the relationship between two or more columns in that table satisfies a pre-defined business association.

For a table to have

Referential Integrity

 referential constraints between tables must be enforced at all times by the Relational Database Management System

Slide

53

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Relational Database Referential

Student *Student_ID

Integrity

Student_Course *Student_ID (FK) *Course_Number *Course_Ind For

Referential Integrity

- The foreign key must match a value in the primary key of the parent table, at all times.

In this example, the Student table has a *Primary Key - Student_ID. The Student_Course table has a 3 column *Primary Key, and also has a Foreign Key (FK) of Student_ID that references the Student table. There must never be a Student_ID in the Student_Course table that does not exist in the Student table first.

Slide

54

WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam

Database Options

Consumer    Flat Files Microsoft Excel - Limit of 65,536 Rows Microsoft Access    FileMaker Pro MySQL (Open Source) Postgres (Open Source) Enterprise RDMS  Oracle   IBM/DB2 MS SQL-server      Sybase Informix Lotus Notes MySQL (Open Source) Postgres (Open Source)

Slide

55