Transcript Document
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
WFM 5201: Data Management and Statistical Analysis
Lecture-07: Database Management System
Akm Saiful Islam Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET)
June, 2008 Slide
1
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Outline
Database Management System
Introduction to Databases File System Vs. Databases Advantages of using databases Data Models – Hierarchical, network, relational, object oriented Overview of Relational Database
Slide
2
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Introduction to Databases
Information Systems process and manage data.
Data Management involves “Capturing”, “Retrieval,” and “Storage” of data.
Database Management Systems (DBMSs) are Computer systems that manage data in databases.
Today’s DBMSs are based on sophisticated software and powerful computer hardware.
Well known DBMS software includes ORACLE, Microsoft SQL Server, Sybase and MySQL(free download) among others.
Slide
3
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
File Organisation
Sequential Files
records are stored in a fixed sequence records can only be read in that sequence, starting from the first record records can only be added at the end of the file (append) sequential files are not efficient
Indexed Files
Use an index to access records in a random fashion.
Records can be sorted according to an attribute or preference. (e.g Alphabetically, Ascending, Descending, etc.) Indexed files are efficient, and faster to access.
Slide
4
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
The File Systems Approach
General Ledger File Production Planning File Invoicing File Despatch File Personnel File Payroll File Inventory File Order Entry File Redundant Data Storage.
One file is used in each application.
No data sharing.
Cross-application transfers are difficult to manage and achieve.
File Systems are rarely used for data processing anymore.
Slide
5
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
The Database Approach
General Ledger Personnel Production Planning Invoicing Despatch Payroll Inventory Order Entry Compactness. Data is stored in a single logical “place.” Data can be shared and related between applications Data transfer between applications is easier Used for a wide range of applications.
Slide
6
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Characteristics
• Amount Database size depends on the number of records or files it contains.
Complexity Database complexity depends on the number of relations between the files.
Volatility A measure of the changes typically required in a given period of time.
Immediacy A measure of how rapidly changes must be made to data.
Slide
7
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of using a Database Approach
Flexible Data Access
. DBMSs have various tools to manipulate, query, or report data, such as Structured Query Language (SQL), and Report Generators. Hence: Selected data is easily retrieved A DBMS can accommodate different data views for different users
Improved Data Integrity
. Modern DBMSs consist of various tools and methods to: ensure that data is correct, consistent, and current verify data input and check whether data is ‘reasonable’.
Slide
8
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of DBs (continued)
Improved Data Security
. Tools such as password access, and encryption, ensure that data is not: deliberately or accidentally damaged or changed accessed without proper authorisation
Data Independence
.
Problems arising from the interdependence of data and programs are kept to a minimum.
Reduced Data Redundancy.
Single version of the truth.
Efficient data storage.
Efficient time management of Hardware (CPU), programmer(s), analyst(s) and user(s).
Relational DBs use Normalisation to reduce data redundancy.
Slide
9
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of DBs (continued)
Ability to Share and Relate Data
.
Different user groups can use the same data.
Data in different (physical or logical) parts of the system can be related for a certain application.
Standardisation of Data
.
In general data items have common names and storage format.
Increased Productivity
. The various tools reduce the complexity that is otherwise associated with DB maintenance when changes are required to the system. For example Law changes, Economy Changes, User Changes.
Slide
10
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Costs of Database Approach
The implementation and use of DBMSs is normally associated with various costs. Such as: Initial expenses involve planning costs, and consultancy fees.
Computer hardware costs.
Software costs.
Database Administrator costs, and staff training costs.
Conversion costs of an existing system.
Various operational costs.
Slide
11
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Models
1. Hierarchical 2. Network 3.
Relational 4. Object
Slide
12
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
1. Hierarchical Model
Stores data as hierarchically related to each other. Record shape are tree structure.
BUET CE Faculty of Civil Engineering WRE URP Faculty of Architectural Archit.
Slide
13
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Hierarchical Database Model
Slide
14
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Hierarchical Database Model
Logically represented by an upside down tree
Each parent can have many children Each child has only one parent
Slide
15
WFM 5201: Data Management and Statistical Analysis © © Dr. Akm Saiful Islam
Hierarchical Model
Several records or files are hierarchically related with each other. For example, an organization has several departments, each of which has attributes such as name of director, number of staffs, annual products etc.
Each department has several divisions with attributes of name of manager, number of staffs, annual products etc. Then each division has several sections with attributes such as name of head, number of staff, number of PCs etc.
Slide
16
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantage and Disadvantages of Hierarchical Model
Advantages High speed access to large databases Easy to update- (to add or delete new nodes) Disadvantages Links are only possible in Vertical Direction (from top to bottom) but not for horizontal or diagonal unless they have same parents.
For example, it is hard to find what is the relation between URP and DCE from this data model.
Slide
17
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
2. Network Database Model
Doesn’t force data into hierarchical levels Owner/Member relationships: Owner record type Member record type Each owner may have one or more member types Each member type and corresponding owner record type form set, which represents relationship
Slide
18
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Network Database Model
Slide
19
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Network Database Model
Each record can have multiple parents Composed of sets - relationships Each set has owner record and member record Member may have several owners A set represents a 1:M relationship between the owner and the member Figure 1.10
Slide
20
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
3. Relational Model
Based on two important concepts:
Student ID Student
Table
Name
1 Mr. X
Key of relation
- one to one, one to many, many to many
Primary attribute
– which can’t be duplicate 2 3
Cour seID
Mr. X Mr. Y
Course
table
Title
Student Table * * Course Table Many to many relationship 001 002 003
CourseID
001 002 003
Cre dit
RS & GIS in WM 3 Watershed Hydrology 3 Risk Management 3
Slide
21
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database
Relational database is the most popular model for GIS. For example, the following relational database softwares are widely used. - INFO in ARC/INFO - DBASE III for several PC-based GIS - ORACLE for several GIS uses In a relational model, the following two important concepts should be defined.
Key of relation ;
a subset of attributes Unique identification ; e.g. the key attributes is a phone directory in a set of last name, first name and address. non redundancy ; any key attribute selected and tabulated should keep the key's uniqueness. e.g. address can not be dropped from telephone address, because there may be many with the same names.
Prime attribute :
an attribute listed in at least one key.
The most important point of the relational database design is to build a set of key attributes with a prime attribute, so as to allow dependence between attributes as well as to avoid loss of general information when records are inserted or deleted.
Slide
22
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Model
Slide
23
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Model
Figure 1.11
Slide
24
SQL
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
What is it?
S
tructured
Q
uery
L
anguage Used in ORACLE and other DB systems Non-procedural - i.e. Specify what you want not how to get it SQL - (also pronounced SEQUEL) directly related to the development of the RELATIONAL MODEL by E.F.Codd.
Slide
25
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
SQL
SQL is used to perform query in relations databases.
For example, find the name of the student who took more than or equal to 6 credit hour in this term
SELECT Student.Name, Course.Credit FROM Student, Course WHERE Student.CourseID = Course.CourseID AND Credit >= 6
The answer is : Mr. X 6
Slide
26
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Find the relationship between this two tables in the BUET Library
Book Table
ISBN 050 060 Title Applied Hydrology Irrigation Author David Maidmen Cheng ID
Borrow Table
Name 1 2 3 Mr. P Mr. Q Mr. R ISBN 050 060 070 One to one Many to Many One to Many
?
Slide
27
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Normalization of an Un-normalized Table to relational database
Slide
28
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantage of Relational Database
Advantages there is no redundancy.
type of building of an owner can be changed without destroying the relation between type and rate.
a new type of building for example "Clay" can be inserted. (row insert is easy).
Disadvantages Require a number of tables and relationship Its difficult to add a new column in the table.
Slide
29
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
4. Object Databases
Current generation systems have a need to handle complex data for complex applications such as computer aided design computer aided software engineering geographic information systems interactive web sites Relational systems are inadequate for these systems Why do you think this is?
Slide
30
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Database Types
Object-oriented
extend a programming language such as Java with persistency and a query language
Object-relational
extend a current RDBMS (e.g. Oracle) with object-oriented extensions
Slide
31
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Oriented Model
BUET Is a CE Part of Departments Is a Is a URP WRE Is a = Inheritance Part of = association Part of DCE Institutes IWFM AIT Attributes: Faculty, Staff, Students
Slide
32
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Oriented Database
An Object Oriented model uses functions to model spatial and non-spatial relationships of geographic objects and the attributes.
An object is an encapsulated unit which is characterized by attributes, a set of orientations and rules. An object oriented model has the following characteristics.
generic properties
relationship.
: there should be an inheritance
abstraction
generated by classification, generalization, association and aggregation.
: objects, classes and super classes are to be
adhoc queries
language.
: users can order spatial operations to obtain spatial relationships of geographic objects using a special
Slide
33
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example of Object Oriented Model
Slide
34
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
5. Object-Relational Database Model
Object-relational database management systems (ORDBMS): Combine: Ability of object technology to handle advanced relationship types Data integrity, reliability, and recovery features of relational models Most popular and powerful of modern database system applications Oracle, Microsoft SQL Server
Slide
35
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object-Relational Database Table
Slide
36
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Overview of Relational Database
Slide
37
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
What is a Relational Database?
A database is more than just a collection of information - such as student and course information, faculty and grades.
A database is a representation of the people and things your business needs to operate, and the way those people and things relate to each other.
A database system supports the business rules defined by the customer.
Slide
38
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Logical to Physical Database Design
The
Entities
in the
Logical Data Model
are translated into
Tables
database design in the physical The entity
attributes
become
columns
of each table in the database Data type (numeric, character, date) Business rules for the legal values for the column (the domain of the column)
Slide
39
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Models
A
data model
is a collection of concepts for describing data.
A
schema
is a description of a particular collection of data, using the given data model.
The
relational model of data
is the most widely used model today.
Main concept:
relation
, basically a table with rows and columns.
Every relation has a
schema
, which describes the columns, or fields.
Slide
40
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example: University Database
Conceptual schema:
Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string)
Physical schema:
Relations stored as unordered files. Index on first column of Students.
External Schema (View):
Course_info(cid:string,enrollment:integer)
Slide
41
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Instance of Students Relation
Students
(
sid
: string,
name:
string,
login:
string,
age:
integer,
gpa:
real )
sid
53666 53688 53650
name
Jones Smith Smith
login
jones@cs
age
18 smith@ee 18 smith@math 19
gpa
3.4
3.2
3.8
Slide
42
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Levels of Abstraction
Many
external schemata,
single
conceptual(logical) schema
and
physical schema
.
External Schema 1 External Schema 2 External Schema 3 Conceptual Schema External schemata describe Physical Schema how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used.
* Schemas are defined using DDL; data is modified/queried using DML .
Slide
43
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Terminology
Tables
within a relational database hold sets of data using rows and columns
Rows
(records) appear horizontally in a report, and contain one or more columns
Columns
(fields) are named data elements and appear vertically in a report
Primary Keys
identify uniqueness in a row
Indexes
are created for faster access to the data in the database
Slide
44
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Basic Database Concepts
u
Table
A set of related records
Record
–
A collection of data about an individual item Name: Barry Harris College: Medicine Tel: 392-5555 Name: Barry Harris College: Medicine Tel: 392-5555
u
Field
–
A single item of data common to all records Name: Barry Harris Slide
45
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
An Example of a Table
Fields Records Name
Graff Harris Ipswich
GatorLink
rgraff bharris zipswich
Phone
392-3900 392-5555 846-5656
College
Pharmacy Medicine PHHP
Slide
46
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Different parts of a database
Fields –
different types of data (number or text)
Records
Queries
Reports
Slide
47
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Concepts pf Relational Database
Based on two important concepts:
Key of relation
- one to one, one to many, many to many.
Primary attribute
be duplicate – which can’t
Slide
48
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Primary Key
The column or set of columns that provide the
uniqueness
for the row.
A table can have only
one primary key.
Existing values in primary key columns may not be modified (insert new value and then delete old value) The table of a relationship containing the primary key is called the Parent Table.
Slide
49
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Foreign Keys
A primary key referenced from another table is called a foreign key For each foreign key value, there must be a row in a table whose primary key has the
same value
.
The foreign key can be made up of one or more columns of a table but must match the primary key it is referencing A table can have any number of foreign keys.
Slide
50
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Primary Keys & Foreign Keys
Name
Graff Harris Ipswich
User
rgraff bharris zipswich
Phone
392-3900 392-5555 846-5656
College
Pharmacy Medicine PHHP To ensure that each record is unique in each table, we can set one field to be a
Primary Key
field.
A Primary Key is a field that that will contain
no duplicates
and
no blank values
.
Foreign Keys link to data in other tables Slide
51
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relationship Types
One-to-One
: relationship is single valued in both directions A manager manages one department; a department has only one manager.
One-to-Many
in the parent table is associated with many rows in the dependent table. : relationship is multi-valued in one direction - one row One department has many employees.
Many-to-Many
: relationships are multi-valued in both directions. This type of relationship can be expressed in a table with a column for each entity. (crosswalk table) An employee can work on more than one project, and a project can have more than one employee assigned. Employee, Project, and Employee/Project tables.
Slide
52
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Integrity
For a table to have
Domain Integrity
the value of each column of data is meaningful and acceptable in the business environment, and passes all the edits we impose on it.
For a table to have
Association Integrity
the relationship between two or more columns in that table satisfies a pre-defined business association.
For a table to have
Referential Integrity
referential constraints between tables must be enforced at all times by the Relational Database Management System
Slide
53
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Referential
Student *Student_ID
Integrity
Student_Course *Student_ID (FK) *Course_Number *Course_Ind For
Referential Integrity
- The foreign key must match a value in the primary key of the parent table, at all times.
In this example, the Student table has a *Primary Key - Student_ID. The Student_Course table has a 3 column *Primary Key, and also has a Foreign Key (FK) of Student_ID that references the Student table. There must never be a Student_ID in the Student_Course table that does not exist in the Student table first.
Slide
54
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Options
Consumer Flat Files Microsoft Excel - Limit of 65,536 Rows Microsoft Access FileMaker Pro MySQL (Open Source) Postgres (Open Source) Enterprise RDMS Oracle IBM/DB2 MS SQL-server Sybase Informix Lotus Notes MySQL (Open Source) Postgres (Open Source)
Slide
55