Document 7124488

Download Report

Transcript Document 7124488

Database Management System (DBMS)
By: Dr. Mohamed Yagoub Mohamed
E-mail: [email protected]
URL: http://www.angelfire.com/mo/yagoub
Overview
 Why we need database
 Database Concept
 Types of database models
 Relational database
DBMS
• DBMS is a collection of data
(database) and programs to access
that data. The goal of DBMS is to
store, retrieve, and display
information
• Key characteristics of DBMS are:
performance, store large volume of
database, share data (access),
provide security (authorization),
remove redundancy (normalization)
and provide concurrent access
(different users at the same time).
Why we need database?
 Without database GIS is cartography
(electronic map)
 No database No spatial analysis
Why we need database?
Principles of GIS
Hardware
Information
GIS
People
Software
Database and GIS
GIS database
Attribute
DBMS
Text, Images
Sound, movie
Multimedia
Spatial data
location
Data abstraction
• Physical level: Describe how the
data are actually store (word or
bytes)
• Conceptual level: Describe what
data are actually stored in the
database (Structure). it gives
Schematic representation of
phenomena
Data abstraction-Continue
• View level: Describe only a part of the
entire database. Many users of the database
may be concerned with a subset of
information. The system may provide many
views for the same database
Data abstraction
View Level
Conceptual level
Logical Level
Different users for
the same database
Instance of schemes
• The collection of database at a particular
moment is called the instance of the
database
• The overall design of the database is called
the database scheme
Types of database models
Data model is a collection of
conceptual tools for describing data,
data relationship, data semantics, and
consistency constraints. There are
mainly three types of models
• Object-based logical models
Are used to describe data at the
conceptual and view level. Example
of these the Entity-Relationship
model and object-oriented model
Data models- Continue
• Record-based logical models
Are used to describe data at the conceptual
and view level. Example of these are:
Network model, Hierarchical model, and
relational model.
• Physical data models
Are used to describe data at the physical
level (bytes and words). It is mainly deal
with hardware.
Entity-Relationship (E-R) model
• It is based on simulation of the real world
which consists of basic objects called
entities and relationship among these
objects
• The overall logical data structure of a
database can be expressed graphically by
an E-R diagram. Which consists of
rectangle (entity), ellipse (attribute),
diamond(relationship), and lines.
Street
Number
Balance
SSN
Name
Date
Customer
1
Custom
Acct
Account
3
2
3 Tables can represent the above relation
E-R Diagram
for customer CustomAcct relationship associate
a customer with each account he has
Aggregation
Aggregation is an abstraction through which
relationships are treated as higher-level
entities i.e. express relationships among
relationships
Name
Employee
SSN
Hours
Numbe
r
Project
Work
Users
5 tables can
represent the
diagram
Type
Machinery
E-R diagram with aggregation
Generalization
In E-R diagram generalization is depicted
through a triangle labeled ISA (is a). The
attribute of higher level entity are said to be
inherited by lower level entity. e.g. both
saving and checking account inherit the
attributes of account
Balance
Account number
Account
3 tables can
represent the
diagram
Saving account
Interest rate
ISA
Checking account
Over draft
E-R diagram with generalization
E-R diagram Table
• An Entity (E) with attributes a1..an can be
represented by a table called E with n
columns for each attribute.
• Each row in this table corresponds to one
entity of the entity set E
Let D1 ==> set of all account number
D2===> set of all balance
Any row consists of 2 tuples (v1,v2) e.g. (255,3000)
The set of all possible rows is the Cartesian product
of D1 and D2 i.e. = D1 X D2
For a table with n columns the
total number of rows = D1 X D2 X ..X Dn-1 X Dn
Account_no
255
452
560
323
215
balance
3000
3222
34555
21000
456780
Mapping constraints
• Mapping cardinalities express the number
of entities to which another entity can be
associated via a relationship
• For a binary relationship set R between
entity set A and B the mapping can be oneone (1-1), one-many(1-M), many-one (M1), and many-many(M-M)
a
1
a
2
b1
b2
b3
1-M relation
Object oriented Model
The basic unit that an object-oriented (OODBMS) manages is the object. It is based on
four basic concepts of abstraction:
• Classification: Mapping of several objects
(instances) to common class
• Generalization: Group several classes which
have the same properties in common (roads,
railway)-transportation network
Object oriented Model-Continue
• Association: Relation between similar
objects is considered a higher level set
object
• Aggregation: Objects which consist of
several other objects (Composed objects)
Object oriented Model-Continue
• OO model uses objects rather than records to
manage data
• An object is a collection of data elements and
operations that together are considered a single
entity
• An object has associated with it a set of variables
that contain the data for the object, a set of
messages to which the object respond, and a
method which response to the message
Object Oriented Model-Continue
• Once the structure is setup, the details of it
need not be user visible
• This approach has the attraction that query is
very natural
• A geographic data handling systems employ
this model are:
TIGRIS, DAPLEX, and PROBE
• It is application in GIS is recommended
Object oriented Model-Continue
• Objects are typed and the format and
operations of an object instance are the same
as some object prototype
• Example of an object might be a lake:
• List of border chain: C1, C2, C3, Cn
• List of nodes: N1, N2, N3, Nn
• Attribute: Depth, soil type
Object oriented Model-Continue
For example student can be a superclass. First and second
year student may represented by a classes that are
specialization of a student class variables and methods
specific to first year students are associated with fist year
student class.
Variables and methods that apply both to first and second
year students are associated with student class.
The variables associated with each class may be:
Student: Name, ID, address
Student
First year student: Subject
Second year student:Practical course
First year
Second year
Hierarchical Model
• Based on Tree structure (child-parent)
• No element can have more than one parent
• Requires knowledge by the user of the
actual storage scheme used by the DBMS
• Examples of database are:
System2000 and IMS
• Not commonly applicable in GIS
Network Model
• Organized data in a network or plex
structure (child-parent)
• Children may have more than one parent
• The query language is procedural
• Examples of database are:
DBMS-10, DMS1100, IDMS
• Not commonly applicable in GIS
Database tree structure (Hierarchical, Network)
Root (Parent)
A1
B2
B1
C1
Child
C2
Child
Parent
C3
C4
Child
C5
C6
Relational model
• A relational database consists of a collection
of tables, each of which is assigned a unique
name
• The relational models differs from network
and hierarchical models in that it does not
use pointers or links. Instead , the relational
model relate records by the value they
contain.This freedom from the use of
pointers allows formal mathematical
foundation to be defined
• Examples of RDBMS are Oracle, Informix,
and Sybase
Reasons to use Relational Model
• Independence of the physical data storage and
logical database structure. Results in users do not
need to understand the underlying physical layout
of the data to access data from a logical structure,
such as a table
• Variable and easy access to all data. Results in
access to data is not predefined as in hierarchical
databases in which users must understand and
navigate through the hierarchy to retrieve data
• Flexible in database design. i.e complex objects
are expressed as simple tables and relationships
• Applying relational design methods reduces data
redundancy (Normalization) and storage
requirements
Relational DBMS
• Aspects of an RDBMS
– Structures: Well defined objects
– Operations: Clearly defined actions
– Integrity Rules: Rules that control which
operations are allowed on the data and
structures of the database
Relational DBMS
• Components of a Relational Database
– Table: collection of rows all containing the
same columns
– Row: Horizontal components of a table.
Consists of values for each column. Each row is
equivalent to a record
– Column: Vertical component of a table. Each
column in the record is often referred to as a
field
Relational DBMS
• Relational Database Rules
– Each column in a table must be unique
– The order of the rows in a table is not
meaningful
– The order of the columns in a table is not
meaningful
– All data in a column must be the same type
– Every table has a primary key, each column in
the primary key must have a value
Relational DBMS
• Primary Key and Foreign Key
– Relational database use primary keys and
foreign keys to allow mapping of information
from one table to another
– A foreign key is column or group of columns in
a table whose value matches those of the
primary key of another table
– Values in primary key column must be unique
e.g. social security number (SSN)
Relational DBMS
• Primary Key and Foreign Key
– Referential Integrity refers to the integrity of
the reference from the primary key in one table
to a foreign key in another table.
Relationships between Tables
•
•
•
•
One-to-One
One-to-Many
Many-to-One
Many-to-Many
Relational DBMS
• Relational Database Example (1-1)
Weather table
city_name
Washington
Amsterdam
Warsaw
Tokyo
Washington
measurement_dt avg_temp
05-01-94
70
05-01-94
47
05-01-94
43
05-01-94
60
05-01-94
55
Foreign Key
Primary Key
Location table
city_name
Seattle
Amsterdam
Warsaw
Tokyo
Washington
Primary Key
country_name
United States
Neatherlands
Poland
Japan
05-01-94
One-to-One
NAME
Caroline
Talbot
Dorchester
Wicomico
Worcester
Somerset
AREA
POP1990
331.8066
27035
238.2847
30549
534.1747
30236
383.3481
74339
457.6503
35028
269.267
23440
NAME
Caroline
Talbot
Dorchester
Wicomico
Worcester
Somerset
NAME
POP1997
Caroline
29424
Talbot
32565
Dorchester
29953
Wicomico
79716
Worcester
41885
Somerset
24251
AREA
POP1990 POP1997
331.8066
27035
29424
238.2847
30549
32565
534.1747
30236
29953
383.3481
74339
79716
457.6503
35028
41885
269.267
23440
24251
Relational DBMS
• Relational Database Example (1-M)
Complexes table
comp_name
Kotraya
H.Plaza
Komtar
Primary Key
N.Shops
444
555
622
N.Banks
70
47
43
Owner table
comp_name
Kotraya
Kotraya
Kotraya
Kotraya
Kotraya
Foreign Key
shop_owner
ALi
Tan
Lee
Raman
Nora
One-to-Many
Dept.
Agri
Agri
Nat Sci
Dept.
Agri
Nat Sci
Agri
Agri
Agri
Nat Sci
Agri
Agri
Advisor
Shorter
Ruby
Mollett
Shorter
Shorter
Ruby
Mollett
Mollett
Advisor
Shorter
Mollett
Ruby
Name
Aaron
Brian
Devin
Eric
Julie
Leigh
Ronnie
Ryan
Advisor
Shorter
Ruby
Mollett
Shorter
Shorter
Ruby
Mollett
Mollett
Name
Aaron
Brian
Devin
Eric
Julie
Leigh
Ronnie
Ryan
Many-to-One
SITE
WV1
WV2
MD1
MD2
SITE
WV1
WV2
MD1
MD2
AREA
331.8066
238.2847
534.1747
383.3481
Veg Type
D1
C2
D1
S3
AREA
331.8066
238.2847
534.1747
383.3481
Veg Type Name
C2
Forest Coniferous
D1
Forest Deciduous
S3
Wetland Swamp
Dominant
pH of Soil
Scrub Pine
6.5
Maple
8
Skunk Cabbage
5
Veg Type Name
Dominant
pH of Soil
D1
Forest Deciduous Maple
8
C2
Forest Coniferous Scrub Pine
6.5
D1
Forest Deciduous Maple
8
S3
Wetland Sw am p Skunk Cabbage
5
Data Definition Language (DDL)
and Data Manipulation Language (DML)
• DDL store files that contains data about data
(metadata). For example storage of structure
in data dictionary
• DML enable users to access or manipulate
data (retrieval, insertion, deletion). The part
of DML that involves information retrieval
is called a query language (QL)
Types of DML
• There are two types of DML, procedural
and non procedural
• Procedural DML: Require the user to
specify what data is needed and how to get
it
• Non-procedural DML: Require the user to
specify what data is needed without specify
how to get it
Query language (QL)
• QL is the language in which a user requests
information from the database. Example of
QL are relational algebra (procedural) and
tuple relational calculus (nonprocedural)
• The most common query languages are
Structured Query Language (SQL),Query
By Example (QBE), and Quel
• SQL has gain wide acceptance in
commercial products
Relational algebra (RA)
• RA is a procedural query language. It
consists of a set of operations that take one
or two relations as input and produce a new
relation as a result
• The fundamental operations in RA are
select(6), project(II), Cartesian product(X) ,
rename, union(U), and set difference(-)
• Other operations include intersection,
natural join, division, and assignment
• To select those tuples (rows) of the AccCust
relation where customer name is “John” it
could be written as:
6name = “John” (AccCust)
The results may be one or more records
or street = “ spring”

6street = “Spring” (AccCust)
• Comparisons can done using =, >, <, >=,etc.
• Let E1 and E2 be relational algebra
expressions. Then the following are all
relational algebra expressions:
• E1 U E2
union
• E1 - E2
set difference(-)
• E1 X E2
Cartesian product
Structured Query Language (SQL)
• SQL is the standard relational database
language
• SQL include commands not only restricted
to query but to other functions such as
defining relation, deleting relations, creating
indices, and modifying relation scheme,
access right, integrity, and transaction
control
• Basic structure of an SQL expression
consists of three clauses: Select, from, and
where
• A typical SQL query has the form
select A1, A2, ..., An
from r1, r2, .., rm
where P
Each Ai represents an attribute and each
ri a relation. P is a predicate (selection). The
list of attributes A1..An can be replaced by
(*) to select all attributes
• The result of an SQL query is a relation
(table)
• The keyword “distinct” is used after select
to force the elimination of duplicates. For
example, find all customer names having a
balance equal $ 6000
select distinct customer-name
from AccCust
where balance = 6000
• SQL supports union, intersect, and minus
• SQL uses logical connectives and, or, and
not and operators (+, -, *, /) on values from
tuples
e.g.
select distinct customer-name
from AccCust
where balance between 600 and 700
Setting up User Accounts and Privileges
• Typical Roles and Privileges
User 1
User 2
User 3
User 4
Role 1
create table create view alter table
select
update
Creating Tables
– In a relational database, data is stored in
tables
– Users with connect privileges can create
tables
– The creator of the table becomes the owner
– Users cannot access the tables unless
permission is explicitly given
– Users can modify the database by deletion
i.e. remove selected tuples, insertion i.e.
insert tuples, or update i.e change a value in
a tuple without changing all values in the
tuple
Field types
•
•
•
•
•
INTEGERS e.g. number of population, age
REAL (DECIMAL) e.g. income, salary
CHARACTER e.g. names, description
DATES e.g. date of flood
IMAGES (Multimedia) e.g. image of flood
(*.tif)
• SOUND e.g. Sound of thunder (*.wav)
• MOVIE e.g. recording film (*.avi, *.mov)
• Granting Object Privileges
– Only owners of a database object can grant
object privileges to other users
– Only users with DBA privileges can grant
object privileges to roles
Creating and Displaying Indexes
• Indexes are optional structures that can be
created on any column or set of columns in
a table to speed up viewing and retrieval of
data rows.
• An index is a list of keywords with the
location of the keyword information
• Users can create indexes
Index Concept
– Index is conceptually similar to an index at
the end of a book or cataloging books in a
library
– Database index is not a physical structure but
a logical structure. The RDBMS, not the user
uses the database index. The DBMS chooses
the best index to use in the table.
– Index can dramatically speed up data access
but will require additional space and may
slow down certain SQL operations
Exporting and Importing Data
• Export and Import utilities allows data to be
moved in and out of the native database. For
Example, converting database tables to
other format such Comma Delimited
Format (CDF), Space Delimited Format
(SDF), *.dbf, and ASCII flat files (Text)
Relational Database design
In the database design considerations must be
paid to:
• Repetition of information (Normalization)
• Loss of information
• Number of fields and records required,
some RDBMS supports limited number of
fields and records (e.g. 255 fields)
• Nature of database, some RDMBS supports
only textual data, some supports multimedia
(sound, image, etc.)
Database quality
Check accuracy of attribute database by
Performing quality assurance / quality
control (QA/QC) on the database by:
• Check correctness of data
• Check mis-typing errors
Storage Capacity Terminology
Symbol Term
K
Kilobyte
M
Megabyte
G
Gigabyte
Bytes
Actual Bytes
1000
1024
1,000,000
1,048,576
1,000,000,000
1,073,741,824
NCGIA
Boolean expressions
• AND
• OR
• NOT
AND
A AND B =
T
T
T
F
F
T
F
F
Result
T
F
F
F
Boolean Operators
both expressions are true
Elevation >= 100 AND Veg Type = conif
Elevation Veg Type Rainfall
100
Mixed
1
250
conif
3
50
conif
2
225
decid
1
pH
7.0
6.5
4.0
8.0
OR
A OR B =
T
T
T
F
F
T
F
F
Result
T
T
T
F
Boolean Operators
at least one expression is true
Rainfall > 2 OR Veg type = decid
Elevation Veg Type Rainfall
100
Mixed
1
250
conif
3
50
conif
2
225
decid
1
pH
7.0
6.5
4.0
8.0
NOT
Negates the Boolean value.
Returns TRUE if the Boolean is FALSE
Returns FALSE if the Boolean is TRUE
Boolean Operators
Excludes information
NOT pH <= 6.5
Elevation Veg Type Rainfall
100
Mixed
1
250
conif
3
50
conif
2
225
decid
1
pH
7.0
6.5
4.0
8.0
Operators
- equals
- greater than
- less than
- not equal to
Operators
- greater than or equal to
- less than or equal to
- expressions enclosed in
parentheses are evaluated first
Query Syntax
[]
around field names
_
in place of spaces in field names
“ ” around string values
()
grouping and evaluation order
*
multiple character wildcard
?
Single character wildcard
Date format
yyyymmdd
Query Syntax Examples
([Roof_type] = "tile”)
([Owner_age] > 65 and [Income] < 20000 )
([Type_use] = "res*" and [Const_date] < 1955)
([State_name] = “A*”)
([Name] = “?athy”)
([Date] = 19991027)
Common Query Syntax Errors
[Roof_type] = "tile”)
([Owner_age] > 65) and [Income] < 20000 )
([Type_use] = res*)
([Const_date] < “1955”)
Query Syntax
Salisbury
Hebron
Princess Anne
Willards
Correct Query Syntax
([City] = “Salisbury” OR [City] = “Princess Anne” OR
[City] = “Hebron” OR [City] = “Willards”)
Incorrect Query Syntax
([City] = “Salisbury” OR “Princess Anne” OR
“Hebron” OR “Willards”)
References
• Bailey, T. C. (1994). “A review of statistical spatial
analysis in geographical
information systems.”
In: Fotheringham, A. S. and P. A. Rogerson (eds.).
Spatial analysis and GIS. Taylor & Francis Ltd.,
London, UK.
• Batini, C. S. and S. B. Navathe (1992). Conceptual
database design. The Benjamin/ Cummings
Publishing, California, USA.
• Hoffer, J. A., J. S. Valacich and J. F. George (1996).
Modern systems analysis and design. The
Benjamin/Cummings Publishing, California, USA.
• Michael, A. and C. Smith (1996). Teach yourself
database programming with Visual Basic 4 in 21
days. Sams Publishing, Indiana, USA