Databases and Web-programming RW 334 Database Systems

Download Report

Transcript Databases and Web-programming RW 334 Database Systems

Databases and Web-programming RW 334 Database Systems

Willem Visser [email protected]

or [email protected]

Main Engineering A517 Computer Science Stellenbosch University

Overview

• • • • • Why and How?

Historical View of Database systems Data Models Relational Model Much of the source for these notes comes from “

Database Systems: Design, Implementation & Management

– Rob, Coronel and Crockett

• •

Data versus Information

Data are raw facts Information is the result of processing raw data to reveal its meaning

DATA INFORMATION

• • • • •

Business Survival

Data is required to derive knowledge Information comes from processing data Information reveal the meaning of data Decision making requires good information Good decision making is key to business survival Data Management is the practice of proper generation, storage and retrieval of data and is the cornerstone on which any successful business is built

Database Systems

• • Database is a shared and integrated computer structure that stores: –

Raw data

Metadata

• i.e. data about data, or a description of the structure of the raw data A database management systems (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database

Advantages of DBMS

• • • •

Better data integration

– If designed correctly!

– One part of the company knows what the other is doing

Minimizes data inconsistency

– If designed correctly!

– If one part of the company knows what the other is doing

Improved data access

– If designed correctly!

– Efficient data queries Good database systems leads to –

Improved decision making

Increased productivity

• •

Types of Databases

Many dimensions – Such as single user, multiple user, centralized or distributed, etc.

• From a DB design point we will consider multiple user and centralized Two fundamental types –

Operational

• Supports companies day to day business • We will focus almost exclusively on these • Also called production databases –

Analytical

• Required to make business decisions • Sometimes these are called data warehouses (more about that later in the course) and they store aggregations of data from various operational DBs typically over time • Of course sometimes the operational DB is used as if it is an analytical DB, but that can be very dangerous (more about that later as well)

Database Design

• Is more important than you can ever imagine • • • Bad design leads to bad data Bad data leads to bad information Bad information leads to bankruptcy • That is why database designers earn BIG bucks

• • • • •

History of Databases

Manual File systems to Computer File systems File System Shortcomings – Extensive programming!

– Difficult to make any changes or ad hoc queries – No structural independence • Change the structure and lots of dependents need to change – No data independence • Change the data and lots of dependents need to change Biggest Issue – must tell system not just what to do, but also how to do it Fosters data redundancy – Same data in many places Causes data inconsistency – Changing some versions of the data but not others

Database Systems

• • • • •

Hardware Software

– OS, DBMS, Application programs

People

– System administrators • Oversee general operations – Database administrators (DBA) • Manages DBMS – DB Designers • Design the structure of the DB – Programmers • Build the applications that use the DB – End Users

Procedures

– How to use the DB in the organization

Data

Database Management Systems

• • • Level of abstraction to hide the complexities of the data storage and retrieval –

Ensures data and structural independence

Hides difference between logical (what user sees) and physical (what computer sees) data format

Data storage management

– Data is stored in very complex ways under the hood and is hidden from the user, until you need to optimize the performance!

Security management

– Allow users to have different access

Database Management Systems

• • • •

Concurrency control

– Important in multi-user DBs to keep consistency

Backup and Recovery

– Disks fail!!!

Data Integrity

– Minimizes redundancy and maximizes consistency

Easy Access

– Efficient Query languages and programming language access support – Good connectivity to other environments

DBMS The Dark Side

• • • •

Increased Costs

– Some of these DBMS systems are VERY expensive and run on expensive hardware

Management Complexities

– DBAs are very well paid, for good reason!

Maintenance

– Must keep the system up to date and patches are frequent

Vendor dependence

– A killer! – Can become very expensive • How many boats do Larry Ellison own again?

Data Models

• • • How to represent real world entities in computer accessible form –

Abstraction of complex object or event

Database Environment – Data structures and their characteristics – – Relationships Constraints – Transformations

Models help one see the big picture

– Always true not just in database design

Data Models Consists of

• • • •

Entities

– About which data is to be collected – People, places, objects, events,…

Characteristics

– Of the entity

Relationships

– Between entities – – One to one, 1-1 One to Many, 1-* – Many to Many, *-*

Constraints

– On the entities, characteristics and relationships

Business Rules

• •

Is a brief, precise and unambiguous description of a policy, procedure or principle within an organization

– “ Students may use many books ” only one student ” • • Two entities: STUDENT and BOOK and 1-* Relationship (1 = student, * = book) “ A book is a used by – “ A class must have at least 5 students and no more than 30 ” • Constraint , two entities and a relationship Business Rules can be most easily discovered through targeted interviews – Iteratively! Not everyone will agree on the rules…at first

• • • • • • •

Rules into Data

Noun becomes Entity Verb becomes a Relationship “ A customer may generate many invoices ” – – Nouns: Entities CUSTOMER and INVOICE Verb: Relationship “ GENERATE ” Relationships are bidirectional – How many instances of B are related to one instance of A?

– How many instances of A are related to one instance of B?

How many classes can one student enroll in?

– Many How many students can be enrolled in one class?

– Many Relationship between Student and Class is: – Many to Many, *-*

History of Data Models

• • • • •

File System

– 1960s

Hierarchical and Network Models

– 1970s

Relational

– 1970-present

Object-Oriented Extended Relational

– Mid 1980s-present

Unstructured (XML format)

– Extended Relational – 2000s - future

Hierarchical Model

Parent Children

• • • •

Parent to Children is a One to Many (1-*) Relationship

Children refine the Parent Entity • AGENT (parent) represents various ENTERTAINERS (children) Benefits: • Fast retrieval and referential integrity (every child must be linked to parent) Problems: • Rigid structure that is hard to change, i.e. no structural independence

Not easy to present *-* relationships, can lead to redundant and inconsistent data

Network Model

• • •

A child can have more than one parent

Major drawback of both Hierarchical and Network model –

No structurally independent way of extracting information!

– And of course any change in the systems was a major headache These models were abandoned in favor of the relational model in the 1980s

Relational Model

Edgar F. Codd (IBM) published the following paper in 1970 – “ A Relational Model of Data for Large Shared Databanks ” – Considered Ingenious but impractical…too much machine overhead…computers became faster and the problem went away

Relational Model

• •

Collection of Tables

– Also called a Relation – – – – Row/Column matrix Each column is called a field Each row entry is called a record (or tuple)

Tables can have common fields that relate their entries

• STUDENT ’ s Number and LIBRARYBOOK ’ s Number can show which student has a library book checked out Relational Database Management System (RDBMS) hides the real complexity of the model from the user –

User sees the logical view and the physical view is completely hidden

This is what made it so popular…and the query language, SQL, of course

– SQL is called declarative, since it says what to do, but not how to do it

• • • • •

Entity Relationship (ER) Model

Designing a database can be complex and a notation for doing it graphically was required Peter Chen introduced the ER data model in 1976 – Shown in an Entity Relationship diagram Unified Modeling Language (UML) can be used – Specifically the class diagram part without method descriptions Entities Relationships – 1 (one) – 0..* (many)

Object Oriented Data Model

• • • You all know this one!

Some databases that use this as its data model Mostly this model makes sense when combined with the Relational Model – Also called the Extended Relational Model

Summary

• • •

Data Models are abstractions

– Get them from business rules – Consists of entities, attributes, relationships and constraints Relational Model is most popular data model in the database world – Tables are related through common attributes – ER models are popular for DB design Next, Relational Model in detail