Transcript Databases and Web-programming RW 334 Database Systems
Databases and Web-programming RW 334 Database Systems
Willem Visser [email protected]
Main Engineering A517 Computer Science Stellenbosch University
Overview
• • • • • Why and How?
Historical View of Database systems Data Models Relational Model Much of the source for these notes comes from “
Database Systems: Design, Implementation & Management
”
– Rob, Coronel and Crockett
• •
Data versus Information
Data are raw facts Information is the result of processing raw data to reveal its meaning
DATA INFORMATION
• • • • •
Business Survival
Data is required to derive knowledge Information comes from processing data Information reveal the meaning of data Decision making requires good information Good decision making is key to business survival Data Management is the practice of proper generation, storage and retrieval of data and is the cornerstone on which any successful business is built
Database Systems
• • Database is a shared and integrated computer structure that stores: –
Raw data
–
Metadata
• i.e. data about data, or a description of the structure of the raw data A database management systems (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database
Advantages of DBMS
• • • •
Better data integration
– If designed correctly!
– One part of the company knows what the other is doing
Minimizes data inconsistency
– If designed correctly!
– If one part of the company knows what the other is doing
Improved data access
– If designed correctly!
– Efficient data queries Good database systems leads to –
Improved decision making
–
Increased productivity
• •
Types of Databases
Many dimensions – Such as single user, multiple user, centralized or distributed, etc.
• From a DB design point we will consider multiple user and centralized Two fundamental types –
Operational
• Supports companies day to day business • We will focus almost exclusively on these • Also called production databases –
Analytical
• Required to make business decisions • Sometimes these are called data warehouses (more about that later in the course) and they store aggregations of data from various operational DBs typically over time • Of course sometimes the operational DB is used as if it is an analytical DB, but that can be very dangerous (more about that later as well)
Database Design
• Is more important than you can ever imagine • • • Bad design leads to bad data Bad data leads to bad information Bad information leads to bankruptcy • That is why database designers earn BIG bucks
• • • • •
History of Databases
Manual File systems to Computer File systems File System Shortcomings – Extensive programming!
– Difficult to make any changes or ad hoc queries – No structural independence • Change the structure and lots of dependents need to change – No data independence • Change the data and lots of dependents need to change Biggest Issue – must tell system not just what to do, but also how to do it Fosters data redundancy – Same data in many places Causes data inconsistency – Changing some versions of the data but not others
Database Systems
• • • • •
Hardware Software
– OS, DBMS, Application programs
People
– System administrators • Oversee general operations – Database administrators (DBA) • Manages DBMS – DB Designers • Design the structure of the DB – Programmers • Build the applications that use the DB – End Users
Procedures
– How to use the DB in the organization
Data
Database Management Systems
• • • Level of abstraction to hide the complexities of the data storage and retrieval –
Ensures data and structural independence
– Hides difference between logical (what user sees) and physical (what computer sees) data format
Data storage management
– Data is stored in very complex ways under the hood and is hidden from the user, until you need to optimize the performance!
Security management
– Allow users to have different access
Database Management Systems
• • • •
Concurrency control
– Important in multi-user DBs to keep consistency
Backup and Recovery
– Disks fail!!!
Data Integrity
– Minimizes redundancy and maximizes consistency
Easy Access
– Efficient Query languages and programming language access support – Good connectivity to other environments
DBMS The Dark Side
• • • •
Increased Costs
– Some of these DBMS systems are VERY expensive and run on expensive hardware
Management Complexities
– DBAs are very well paid, for good reason!
Maintenance
– Must keep the system up to date and patches are frequent
Vendor dependence
– A killer! – Can become very expensive • How many boats do Larry Ellison own again?
Data Models
• • • How to represent real world entities in computer accessible form –
Abstraction of complex object or event
Database Environment – Data structures and their characteristics – – Relationships Constraints – Transformations
Models help one see the big picture
– Always true not just in database design
Data Models Consists of
• • • •
Entities
– About which data is to be collected – People, places, objects, events,…
Characteristics
– Of the entity
Relationships
– Between entities – – One to one, 1-1 One to Many, 1-* – Many to Many, *-*
Constraints
– On the entities, characteristics and relationships
Business Rules
• •
Is a brief, precise and unambiguous description of a policy, procedure or principle within an organization
– “ Students may use many books ” only one student ” • • Two entities: STUDENT and BOOK and 1-* Relationship (1 = student, * = book) “ A book is a used by – “ A class must have at least 5 students and no more than 30 ” • Constraint , two entities and a relationship Business Rules can be most easily discovered through targeted interviews – Iteratively! Not everyone will agree on the rules…at first
• • • • • • •
Rules into Data
Noun becomes Entity Verb becomes a Relationship “ A customer may generate many invoices ” – – Nouns: Entities CUSTOMER and INVOICE Verb: Relationship “ GENERATE ” Relationships are bidirectional – How many instances of B are related to one instance of A?
– How many instances of A are related to one instance of B?
How many classes can one student enroll in?
– Many How many students can be enrolled in one class?
– Many Relationship between Student and Class is: – Many to Many, *-*
History of Data Models
• • • • •
File System
– 1960s
Hierarchical and Network Models
– 1970s
Relational
– 1970-present
Object-Oriented Extended Relational
– Mid 1980s-present
Unstructured (XML format)
– Extended Relational – 2000s - future
Hierarchical Model
Parent Children
• • • •
Parent to Children is a One to Many (1-*) Relationship
Children refine the Parent Entity • AGENT (parent) represents various ENTERTAINERS (children) Benefits: • Fast retrieval and referential integrity (every child must be linked to parent) Problems: • Rigid structure that is hard to change, i.e. no structural independence •
Not easy to present *-* relationships, can lead to redundant and inconsistent data
Network Model
• • •
A child can have more than one parent
Major drawback of both Hierarchical and Network model –
No structurally independent way of extracting information!
– And of course any change in the systems was a major headache These models were abandoned in favor of the relational model in the 1980s
•
Relational Model
Edgar F. Codd (IBM) published the following paper in 1970 – “ A Relational Model of Data for Large Shared Databanks ” – Considered Ingenious but impractical…too much machine overhead…computers became faster and the problem went away
Relational Model
• •
Collection of Tables
– Also called a Relation – – – – Row/Column matrix Each column is called a field Each row entry is called a record (or tuple)
Tables can have common fields that relate their entries
• STUDENT ’ s Number and LIBRARYBOOK ’ s Number can show which student has a library book checked out Relational Database Management System (RDBMS) hides the real complexity of the model from the user –
User sees the logical view and the physical view is completely hidden
–
This is what made it so popular…and the query language, SQL, of course
– SQL is called declarative, since it says what to do, but not how to do it
• • • • •
Entity Relationship (ER) Model
Designing a database can be complex and a notation for doing it graphically was required Peter Chen introduced the ER data model in 1976 – Shown in an Entity Relationship diagram Unified Modeling Language (UML) can be used – Specifically the class diagram part without method descriptions Entities Relationships – 1 (one) – 0..* (many)
Object Oriented Data Model
• • • You all know this one!
Some databases that use this as its data model Mostly this model makes sense when combined with the Relational Model – Also called the Extended Relational Model
Summary
• • •
Data Models are abstractions
– Get them from business rules – Consists of entities, attributes, relationships and constraints Relational Model is most popular data model in the database world – Tables are related through common attributes – ER models are popular for DB design Next, Relational Model in detail