SIMS 202: Information Organization and Retrieval Lecture 11: Intro to Database Design
Download ReportTranscript SIMS 202: Information Organization and Retrieval Lecture 11: Intro to Database Design
Lecture 11: Intro to Database Design
SIMS 202: Information Organization and Retrieval
Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
http://www.sims.berkeley.edu/academics/courses/is202/f03/
IS 202 – FALL 2003 2003.09.30 - SLIDE 1
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 2
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 3
Streams vs. Clips
IS 202 – FALL 2003 The Stream of 100 Frames of Video with 6 Annotations Resulting in Many Possible Segmentations of the Stream Stream of 100 Frames of Video 2003.09.30 - SLIDE 4
Stream-Based Representation
• Makes annotation pay off – The richer the annotation, the more numerous the possible segmentations of the video stream • Clips – Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream • Annotations – Create representations which make clips, not representations of clips
IS 202 – FALL 2003 2003.09.30 - SLIDE 5
Keywords vs. Semantic Descriptors
IS 202 – FALL 2003
dog, biting, Steve
2003.09.30 - SLIDE 6
Why Keywords Don’t Work
• Are not a semantic representation • Do not describe relations between descriptors • Do not describe temporal structure • Do not converge • Do not scale
IS 202 – FALL 2003 2003.09.30 - SLIDE 7
Natural Language vs. Visual Language
Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking. IS 202 – FALL 2003 2003.09.30 - SLIDE 8
After Capture: Media Streams
IS 202 – FALL 2003 2003.09.30 - SLIDE 9
Media Streams Features
• Key features – Stream-based representation (better segmentation) – Semantic indexing (what things are similar to) – Relational indexing (who is doing what to whom) – Temporal indexing (when things happen) – Iconic interface (designed visual language) – Universal annotation (standardized markup schema) • Key benefits – More accurate annotation and retrieval – Global usability and standardization – Reuse of rich media according to content and structure
IS 202 – FALL 2003 2003.09.30 - SLIDE 10
Video Retrieval In Media Streams
• Same interface for annotation and retrieval • Assembles responses to queries as well as finds them • Query responses use semantics to degrade gracefully
IS 202 – FALL 2003 2003.09.30 - SLIDE 11
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 12
What is a Database?
IS 202 – FALL 2003 2003.09.30 - SLIDE 13
Files and Databases
• File: A collection of records or documents dealing with one organization, person, area or subject (Rowley) – Manual (paper) files – Computer files • Database: A collection of similar records with relationships between the records (Rowley) – Bibliographic, statistical, business data, images, etc.
IS 202 – FALL 2003 2003.09.30 - SLIDE 14
Database
• A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date) – Paper “Databases” • Still contain a large portion of the world’s knowledge – File-Based Data Processing Systems • Early batch processing of (primarily) business data – Database Management Systems (DBMS)
IS 202 – FALL 2003 2003.09.30 - SLIDE 15
Why DBMS?
• History – 50’s and 60’s all applications were custom built for particular needs – File based – Many similar/duplicative applications dealing with collections of business data – Early DBMS were extensions of programming languages – 1970 - E.F. Codd and the Relational Model – 1979 - Ashton-Tate and first Microcomputer DBMS
IS 202 – FALL 2003 2003.09.30 - SLIDE 16
File Based Systems
Application Delivery List Coal Estimation Just what asked for File Naughty Nice Toys
IS 202 – FALL 2003 2003.09.30 - SLIDE 17
From File Systems to DBMS
• Problems with file processing systems – Inconsistent data – Inflexibility – Limited data sharing – Poor enforcement of standards – Excessive program maintenance
IS 202 – FALL 2003 2003.09.30 - SLIDE 18
DBMS Benefits
• Minimal data redundancy • Consistency of data • Integration of data • Sharing of data • Ease of application development • Uniform security, privacy, and integrity controls • Data accessibility and responsiveness • Data independence • Reduced program maintenance
IS 202 – FALL 2003 2003.09.30 - SLIDE 19
Terms and Concepts
• Data independence – Physical representation and location of data and the use of that data are separated • The application doesn’t need to know how or where the database has stored the data, but just how to ask for it • Moving a database from one DBMS to another should not have a material effect on application program • Recoding, adding fields, etc. in the database should not affect applications
IS 202 – FALL 2003 2003.09.30 - SLIDE 20
Database Environment
IS 202 – FALL 2003
CASE Tools User Interface Application Programs DBMS Repository Database
2003.09.30 - SLIDE 21
Database Components
Database Database contains:
User’s Data Metadata Indexes Application Metadata
DBMS
===============
Design tools
Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) =============
Run time
Form processor Query processor Report Writer Language Run time
IS 202 – FALL 2003
Application Programs User Interface Applications
2003.09.30 - SLIDE 22
Types of Database Systems
• PC databases • Centralized database • Client/server databases • Distributed databases • Database models
IS 202 – FALL 2003 2003.09.30 - SLIDE 23
PC Databases
E.g.: Access FoxPro Dbase Etc.
2003.09.30 - SLIDE 24 IS 202 – FALL 2003
Centralized Databases
Central Computer
IS 202 – FALL 2003 2003.09.30 - SLIDE 25
Client Server Databases
Client Client
IS 202 – FALL 2003
Client Network Database Server
2003.09.30 - SLIDE 26
Distributed Databases
IS 202 – FALL 2003
Location C computer computer Location A Location B computer
Homogeneous Databases 2003.09.30 - SLIDE 27
Distributed Databases
Heterogeneous Or Federated Databases
Database Server
IS 202 – FALL 2003
Client Remote Comp.
Local Network Client Comm Server Remote Comp.
2003.09.30 - SLIDE 28
Terms and Concepts
• A “database application” is an application program (or set of related programs) that is used to perform a series of database activities: – Create • Add new data to the database – Read • Read current data from the database – Update • Update or modify current database data – Delete • Remove current On behalf of database users
IS 202 – FALL 2003 2003.09.30 - SLIDE 29
Terms and Concepts
• • • •
Enterprise
– Organization
Entity
– Person, Place, Thing, Event, Concept...
Attributes
– Data elements (facts) about some entity – Also sometimes called fields or items or domains
Data values
– Instances of a particular attribute for a particular entity
IS 202 – FALL 2003 2003.09.30 - SLIDE 30
Terms and Concepts
• •
Key
– An attribute or set of attributes used to identify or locate records in a file
Primary Key
– An attribute or set of attributes that
uniquely
identifies each record in a file
IS 202 – FALL 2003 2003.09.30 - SLIDE 31
Terms and Concepts
•
Models
– (1) Levels or views of the Database • Conceptual, logical, physical – (2) DBMS types • Relational, Hierarchic, Network, Object-Oriented, Object-Relational
IS 202 – FALL 2003 2003.09.30 - SLIDE 32
Models (1)
Application 1 External Model Application 2 External Model Application 3 External Model Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Application 4 Conceptual requirements Conceptual Model Logical Model Internal Model
IS 202 – FALL 2003
More later on this…
2003.09.30 - SLIDE 33
Data Models(2): History
• Hierarchical Model (1960’s and 1970’s) – Similar to data structures in programming languages Books (id, title) Authors (first, last) Publisher Subjects
IS 202 – FALL 2003 2003.09.30 - SLIDE 34
Data Models(2): History
• Network Model (1970’s) – Provides for single entries of data and navigational “ links ” through chains of data.
Authors Subjects Books Publishers
2003.09.30 - SLIDE 35 IS 202 – FALL 2003
Data Models(2): History
• Relational Model (1980’s) – Provides a conceptually simple model for data as relations (typically considered “tables”) with all data visible Book ID Title 1 Introductio pubid 2 The history 3 New stuff ab 4 Another title 5 And yet more 2 Author id 1 4 3 2 3 2 1 4 5 pubid pubname 1 Harper 2 Addison 3 Oxford 4 Que Book ID 1 2 3 4 4 Subid 2 1 3 2 3 Authorid Author name 1 Smith 2 Wynar 3 Jones 4 Duncan 5 Applegate Subid Subject 1 cataloging 2 history 3 stuff
2003.09.30 - SLIDE 36 IS 202 – FALL 2003
Data Models(2): History
• Object Oriented Data Model (1990’s) – Encapsulates data and operations as “Objects” Books (id, title) Authors (first, last) Publisher Subjects
IS 202 – FALL 2003 2003.09.30 - SLIDE 37
Data Models(2): History
• Object-Relational Model (1990’s) – Combines the well-known properties of the Relational Model with such OO features as: • User-defined datatypes • User-defined functions • Inheritance and sub-classing • All of the major enterprise DBMS systems are now Object-Relational or incorporate Object-Relational features
IS 202 – FALL 2003 2003.09.30 - SLIDE 38
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 39
Database System Life Cycle
Physical Creation 2 Design 1 Conversion 3 IS 202 – FALL 2003 Growth, Change, & Maintenance 6 Operations 5 Integration 4 2003.09.30 - SLIDE 40
Design
• Determination of the needs of the organization • Development of the Conceptual Model of the database – Typically using Entity-Relationship diagramming techniques • Construction of a Data Dictionary • Development of the Logical Model
IS 202 – FALL 2003 2003.09.30 - SLIDE 41
Physical Creation
• Development of the Physical Model of the Database – Data formats and types – Determination of indexes, etc.
• Load a prototype database and test • Determine and implement security, privacy and access controls • Determine and implement integrity constraints
IS 202 – FALL 2003 2003.09.30 - SLIDE 42
Conversion
• Convert existing data sets and applications to use the new database – May need programs, conversion utilities to convert old data to new formats
IS 202 – FALL 2003 2003.09.30 - SLIDE 43
Integration
• Overlaps with Phase 3 • Integration of converted applications and new applications into the new database
IS 202 – FALL 2003 2003.09.30 - SLIDE 44
Operations
• All applications run full-scale • Privacy, security, access control must be in place • Recovery and Backup procedures must be established and used
IS 202 – FALL 2003 2003.09.30 - SLIDE 45
Growth, Change, and Maintenance
• Change is a way of life – Applications, data requirements, reports, etc. will all change as new needs and requirements are found – The Database and applications and will need to be modified to meet the needs of changes
IS 202 – FALL 2003 2003.09.30 - SLIDE 46
Another View of the Life Cycle
Integration 4 Operations 5 Design Physical Creation 2 1 Conversion Growth, 3 Change 6
IS 202 – FALL 2003 2003.09.30 - SLIDE 47
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 48
Database Design Process
Application 1 External Model Application 2 External Model Application 3 External Model Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Application 4 Conceptual requirements Conceptual Model Logical Model Internal Model
IS 202 – FALL 2003 2003.09.30 - SLIDE 49
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information – Persons (e.g.: customers in a business, employees, authors) – Things (e.g.: purchase orders, meetings, parts, companies) Employee
IS 202 – FALL 2003 2003.09.30 - SLIDE 50
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities) Birthdate First Age Middle Name Employee SSN Last Projects
IS 202 – FALL 2003 2003.09.30 - SLIDE 51
Relationships
• Relationships are the associations between entities • They can involve one or more entities and belong to particular relationship types
IS 202 – FALL 2003 2003.09.30 - SLIDE 52
Relationships
IS 202 – FALL 2003
Student Attends Supplier Project Supplies project parts Class Part
2003.09.30 - SLIDE 53
Types of Relationships
• Concerned only with
cardinality
of relationship Employee 1 Assigned 1 Truck Employee n Assigned 1 Project
IS 202 – FALL 2003
Employee m Assigned n Project Chen ER notation
2003.09.30 - SLIDE 54
Other Notations
Employee Assigned Truck
IS 202 – FALL 2003
Employee Employee Assigned Project Assigned Project “Crow’s Foot”
2003.09.30 - SLIDE 55
Other Notations
Employee Assigned Truck
IS 202 – FALL 2003
Employee Employee Assigned Project Assigned Project IDEFIX Notation
2003.09.30 - SLIDE 56
More Complex Relationships
Manager 1/1/1 Employee 1/n/n Evaluation n/n/1 Project SSN Date Project
IS 202 – FALL 2003
Employee Employee 4(2-10) Assigned 1 Project Manages Is Managed By 1 Manages n
2003.09.30 - SLIDE 57
Weak Entities
• Owe existence entirely to another entity Invoice # Part# Invoice# Quantity Order Contains Order-line Rep#
IS 202 – FALL 2003 2003.09.30 - SLIDE 58
Supertype and Subtype Entities
IS 202 – FALL 2003
Employee Sales-rep Sold Invoice Is one of Other Manages Clerk
2003.09.30 - SLIDE 59
Many to Many Relationships
Proj# SSN Hours Project Assignment Assigned Is Assigned Employee SSN
IS 202 – FALL 2003
Proj# Project
2003.09.30 - SLIDE 60
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 61
Questions: Brooke Maury
• Discussion Questions on Hoffer & McFadden: • The relational database model has remained fairly static since its inception in the 1970’s. Is this evidence of its strength as an organizational model or an indication of its inflexibility?
IS 202 – FALL 2003 2003.09.30 - SLIDE 62
Questions: Brooke Maury
• If the goal of the relational database model is to encode a ‘conceptual’ design into a logical design, is it possible that improved technology and the development of new modeling techniques will supplant the RDBMS? Specifically, what impact will XML and the development of document engineering have on organizing information in multiple normalized tables? • Conversely, what does the relational model have that would be lost if a conceptual design was encoded in another model?
IS 202 – FALL 2003 2003.09.30 - SLIDE 63
Questions: Brooke Maury
• (Next time?) The drive to develop the RDBM was in part motivated by a need to minimize the space required and improve the performance of database systems by removing redundancies. What impact will very inexpensive data storage and computing power have on the relational database model and the third normal form especially?
IS 202 – FALL 2003 2003.09.30 - SLIDE 64
Questions: Shane Ahern
• Discussion Questions for "Logical Database Design and the Relational Model" • Is the normalization process described really necessary? When I design a database schema, I find that by thinking of tables in terms of they entities they represent (employees, sales, events), I avoid most of the problems of normalization that the process seeks to address (i.e. salesperson and region in Sales table, salesperson is clearly a distinct entity from sales). If the formal process described in the article is not followed, are there potential pitfalls that might lead to problems with your database schema?
IS 202 – FALL 2003 2003.09.30 - SLIDE 65
Questions: Shane Ahern
• The article points out that "the relational model does not yet directly support supertype/subtype relationships." Once the tables in a relational database have been decomposed to third normal form, the database is efficient from systems point-of-view, but the tables no longer represent a representation of the data that is intuitive to humans. The object-oriented model more accurately mirrors the way we think about the concepts that we wish to store in databases. So perhaps object-oriented database systems are worth considering. What about XML databases?
IS 202 – FALL 2003 2003.09.30 - SLIDE 66
Questions: Arthur Law
• The three models that we have been presented with, Entity Relationship Model, NIAM Model, and Object Oriented Model all enforce a specific thought process in the organization and relationship between items in a database. With all of our recent discussion of computers understanding natural language are these methods now out of date with how we should be organizing information? Should we use artificial intelligence or learning algorithms to statistically determine the relationship between entities or is there still value in using these models?
IS 202 – FALL 2003 2003.09.30 - SLIDE 67
Questions: Arthur Law
• Each model is approximately one decade apart in development and a quick Google search shows that companies are using databases with one of the three models. However, as new models arise there doesn't seem too much interest in migrating from one data model to another. Which makes sense given that an organization using a given model probably finds that it works. Now with the proliferation of XML, we see more information being shared between organizations, so are we fated for an expensive and lengthy translation process between databases? Or should all DB administrators be responsible for upgrading to the latest model?
IS 202 – FALL 2003 2003.09.30 - SLIDE 68
Lecture Overview
• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Database Design • Discussion • Next Time/Readings
IS 202 – FALL 2003 2003.09.30 - SLIDE 69
Next Time
• Database Design – Normalization and SQL • Readings (no additional DBMS readings) • Additional Questions/ or revisit some of today’s discussion questions in the light of the next lecture?
IS 202 – FALL 2003 2003.09.30 - SLIDE 70