SIMS 202: Information Organization and Retrieval Lecture 11: Intro to Database Design

Download Report

Transcript SIMS 202: Information Organization and Retrieval Lecture 11: Intro to Database Design

Lecture 11: Intro to Database Design

SIMS 202: Information Organization and Retrieval

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003

http://www.sims.berkeley.edu/academics/courses/is202/f03/

IS 202 – FALL 2003 2003.09.30 - SLIDE 1

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 2

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 3

Streams vs. Clips

IS 202 – FALL 2003 The Stream of 100 Frames of Video with 6 Annotations Resulting in Many Possible Segmentations of the Stream Stream of 100 Frames of Video 2003.09.30 - SLIDE 4

Stream-Based Representation

• Makes annotation pay off – The richer the annotation, the more numerous the possible segmentations of the video stream • Clips – Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream • Annotations – Create representations which make clips, not representations of clips

IS 202 – FALL 2003 2003.09.30 - SLIDE 5

Keywords vs. Semantic Descriptors

IS 202 – FALL 2003

dog, biting, Steve

2003.09.30 - SLIDE 6

Why Keywords Don’t Work

• Are not a semantic representation • Do not describe relations between descriptors • Do not describe temporal structure • Do not converge • Do not scale

IS 202 – FALL 2003 2003.09.30 - SLIDE 7

Natural Language vs. Visual Language

Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking. IS 202 – FALL 2003 2003.09.30 - SLIDE 8

After Capture: Media Streams

IS 202 – FALL 2003 2003.09.30 - SLIDE 9

Media Streams Features

• Key features – Stream-based representation (better segmentation) – Semantic indexing (what things are similar to) – Relational indexing (who is doing what to whom) – Temporal indexing (when things happen) – Iconic interface (designed visual language) – Universal annotation (standardized markup schema) • Key benefits – More accurate annotation and retrieval – Global usability and standardization – Reuse of rich media according to content and structure

IS 202 – FALL 2003 2003.09.30 - SLIDE 10

Video Retrieval In Media Streams

• Same interface for annotation and retrieval • Assembles responses to queries as well as finds them • Query responses use semantics to degrade gracefully

IS 202 – FALL 2003 2003.09.30 - SLIDE 11

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 12

What is a Database?

IS 202 – FALL 2003 2003.09.30 - SLIDE 13

Files and Databases

• File: A collection of records or documents dealing with one organization, person, area or subject (Rowley) – Manual (paper) files – Computer files • Database: A collection of similar records with relationships between the records (Rowley) – Bibliographic, statistical, business data, images, etc.

IS 202 – FALL 2003 2003.09.30 - SLIDE 14

Database

• A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date) – Paper “Databases” • Still contain a large portion of the world’s knowledge – File-Based Data Processing Systems • Early batch processing of (primarily) business data – Database Management Systems (DBMS)

IS 202 – FALL 2003 2003.09.30 - SLIDE 15

Why DBMS?

• History – 50’s and 60’s all applications were custom built for particular needs – File based – Many similar/duplicative applications dealing with collections of business data – Early DBMS were extensions of programming languages – 1970 - E.F. Codd and the Relational Model – 1979 - Ashton-Tate and first Microcomputer DBMS

IS 202 – FALL 2003 2003.09.30 - SLIDE 16

File Based Systems

Application Delivery List Coal Estimation Just what asked for File Naughty Nice Toys

IS 202 – FALL 2003 2003.09.30 - SLIDE 17

From File Systems to DBMS

• Problems with file processing systems – Inconsistent data – Inflexibility – Limited data sharing – Poor enforcement of standards – Excessive program maintenance

IS 202 – FALL 2003 2003.09.30 - SLIDE 18

DBMS Benefits

• Minimal data redundancy • Consistency of data • Integration of data • Sharing of data • Ease of application development • Uniform security, privacy, and integrity controls • Data accessibility and responsiveness • Data independence • Reduced program maintenance

IS 202 – FALL 2003 2003.09.30 - SLIDE 19

Terms and Concepts

• Data independence – Physical representation and location of data and the use of that data are separated • The application doesn’t need to know how or where the database has stored the data, but just how to ask for it • Moving a database from one DBMS to another should not have a material effect on application program • Recoding, adding fields, etc. in the database should not affect applications

IS 202 – FALL 2003 2003.09.30 - SLIDE 20

Database Environment

IS 202 – FALL 2003

CASE Tools User Interface Application Programs DBMS Repository Database

2003.09.30 - SLIDE 21

Database Components

Database Database contains:

User’s Data Metadata Indexes Application Metadata

DBMS

===============

Design tools

Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) =============

Run time

Form processor Query processor Report Writer Language Run time

IS 202 – FALL 2003

Application Programs User Interface Applications

2003.09.30 - SLIDE 22

Types of Database Systems

• PC databases • Centralized database • Client/server databases • Distributed databases • Database models

IS 202 – FALL 2003 2003.09.30 - SLIDE 23

PC Databases

E.g.: Access FoxPro Dbase Etc.

2003.09.30 - SLIDE 24 IS 202 – FALL 2003

Centralized Databases

Central Computer

IS 202 – FALL 2003 2003.09.30 - SLIDE 25

Client Server Databases

Client Client

IS 202 – FALL 2003

Client Network Database Server

2003.09.30 - SLIDE 26

Distributed Databases

IS 202 – FALL 2003

Location C computer computer Location A Location B computer

Homogeneous Databases 2003.09.30 - SLIDE 27

Distributed Databases

Heterogeneous Or Federated Databases

Database Server

IS 202 – FALL 2003

Client Remote Comp.

Local Network Client Comm Server Remote Comp.

2003.09.30 - SLIDE 28

Terms and Concepts

• A “database application” is an application program (or set of related programs) that is used to perform a series of database activities: – Create • Add new data to the database – Read • Read current data from the database – Update • Update or modify current database data – Delete • Remove current On behalf of database users

IS 202 – FALL 2003 2003.09.30 - SLIDE 29

Terms and Concepts

• • • •

Enterprise

– Organization

Entity

– Person, Place, Thing, Event, Concept...

Attributes

– Data elements (facts) about some entity – Also sometimes called fields or items or domains

Data values

– Instances of a particular attribute for a particular entity

IS 202 – FALL 2003 2003.09.30 - SLIDE 30

Terms and Concepts

• •

Key

– An attribute or set of attributes used to identify or locate records in a file

Primary Key

– An attribute or set of attributes that

uniquely

identifies each record in a file

IS 202 – FALL 2003 2003.09.30 - SLIDE 31

Terms and Concepts

Models

– (1) Levels or views of the Database • Conceptual, logical, physical – (2) DBMS types • Relational, Hierarchic, Network, Object-Oriented, Object-Relational

IS 202 – FALL 2003 2003.09.30 - SLIDE 32

Models (1)

Application 1 External Model Application 2 External Model Application 3 External Model Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Application 4 Conceptual requirements Conceptual Model Logical Model Internal Model

IS 202 – FALL 2003

More later on this…

2003.09.30 - SLIDE 33

Data Models(2): History

• Hierarchical Model (1960’s and 1970’s) – Similar to data structures in programming languages Books (id, title) Authors (first, last) Publisher Subjects

IS 202 – FALL 2003 2003.09.30 - SLIDE 34

Data Models(2): History

• Network Model (1970’s) – Provides for single entries of data and navigational “ links ” through chains of data.

Authors Subjects Books Publishers

2003.09.30 - SLIDE 35 IS 202 – FALL 2003

Data Models(2): History

• Relational Model (1980’s) – Provides a conceptually simple model for data as relations (typically considered “tables”) with all data visible Book ID Title 1 Introductio pubid 2 The history 3 New stuff ab 4 Another title 5 And yet more 2 Author id 1 4 3 2 3 2 1 4 5 pubid pubname 1 Harper 2 Addison 3 Oxford 4 Que Book ID 1 2 3 4 4 Subid 2 1 3 2 3 Authorid Author name 1 Smith 2 Wynar 3 Jones 4 Duncan 5 Applegate Subid Subject 1 cataloging 2 history 3 stuff

2003.09.30 - SLIDE 36 IS 202 – FALL 2003

Data Models(2): History

• Object Oriented Data Model (1990’s) – Encapsulates data and operations as “Objects” Books (id, title) Authors (first, last) Publisher Subjects

IS 202 – FALL 2003 2003.09.30 - SLIDE 37

Data Models(2): History

• Object-Relational Model (1990’s) – Combines the well-known properties of the Relational Model with such OO features as: • User-defined datatypes • User-defined functions • Inheritance and sub-classing • All of the major enterprise DBMS systems are now Object-Relational or incorporate Object-Relational features

IS 202 – FALL 2003 2003.09.30 - SLIDE 38

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 39

Database System Life Cycle

Physical Creation 2 Design 1 Conversion 3 IS 202 – FALL 2003 Growth, Change, & Maintenance 6 Operations 5 Integration 4 2003.09.30 - SLIDE 40

Design

• Determination of the needs of the organization • Development of the Conceptual Model of the database – Typically using Entity-Relationship diagramming techniques • Construction of a Data Dictionary • Development of the Logical Model

IS 202 – FALL 2003 2003.09.30 - SLIDE 41

Physical Creation

• Development of the Physical Model of the Database – Data formats and types – Determination of indexes, etc.

• Load a prototype database and test • Determine and implement security, privacy and access controls • Determine and implement integrity constraints

IS 202 – FALL 2003 2003.09.30 - SLIDE 42

Conversion

• Convert existing data sets and applications to use the new database – May need programs, conversion utilities to convert old data to new formats

IS 202 – FALL 2003 2003.09.30 - SLIDE 43

Integration

• Overlaps with Phase 3 • Integration of converted applications and new applications into the new database

IS 202 – FALL 2003 2003.09.30 - SLIDE 44

Operations

• All applications run full-scale • Privacy, security, access control must be in place • Recovery and Backup procedures must be established and used

IS 202 – FALL 2003 2003.09.30 - SLIDE 45

Growth, Change, and Maintenance

• Change is a way of life – Applications, data requirements, reports, etc. will all change as new needs and requirements are found – The Database and applications and will need to be modified to meet the needs of changes

IS 202 – FALL 2003 2003.09.30 - SLIDE 46

Another View of the Life Cycle

Integration 4 Operations 5 Design Physical Creation 2 1 Conversion Growth, 3 Change 6

IS 202 – FALL 2003 2003.09.30 - SLIDE 47

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 48

Database Design Process

Application 1 External Model Application 2 External Model Application 3 External Model Application 4 External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Application 3 Conceptual requirements Application 4 Conceptual requirements Conceptual Model Logical Model Internal Model

IS 202 – FALL 2003 2003.09.30 - SLIDE 49

Entity

• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information – Persons (e.g.: customers in a business, employees, authors) – Things (e.g.: purchase orders, meetings, parts, companies) Employee

IS 202 – FALL 2003 2003.09.30 - SLIDE 50

Attributes

• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities) Birthdate First Age Middle Name Employee SSN Last Projects

IS 202 – FALL 2003 2003.09.30 - SLIDE 51

Relationships

• Relationships are the associations between entities • They can involve one or more entities and belong to particular relationship types

IS 202 – FALL 2003 2003.09.30 - SLIDE 52

Relationships

IS 202 – FALL 2003

Student Attends Supplier Project Supplies project parts Class Part

2003.09.30 - SLIDE 53

Types of Relationships

• Concerned only with

cardinality

of relationship Employee 1 Assigned 1 Truck Employee n Assigned 1 Project

IS 202 – FALL 2003

Employee m Assigned n Project Chen ER notation

2003.09.30 - SLIDE 54

Other Notations

Employee Assigned Truck

IS 202 – FALL 2003

Employee Employee Assigned Project Assigned Project “Crow’s Foot”

2003.09.30 - SLIDE 55

Other Notations

Employee Assigned Truck

IS 202 – FALL 2003

Employee Employee Assigned Project Assigned Project IDEFIX Notation

2003.09.30 - SLIDE 56

More Complex Relationships

Manager 1/1/1 Employee 1/n/n Evaluation n/n/1 Project SSN Date Project

IS 202 – FALL 2003

Employee Employee 4(2-10) Assigned 1 Project Manages Is Managed By 1 Manages n

2003.09.30 - SLIDE 57

Weak Entities

• Owe existence entirely to another entity Invoice # Part# Invoice# Quantity Order Contains Order-line Rep#

IS 202 – FALL 2003 2003.09.30 - SLIDE 58

Supertype and Subtype Entities

IS 202 – FALL 2003

Employee Sales-rep Sold Invoice Is one of Other Manages Clerk

2003.09.30 - SLIDE 59

Many to Many Relationships

Proj# SSN Hours Project Assignment Assigned Is Assigned Employee SSN

IS 202 – FALL 2003

Proj# Project

2003.09.30 - SLIDE 60

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 61

Questions: Brooke Maury

• Discussion Questions on Hoffer & McFadden: • The relational database model has remained fairly static since its inception in the 1970’s. Is this evidence of its strength as an organizational model or an indication of its inflexibility?

IS 202 – FALL 2003 2003.09.30 - SLIDE 62

Questions: Brooke Maury

• If the goal of the relational database model is to encode a ‘conceptual’ design into a logical design, is it possible that improved technology and the development of new modeling techniques will supplant the RDBMS? Specifically, what impact will XML and the development of document engineering have on organizing information in multiple normalized tables? • Conversely, what does the relational model have that would be lost if a conceptual design was encoded in another model?

IS 202 – FALL 2003 2003.09.30 - SLIDE 63

Questions: Brooke Maury

• (Next time?) The drive to develop the RDBM was in part motivated by a need to minimize the space required and improve the performance of database systems by removing redundancies. What impact will very inexpensive data storage and computing power have on the relational database model and the third normal form especially?

IS 202 – FALL 2003 2003.09.30 - SLIDE 64

Questions: Shane Ahern

• Discussion Questions for "Logical Database Design and the Relational Model" • Is the normalization process described really necessary? When I design a database schema, I find that by thinking of tables in terms of they entities they represent (employees, sales, events), I avoid most of the problems of normalization that the process seeks to address (i.e. salesperson and region in Sales table, salesperson is clearly a distinct entity from sales). If the formal process described in the article is not followed, are there potential pitfalls that might lead to problems with your database schema?

IS 202 – FALL 2003 2003.09.30 - SLIDE 65

Questions: Shane Ahern

• The article points out that "the relational model does not yet directly support supertype/subtype relationships." Once the tables in a relational database have been decomposed to third normal form, the database is efficient from systems point-of-view, but the tables no longer represent a representation of the data that is intuitive to humans. The object-oriented model more accurately mirrors the way we think about the concepts that we wish to store in databases. So perhaps object-oriented database systems are worth considering. What about XML databases?

IS 202 – FALL 2003 2003.09.30 - SLIDE 66

Questions: Arthur Law

• The three models that we have been presented with, Entity Relationship Model, NIAM Model, and Object Oriented Model all enforce a specific thought process in the organization and relationship between items in a database. With all of our recent discussion of computers understanding natural language are these methods now out of date with how we should be organizing information? Should we use artificial intelligence or learning algorithms to statistically determine the relationship between entities or is there still value in using these models?

IS 202 – FALL 2003 2003.09.30 - SLIDE 67

Questions: Arthur Law

• Each model is approximately one decade apart in development and a quick Google search shows that companies are using databases with one of the three models. However, as new models arise there doesn't seem too much interest in migrating from one data model to another. Which makes sense given that an organization using a given model probably finds that it works. Now with the proliferation of XML, we see more information being shared between organizations, so are we fated for an expensive and lengthy translation process between databases? Or should all DB administrators be responsible for upgrading to the latest model?

IS 202 – FALL 2003 2003.09.30 - SLIDE 68

Lecture Overview

• Review – MediaStreams • Databases and Database Design • Database Life Cycle • ER Diagrams • Database Design • Discussion • Next Time/Readings

IS 202 – FALL 2003 2003.09.30 - SLIDE 69

Next Time

• Database Design – Normalization and SQL • Readings (no additional DBMS readings) • Additional Questions/ or revisit some of today’s discussion questions in the light of the next lecture?

IS 202 – FALL 2003 2003.09.30 - SLIDE 70