Transcript Slide 1

Slide 1
Chapter
5
Data Resource Management
Well, Sort-of
Slide 2
Chapter
5
Data Resource Management
Data, Data Everywhere *
• The Sloan Digital Sky Survey started in 2000. In its first few
weeks it collected more data than had been amassed the entire
history of astronomy
• By 2010, it had collected 140 terabytes of data
• Its replacement, scheduled for 2016, will collect that amount of
data every 5 days
• In 2010, Walmart processed 1M customer transactions every
hour
• This equates to 2.5 petabytes, the equivalent of 167 times the
books in the American Library of Congress
• Facebook houses more than 40 billion photos
* Excerpted from a Feb. 27th, 2010, Economist article
Slide 3
Chapter
5
Data Resource Management
Data, Data Everywhere *
• Decoding the human genome involves 3 billion base pairs.
• The first time it was attempted, it took 10 years
• It can now be accomplished in 1 week.
• It is estimated that within the next few years, the amount of global
data created will approach 2,000 Exabytes per year
(1 Exabyte = 1,000 Petabytes)
• Problem: It is estimated that the total amount of storage available
will be approximately 100 Exabytes
* Excerpted from a Feb. 27th, 2010, Economist article
Slide 4
Chapter
5
•
•
•
Data Resource Management
Data, Data Everywhere *
Kilobyte = 210 bytes 1,024 bytes
• One page of typed text typically requires 2K
Megabyte = 220 bytes 1,048,576 bytes
• Storing the complete works of Shakespeare requires 5MB
Gigabyte = 230 bytes 1,073,741,824 bytes
• A 2-hour film requires 1-2 GB
•
Tera(trillion)byte = 240 bytes 1,099,511,627,776 bytes
•
• All of the books in the Library of Congress requires 15 TB
Peta(quadrillion)byte = 250 bytes 1,125,899,906,842,624 bytes
•
•
•
• Google processes about 1 PB every hour
Exa(quintillion)byte = 260 bytes 1,152,921,504,606,846,976 bytes
• Equivalent to 10 billion copies of the economist
Zetta(sextillion)byte = 270 bytes 1,180,591,620,717,411,303,424 bytes
• The total amt. of information in existence is estimated at 1.2 ZB
Yotta(septillion)byte = 280 bytes 1,208,925,819,614,629,174,706,176 bytes
* Excerpted from a Feb. 27th, 2010, Economist article
Slide 5
Chapter
5
Data Resource Management
What is Data Resource Management??
 A managerial activity that applies information systems
technologies to the task of managing an organization’s
data resources to meet the information needs of their
business stakeholders
What does that mean??
 It’s a very fancy way of saying that we are going to talk
about databases
Slide 6
Chapter
5
Data Resource Management
What is a Database??
• A large, integrated collection of Data and Metadata
• A way we can model (parts of) the real
world (well, Sort-of)
•
Entities (i.e., a person, place, object or
event we wish to have information about).
• Students
• Physicians
• Patients
• Customers
•
The Attributes of that entity (i.e., characteristics).
• GPA
• Specialty
• Illness
• Balance Due
•
The Relationships between entities (i.e., how do entities
interact).
• One Physician has many Patients
• A Patient has only one Physician
Slide 7
Chapter
Data Resource Management
5
What is it, really??
 Consider some information the University maintains:
 Name
 Address
 SSN
 Major
 Tuition Paid
 Courses Taken
 Tuition Owed
 Grades Received  Grants/Scholarships
 HOW is this information stored?
You are an entity with attributes which vary. Within the
University, different areas have different interests in
you (i.e., the Registrar, the Bursar, etc.). Nonetheless,
you are still part of the University as a whole.
Slide 8
Chapter
5
Data Resource Management
How does this relate to a database?
You are an entity class (student)
Table
with attributes
Fields
Your attributes can
which vary
be different
Within the University,
Files ()
different areas, have
different interests in you
(i.e,. The Registrar, Bursar, etc.)
Nonetheless, you are still
part of the University
Database
Slide 9
Chapter
5
Data Resource Management
HOW does this relate to a database?
Hierarchically:
A Database consists of
Files, which contain
•••
•••
•••
•••••
•••••
•••••
••
••
••
•••
•••
•••
••
••
••
•••
•••
•••
•••
•••
•••
••
••
••
••••
••••
••••
•••
•••••
••
•••
••
•••
•••
••
••••
Records, which contain
Hernandez, Juan 123456789
Jones, Mary
72
2.42
234567890 102 3.87
Fields, which may consist
of a variety of data types
Notice that there should always be a Key (Unique) Field
Slide 10
Chapter
5
Data Resource Management
Alternatively (from smallest to largest component):
• Character: A single alphabetic, numeric or other symbol
•
•
•
•
•
Field: A group of related characters
Entity: A person, place, object or event
Attribute: A characteristic of an entity
Record: A collection of attributes that describe an entity
File: A group of related records
• Database: An integrated collection of logically related
data elements
Slide 11
Chapter
5
Data Resource Management
Logical Data Elements:
Slide 12
Chapter
5
Data Resource Management
Why Databases??
Databases were not always commonplace
 Initially, there were no databases or DataBase
Management Systems (DBMS)
 Individual Applications were written to meet specific user
needs
(File Processing or Traditional File Processing Systems)
 As business applications became more complex, it
became apparent that there were too many problems
associated with Traditional Processing Systems
Slide 13
Chapter
5
Data Resource Management
What Problems??
 Single Applications
• A program was written for (generally)
one and only one application
(The user would specify their individual needs)
 Program-Data Dependence
• Since each program was written for a specific data
set, a change in the data, or data format, required a
change in the program which uses the data
Slide 14
Chapter
5
Data Resource Management
What Problems??
 Data Redundancy
• duplicate data requires an update to
be made to all files storing that data
 Lack of Data Integration
• data stored in separate files require special programs
for output making ad hoc reporting difficult
 Data Input Errors
• If more people are required to enter data, the likelihood
that errors/mis-entered data will be stored is increased
Slide 15
Chapter
5
Data Resource Management
How did this work??
Slide 16
Chapter
5
Data Resource Management
How did databases come about??
 1960’s: North American Rockwell’s Moon Project
• > 60% of all data used was duplicated
in multiple data sets (redundancy)
 By the Mid 1960’s:
• Rockwell/IBM Joint Venture to develop a DataBase
Management System (DBMS)
• Hierarchical in Nature
 Later:
• IBM’s Information Management System (IMS)
Slide 17
Chapter
5
Data Resource Management
How are databases different??
 Database Management Approach
• Consolidates data records into
one database that can be
accessed by many different
application programs.
• Software interface between users and databases
• Data definition is stored once, separately from
application programs
Slide 18
Chapter
5
Data Resource Management
How are databases different??
 Database Management Approach
Slide 19
Chapter
5
Data Resource Management
What is a DBMS??
 Software that controls the creation, maintenance, and
use of databases
Slide 20
Chapter
5
Data Resource Management
What does a DBMS consist of??
Slide 21
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Database Development:
• Defining and organizing
the content, relationships
and structure of the data
needed to build the
database
• Specifying integrity
constraints
• Fixing of Access Rights (Authorization)
Slide 22
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Database Development:
Entity Relationship Diagrams
 Consider the following situation
A customer places an order. The order consists of parts.
Entity
Relationship
Relationship
Customer
Places
An Organization An Association
about which we between Entities
wish to maintain
information
Orders
Contain
Entity
Parts
Slide 23
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Database Maintenance:
• Updating a database
continually to reflect new
business transactions and
other events
• Updating a database to
correct data and ensure
accuracy of the data
Slide 24
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Database Interrogation:
• Capability of a DBMS to
report information from
the database in response
to end users’ requests
• Query Language: allows
easy, immediate access to
ad hoc data requests
• Report Generator: allows quick, easy specification of a
report format for information users have requested
Slide 25
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Database Interrogation:
• Natural Language vs. SQL Queries
Slide 26
Chapter
5
Data Resource Management
What are the major functions of a DBMS ???
 Application Development:
• End users, systems
analysts, and other
application developers
can use the internal 4GL
programming language
and built-in software
development tools
provided by many DBMS
packages to develop
custom application
programs.
Slide 27
Chapter
5
Data Resource Management
What are the forms of a DBMS ???
Hierarchical: relationships between
records form a hierarchy or treelike
structure
Network: data can be accessed by
one of several paths because any
data element or record can be
related to any number of other data
elements
Relational: All data elements within
the database are viewed as being
stored in the form of simple tables
Slide 28
Chapter
Data Resource Management
5
What are the forms of a DBMS ???
RDBMS
Table Student
StudentID
Name
Address
Major
123456789
Saenz, Lupe
123 Mesa
Finance
234567890
Chung, Mei
37 5th St.
INFOSYS
345678901
Adams, John
54B Hague
Accounting
456789012
Elam, Mary
123-22 E St.
INFOSYS
••••••
••••••
••••••
••••••
Field Names
Record
Field
Slide 29
Chapter
Data Resource Management
5
What are the forms of a DBMS ???
Table Student
RDBMS
StudentID
Name
Address
Major
123456789
Saenz, Lupe
123 Mesa
Finance
234567890
Chung, Mei
37 5th St.
INFOSYS
345678901
Adams, John
54B Hague
Accounting
456789012
Elam, Mary
123-22 E St.
Accounting
••••••
••••••
••••••
••••••
Table Balance
Table Department
Student
Owed
Department
Faculty
••••••
Depart
103456678
1,502.36
Marketing
987654321
••••••
Finance
123456789
COBA219
Finance
876543210
••••••
INFOSYS
456789012
COBA232
Accounting
765432109
••••••
Accounting
••••••
••••••
••••••
••••••
••••••
••••••
Slide 30
Chapter
5
Data Resource Management
What are the forms of a DBMS ???
Multidimensional Database Structure
• Variation of the relational model that uses multidimensional structures to organize data and express the
relationships between data
Slide 31
Chapter
5
Data Resource Management
What are the forms of a DBMS ???
Object-Oriented Database Structure
• Can accommodate more complex data types including
graphics, pictures, voice and text
Slide 32
Chapter
5
Data Resource Management
What are the forms of a DBMS ???
Object-Oriented Database Structure
 Encapsulation:
• data values and operations that can be performed on them
are stored as a unit
• Conceals the exact details of how a particular class works
from objects that use its code or send messages to it
 Inheritance:
• automatically creating new objects by replicating some or
all of the characteristics of one or more existing objects
Slide 33
Chapter
5
Data Resource Management
How do the DBMS structures compare ???
(These arte your authors’ viewpoints)
Hierarchical: best for structured, routine types of transaction
processing.
Network: best when many-to-many relationships are needed
Relational: best when ad hoc reporting is required.
Slide 34
Chapter
5
Data Resource Management
How are databases developed ???
Database Development: Enterprise-wide database development
is usually controlled by database administrators (DBA)
 Data Planning:
• Database administrators and
designers work with corporate
and end user management to
develop an enterprise model that
defines the basic business
process of the enterprise.
Slide 35
Chapter
5
Data Resource Management
How are databases developed ???
 Logical Schema:
• data elements and relationships
among them
 Physical Schema:
• describes how data are to be
stored and accessed on the
storage devices of a computer
system
• Data Dictionary: catalog or
directory containing metadata
Slide 36
Chapter
5
Data Resource Management
How are databases developed ???
Logical vs. Physical Designs:
Slide 37
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
Slide 38
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
• Operational: store detailed data
needed to support the business
processes and operations of a
company
 Subject Area DataBases (SADB), Transaction Databases,
Production Databases
 Customer databases
 Inventory databases
 Human Resources databases
Slide 39
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
• Distributed: databases that are
replicated and distributed in whole
or in part to network servers at a
variety of sites
 A single logical database that is
spread across computers at
multiple locations
 Replicated databases
 Partitioned databases
 Challenges: ensuring that data is constantly, consistently
and concurrently updated
Slide 40
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
• External: contain a wealth of
information available from
commercial online services and
from many sources on the World
Wide Web
 Commercial/Shareware/Freeware
 Internet dominated
Slide 41
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
• Hypermedia: consist of hyperlinked pages of multimedia
Slide 42
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
Data Warehouses
• Large database that stores data that have been extracted from
the various operational, external, and other databases of an
organization
Slide 43
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
Data Marts
• Databases that hold
subsets of data from a
data warehouse that
focus on specific
aspects of a company,
such as a department
or a business process
Slide 44
Chapter
5
Data Resource Management
How are databases used???
Types of Databases:
Data Mining Uses:
•
•
•
•
•
Perform “market-basket analysis” to identify new product bundles.
Find root causes to quality or manufacturing problems.
Prevent customer attrition and acquire new customers
Cross-sell to existing customers
Profile customers with more accuracy
Slide 45
Chapter
5
Data Resource Management
QUESTIONS???