Transcript Slide 1
Slide 1 Chapter 5 Data Resource Management Well, Sort-of Slide 2 Chapter 5 Data Resource Management Data, Data Everywhere * • The Sloan Digital Sky Survey started in 2000. In its first few weeks it collected more data than had been amassed the entire history of astronomy • By 2010, it had collected 140 terabytes of data • Its replacement, scheduled for 2016, will collect that amount of data every 5 days • In 2010, Walmart processed 1M customer transactions every hour • This equates to 2.5 petabytes, the equivalent of 167 times the books in the American Library of Congress • Facebook houses more than 40 billion photos * Excerpted from a Feb. 27th, 2010, Economist article Slide 3 Chapter 5 Data Resource Management Data, Data Everywhere * • Decoding the human genome involves 3 billion base pairs. • The first time it was attempted, it took 10 years • It can now be accomplished in 1 week. • It is estimated that within the next few years, the amount of global data created will approach 2,000 Exabytes per year (1 Exabyte = 1,000 Petabytes) • Problem: It is estimated that the total amount of storage available will be approximately 100 Exabytes * Excerpted from a Feb. 27th, 2010, Economist article Slide 4 Chapter 5 • • • Data Resource Management Data, Data Everywhere * Kilobyte = 210 bytes 1,024 bytes • One page of typed text typically requires 2K Megabyte = 220 bytes 1,048,576 bytes • Storing the complete works of Shakespeare requires 5MB Gigabyte = 230 bytes 1,073,741,824 bytes • A 2-hour film requires 1-2 GB • Tera(trillion)byte = 240 bytes 1,099,511,627,776 bytes • • All of the books in the Library of Congress requires 15 TB Peta(quadrillion)byte = 250 bytes 1,125,899,906,842,624 bytes • • • • Google processes about 1 PB every hour Exa(quintillion)byte = 260 bytes 1,152,921,504,606,846,976 bytes • Equivalent to 10 billion copies of the economist Zetta(sextillion)byte = 270 bytes 1,180,591,620,717,411,303,424 bytes • The total amt. of information in existence is estimated at 1.2 ZB Yotta(septillion)byte = 280 bytes 1,208,925,819,614,629,174,706,176 bytes * Excerpted from a Feb. 27th, 2010, Economist article Slide 5 Chapter 5 Data Resource Management What is Data Resource Management?? A managerial activity that applies information systems technologies to the task of managing an organization’s data resources to meet the information needs of their business stakeholders What does that mean?? It’s a very fancy way of saying that we are going to talk about databases Slide 6 Chapter 5 Data Resource Management What is a Database?? • A large, integrated collection of Data and Metadata • A way we can model (parts of) the real world (well, Sort-of) • Entities (i.e., a person, place, object or event we wish to have information about). • Students • Physicians • Patients • Customers • The Attributes of that entity (i.e., characteristics). • GPA • Specialty • Illness • Balance Due • The Relationships between entities (i.e., how do entities interact). • One Physician has many Patients • A Patient has only one Physician Slide 7 Chapter Data Resource Management 5 What is it, really?? Consider some information the University maintains: Name Address SSN Major Tuition Paid Courses Taken Tuition Owed Grades Received Grants/Scholarships HOW is this information stored? You are an entity with attributes which vary. Within the University, different areas have different interests in you (i.e., the Registrar, the Bursar, etc.). Nonetheless, you are still part of the University as a whole. Slide 8 Chapter 5 Data Resource Management How does this relate to a database? You are an entity class (student) Table with attributes Fields Your attributes can which vary be different Within the University, Files () different areas, have different interests in you (i.e,. The Registrar, Bursar, etc.) Nonetheless, you are still part of the University Database Slide 9 Chapter 5 Data Resource Management HOW does this relate to a database? Hierarchically: A Database consists of Files, which contain ••• ••• ••• ••••• ••••• ••••• •• •• •• ••• ••• ••• •• •• •• ••• ••• ••• ••• ••• ••• •• •• •• •••• •••• •••• ••• ••••• •• ••• •• ••• ••• •• •••• Records, which contain Hernandez, Juan 123456789 Jones, Mary 72 2.42 234567890 102 3.87 Fields, which may consist of a variety of data types Notice that there should always be a Key (Unique) Field Slide 10 Chapter 5 Data Resource Management Alternatively (from smallest to largest component): • Character: A single alphabetic, numeric or other symbol • • • • • Field: A group of related characters Entity: A person, place, object or event Attribute: A characteristic of an entity Record: A collection of attributes that describe an entity File: A group of related records • Database: An integrated collection of logically related data elements Slide 11 Chapter 5 Data Resource Management Logical Data Elements: Slide 12 Chapter 5 Data Resource Management Why Databases?? Databases were not always commonplace Initially, there were no databases or DataBase Management Systems (DBMS) Individual Applications were written to meet specific user needs (File Processing or Traditional File Processing Systems) As business applications became more complex, it became apparent that there were too many problems associated with Traditional Processing Systems Slide 13 Chapter 5 Data Resource Management What Problems?? Single Applications • A program was written for (generally) one and only one application (The user would specify their individual needs) Program-Data Dependence • Since each program was written for a specific data set, a change in the data, or data format, required a change in the program which uses the data Slide 14 Chapter 5 Data Resource Management What Problems?? Data Redundancy • duplicate data requires an update to be made to all files storing that data Lack of Data Integration • data stored in separate files require special programs for output making ad hoc reporting difficult Data Input Errors • If more people are required to enter data, the likelihood that errors/mis-entered data will be stored is increased Slide 15 Chapter 5 Data Resource Management How did this work?? Slide 16 Chapter 5 Data Resource Management How did databases come about?? 1960’s: North American Rockwell’s Moon Project • > 60% of all data used was duplicated in multiple data sets (redundancy) By the Mid 1960’s: • Rockwell/IBM Joint Venture to develop a DataBase Management System (DBMS) • Hierarchical in Nature Later: • IBM’s Information Management System (IMS) Slide 17 Chapter 5 Data Resource Management How are databases different?? Database Management Approach • Consolidates data records into one database that can be accessed by many different application programs. • Software interface between users and databases • Data definition is stored once, separately from application programs Slide 18 Chapter 5 Data Resource Management How are databases different?? Database Management Approach Slide 19 Chapter 5 Data Resource Management What is a DBMS?? Software that controls the creation, maintenance, and use of databases Slide 20 Chapter 5 Data Resource Management What does a DBMS consist of?? Slide 21 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Database Development: • Defining and organizing the content, relationships and structure of the data needed to build the database • Specifying integrity constraints • Fixing of Access Rights (Authorization) Slide 22 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Database Development: Entity Relationship Diagrams Consider the following situation A customer places an order. The order consists of parts. Entity Relationship Relationship Customer Places An Organization An Association about which we between Entities wish to maintain information Orders Contain Entity Parts Slide 23 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Database Maintenance: • Updating a database continually to reflect new business transactions and other events • Updating a database to correct data and ensure accuracy of the data Slide 24 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Database Interrogation: • Capability of a DBMS to report information from the database in response to end users’ requests • Query Language: allows easy, immediate access to ad hoc data requests • Report Generator: allows quick, easy specification of a report format for information users have requested Slide 25 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Database Interrogation: • Natural Language vs. SQL Queries Slide 26 Chapter 5 Data Resource Management What are the major functions of a DBMS ??? Application Development: • End users, systems analysts, and other application developers can use the internal 4GL programming language and built-in software development tools provided by many DBMS packages to develop custom application programs. Slide 27 Chapter 5 Data Resource Management What are the forms of a DBMS ??? Hierarchical: relationships between records form a hierarchy or treelike structure Network: data can be accessed by one of several paths because any data element or record can be related to any number of other data elements Relational: All data elements within the database are viewed as being stored in the form of simple tables Slide 28 Chapter Data Resource Management 5 What are the forms of a DBMS ??? RDBMS Table Student StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting 456789012 Elam, Mary 123-22 E St. INFOSYS •••••• •••••• •••••• •••••• Field Names Record Field Slide 29 Chapter Data Resource Management 5 What are the forms of a DBMS ??? Table Student RDBMS StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting 456789012 Elam, Mary 123-22 E St. Accounting •••••• •••••• •••••• •••••• Table Balance Table Department Student Owed Department Faculty •••••• Depart 103456678 1,502.36 Marketing 987654321 •••••• Finance 123456789 COBA219 Finance 876543210 •••••• INFOSYS 456789012 COBA232 Accounting 765432109 •••••• Accounting •••••• •••••• •••••• •••••• •••••• •••••• Slide 30 Chapter 5 Data Resource Management What are the forms of a DBMS ??? Multidimensional Database Structure • Variation of the relational model that uses multidimensional structures to organize data and express the relationships between data Slide 31 Chapter 5 Data Resource Management What are the forms of a DBMS ??? Object-Oriented Database Structure • Can accommodate more complex data types including graphics, pictures, voice and text Slide 32 Chapter 5 Data Resource Management What are the forms of a DBMS ??? Object-Oriented Database Structure Encapsulation: • data values and operations that can be performed on them are stored as a unit • Conceals the exact details of how a particular class works from objects that use its code or send messages to it Inheritance: • automatically creating new objects by replicating some or all of the characteristics of one or more existing objects Slide 33 Chapter 5 Data Resource Management How do the DBMS structures compare ??? (These arte your authors’ viewpoints) Hierarchical: best for structured, routine types of transaction processing. Network: best when many-to-many relationships are needed Relational: best when ad hoc reporting is required. Slide 34 Chapter 5 Data Resource Management How are databases developed ??? Database Development: Enterprise-wide database development is usually controlled by database administrators (DBA) Data Planning: • Database administrators and designers work with corporate and end user management to develop an enterprise model that defines the basic business process of the enterprise. Slide 35 Chapter 5 Data Resource Management How are databases developed ??? Logical Schema: • data elements and relationships among them Physical Schema: • describes how data are to be stored and accessed on the storage devices of a computer system • Data Dictionary: catalog or directory containing metadata Slide 36 Chapter 5 Data Resource Management How are databases developed ??? Logical vs. Physical Designs: Slide 37 Chapter 5 Data Resource Management How are databases used??? Types of Databases: Slide 38 Chapter 5 Data Resource Management How are databases used??? Types of Databases: • Operational: store detailed data needed to support the business processes and operations of a company Subject Area DataBases (SADB), Transaction Databases, Production Databases Customer databases Inventory databases Human Resources databases Slide 39 Chapter 5 Data Resource Management How are databases used??? Types of Databases: • Distributed: databases that are replicated and distributed in whole or in part to network servers at a variety of sites A single logical database that is spread across computers at multiple locations Replicated databases Partitioned databases Challenges: ensuring that data is constantly, consistently and concurrently updated Slide 40 Chapter 5 Data Resource Management How are databases used??? Types of Databases: • External: contain a wealth of information available from commercial online services and from many sources on the World Wide Web Commercial/Shareware/Freeware Internet dominated Slide 41 Chapter 5 Data Resource Management How are databases used??? Types of Databases: • Hypermedia: consist of hyperlinked pages of multimedia Slide 42 Chapter 5 Data Resource Management How are databases used??? Types of Databases: Data Warehouses • Large database that stores data that have been extracted from the various operational, external, and other databases of an organization Slide 43 Chapter 5 Data Resource Management How are databases used??? Types of Databases: Data Marts • Databases that hold subsets of data from a data warehouse that focus on specific aspects of a company, such as a department or a business process Slide 44 Chapter 5 Data Resource Management How are databases used??? Types of Databases: Data Mining Uses: • • • • • Perform “market-basket analysis” to identify new product bundles. Find root causes to quality or manufacturing problems. Prevent customer attrition and acquire new customers Cross-sell to existing customers Profile customers with more accuracy Slide 45 Chapter 5 Data Resource Management QUESTIONS???