Chapter 12: Files and Databases

Download Report

Transcript Chapter 12: Files and Databases

Files and Databases

Chapter 12 Organizing and Managing Digital Data

© The McGraw-Hill Companies, Inc., 2000

Ch 12

Overview

• Database and database administrator • Hierarchy • File handling • File management systems • DBMS • Ethics © The McGraw-Hill Companies, Inc., 2000 2 CCI

Databases

• Collection of related files • Range in size from those on your PC to terabytes of digital photographs of the world on a large series of servers – http://teraserver.microsoft.com

• Examples – online services, virtual art museums, libraries © The McGraw-Hill Companies, Inc., 2000 Ch 12 3 CCI

Database Administration

• Managing a database • Database administrator (DBA) – design, implementation, integration] – coordination with users – system security – backup and recovery – performance monitoring © The McGraw-Hill Companies, Inc., 2000 Ch 12 4 CCI

Hierarchy and Key Field

• Data storage hierarchy – levels of data • bits, bytes, fields, records, files, and databases – definitions • character, field, record, file, database – key field • unique data used to identify a record • used for sorting • often numerically generated © The McGraw-Hill Companies, Inc., 2000 Ch 12 5 CCI

Ch 12

Basic Concepts

• Types of files – program files • software instructions – data files • data order and organization should be logical and consistent © The McGraw-Hill Companies, Inc., 2000 6 CCI

Ch 12

Types of Data Files

• Master – relatively permanent records updated periodically – currently accurate • Transaction – temporary holding file used for additions, deletions, modifications © The McGraw-Hill Companies, Inc., 2000 7 CCI

Batch vs. Online

• Batch – collect data, then process all at once – advantage • very efficient processing, checking for data validity occurs at the originating batch site • Online – real-time processing – airlines reservation system booking prevents duplicate reservations © The McGraw-Hill Companies, Inc., 2000 Ch 12 8 CCI

Offline vs. Online

• Offline – not directly accessible to the CPU such as tapes or disks that need to be loaded • Online – storage is direct and fast – generally disk © The McGraw-Hill Companies, Inc., 2000 Ch 12 9 CCI

File Organization

• Sequential access storage – stores one record after another – alphabetic or numeric • Direct access storage – can access the data using direct methods such as addressing © The McGraw-Hill Companies, Inc., 2000 Ch 12 10 CCI

Organizing Methods

• Sequential file organization – records can be retrieved in the sequence that they were stored – useful when large group needs to be accessed most of the time – catalog mailing © The McGraw-Hill Companies, Inc., 2000 Ch 12 11 CCI

Organizing Methods (continued)

• Direct file organization – random file organization – records stored in no particular sequence – hashing algorithm used to generate a unique number to identify the record – faster for finding a specific record © The McGraw-Hill Companies, Inc., 2000 Ch 12 12 CCI

Ch 12

Organizing Methods

• Indexed-sequential file organization – or indexed file organization – files stored in sequential order – indexes records according to key field – requires magnetic or optical disk – slower overall than direct access – bank has up-to-date record information, but prints sequentially (monthly statements) © The McGraw-Hill Companies, Inc., 2000 13 CCI

File Management System

• Disadvantage of file management systems – data redundancy – lack of integrity – lack of program independence Ch 12 © The McGraw-Hill Companies, Inc., 2000 14 CCI

Database Management Systems

• DBMS • Controls the structure of the database and access to the data Ch 12 © The McGraw-Hill Companies, Inc., 2000 15 CCI

Ch 12

Advantages of DBMS

• Reduced data redundancy • Improved data integrity • More program independence • Increased user productivity • Increased security © The McGraw-Hill Companies, Inc., 2000 16 CCI

Disadvantages of DBMS

• Cost issues • Data vulnerability issues • Privacy issues Ch 12 © The McGraw-Hill Companies, Inc., 2000 17 CCI

Database Organization

• Hierarchical • Network • Relational • Object-oriented Ch 12 © The McGraw-Hill Companies, Inc., 2000 18 CCI

Hierarchical

• Grouped in related groups, or tree • Lower level record called a child • Parent record at the top of the tree is called a root record • In a hierarchical database, a parent may have more than one child, but a child has only one parent (a one-to-many relationship) • Simple and fast © The McGraw-Hill Companies, Inc., 2000 Ch 12 19 CCI

Network Database

• A type of hierarchical database, but children can have more than one parent • More flexible because can establish relationships between differently parents • Limits to the number of links • Retains some of the speed of access of a hierarchical database © The McGraw-Hill Companies, Inc., 2000 Ch 12 20 CCI

Relational Database

• Relates data through a key field • More flexible • Advantage – user does not have to be aware of structure – easily add, modify, delete records © The McGraw-Hill Companies, Inc., 2000 Ch 12 21 CCI

Relational Database

• Disadvantage – can be time consuming Ch 12 © The McGraw-Hill Companies, Inc., 2000 22 CCI

Ch 12

Object-Oriented Database

• OODBMS • Numeric, text, graphics, audio • Important part of technology merge • Uses – medical information systems – engineering information systems – geographic databases – training and education © The McGraw-Hill Companies, Inc., 2000 23 CCI

DBMS Features

• Data dictionary – also called encyclopedia and repository – stores data definitions • Utilities – assist in maintaining databases by filtering acceptable data for input, editing data, and monitoring © The McGraw-Hill Companies, Inc., 2000 Ch 12 24 CCI

Ch 12

Query Language

• Data manipulation language • Used to make database queries that do not require command language • Most popular is SQL (“see quill”), or Structured Query Language © The McGraw-Hill Companies, Inc., 2000 25 CCI

SQL

• Used in Oracle, Sybase, dBase, Paradox, and Microsoft Access • Some use a natural or spoken English method of information gathering Ch 12 © The McGraw-Hill Companies, Inc., 2000 26 CCI

Report Generator

• Produces on screen or printed reports • User can customize appearance © The McGraw-Hill Companies, Inc., 2000 Ch 12 27 CCI

Access Security

• Can be tailored for group access or individual access • Physical security is equally as critical as data security © The McGraw-Hill Companies, Inc., 2000 Ch 12 28 CCI

Ch 12

System Recovery

• Recovery types – full and partial – match backup techniques • Techniques – mirroring – reprocessing – rollforward – rollback © The McGraw-Hill Companies, Inc., 2000 29 CCI

Types of Recovery

• Mirroring – frequent simultaneous copying of data to two or more places • Reprocessing – goes back to a point of database activity where the database was correct and reprocesses data to bring it up to date © The McGraw-Hill Companies, Inc., 2000 Ch 12 30 CCI

Types of Recovery (continued)

• Rollforward: forward recovery – recreates current database using a previously stored database – uses after-image records with processing information • Rollback: backward recovery – undoes unwanted images, for example, if only half a transaction was processed © The McGraw-Hill Companies, Inc., 2000 Ch 12 31 CCI

Mining, Warehouses, “Siftware”

• Data mining – DM, or knowledge discovery – sifts through large database to uncover trends and predict future trends – helps in marketing, health, and science © The McGraw-Hill Companies, Inc., 2000 Ch 12 32 CCI

Data Warehousing

• Requires data preparation – identification of all data sources – fuse data and clean or scrub data to ensure accuracy – metadata shows the origins of data, the transformations, and summary data • Data warehouse – combination of cleaned data and metadata – often uses massively parallel processing (MPP) © The McGraw-Hill Companies, Inc., 2000 Ch 12 33 CCI

Siftware for Finding and Analyzing

• Query-and-reporting tools – Focus Reporter, Seagate Crystal Reports, Esperant – Specific questions to verify hypotheses • Multidimensional-analysis tools – MDA – Essbase, Lightship – data surfing to explore dimensions of subset © The McGraw-Hill Companies, Inc., 2000 Ch 12 34 CCI

Siftware for Finding and Analyzing...

• Intelligent agents – roam networks and perform complex tasks – DataEngine, Data, Logic – Help turn up unexpected relationships and patterns • Data mining – combines facts from all parts of a business – cash registers, shipping documents, credit-card files © The McGraw-Hill Companies, Inc., 2000 Ch 12 35 CCI

Ch 12

Ethics of Using Databases

• Misinformation explosion – data is found, but little effort is made to insure that the data is updated – reliance on anecdotal evidence – causes inaccuracies that can be harmful © The McGraw-Hill Companies, Inc., 2000 36 CCI

Ch 12

Information Accuracy

• More facts, faster facts, but not necessarily better facts • Database is not necessarily updated with current information • Computer sources not necessarily accurate © The McGraw-Hill Companies, Inc., 2000 37 CCI

Information Completeness

• Know the boundaries, as no information service has it all • Know the complete iterations of key words • History is limited – most databases go back only to 1980 – frequently assessment is unthinkingly extended to years beyond 1980 © The McGraw-Hill Companies, Inc., 2000 Ch 12 38 CCI

Ch 12

Privacy Issues

• Right not to reveal information about one’s self • Credit card, shopping habits, harassment • Fair Information Practices – U.S. Department of Health, education, and Welfare © The McGraw-Hill Companies, Inc., 2000 39 CCI

Privacy Enactment

• Privacy Act of 1974 – limits government and their contractors – right to see and correct inaccurate data about one’s self • Freedom of Information Act – personal access to data gathered on self • Computer Matching and Privacy Protection Act – prevents government from comparing some records to other records of individuals © The McGraw-Hill Companies, Inc., 2000 Ch 12 40 CCI

Ch 12

Finance Privacy

• Fair Credit Reporting Act of 1970 – access to and challenge credit records – if denied credit, must be given free of charge • Right to Financial Privacy Act of 1978 – restrictions on federal agencies wanting to search records in banks © The McGraw-Hill Companies, Inc., 2000 41 CCI

Health Privacy

• No federal laws protect medical records in the United States – except drug and alcohol abuse and psychiatric care • A strategy is to decline to fill out medical history or questionnaires unless clear need for them • Can always ask for a copy of your medical records © The McGraw-Hill Companies, Inc., 2000 Ch 12 42 CCI

Employment Privacy

• Nongovernmental employer least regulated by privacy legislation • Employers may verify – education – employment – credit – driving record – workers’ compensation claims – criminal record, if any © The McGraw-Hill Companies, Inc., 2000 Ch 12 43 CCI

Commerce Issues

• Marketing gathers data about age, buying habits, favorite charities • No prohibition of gathering data for one reason and using it for another – except Video Privacy Protection Act of 1988 • prevents giving out records without a court order or individual’s consent © The McGraw-Hill Companies, Inc., 2000 Ch 12 44 CCI

Ch 12

Communications Privacy

• Some constraints in acquiring and disseminating information, listening, and encryption use • Some argue that you must be willing to give up some privacy for safety and security © The McGraw-Hill Companies, Inc., 2000 45 CCI