Transcript Chapter 12: Files and Databases
Files and Databases
Chapter 12 Organizing and Managing Digital Data
© The McGraw-Hill Companies, Inc., 2000
Ch 12
Overview
• Database and database administrator • Hierarchy • File handling • File management systems • DBMS • Ethics © The McGraw-Hill Companies, Inc., 2000 2 CCI
Databases
• Collection of related files • Range in size from those on your PC to terabytes of digital photographs of the world on a large series of servers – http://teraserver.microsoft.com
• Examples – online services, virtual art museums, libraries © The McGraw-Hill Companies, Inc., 2000 Ch 12 3 CCI
Database Administration
• Managing a database • Database administrator (DBA) – design, implementation, integration] – coordination with users – system security – backup and recovery – performance monitoring © The McGraw-Hill Companies, Inc., 2000 Ch 12 4 CCI
Hierarchy and Key Field
• Data storage hierarchy – levels of data • bits, bytes, fields, records, files, and databases – definitions • character, field, record, file, database – key field • unique data used to identify a record • used for sorting • often numerically generated © The McGraw-Hill Companies, Inc., 2000 Ch 12 5 CCI
Ch 12
Basic Concepts
• Types of files – program files • software instructions – data files • data order and organization should be logical and consistent © The McGraw-Hill Companies, Inc., 2000 6 CCI
Ch 12
Types of Data Files
• Master – relatively permanent records updated periodically – currently accurate • Transaction – temporary holding file used for additions, deletions, modifications © The McGraw-Hill Companies, Inc., 2000 7 CCI
Batch vs. Online
• Batch – collect data, then process all at once – advantage • very efficient processing, checking for data validity occurs at the originating batch site • Online – real-time processing – airlines reservation system booking prevents duplicate reservations © The McGraw-Hill Companies, Inc., 2000 Ch 12 8 CCI
Offline vs. Online
• Offline – not directly accessible to the CPU such as tapes or disks that need to be loaded • Online – storage is direct and fast – generally disk © The McGraw-Hill Companies, Inc., 2000 Ch 12 9 CCI
File Organization
• Sequential access storage – stores one record after another – alphabetic or numeric • Direct access storage – can access the data using direct methods such as addressing © The McGraw-Hill Companies, Inc., 2000 Ch 12 10 CCI
Organizing Methods
• Sequential file organization – records can be retrieved in the sequence that they were stored – useful when large group needs to be accessed most of the time – catalog mailing © The McGraw-Hill Companies, Inc., 2000 Ch 12 11 CCI
Organizing Methods (continued)
• Direct file organization – random file organization – records stored in no particular sequence – hashing algorithm used to generate a unique number to identify the record – faster for finding a specific record © The McGraw-Hill Companies, Inc., 2000 Ch 12 12 CCI
Ch 12
Organizing Methods
• Indexed-sequential file organization – or indexed file organization – files stored in sequential order – indexes records according to key field – requires magnetic or optical disk – slower overall than direct access – bank has up-to-date record information, but prints sequentially (monthly statements) © The McGraw-Hill Companies, Inc., 2000 13 CCI
File Management System
• Disadvantage of file management systems – data redundancy – lack of integrity – lack of program independence Ch 12 © The McGraw-Hill Companies, Inc., 2000 14 CCI
Database Management Systems
• DBMS • Controls the structure of the database and access to the data Ch 12 © The McGraw-Hill Companies, Inc., 2000 15 CCI
Ch 12
Advantages of DBMS
• Reduced data redundancy • Improved data integrity • More program independence • Increased user productivity • Increased security © The McGraw-Hill Companies, Inc., 2000 16 CCI
Disadvantages of DBMS
• Cost issues • Data vulnerability issues • Privacy issues Ch 12 © The McGraw-Hill Companies, Inc., 2000 17 CCI
Database Organization
• Hierarchical • Network • Relational • Object-oriented Ch 12 © The McGraw-Hill Companies, Inc., 2000 18 CCI
Hierarchical
• Grouped in related groups, or tree • Lower level record called a child • Parent record at the top of the tree is called a root record • In a hierarchical database, a parent may have more than one child, but a child has only one parent (a one-to-many relationship) • Simple and fast © The McGraw-Hill Companies, Inc., 2000 Ch 12 19 CCI
Network Database
• A type of hierarchical database, but children can have more than one parent • More flexible because can establish relationships between differently parents • Limits to the number of links • Retains some of the speed of access of a hierarchical database © The McGraw-Hill Companies, Inc., 2000 Ch 12 20 CCI
Relational Database
• Relates data through a key field • More flexible • Advantage – user does not have to be aware of structure – easily add, modify, delete records © The McGraw-Hill Companies, Inc., 2000 Ch 12 21 CCI
Relational Database
• Disadvantage – can be time consuming Ch 12 © The McGraw-Hill Companies, Inc., 2000 22 CCI
Ch 12
Object-Oriented Database
• OODBMS • Numeric, text, graphics, audio • Important part of technology merge • Uses – medical information systems – engineering information systems – geographic databases – training and education © The McGraw-Hill Companies, Inc., 2000 23 CCI
DBMS Features
• Data dictionary – also called encyclopedia and repository – stores data definitions • Utilities – assist in maintaining databases by filtering acceptable data for input, editing data, and monitoring © The McGraw-Hill Companies, Inc., 2000 Ch 12 24 CCI
Ch 12
Query Language
• Data manipulation language • Used to make database queries that do not require command language • Most popular is SQL (“see quill”), or Structured Query Language © The McGraw-Hill Companies, Inc., 2000 25 CCI
SQL
• Used in Oracle, Sybase, dBase, Paradox, and Microsoft Access • Some use a natural or spoken English method of information gathering Ch 12 © The McGraw-Hill Companies, Inc., 2000 26 CCI
Report Generator
• Produces on screen or printed reports • User can customize appearance © The McGraw-Hill Companies, Inc., 2000 Ch 12 27 CCI
Access Security
• Can be tailored for group access or individual access • Physical security is equally as critical as data security © The McGraw-Hill Companies, Inc., 2000 Ch 12 28 CCI
Ch 12
System Recovery
• Recovery types – full and partial – match backup techniques • Techniques – mirroring – reprocessing – rollforward – rollback © The McGraw-Hill Companies, Inc., 2000 29 CCI
Types of Recovery
• Mirroring – frequent simultaneous copying of data to two or more places • Reprocessing – goes back to a point of database activity where the database was correct and reprocesses data to bring it up to date © The McGraw-Hill Companies, Inc., 2000 Ch 12 30 CCI
Types of Recovery (continued)
• Rollforward: forward recovery – recreates current database using a previously stored database – uses after-image records with processing information • Rollback: backward recovery – undoes unwanted images, for example, if only half a transaction was processed © The McGraw-Hill Companies, Inc., 2000 Ch 12 31 CCI
Mining, Warehouses, “Siftware”
• Data mining – DM, or knowledge discovery – sifts through large database to uncover trends and predict future trends – helps in marketing, health, and science © The McGraw-Hill Companies, Inc., 2000 Ch 12 32 CCI
Data Warehousing
• Requires data preparation – identification of all data sources – fuse data and clean or scrub data to ensure accuracy – metadata shows the origins of data, the transformations, and summary data • Data warehouse – combination of cleaned data and metadata – often uses massively parallel processing (MPP) © The McGraw-Hill Companies, Inc., 2000 Ch 12 33 CCI
Siftware for Finding and Analyzing
• Query-and-reporting tools – Focus Reporter, Seagate Crystal Reports, Esperant – Specific questions to verify hypotheses • Multidimensional-analysis tools – MDA – Essbase, Lightship – data surfing to explore dimensions of subset © The McGraw-Hill Companies, Inc., 2000 Ch 12 34 CCI
Siftware for Finding and Analyzing...
• Intelligent agents – roam networks and perform complex tasks – DataEngine, Data, Logic – Help turn up unexpected relationships and patterns • Data mining – combines facts from all parts of a business – cash registers, shipping documents, credit-card files © The McGraw-Hill Companies, Inc., 2000 Ch 12 35 CCI
Ch 12
Ethics of Using Databases
• Misinformation explosion – data is found, but little effort is made to insure that the data is updated – reliance on anecdotal evidence – causes inaccuracies that can be harmful © The McGraw-Hill Companies, Inc., 2000 36 CCI
Ch 12
Information Accuracy
• More facts, faster facts, but not necessarily better facts • Database is not necessarily updated with current information • Computer sources not necessarily accurate © The McGraw-Hill Companies, Inc., 2000 37 CCI
Information Completeness
• Know the boundaries, as no information service has it all • Know the complete iterations of key words • History is limited – most databases go back only to 1980 – frequently assessment is unthinkingly extended to years beyond 1980 © The McGraw-Hill Companies, Inc., 2000 Ch 12 38 CCI
Ch 12
Privacy Issues
• Right not to reveal information about one’s self • Credit card, shopping habits, harassment • Fair Information Practices – U.S. Department of Health, education, and Welfare © The McGraw-Hill Companies, Inc., 2000 39 CCI
Privacy Enactment
• Privacy Act of 1974 – limits government and their contractors – right to see and correct inaccurate data about one’s self • Freedom of Information Act – personal access to data gathered on self • Computer Matching and Privacy Protection Act – prevents government from comparing some records to other records of individuals © The McGraw-Hill Companies, Inc., 2000 Ch 12 40 CCI
Ch 12
Finance Privacy
• Fair Credit Reporting Act of 1970 – access to and challenge credit records – if denied credit, must be given free of charge • Right to Financial Privacy Act of 1978 – restrictions on federal agencies wanting to search records in banks © The McGraw-Hill Companies, Inc., 2000 41 CCI
Health Privacy
• No federal laws protect medical records in the United States – except drug and alcohol abuse and psychiatric care • A strategy is to decline to fill out medical history or questionnaires unless clear need for them • Can always ask for a copy of your medical records © The McGraw-Hill Companies, Inc., 2000 Ch 12 42 CCI
Employment Privacy
• Nongovernmental employer least regulated by privacy legislation • Employers may verify – education – employment – credit – driving record – workers’ compensation claims – criminal record, if any © The McGraw-Hill Companies, Inc., 2000 Ch 12 43 CCI
Commerce Issues
• Marketing gathers data about age, buying habits, favorite charities • No prohibition of gathering data for one reason and using it for another – except Video Privacy Protection Act of 1988 • prevents giving out records without a court order or individual’s consent © The McGraw-Hill Companies, Inc., 2000 Ch 12 44 CCI
Ch 12
Communications Privacy
• Some constraints in acquiring and disseminating information, listening, and encryption use • Some argue that you must be willing to give up some privacy for safety and security © The McGraw-Hill Companies, Inc., 2000 45 CCI