Transcript Database Management Systems - Health Information Technology
Database Management Systems
Pierce College
CIS260
• Course Description: Concepts and theory of relational database management systems (RDBMS) including the analysis and design of relational database systems. This is a project-based class. Entity Relationship modeling and advanced Microsoft Access techniques, in preparation for the Microsoft Office Specialist exam, will be covered. Course discussion and hands-on case studies in the healthcare industry with comparison to other industries, as applicable, provides practical knowledge and experience.
• Course Content Outline/Topics: A. Database Management Systems B. Database Development C. Business rules and user requirements D. Entity Relationship Diagrams E. Normalization F. Database Design Patterns G. Validate and manage data H. Data Queries I. Database Forms, Views and Reports J. Database Security and Administration K. Software, Data and User Testing 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 2
Learning Objectives
A1. Discuss database management systems and database administration. B1. Analyze, design, and create relational databases to meet industry and customer needs using current relational database management system software.
C1. Identify the business rules and customer requirements to be included in the data model.
D1. Differentiate and create conceptual data models, logical data models and physical data models.
E1. Normalize relationships in tables.
F1. Use database design patterns in data modeling.
F2. Use modeling/diagramming software to model data.
G1. Validate, import, convert, and export data from one application to another H1. Create data queries that sort, filter, manipulate and calculate data I1. Develop effective queries, forms, reports and custom user interfaces for databases.
J1. Discuss ethics and security issues and regulations surrounding data and databases.
K1. Test the integrity of the database design.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 3
CAHIMS Requirements
• Case studies, critiques/peer reviews, diagrams/models, group or team discussions/debates, individual projects/assignments, online research, team projects, computer labs (CAHIMS 3.3, 5.2, 6.1, 6.2, 7.1, 7.3, 7.4, 9.4) 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems
Module 1: Database Management Systems
Lecture 1: Define and Discuss Databases and Database Management Systems (DBMS) 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 5
Database versus Database Management System(DBMS)
• Database “a comprehensive collection of related data computer.”, organized for convenient access, generally in a ~Dictionary.com, “Database” • Database System aka Database Management System (DBMS) “a software system designed to allow the definition, creation, querying, update, and administration of databases.” ~Wikipedia.com, “Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 6
Variations of Database Management Systems
• Relational Database and Relational Database Management System (RDBMS) • The most common type of DBMS.
• “A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model . The software used in a relational database is called a relational database management system (RDBMS). “ • “First defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. Codd's view of what qualifies as an RDBMS is summarized in Codd's 12 rules.” Wikipedia.com, “Relational Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 7
Variations of Database Management Systems
• • Object Relational Database Management System (ORDBMS) • “An object-relational database (ORD), or object-relational database management system (ORDBMS), is … similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. • An object-relational database can be said to provide a middle ground between relational databases and object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language;…” • Object Oriented Database Management System (OODBMS) “…at the other extreme are OODBMSes in which the database is essentially a persistent object store for software written in an object-oriented programming language , with a programming API for storing and retrieving objects, and little or no specific support for querying.” http://en.wikipedia.org/wiki/ORDBMS , 3/26/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 8
Common RDBMS
According to research company Gartner , the five leading commercial relational database vendors by revenue in 2011 were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%).
According to Gartner, in 2008, the percentage of database sites using any given technology were (a given site may deploy multiple technologies): • Oracle Database - 70% • Microsoft SQL Server - 68% • MySQL (Oracle Corporation) - 50% • • IBM DB2 - 39% IBM Informix - 18% • Adaptive Server Enterprise (Sybase Corporation) - 15% • • Sybase IQ Teradata - 14% - 11% • Amazon Relational Database Service engines.
is a database as a service offering MySQL and Oracle database 4/19/2013 5:00 PM ~ Source Wikipedia.com “RDBMS”, 3/22/2013 Pierce College - CIS260 Database Management Systems 9
Data Warehouse
“A database designed to support decision making in an organization. Data from the production databases are copied to the data warehouse so that queries can be performed without disturbing the performance or the stability of the production systems.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse , 4/18/13 4/19/2013 5:00 PM https://upload.wikimedia.org/wikipedia/commons/4/46/Data_warehouse_overview.JPG
, 4/18/2013 Pierce College - CIS260 Database Management Systems 10
Data Marts, Cubes and Mining
Data Marts
“Data warehouses can become enormous with hundreds of gigabytes of transactions. As a result, subsets, known as "data marts," are often created for just one department or product line.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse , 4/18/13
Data Cubes
“An OLAP cube is an array of data understood in terms of its 0 or more dimensions…A cube can be considered a generalization of a three-dimensional spreadsheet.” http://en.wikipedia.org/wiki/OLAP_cube , 4/18/13
Data Mining
“Data mining …is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
” http://en.wikipedia.org/wiki/Data_mining , 4/18/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 11
Business Intelligence/Business Analytics
“
Business intelligence
, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting.
Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering.” http://www.cio.com/article/40296/Business_Intelligence_Definition_and_Solutions , 4/18/13 “Thomas Davenport [an American academic and author specializing in analytics] argues that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition,
business analytics
is the subset of BI based on statistics, prediction, and optimization.” http://en.wikipedia.org/wiki/Business_intelligence#Business_intelligence_and_data_warehousing , 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 12
Big Data
“Every minute, 48 hours of video are uploaded onto YouTube. 204 million e-mail messages are sent and 600 new websites generated. 600,000 pieces of content are shared on Facebook, and more than 100,000 tweets are sent. And that does not even begin to scratch the surface of data generation, which spans to sensors, medical records, corporate databases, and more.” http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ , 4/18/13 “By 2015, the average hospital will have two-thirds of a petabyte (665 terabytes) of patient data, 80% of which will be unstructured data like CT scans and X-rays.” http://www.forbes.com/sites/netapp/2013/04/17/healthcare-big-data/ , 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 13
Module 2: Database Administration
Lecture 1: Database Administration 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 14
Database Administration
Wikipedia defines database administration as “ the function of managing and maintaining database management systems (DBMS) software” and lists the following database administrator (DBA) responsibilities: “DBA Responsibilities • Installation, configuration and upgrading of Database server software and related products.
• Evaluate Database features and Database related products.
• Establish and maintain sound backup and recovery policies and procedures.
• Take care of the Database design and implementation.
• Implement and maintain database security (create and maintain users and roles, assign privileges).
• Database tuning and performance monitoring.
• Application tuning and performance monitoring.
• Setup and maintain documentation and standards.
• Plan growth and changes (capacity planning).
• Work as part of a team and provide 24x7 support when required.
• Do general technical troubleshooting and give cons.
• Database recovery.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 15
Module 2: Database Administration
Lecture 2: Governance, Policies, and Procedures 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 16
Governance, Policies, and Procedures
• To perform DBA responsibilities, there needs to be an understanding of expectations and consequences. • Who sets those expectations, who enforces those expectations, and who implements those expectations is the purpose behind governance, policies and procedures. Who is responsible for those expectations? EVERY stakeholder.
• Even the DBA has specific roles for helping set these expectations as shown in the list of responsibilities: • • “Establish and maintain sound backup and recovery
policies
and
procedures
.
Setup and maintain
documentation and standards
.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 17
Example: Why?
Our college will comply with FERPA regulations and protect the privacy of student education records.
What?
Grades will not be posted.
How?
Student grades will be mailed to the student three days after final exams.
4/19/2013 5:00 PM
Governance Policies Procedures
Pierce College - CIS260 Database Management Systems 18
Governance
• Governance addresses all levels of the organization and answers the question “
why
are we setting these expectations?” •
“IT governance enables an organization to attain three vital objectives: regulatory and legal compliance, operational excellence, and risk optimization.” [1]
•
“An IT governance framework should not exist in isolation from either the overarching corporate governance model or the ERM [enterprise risk management] model.” [2]
[1], [2]“
IT Excellence Starts with Governance”, An Ernst & Young White Paper, By Nick
Robinson, Manager, Technology & Security Risk Services,
http://www.technologyexecutivesclub.com/Articles/itgovernance/excellence.php
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 19
Policies and Procedures
“A set of
policies
are principles, rules, and guidelines formulated or adopted by an organization to reach its long-term goals and typically published in a booklet or other form that is widely accessible.
Policies and procedures are designed to influence and determine all major decisions and actions, and all activities take place within the boundaries set by them.
Procedures
are the specific methods employed to express policies in action in day-to-day operations of the organization. Together, policies and procedures ensure that a point of view held by the governing body of an organization is translated into steps that result in an outcome compatible with that view. “ Businessdictionary.com, http://www.businessdictionary.com/definition/policies-and-procedures.html
, 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 20
Module 2: Database Administration
Lecture 3: Database Career Paths and Certification 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 21
Healthcare Database Career Paths and Certification
Career Paths
• • • • • • database administrator database developer database analyst data analyst business analyst application analyst
Certifications
• • • • • HIMSS CAHIMS Microsoft MTA Microsoft MOS Microsoft MCSA CompTIA HIT 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 22
Module 2: Database Administration
Lecture 4: CAHIMS 9.4 Staying Current with Technology and Industry 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 23
Module 3: Data Management
Lecture 1: What is Data Management?
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 24
Data Management
“Data management (DM) is the business function of planning for, controlling, and delivering data and information assets. This function includes • The disciplines of development, execution and supervision • of plans, policies, programs, projects, processes, practices and procedures • that control, protect, deliver and enhance • the value of data and information assets.“ “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 25
Data, Information, Knowledge
• The DAMA-DMBOK Guide defines
data
as the “representation of facts as text, numbers, graphics, images, sound or video” •
information
as “data in context”. “This context includes …definition…format…timeframe…relevance” •
knowledge
as “information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience.” “We gain in knowledge when we understand the significance of information.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 26
Adding Value
“The Business Value of Data What's the value of your organization's data?
The ability of business and IT managers to answer that question directly correlates to the success of their company's business continuity and data recovery efforts.” “The Business Value of Data”, by Michael Croy , Forsythe Solutions Group, Inc., , http://www.technologyexecutivesclub.com/Articles/management/artBusinessValueofData.php
, 03/25/13 “Data has value only when it is actually used, or can be useful in the future. All data lifecycle stages have associated costs and risks, but only the “use” stage adds business value.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 3 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 27
Big Data
“is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” “Examples include Big Science, web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.” http://en.wikipedia.org/wiki/Big_data , 3/26/13 “A recent McKinsey report found that value gained from data in the US health care sector alone could be more than US $300 billion every year. But traditional tools aren’t enough to manage these extremely large amounts of fast-changing information.” http://www.net-security.org/secworld.php?id=14594 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 28
Module 3: Data Management
Lecture 2: Data Governance 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 29
Data Governance
“Data Governance is "a system of decision rights and accountabilities for information related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.
“ ” Datagovernance.org, http://www.datagovernance.com/adg_data_governance_basics.html
, 03/18/13 “Data governance is an approach that public and private entities can use to organize one or more aspects of their data management efforts, including business intelligence (BI), data security and privacy, master data management (MDM), and data quality (DQ) management.” Microsoft Corporation, http://www.microsoft.com/en-us/download/details.aspx?id=10985 , 03/18/13 “A Governance Plan typically includes your objectives, administration and maintenance, policies, support, stakeholder teams and other elements.” “Governance”, http://www.webopedia.com/TERM/V/validation.html
, 03/15/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 30
Parts of Data Governance
4/19/2013 5:00 PM © DAMA International 2010 Pierce College - CIS260 Database Management Systems 31
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 32
Module 3: Data Management
Lecture 3: Data Validation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 33
Data Validation
Validation (n.) Verification that something is correct or conforms to a certain standard. In data collection or data entry, it is the process of ensuring that the data that are entered fall within the accepted boundaries of the application collecting the data. “Validation”, http://www.webopedia.com/TERM/V/validation.html
, 03/15/13 “…data validation is the process of ensuring that a program operates on clean, correct and useful data.” “Data validation”, http://en.wikipedia.org/wiki/Data_validation , 3/13/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 34
Module 3: Data Management
Lecture 4: The Data Dictionary 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 35
Data Dictionary
“Database about a database. A data dictionary defines the structure of the database itself (not that of the data held in the database) and is used in control and maintenance of large databases. Among other items of information, it records (1) (2) what data is stored, name, description, and characteristics of each data element, (3) types of relationships between data elements, (4) access rights and frequency of access. Also called system dictionary when used in the context of a system design.” “Data dictionary”, http://www.businessdictionary.com/definition/data-dictionary.html#ixzz2NuokM0Dn , 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 36
Reasons for a data dictionary
• ““High-Availability and Disaster Recovery (HA/DR) or what is often called “Business Continuity.”” • “to determine authoritative sources” • “to gather the disparate elements and cull them into a single reporting entity” • “for systems optimization” • “No one group knows where all of the data is, which of it is authoritative, and which you should track. “ “Developing a Data Dictionary”, Apr 8, 2011, http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=382 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 37
Guidelines for Developing the Data Dictionary
Recommended guidelines from the e-HIM Workgroup of the American Health Information Management Association (AHIMA), 2006; 1. Design a plan: Preplan the development, implementation, and maintenance of the data dictionary.
2. Develop an enterprise data dictionary; integrate common data elements across the entire institution to ensure consistency.
3. Ensure collaborative involvement: Make sure there is support from all key stakeholders.
4. Develop an approvals process: Ensure a documentation tail for all decision, updates, and maintenance.
5. Identify and retain details of data versions: Version control is important.
6. Design for flexibility and growth.
7. Design room for expansion of field values.
8. Follow established ISO/International Electro technical Commission (IEC) 11179 guidelines for metadata registry: to promote interoperability follow standards.
9. Adopt nationally recognized standards.
10. Beware of differing standards for the same concepts.
11. Use geographic codes and conform to the National Spatial Data Infrastructure and the Federal Geographic Data Committee.
12. Test the information system: Develop a test plan to ensure the system supports the data dictionary.
13. Provide ongoing education and training.
14. Assess the extend to which the data elements maintain consistency and avoid duplication.
Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 38
Key Points in the Guidelines: Involve Stakeholders
• Involve all stakeholders in the discussion of creating the data dictionary.
• Stakeholders may include data creators, owners, users which will affect: -Departments (represented across either facility or enterprise) -Outside collaborating agencies/facilities -Public health agencies -Clinical providers including all specialties -HIM administrative support services -Reimbursement support service -Legal support services -IT support services Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 Pierce College - CIS260 Database Management Systems 39 4/19/2013 5:00 PM
Key Points in Guidelines: Train Employees
• Ongoing education of long term employees to assure compliance is an important tactic for maintaining quality • New employees should also receive appropriate training to involve them in assuring the quality of data being collected Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 Pierce College - CIS260 Database Management Systems 40 4/19/2013 5:00 PM
Data Dictionary Examples and Best Practices
California Department of Pesticide Regulation
“Data Dictionary”
http://www.cdpr.ca.gov/docs/enforce/residue/datadict.htm
Northwest Environmental Data Network (NED)
“Best practices for data dictionary definitions and usage”
http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 41
Module 4: Stakeholders & Requirements
Lecture 1: Business, Functional, and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 42
Define Stakeholders
• Sponsor(s) – person(s) with decision and funding authority • Subject Matter Expert(s) – person(s) who know the content • End User(s)- person(s), or representatives of user groups, who will utilize or benefit from the new product or process • Support Staff – all other persons needed to implement and/or maintain the product or process, i.e. training, IT, HR 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 43
Documenting Requirements
B USINESS , F UNCTIONAL , AND T ECHNICAL R EQUIREMENTS
• Variety of titles exist for these requirements documents, but the objectives are the same. • Business Requirements • Answers the question “how does this project fill an organizational/department need?” • High level. Used for management approval and resource allocation.
• Functional Requirements • Answers the question “what do the users need?” • End User level. Used for capturing and verifying the users’ requirements.
• Technical Requirements • Answers the question “what does the technology (hardware/software) need?” • Highly detailed. Used for programmers, developers, networking, DBA’s, testers, and all other needed IT staff. • For small deliverables, all three may be described in one document, for complex projects, may need many versions and levels.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 44
Requirements Tips and Templates
•
“Top 10 Writing good requirements tips”,
http://www.requirementone.com/Blog/2012/01/15/Top-10-Writing-good-requirements-tips , 3/18/13 •
“Business Requirements Document: A High-level Review”,
http://www.isixsigma.com/implementation/project-selection-tracking/business-requirements-document-high-level-review/ , 3/25/13 •
Centers for Medicare & Medicaid Services, Requirements Document (template)
http://www.hsd.state.nm.us/pdf/hcr/HIX/CMS%20Requirements%20Document.pdf
, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 45
Module 4: Stakeholders & Requirements
Lecture 2: CAHIMS 3.3 Business, User and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 46
Module 4: Stakeholders & Requirements
Lecture 3: CAHIMS 9.1 Business Communication and Ethics 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 47
Module 4: Stakeholders & Requirements
Lecture 4: CAHIMS 9.3 Professionalism and Customer Service 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 48
Module 5: Data Modeling
Lecture 1: Data Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 49
Why Model?
House 4/19/2013 5:00 PM Earth Pierce College - CIS260 Database Management Systems Molecule 50
Data Modeling
“Communication and precision are the two key benefits that make a data model so important.”
Data Modeling Made Simple: A Practical Guide for Business and IT Professionals, Technics Publications, LLC , 2nd Edition, Steve Hoberman, pg. 37 51 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems
Transfer User Requirements into Data Models
Three types of data models: Conceptual Model–simplest model; used to communicate and validate requirements with stakeholders
• A Conceptual data model is the most abstract form of data model. It is helpful for communicating ideas to a wide range of stakeholders because of its simplicity. Therefore platform-specific information, such as data types, is omitted from a Conceptual data model.” http:// www.sparxsystems.com/enterprise_architect_user_guide/9.3/domain_based_models/conceptual_data_model.html
, 3/25/13
Logical Model-more complex model; follows formal database design rules
• “Logical data modeling is the process of documenting the comprehensive business information requirements in an accurate and consistent format. “
“Data modeling”,
http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm
, 3/25/13
Physical Model-the most complex model; performance specific and vendor specific model
• “The physical design of your database optimizes performance while ensuring data integrity by avoiding unnecessary data redundancies. During physical design, you transform the entities into tables, the instances into rows, and the attributes into columns.” “Physical database design”,
http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm
, 3/25/13
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 52
“In Summary The conceptual model is concerned with the real world view and understanding of data; the logical model is a generalized formal structure in the rules of information science; the physical model specifies how this will be executed in a particular DBMS instance.” http://www.aisintl.com/case/CDM-PDM.html
, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 53
Data Modeling Standards
• Several notation standards exist for data modeling • Peter Chen’s • • • Bachman notation Barker's Notation EXPRESS • • • IDEF1X Martin notation (min, max)-notation of Jean-Raymond Abrial in 1974 • • • UML class diagrams Merise Object-Role Modeling) http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model , 3/13/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 54
Module 5: Data Modeling
Lecture 2: Entity Relationship Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 55
The Entity Relationship(ER) Model
• The ER model shows information to be collected in the database (entity) and its relationship with other information collected.
• Peter Chen, 1976, originator of the entity relationship model 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 56
Identify the “Entities”
Identify the “entities” – the nouns – the title of the information being collected (patient, appointment, prescription, physician, etc.).
1. Draw a box for each entity and label with the entity name.
2. Label using the singular spelling of the noun and capitalize the noun.* 4/19/2013 5:00 PM
Figure 1: The design of the "box" will depend on the software used to create it.
Pierce College - CIS260 Database Management Systems 57
Identify the “Relationships”
Identify the “relationships” – the verbs; how one piece of data or information interacts/relates with another piece of information Draw a line between entities to show relationship.
1.Label the line with verbs that describe the relationship. 2.The first verb is for reading left to right; the second verb is for reading right to left.
4/19/2013 5:00 PM
Figure 2: The Patient has an Appointment. The Appointment is for a Patient.
Pierce College - CIS260 Database Management Systems 58
Identify the “Cardinality”
Identify the “cardinality” – the
number
of entities allowed in the relationship.
1.A single line touching an entity means “ONE”.
2.A line ending with three small lines, referred to as “crow’s feet”, means “Zero or more”. Once created, this can be set to a different minimum such as “One or more” or “Three or more”.
4/19/2013 5:00 PM
Figure 3: Single point at line end means "ONE"; crow’s feet, means "many" or "one or more".
Pierce College - CIS260 Database Management Systems 59
Identify the “Optional/Optionality”
Identify “optional/optionality” – whether the relationship is required or not.
1.
2.
Microsoft VISIO uses the “O” to show “optional” as seen by the entity Appointment.
The “||” means one required.
4/19/2013 5:00 PM
Figure 4: The Patient may have "one or more“ Appointments; the Appointment must have one Patient.
Pierce College - CIS260 Database Management Systems 60
Add the “Attributes”
Add the “attributes” – descriptors of the entity.
1. Label the attributes as singular tense.
2. Don’t put spaces or symbols between words if more than one is needed for clarity.
3. Type or write as “camel case” – first letter of each word is upper case, all other letters are lower case.
Figure 5: We identify the Patient by his/her name; we identify the Appointment by the date, time and Physician .
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 61
Add “Primary Key (PK)” and “Foreign Key (FK)”
Add “primary key (PK)” – use an attribute of the entity, or create a new attribute, that
uniquely
identifies the entity.
1. A new attribute, primary key ID, is usually created for most entities because none of the attributes identified are guaranteed to always be “unique”. 2. Add “foreign key(FK)” –this is the Primary key of the parent table in a relationship. The PK of the parent (in this case “Patient”) is added to the child table (in this case “Appointment”) thereby becoming the FK.
Figure 6:
Pierce College - CIS260 Database Management Systems 4/19/2013 5:00 PM 62
Unique Identifiers (UID)
• • • • A unique identifier/unique ID/UID is a number or combination of numbers and letters that when used will only identify one entity or record. Examples of ID’s we think uniquely identify us: • • • Driver’s license Social Security Number Telephone number Why they might not be unique: • http://www.idanalytics.com/news-and-events/news-releases/2010/8-11-2010.php
CustomerID, or PatientID, or AccountNumber are examples of new identifiers created for the purpose of keeping the information unique.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 63
Read the Data Models
Practice reading the diagrams.
1. Keep nouns singular when starting each sentence.
2. Read from left to right, then from right to left. The sentences must make sense in both directions.
** *** Noun A
Bike A Wheel
may/must
must May
have relationship(s)
be sold with Be sold with
with some number of Noun B.
one or more A Wheel(s) Bike Wheel Bike is sold with / is part of 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 64
Module 5: Data Modeling
Lecture 3: Introduction to Microsoft VISIO 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 65
Diagramming Software
There are many ER diagramming tools.
• Free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis • Proprietary ER diagramming tools • Free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#ER_diagramming_tools • Microsoft VISIO is a proprietary diagramming tool that is free to students in college programs covered under the Microsoft Academic license.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 66
Diagramming Software
•
Get to know VISIO
http://office.microsoft.com/en-us/visio-help/get-to-know-visio-RZ001126777.aspx
•
Visio 2007 training courses
http://office.microsoft.com/en-us/visio-help/visio-2007-training-courses-HA010214368.aspx?CTT=1 •
Create a Database Model (also known as Entity Relationship diagram)
http://office.microsoft.com/en-us/visio-help/create-a-database-model-also-known-as-entity-relationship-diagram-HA010115477.aspx
• Make the switch to VISIO 2010 http://office.microsoft.com/en-us/access-help/download-office-2010-training-HA101901726.aspx?CTT=1 • Make the switch to VISIO 2013 http://office.microsoft.com/en-us/support/make-the-switch-to-visio-2013-RZ102925050.aspx?CTT=5&origin=HA104032123 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 67
Module 5: Data Modeling
Lecture 4: Normalization 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 68
Normalization
“Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.” http://databases.about.com/od/specificproducts/a/normalization.htm
”Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations.” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 69
Normalization Rules
“There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications. “ http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 70
First Normal Form (1NF)
“First Normal Form • Eliminate repeating groups in individual tables. • Create a separate table for each set of related data. • Identify each set of related data with a primary key. Do not use multiple fields in a single table to store similar data.” http://support.microsoft.com/kb/283878 71 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems
1NF Example
Not 1NF 1NF
Student
Name
Sally Doe Jefferson Doe CIS260, ENG101 Multiple entries for one field is a violation of 1NF.
Split into two tables using a student unique identifier to tie the information together.
4/19/2013 5:00 PM
CourseID
CIS260, CIS270, CIS280 Student
StudentID
SD12345 JD23456 Course
CourseID
CIS260 CIS270 CIS280 CIS260 ENG101 Pierce College - CIS260 Database Management Systems
Student
Sally Doe Jefferson Doe
CourseTitle
Database Programming Analysis Database English
StudentID
SD12345 SD12345 SD12345 JD23456 JD23456 72
Second Normal Form (2NF)
• • “Second Normal Form Create separate tables for sets of values that apply to multiple records. Relate these tables with a foreign key. Records should not depend on anything other than a table's primary key (a compound key, if necessary).” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 73
2NF Example
Not 2NF
Student
StudentID
SD12345 JD23456 Course
CourseID
CIS260 CIS270 CIS280 CIS260 ENG101 4/19/2013 5:00 PM
Student
Sally Doe Jefferson Doe Each Student is unique; expanded Course table violates 2NF.
CourseTitle StudentID Database
SD12345 Programming SD12345 Analysis
Database
English SD12345 JD23456 JD23456
2NF
Enrollment
StudentID
SD12345 SD12345 SD12345 JD23456 JD23456 Course
CourseID
CIS260 CIS270 CIS280 ENG101 Pierce College - CIS260 Database Management Systems
CourseID
CIS260 CIS270 CIS280 CIS260 ENG101
CourseTitle
Database Programming Analysis English Now each Enrollment is unique, and each Course is unique.
74
Third Normal Form (3NF)
• “Third Normal Form Eliminate fields that do not depend on the key. Values in a record that are not part of that record's key do not belong in the table.” http://support.microsoft.com/kb/283878 • “A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd".
” http://en.wikipedia.org/wiki/Third_normal_form 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 75
3NF Example
Not 3NF
Enrollment
StudentID CourseID
SD12345 CIS260 SD12345 SD12345 JD23456 JD23456 CIS270 CIS280 CIS260 ENG101 Expanded Enrollment table not 3NF because Instructor is dependent on Instructor ID, not CourseID, and the record not dependent on a unique record key.
InstructorID Instructor
FAC123 FAC234 FAC345 FAC456 FAC567 Schmidt Jones Nguyen Doe Chen
3NF
Enrollment
EnrollmentID StudentID
2013Fall001 SD12345 2013Fall002 SD12345 2013Fall003 SD12345 2013Fall004 JD23456 2013Fall005 JD23456 Instructor
InstructorID
FAC123 FAC234 FAC345 FAC456 FAC567
Instructor
Schmidt Jones Nguyen Doe Chen
CourseID
CIS260 CIS270 CIS280 CIS260 ENG101
InstructorID
FAC123 FAC234 FAC345 FAC456 FAC567 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 76
Finished Normalized Entities
Course
CourseID
CIS260 CIS270 CIS280 ENG101
CourseTitle
Database Programming Analysis English Instructor
InstructorID
FAC123 FAC234 FAC345 FAC456 FAC567 Each entity now holds unique data.
Instructor
Schmidt Jones Nguyen Doe Chen Student
StudentID
SD12345 JD23456 Enrollment
EnrollmentID StudentID
2013Fall001 SD12345 2013Fall002 SD12345 2013Fall003 SD12345 2013Fall004 JD23456 2013Fall005 JD23456
Student
Sally Doe Jefferson Doe
CourseID
CIS260 CIS270 CIS280 CIS260 ENG101
InstructorID
FAC123 FAC234 FAC345 FAC456 FAC567 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 77
Matching ER Diagram
PK
Student
StudentID
Student has / is of
PK
Instructor
InstructorID
Instructor has / is of
PK PK,FK1 PK,FK3
Enrollment
EnrollmentID StudentID CourseID
FK2 InstructorID has / is of
PK
Course
CourseID
CourseTitle 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 78
Normalization Exception
“From a relational model point of view, it is standard to have tables that are in Third Normal Form. Normalized physical design provides the greatest ease of maintenance, and databases in this form are clearly understood by developers.
However, a fully normalized design may not always yield the best performance. Sybase recommends that you design databases for Third Normal Form, however, if performance issues arise, you may have to denormalize to solve them.”
Copyright © 2003. Sybase Inc. All rights reserved. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/databases215.htm
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 79
Module 5:
Lecture 5: Other Modeling Concepts 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 80
Many-to-Many Relationships
“In a many-to-many relationship, a record in one table relates to multiple records in a second table, and a record in the second table relates to multiple records in the first table.
This type of relationship requires a third table, called a junction table. The junction table contains the primary keys from the other two tables as its foreign keys.” http://office.microsoft.com/en-us/access-help/table-that-data-RZ006149432.aspx?section=26 , 4/16/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 81
Junction/Intersection Tables
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 82
Lookup Tables
• • Lookup tables are generally tables that contain rarely changing data and are considered the master or reference list.* An example could be a zip code table with associated city, state, county, country, latitude/longitude information. Many different tables and/or databases could use this same table for lookup and for presenting various combinations of the fields on different reports. Lookup tables generally are single topic, used when the data rarely changes, when multiple applications could share the same information, to minimize the likelihood of typos, and to minimize storage for other tables needing to reference this table. The relationship between a lookup table and another entity table is strictly a lookup relationship.
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 83
Lookup Table Examples
2 3
OrderStatusID
1 4
OrderStatus
Ordered Back Ordered Shipped Cancelled 4/19/2013 5:00 PM
StateNamesID
1 2 3 Pierce College - CIS260 Database Management Systems
StateNameShort
AL AK AZ
StateNameLong
Alabama Alaska Arizona 84
Recursion
The Supervisor is also an Employee, so a lot of duplication of information.
4/19/2013 5:00 PM By adding the EmployeeID of the Supervisor to the field SupervisorID, the relationship is established with only one extra field needed.
Pierce College - CIS260 Database Management Systems 85
Supertypes and Subtypes
PATIENT
*First Name *Last Name *Gender *Birthdate
MALE FEMALE
*Number of Pregnancies
INFANT
*Head Circumference 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 86
Module 5: Data Modeling
Lecture 6: Data Patterns 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 87
Patterns/Templates
4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 88
Data Patterns
"Thus Universal Patterns for Data Modeling are reusable guides that provide a data modeling template for very prevalent or "universal" themes that occur in data modeling.“ The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling, Len Silverston, Paul Agnew, Wiley; 1 edition (January 9, 2009) , page 5
Watch Youtube video of Len Silverston,
author of The Data Model Resource Book , http://www.youtube.com/watch?v=D5Rgpl6humE 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 89
Example Data Pattern – Electronic Medical Records
4/19/2013 5:00 PM Source: http://www.databaseanswers.org/data_models/electronic_medical_records/index.htm, 4/16/2013 Pierce College - CIS260 Database Management Systems 90