Database Management Systems - Health Information Technology

Download Report

Transcript Database Management Systems - Health Information Technology

Database Management Systems

Pierce College

CIS260

• Course Description: Concepts and theory of relational database management systems (RDBMS) including the analysis and design of relational database systems. This is a project-based class. Entity Relationship modeling and advanced Microsoft Access techniques, in preparation for the Microsoft Office Specialist exam, will be covered. Course discussion and hands-on case studies in the healthcare industry with comparison to other industries, as applicable, provides practical knowledge and experience.

• Course Content Outline/Topics: A. Database Management Systems B. Database Development C. Business rules and user requirements D. Entity Relationship Diagrams E. Normalization F. Database Design Patterns G. Validate and manage data H. Data Queries I. Database Forms, Views and Reports J. Database Security and Administration K. Software, Data and User Testing 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 2

Learning Objectives

A1. Discuss database management systems and database administration. B1. Analyze, design, and create relational databases to meet industry and customer needs using current relational database management system software.

C1. Identify the business rules and customer requirements to be included in the data model.

D1. Differentiate and create conceptual data models, logical data models and physical data models.

E1. Normalize relationships in tables.

F1. Use database design patterns in data modeling.

F2. Use modeling/diagramming software to model data.

G1. Validate, import, convert, and export data from one application to another H1. Create data queries that sort, filter, manipulate and calculate data I1. Develop effective queries, forms, reports and custom user interfaces for databases.

J1. Discuss ethics and security issues and regulations surrounding data and databases.

K1. Test the integrity of the database design.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 3

CAHIMS Requirements

• Case studies, critiques/peer reviews, diagrams/models, group or team discussions/debates, individual projects/assignments, online research, team projects, computer labs (CAHIMS 3.3, 5.2, 6.1, 6.2, 7.1, 7.3, 7.4, 9.4) 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems

Module 1: Database Management Systems

Lecture 1: Define and Discuss Databases and Database Management Systems (DBMS) 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 5

Database versus Database Management System(DBMS)

• Database “a comprehensive collection of related data computer.”, organized for convenient access, generally in a ~Dictionary.com, “Database” • Database System aka Database Management System (DBMS) “a software system designed to allow the definition, creation, querying, update, and administration of databases.” ~Wikipedia.com, “Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 6

Variations of Database Management Systems

• Relational Database and Relational Database Management System (RDBMS) • The most common type of DBMS.

• “A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model . The software used in a relational database is called a relational database management system (RDBMS). “ • “First defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. Codd's view of what qualifies as an RDBMS is summarized in Codd's 12 rules.” Wikipedia.com, “Relational Database” 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 7

Variations of Database Management Systems

• • Object Relational Database Management System (ORDBMS) • “An object-relational database (ORD), or object-relational database management system (ORDBMS), is … similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. • An object-relational database can be said to provide a middle ground between relational databases and object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of relational databases: the data resides in the database and is manipulated collectively with queries in a query language;…” • Object Oriented Database Management System (OODBMS) “…at the other extreme are OODBMSes in which the database is essentially a persistent object store for software written in an object-oriented programming language , with a programming API for storing and retrieving objects, and little or no specific support for querying.” http://en.wikipedia.org/wiki/ORDBMS , 3/26/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 8

Common RDBMS

According to research company Gartner , the five leading commercial relational database vendors by revenue in 2011 were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%).

According to Gartner, in 2008, the percentage of database sites using any given technology were (a given site may deploy multiple technologies): • Oracle Database - 70% • Microsoft SQL Server - 68% • MySQL (Oracle Corporation) - 50% • • IBM DB2 - 39% IBM Informix - 18% • Adaptive Server Enterprise (Sybase Corporation) - 15% • • Sybase IQ Teradata - 14% - 11% • Amazon Relational Database Service engines.

is a database as a service offering MySQL and Oracle database 4/19/2013 5:00 PM ~ Source Wikipedia.com “RDBMS”, 3/22/2013 Pierce College - CIS260 Database Management Systems 9

Data Warehouse

“A database designed to support decision making in an organization. Data from the production databases are copied to the data warehouse so that queries can be performed without disturbing the performance or the stability of the production systems.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse , 4/18/13 4/19/2013 5:00 PM https://upload.wikimedia.org/wikipedia/commons/4/46/Data_warehouse_overview.JPG

, 4/18/2013 Pierce College - CIS260 Database Management Systems 10

Data Marts, Cubes and Mining

Data Marts

“Data warehouses can become enormous with hundreds of gigabytes of transactions. As a result, subsets, known as "data marts," are often created for just one department or product line.” http://www.pcmag.com/encyclopedia/term/40866/data-warehouse , 4/18/13

Data Cubes

“An OLAP cube is an array of data understood in terms of its 0 or more dimensions…A cube can be considered a generalization of a three-dimensional spreadsheet.” http://en.wikipedia.org/wiki/OLAP_cube , 4/18/13

Data Mining

Data mining …is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

” http://en.wikipedia.org/wiki/Data_mining , 4/18/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 11

Business Intelligence/Business Analytics

Business intelligence

, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting.

Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering.” http://www.cio.com/article/40296/Business_Intelligence_Definition_and_Solutions , 4/18/13 “Thomas Davenport [an American academic and author specializing in analytics] argues that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition,

business analytics

is the subset of BI based on statistics, prediction, and optimization.” http://en.wikipedia.org/wiki/Business_intelligence#Business_intelligence_and_data_warehousing , 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 12

Big Data

“Every minute, 48 hours of video are uploaded onto YouTube. 204 million e-mail messages are sent and 600 new websites generated. 600,000 pieces of content are shared on Facebook, and more than 100,000 tweets are sent. And that does not even begin to scratch the surface of data generation, which spans to sensors, medical records, corporate databases, and more.” http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ , 4/18/13 “By 2015, the average hospital will have two-thirds of a petabyte (665 terabytes) of patient data, 80% of which will be unstructured data like CT scans and X-rays.” http://www.forbes.com/sites/netapp/2013/04/17/healthcare-big-data/ , 4/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 13

Module 2: Database Administration

Lecture 1: Database Administration 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 14

Database Administration

Wikipedia defines database administration as “ the function of managing and maintaining database management systems (DBMS) software” and lists the following database administrator (DBA) responsibilities: “DBA Responsibilities • Installation, configuration and upgrading of Database server software and related products.

• Evaluate Database features and Database related products.

• Establish and maintain sound backup and recovery policies and procedures.

• Take care of the Database design and implementation.

Implement and maintain database security (create and maintain users and roles, assign privileges).

• Database tuning and performance monitoring.

• Application tuning and performance monitoring.

• Setup and maintain documentation and standards.

• Plan growth and changes (capacity planning).

• Work as part of a team and provide 24x7 support when required.

• Do general technical troubleshooting and give cons.

• Database recovery.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 15

Module 2: Database Administration

Lecture 2: Governance, Policies, and Procedures 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 16

Governance, Policies, and Procedures

• To perform DBA responsibilities, there needs to be an understanding of expectations and consequences. • Who sets those expectations, who enforces those expectations, and who implements those expectations is the purpose behind governance, policies and procedures. Who is responsible for those expectations? EVERY stakeholder.

• Even the DBA has specific roles for helping set these expectations as shown in the list of responsibilities: • • “Establish and maintain sound backup and recovery

policies

and

procedures

.

Setup and maintain

documentation and standards

.” Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 17

Example: Why?

Our college will comply with FERPA regulations and protect the privacy of student education records.

What?

Grades will not be posted.

How?

Student grades will be mailed to the student three days after final exams.

4/19/2013 5:00 PM

Governance Policies Procedures

Pierce College - CIS260 Database Management Systems 18

Governance

• Governance addresses all levels of the organization and answers the question “

why

are we setting these expectations?” •

“IT governance enables an organization to attain three vital objectives: regulatory and legal compliance, operational excellence, and risk optimization.” [1]

“An IT governance framework should not exist in isolation from either the overarching corporate governance model or the ERM [enterprise risk management] model.” [2]

[1], [2]“

IT Excellence Starts with Governance”, An Ernst & Young White Paper, By Nick

Robinson, Manager, Technology & Security Risk Services,

http://www.technologyexecutivesclub.com/Articles/itgovernance/excellence.php

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 19

Policies and Procedures

“A set of

policies

are principles, rules, and guidelines formulated or adopted by an organization to reach its long-term goals and typically published in a booklet or other form that is widely accessible.

Policies and procedures are designed to influence and determine all major decisions and actions, and all activities take place within the boundaries set by them.

Procedures

are the specific methods employed to express policies in action in day-to-day operations of the organization. Together, policies and procedures ensure that a point of view held by the governing body of an organization is translated into steps that result in an outcome compatible with that view. “ Businessdictionary.com, http://www.businessdictionary.com/definition/policies-and-procedures.html

, 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 20

Module 2: Database Administration

Lecture 3: Database Career Paths and Certification 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 21

Healthcare Database Career Paths and Certification

Career Paths

• • • • • • database administrator database developer database analyst data analyst business analyst application analyst

Certifications

• • • • • HIMSS CAHIMS Microsoft MTA Microsoft MOS Microsoft MCSA CompTIA HIT 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 22

Module 2: Database Administration

Lecture 4: CAHIMS 9.4 Staying Current with Technology and Industry 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 23

Module 3: Data Management

Lecture 1: What is Data Management?

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 24

Data Management

“Data management (DM) is the business function of planning for, controlling, and delivering data and information assets. This function includes • The disciplines of development, execution and supervision • of plans, policies, programs, projects, processes, practices and procedures • that control, protect, deliver and enhance • the value of data and information assets.“ “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 25

Data, Information, Knowledge

• The DAMA-DMBOK Guide defines

data

as the “representation of facts as text, numbers, graphics, images, sound or video” •

information

as “data in context”. “This context includes …definition…format…timeframe…relevance” •

knowledge

as “information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience.” “We gain in knowledge when we understand the significance of information.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 26

Adding Value

“The Business Value of Data What's the value of your organization's data?

The ability of business and IT managers to answer that question directly correlates to the success of their company's business continuity and data recovery efforts.” “The Business Value of Data”, by Michael Croy , Forsythe Solutions Group, Inc., , http://www.technologyexecutivesclub.com/Articles/management/artBusinessValueofData.php

, 03/25/13 “Data has value only when it is actually used, or can be useful in the future. All data lifecycle stages have associated costs and risks, but only the “use” stage adds business value.” “DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 3 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 27

Big Data

“is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” “Examples include Big Science, web logs, RFID, sensor networks, social networks, social data (due to the social data revolution), Internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.” http://en.wikipedia.org/wiki/Big_data , 3/26/13 “A recent McKinsey report found that value gained from data in the US health care sector alone could be more than US $300 billion every year. But traditional tools aren’t enough to manage these extremely large amounts of fast-changing information.” http://www.net-security.org/secworld.php?id=14594 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 28

Module 3: Data Management

Lecture 2: Data Governance 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 29

Data Governance

“Data Governance is "a system of decision rights and accountabilities for information related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.

“ ” Datagovernance.org, http://www.datagovernance.com/adg_data_governance_basics.html

, 03/18/13 “Data governance is an approach that public and private entities can use to organize one or more aspects of their data management efforts, including business intelligence (BI), data security and privacy, master data management (MDM), and data quality (DQ) management.” Microsoft Corporation, http://www.microsoft.com/en-us/download/details.aspx?id=10985 , 03/18/13 “A Governance Plan typically includes your objectives, administration and maintenance, policies, support, stakeholder teams and other elements.” “Governance”, http://www.webopedia.com/TERM/V/validation.html

, 03/15/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 30

Parts of Data Governance

4/19/2013 5:00 PM © DAMA International 2010 Pierce College - CIS260 Database Management Systems 31

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 32

Module 3: Data Management

Lecture 3: Data Validation 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 33

Data Validation

Validation (n.) Verification that something is correct or conforms to a certain standard. In data collection or data entry, it is the process of ensuring that the data that are entered fall within the accepted boundaries of the application collecting the data. “Validation”, http://www.webopedia.com/TERM/V/validation.html

, 03/15/13 “…data validation is the process of ensuring that a program operates on clean, correct and useful data.” “Data validation”, http://en.wikipedia.org/wiki/Data_validation , 3/13/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 34

Module 3: Data Management

Lecture 4: The Data Dictionary 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 35

Data Dictionary

“Database about a database. A data dictionary defines the structure of the database itself (not that of the data held in the database) and is used in control and maintenance of large databases. Among other items of information, it records (1) (2) what data is stored, name, description, and characteristics of each data element, (3) types of relationships between data elements, (4) access rights and frequency of access. Also called system dictionary when used in the context of a system design.” “Data dictionary”, http://www.businessdictionary.com/definition/data-dictionary.html#ixzz2NuokM0Dn , 3/18/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 36

Reasons for a data dictionary

• ““High-Availability and Disaster Recovery (HA/DR) or what is often called “Business Continuity.”” • “to determine authoritative sources” • “to gather the disparate elements and cull them into a single reporting entity” • “for systems optimization” • “No one group knows where all of the data is, which of it is authoritative, and which you should track. “ “Developing a Data Dictionary”, Apr 8, 2011, http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=382 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 37

Guidelines for Developing the Data Dictionary

Recommended guidelines from the e-HIM Workgroup of the American Health Information Management Association (AHIMA), 2006; 1. Design a plan: Preplan the development, implementation, and maintenance of the data dictionary.

2. Develop an enterprise data dictionary; integrate common data elements across the entire institution to ensure consistency.

3. Ensure collaborative involvement: Make sure there is support from all key stakeholders.

4. Develop an approvals process: Ensure a documentation tail for all decision, updates, and maintenance.

5. Identify and retain details of data versions: Version control is important.

6. Design for flexibility and growth.

7. Design room for expansion of field values.

8. Follow established ISO/International Electro technical Commission (IEC) 11179 guidelines for metadata registry: to promote interoperability follow standards.

9. Adopt nationally recognized standards.

10. Beware of differing standards for the same concepts.

11. Use geographic codes and conform to the National Spatial Data Infrastructure and the Federal Geographic Data Committee.

12. Test the information system: Develop a test plan to ensure the system supports the data dictionary.

13. Provide ongoing education and training.

14. Assess the extend to which the data elements maintain consistency and avoid duplication.

Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 38

Key Points in the Guidelines: Involve Stakeholders

• Involve all stakeholders in the discussion of creating the data dictionary.

• Stakeholders may include data creators, owners, users which will affect: -Departments (represented across either facility or enterprise) -Outside collaborating agencies/facilities -Public health agencies -Clinical providers including all specialties -HIM administrative support services -Reimbursement support service -Legal support services -IT support services Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 Pierce College - CIS260 Database Management Systems 39 4/19/2013 5:00 PM

Key Points in Guidelines: Train Employees

• Ongoing education of long term employees to assure compliance is an important tactic for maintaining quality • New employees should also receive appropriate training to involve them in assuring the quality of data being collected Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2 Pierce College - CIS260 Database Management Systems 40 4/19/2013 5:00 PM

Data Dictionary Examples and Best Practices

California Department of Pesticide Regulation

“Data Dictionary”

http://www.cdpr.ca.gov/docs/enforce/residue/datadict.htm

Northwest Environmental Data Network (NED)

“Best practices for data dictionary definitions and usage”

http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 41

Module 4: Stakeholders & Requirements

Lecture 1: Business, Functional, and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 42

Define Stakeholders

• Sponsor(s) – person(s) with decision and funding authority • Subject Matter Expert(s) – person(s) who know the content • End User(s)- person(s), or representatives of user groups, who will utilize or benefit from the new product or process • Support Staff – all other persons needed to implement and/or maintain the product or process, i.e. training, IT, HR 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 43

Documenting Requirements

B USINESS , F UNCTIONAL , AND T ECHNICAL R EQUIREMENTS

• Variety of titles exist for these requirements documents, but the objectives are the same. • Business Requirements • Answers the question “how does this project fill an organizational/department need?” • High level. Used for management approval and resource allocation.

• Functional Requirements • Answers the question “what do the users need?” • End User level. Used for capturing and verifying the users’ requirements.

• Technical Requirements • Answers the question “what does the technology (hardware/software) need?” • Highly detailed. Used for programmers, developers, networking, DBA’s, testers, and all other needed IT staff. • For small deliverables, all three may be described in one document, for complex projects, may need many versions and levels.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 44

Requirements Tips and Templates

“Top 10 Writing good requirements tips”,

http://www.requirementone.com/Blog/2012/01/15/Top-10-Writing-good-requirements-tips , 3/18/13 •

“Business Requirements Document: A High-level Review”,

http://www.isixsigma.com/implementation/project-selection-tracking/business-requirements-document-high-level-review/ , 3/25/13 •

Centers for Medicare & Medicaid Services, Requirements Document (template)

http://www.hsd.state.nm.us/pdf/hcr/HIX/CMS%20Requirements%20Document.pdf

, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 45

Module 4: Stakeholders & Requirements

Lecture 2: CAHIMS 3.3 Business, User and Technical Requirements 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 46

Module 4: Stakeholders & Requirements

Lecture 3: CAHIMS 9.1 Business Communication and Ethics 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 47

Module 4: Stakeholders & Requirements

Lecture 4: CAHIMS 9.3 Professionalism and Customer Service 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 48

Module 5: Data Modeling

Lecture 1: Data Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 49

Why Model?

House 4/19/2013 5:00 PM Earth Pierce College - CIS260 Database Management Systems Molecule 50

Data Modeling

“Communication and precision are the two key benefits that make a data model so important.”

Data Modeling Made Simple: A Practical Guide for Business and IT Professionals, Technics Publications, LLC , 2nd Edition, Steve Hoberman, pg. 37 51 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems

Transfer User Requirements into Data Models

Three types of data models: Conceptual Model–simplest model; used to communicate and validate requirements with stakeholders

• A Conceptual data model is the most abstract form of data model. It is helpful for communicating ideas to a wide range of stakeholders because of its simplicity. Therefore platform-specific information, such as data types, is omitted from a Conceptual data model.” http:// www.sparxsystems.com/enterprise_architect_user_guide/9.3/domain_based_models/conceptual_data_model.html

, 3/25/13

Logical Model-more complex model; follows formal database design rules

• “Logical data modeling is the process of documenting the comprehensive business information requirements in an accurate and consistent format. “

“Data modeling”,

http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm

, 3/25/13

Physical Model-the most complex model; performance specific and vendor specific model

• “The physical design of your database optimizes performance while ensuring data integrity by avoiding unnecessary data redundancies. During physical design, you transform the entities into tables, the instances into rows, and the attributes into columns.” “Physical database design”,

http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm

, 3/25/13

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 52

“In Summary The conceptual model is concerned with the real world view and understanding of data; the logical model is a generalized formal structure in the rules of information science; the physical model specifies how this will be executed in a particular DBMS instance.” http://www.aisintl.com/case/CDM-PDM.html

, 3/25/13 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 53

Data Modeling Standards

• Several notation standards exist for data modeling • Peter Chen’s • • • Bachman notation Barker's Notation EXPRESS • • • IDEF1X Martin notation (min, max)-notation of Jean-Raymond Abrial in 1974 • • • UML class diagrams Merise Object-Role Modeling) http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model , 3/13/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 54

Module 5: Data Modeling

Lecture 2: Entity Relationship Modeling 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 55

The Entity Relationship(ER) Model

• The ER model shows information to be collected in the database (entity) and its relationship with other information collected.

• Peter Chen, 1976, originator of the entity relationship model 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 56

Identify the “Entities”

Identify the “entities” – the nouns – the title of the information being collected (patient, appointment, prescription, physician, etc.).

1. Draw a box for each entity and label with the entity name.

2. Label using the singular spelling of the noun and capitalize the noun.* 4/19/2013 5:00 PM

Figure 1: The design of the "box" will depend on the software used to create it.

Pierce College - CIS260 Database Management Systems 57

Identify the “Relationships”

Identify the “relationships” – the verbs; how one piece of data or information interacts/relates with another piece of information Draw a line between entities to show relationship.

1.Label the line with verbs that describe the relationship. 2.The first verb is for reading left to right; the second verb is for reading right to left.

4/19/2013 5:00 PM

Figure 2: The Patient has an Appointment. The Appointment is for a Patient.

Pierce College - CIS260 Database Management Systems 58

Identify the “Cardinality”

Identify the “cardinality” – the

number

of entities allowed in the relationship.

1.A single line touching an entity means “ONE”.

2.A line ending with three small lines, referred to as “crow’s feet”, means “Zero or more”. Once created, this can be set to a different minimum such as “One or more” or “Three or more”.

4/19/2013 5:00 PM

Figure 3: Single point at line end means "ONE"; crow’s feet, means "many" or "one or more".

Pierce College - CIS260 Database Management Systems 59

Identify the “Optional/Optionality”

Identify “optional/optionality” – whether the relationship is required or not.

1.

2.

Microsoft VISIO uses the “O” to show “optional” as seen by the entity Appointment.

The “||” means one required.

4/19/2013 5:00 PM

Figure 4: The Patient may have "one or more“ Appointments; the Appointment must have one Patient.

Pierce College - CIS260 Database Management Systems 60

Add the “Attributes”

Add the “attributes” – descriptors of the entity.

1. Label the attributes as singular tense.

2. Don’t put spaces or symbols between words if more than one is needed for clarity.

3. Type or write as “camel case” – first letter of each word is upper case, all other letters are lower case.

Figure 5: We identify the Patient by his/her name; we identify the Appointment by the date, time and Physician .

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 61

Add “Primary Key (PK)” and “Foreign Key (FK)”

Add “primary key (PK)” – use an attribute of the entity, or create a new attribute, that

uniquely

identifies the entity.

1. A new attribute, primary key ID, is usually created for most entities because none of the attributes identified are guaranteed to always be “unique”. 2. Add “foreign key(FK)” –this is the Primary key of the parent table in a relationship. The PK of the parent (in this case “Patient”) is added to the child table (in this case “Appointment”) thereby becoming the FK.

Figure 6:

Pierce College - CIS260 Database Management Systems 4/19/2013 5:00 PM 62

Unique Identifiers (UID)

• • • • A unique identifier/unique ID/UID is a number or combination of numbers and letters that when used will only identify one entity or record. Examples of ID’s we think uniquely identify us: • • • Driver’s license Social Security Number Telephone number Why they might not be unique: • http://www.idanalytics.com/news-and-events/news-releases/2010/8-11-2010.php

CustomerID, or PatientID, or AccountNumber are examples of new identifiers created for the purpose of keeping the information unique.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 63

Read the Data Models

Practice reading the diagrams.

1. Keep nouns singular when starting each sentence.

2. Read from left to right, then from right to left. The sentences must make sense in both directions.

** *** Noun A

Bike A Wheel

may/must

must May

have relationship(s)

be sold with Be sold with

with some number of Noun B.

one or more A Wheel(s) Bike Wheel Bike is sold with / is part of 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 64

Module 5: Data Modeling

Lecture 3: Introduction to Microsoft VISIO 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 65

Diagramming Software

There are many ER diagramming tools.

• Free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis • Proprietary ER diagramming tools • Free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#ER_diagramming_tools • Microsoft VISIO is a proprietary diagramming tool that is free to students in college programs covered under the Microsoft Academic license.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 66

Diagramming Software

Get to know VISIO

http://office.microsoft.com/en-us/visio-help/get-to-know-visio-RZ001126777.aspx

Visio 2007 training courses

http://office.microsoft.com/en-us/visio-help/visio-2007-training-courses-HA010214368.aspx?CTT=1 •

Create a Database Model (also known as Entity Relationship diagram)

http://office.microsoft.com/en-us/visio-help/create-a-database-model-also-known-as-entity-relationship-diagram-HA010115477.aspx

• Make the switch to VISIO 2010 http://office.microsoft.com/en-us/access-help/download-office-2010-training-HA101901726.aspx?CTT=1 • Make the switch to VISIO 2013 http://office.microsoft.com/en-us/support/make-the-switch-to-visio-2013-RZ102925050.aspx?CTT=5&origin=HA104032123 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 67

Module 5: Data Modeling

Lecture 4: Normalization 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 68

Normalization

“Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.” http://databases.about.com/od/specificproducts/a/normalization.htm

”Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations.” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 69

Normalization Rules

“There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications. “ http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 70

First Normal Form (1NF)

“First Normal Form • Eliminate repeating groups in individual tables. • Create a separate table for each set of related data. • Identify each set of related data with a primary key. Do not use multiple fields in a single table to store similar data.” http://support.microsoft.com/kb/283878 71 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems

1NF Example

Not 1NF 1NF

Student

Name

Sally Doe Jefferson Doe CIS260, ENG101 Multiple entries for one field is a violation of 1NF.

Split into two tables using a student unique identifier to tie the information together.

4/19/2013 5:00 PM

CourseID

CIS260, CIS270, CIS280 Student

StudentID

SD12345 JD23456 Course

CourseID

CIS260 CIS270 CIS280 CIS260 ENG101 Pierce College - CIS260 Database Management Systems

Student

Sally Doe Jefferson Doe

CourseTitle

Database Programming Analysis Database English

StudentID

SD12345 SD12345 SD12345 JD23456 JD23456 72

Second Normal Form (2NF)

• • “Second Normal Form Create separate tables for sets of values that apply to multiple records. Relate these tables with a foreign key. Records should not depend on anything other than a table's primary key (a compound key, if necessary).” http://support.microsoft.com/kb/283878 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 73

2NF Example

Not 2NF

Student

StudentID

SD12345 JD23456 Course

CourseID

CIS260 CIS270 CIS280 CIS260 ENG101 4/19/2013 5:00 PM

Student

Sally Doe Jefferson Doe Each Student is unique; expanded Course table violates 2NF.

CourseTitle StudentID Database

SD12345 Programming SD12345 Analysis

Database

English SD12345 JD23456 JD23456

2NF

Enrollment

StudentID

SD12345 SD12345 SD12345 JD23456 JD23456 Course

CourseID

CIS260 CIS270 CIS280 ENG101 Pierce College - CIS260 Database Management Systems

CourseID

CIS260 CIS270 CIS280 CIS260 ENG101

CourseTitle

Database Programming Analysis English Now each Enrollment is unique, and each Course is unique.

74

Third Normal Form (3NF)

• “Third Normal Form Eliminate fields that do not depend on the key. Values in a record that are not part of that record's key do not belong in the table.” http://support.microsoft.com/kb/283878 • “A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd".

” http://en.wikipedia.org/wiki/Third_normal_form 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 75

3NF Example

Not 3NF

Enrollment

StudentID CourseID

SD12345 CIS260 SD12345 SD12345 JD23456 JD23456 CIS270 CIS280 CIS260 ENG101 Expanded Enrollment table not 3NF because Instructor is dependent on Instructor ID, not CourseID, and the record not dependent on a unique record key.

InstructorID Instructor

FAC123 FAC234 FAC345 FAC456 FAC567 Schmidt Jones Nguyen Doe Chen

3NF

Enrollment

EnrollmentID StudentID

2013Fall001 SD12345 2013Fall002 SD12345 2013Fall003 SD12345 2013Fall004 JD23456 2013Fall005 JD23456 Instructor

InstructorID

FAC123 FAC234 FAC345 FAC456 FAC567

Instructor

Schmidt Jones Nguyen Doe Chen

CourseID

CIS260 CIS270 CIS280 CIS260 ENG101

InstructorID

FAC123 FAC234 FAC345 FAC456 FAC567 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 76

Finished Normalized Entities

Course

CourseID

CIS260 CIS270 CIS280 ENG101

CourseTitle

Database Programming Analysis English Instructor

InstructorID

FAC123 FAC234 FAC345 FAC456 FAC567 Each entity now holds unique data.

Instructor

Schmidt Jones Nguyen Doe Chen Student

StudentID

SD12345 JD23456 Enrollment

EnrollmentID StudentID

2013Fall001 SD12345 2013Fall002 SD12345 2013Fall003 SD12345 2013Fall004 JD23456 2013Fall005 JD23456

Student

Sally Doe Jefferson Doe

CourseID

CIS260 CIS270 CIS280 CIS260 ENG101

InstructorID

FAC123 FAC234 FAC345 FAC456 FAC567 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 77

Matching ER Diagram

PK

Student

StudentID

Student has / is of

PK

Instructor

InstructorID

Instructor has / is of

PK PK,FK1 PK,FK3

Enrollment

EnrollmentID StudentID CourseID

FK2 InstructorID has / is of

PK

Course

CourseID

CourseTitle 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 78

Normalization Exception

“From a relational model point of view, it is standard to have tables that are in Third Normal Form. Normalized physical design provides the greatest ease of maintenance, and databases in this form are clearly understood by developers.

However, a fully normalized design may not always yield the best performance. Sybase recommends that you design databases for Third Normal Form, however, if performance issues arise, you may have to denormalize to solve them.”

Copyright © 2003. Sybase Inc. All rights reserved. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/databases215.htm

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 79

Module 5:

Lecture 5: Other Modeling Concepts 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 80

Many-to-Many Relationships

“In a many-to-many relationship, a record in one table relates to multiple records in a second table, and a record in the second table relates to multiple records in the first table.

This type of relationship requires a third table, called a junction table. The junction table contains the primary keys from the other two tables as its foreign keys.” http://office.microsoft.com/en-us/access-help/table-that-data-RZ006149432.aspx?section=26 , 4/16/2013 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 81

Junction/Intersection Tables

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 82

Lookup Tables

• • Lookup tables are generally tables that contain rarely changing data and are considered the master or reference list.* An example could be a zip code table with associated city, state, county, country, latitude/longitude information. Many different tables and/or databases could use this same table for lookup and for presenting various combinations of the fields on different reports. Lookup tables generally are single topic, used when the data rarely changes, when multiple applications could share the same information, to minimize the likelihood of typos, and to minimize storage for other tables needing to reference this table. The relationship between a lookup table and another entity table is strictly a lookup relationship.

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 83

Lookup Table Examples

2 3

OrderStatusID

1 4

OrderStatus

Ordered Back Ordered Shipped Cancelled 4/19/2013 5:00 PM

StateNamesID

1 2 3 Pierce College - CIS260 Database Management Systems

StateNameShort

AL AK AZ

StateNameLong

Alabama Alaska Arizona 84

Recursion

The Supervisor is also an Employee, so a lot of duplication of information.

4/19/2013 5:00 PM By adding the EmployeeID of the Supervisor to the field SupervisorID, the relationship is established with only one extra field needed.

Pierce College - CIS260 Database Management Systems 85

Supertypes and Subtypes

PATIENT

*First Name *Last Name *Gender *Birthdate

MALE FEMALE

*Number of Pregnancies

INFANT

*Head Circumference 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 86

Module 5: Data Modeling

Lecture 6: Data Patterns 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 87

Patterns/Templates

4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 88

Data Patterns

"Thus Universal Patterns for Data Modeling are reusable guides that provide a data modeling template for very prevalent or "universal" themes that occur in data modeling.“ The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling, Len Silverston, Paul Agnew, Wiley; 1 edition (January 9, 2009) , page 5

Watch Youtube video of Len Silverston,

author of The Data Model Resource Book , http://www.youtube.com/watch?v=D5Rgpl6humE 4/19/2013 5:00 PM Pierce College - CIS260 Database Management Systems 89

Example Data Pattern – Electronic Medical Records

4/19/2013 5:00 PM Source: http://www.databaseanswers.org/data_models/electronic_medical_records/index.htm, 4/16/2013 Pierce College - CIS260 Database Management Systems 90