Data Management MIS 503 Management Information Systems MBA Program Definitions • Database: A DB is an organized collection of logically related data • Data: stored representations.
Download ReportTranscript Data Management MIS 503 Management Information Systems MBA Program Definitions • Database: A DB is an organized collection of logically related data • Data: stored representations.
Data Management
MIS 503 Management Information Systems MBA Program
Definitions
• Database: A DB is an organized collection of logically related data • Data: stored representations of meaningful objects and events – Structured: numbers, text, dates – Unstructured: images, video, documents • Information: data processed to increase knowledge in the person using the data • Metadata: data that describes the properties and context of user data
Data Entities, Attributes, and Keys
• An
entity
is a generalized class of people, places, or things for which data is collected, stored, and maintained.
• • A
attribute
is a characteristic of an entity.
Data item
- a value of an attribute can be found in the fields of the record describing an entity.
Key Fields
• Keys are special fields that serve two main purposes: –
Primary keys
are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc.
This is how we can guarantee that all rows are unique
–
Foreign keys
are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship) – A
secondary key
is a field in a record that does not uniquely identify the record but which is used to look up fields (e.g., Last Name) in, for example, and index – Candidate Key – an attribute that could be a key…satisfies the requirements for being a key • Keys can be
simple composite
(a single field) or (more than one field)
Data Model
• A data model is a diagram of entities and their relationships. • Data modeling involves understanding a specific business problem and analyzing the data and information needed to deliver a solution.
• The data model will be optimized to balance storage efficiency and query efficiency
E-R Diagrams
•
Entity-relationship (ER) diagrams
use graphical diagrams to demonstrate the organization of and relationships between entities.
• Relationships include: – one-to-many (1:N) – one-to-one (1:1) – many-to-many (N:M)
Course
CourseNo CrsDesc CrsUnits
ER Modeling
Single line: one cardinality Inside symbol: minimum cardinality Crow's foot: many cardinality Has
Offering
OfferNo OffLocation OffTime Outside symbol: maximum cardinality Circle: zero cardinality
ER Modeling
•
Relational Database Model
Relational Model
- the relational model describes data using a standard tabular format – All data elements are placed in two-dimensional tables, called relations – A relation contains rows (tuples, records) and columns (attributes, fields) with each intersecting cell containing an item of data • Each attribute has a domain, which is the structure or constraints on the type of data an attribute can hold Customer Table Field Name Description Customer Name Self Explanatory Customer Address Self Explanatory
Customer ID Order Number
Order Number Order Item Number of Items Ordered Self Explanatory
Primary Key-----> Customer ID Secondary Key
Order Table Primary Key Self Explanatory
Secondary Key
Data Management Issues
• • •
Data redundancy
– Un-needed duplication of data
Data integrity
– Can we rely on the data?
– Is it accurate?
– Is it secure?
Program-data dependence
- programs and data that are developed and organized so that the data is linked to the application program and the data is incompatible with other programs or data management tools
The Database Approach to Data Management
• The database approach – Central repository of shared data – Data is managed by a controlling agent – Stored in a standardized, convenient form To implement a database one must have a Database Management System (DBMS)
Database Management Systems (DBMS)
• A
database management system
is a group of programs used as an interface between a database and application programs or a database and the user.
– DBMSs are classified by the type of database model they support.
– Modern information systems are usually built on data housed in one or more DBMS
•
T ECHNICAL A SPECTS OF M ANAGING THE D ATA R ESOURCE
Tools for Managing Data
A DBMS helps manage data by providing seven functions: 1. Data storage, retrieval, update 2. Backup 3. Recovery 4. Integrity control 5. Security control 6. Concurrency control 7. Transaction control
Page 139
Advantages of DBMSs
• Reduced data redundancy.
• Improved data integrity.
• Faster program development.
• Easier modification and updating.
• Data and program independence.
• Standardization of data access.
• A framework for program development.
• Better overall protection of the data.
• Shared data and information resources.
Data Normalization
• A common problem with data organization is that data are often not well organized – –
Anomalies
are problems and irregularities in data.
Data anomalies
often result in giving users incorrect information, causing them to be misinformed about actual business conditions.
• If data are not well organized, the table may need to be normalized…
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies • Goal is to avoid anomalies – – –
Insertion Anomaly
create duplicate data – adding new rows forces user to
Deletion Anomaly
– deleting rows may cause a loss of data that would be needed for other future rows
Modification Anomaly
– changing data in a row forces changes to other rows because of duplication
General rule of thumb: a table should not pertain to more than one entity type
What’s wrong with this table?
An Example of Normalization
• Click here to view a step-by-step example of normalization
T ECHNICAL A SPECTS OF M ANAGING THE D ATA R ESOURCE
Database Architecture Database –
shared collection of logically related data, organized to meet needs of an organization
Database Architecture
– way in which the data are structured and stored in the database
Page 137
Figure 5.3 The Data Pyramid Page 137
The Three-level DB Schema
Database Schemas
•
Schema
– A – The – The - a general description of the entire database that shows all of the record types and their relationships.
user view
(external schema) is the portion of the database a user can access.
conceptual view
is the logical design of the database (how should the database be organized regardless of physical constraints)
internal view
(physical view) is the physical storage structure for the database • A
subschema
records and their relationships in the database.
shows only some of the
SQL
•
Structured Query Language
- a query language is a specialized type of data manipulation language.
• Query languages make retrieving information and manipulating a database easy and fast.
• SQL - structured query language.
Structured Query Language (SQL)
• Basic structure of a SQL expression – The select clause lists the attributes desired in answer to a query – The from clause is a list of relations or tables that the query language processor should consult in filling the request – The where clause describes the attributes desired in the answer
Emerging Database Trends
Distributed Databases
•
Distributed Processing
- involves placing processing units at different locations and typing them together with data communications equipment and systems.
– A distributed database is a database in which the actual data may be spread across several small databases connected via telecommunication devices.
Storage Area Networks (SAN)
• High-speed, special purpose network or subnet that interconnects different kinds of storage devices with associated data servers to benefit a larger network of users.
• Part of an overall network for computing resources.
• Usually physically located near larger computing resources such as a mainframe or server.
Network-Attached Storage (NAS)
• Hard disk storage with its own network address rather than being attached to a server or workstation.
• Includes: – Multi-disk Redundant Arrays of Integrated Disks (RAID) systems – Software to configure and map file locations – Designed to handle a variety of network protocols
The Data Warehouse
• Businesses collect a tremendous amount of transactions data from routine operations • These data can be analyzed to understand the business better – Requires multidimensional analysis called Online Analytical Processing (OLAP) – Helps create a learning organization that is better able to understand its markets, customers and itself
• •
Definition
Data Warehouse
: – A subject-oriented, integrated, time-variant, non updatable collection of data used in support of management decision-making processes –
Subject-oriented:
students, products e.g. customers, patients, –
Integrated:
Consistent naming conventions, formats, encoding structures; from multiple data sources – –
Time-variant:
Can study trends and changes
Nonupdatable:
Read-only, periodically refreshed
Data Mart
: – A data warehouse that is limited in scope
Components of a
star schema
Fact tables
contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance
Dimension tables
contain descriptions about the subjects of the business
Data Mining
• Discovers interesting structure in large amounts of data • This structure consists of – Patterns – Statistical or predictive models of the data – Relationships between the data • Applied extensively to customer data – Allows firms to determine for instance which products sell together
Object-Oriented Databases
• Traditionally relational databases supported a limited number of data types – Alphabet, numeric, dates, and time • Modern organizations use a variety of data – Graphics objects, audio clips, videos, subscripted arrays, and complex data for data mining • RDBMS vendors have extended their packages to handle such data objects
Threats to Data Security
• Accidental losses attributable to: – Human error – Software failure – Hardware failure • Theft and fraud.
• Improper data access: – Loss of privacy (personal data) – Loss of confidentiality (corporate data) • Loss of data integrity • Loss of availability (through, e.g. sabotage)
Data Management Security Techniques/Procedures
• Views or subschemas • Integrity controls • Authorization rules • User-defined procedures • Encryption • Authentication schemes • Backup, journalizing, and checkpointing
• • • •
M ANAGERIAL I SSUES IN M ANAGING D ATA
Principles in Managing Data
The need to manage data is permanent Data can exist at several levels Application software should be separate from the database Application software can be classified by how they treat data 1. Data capture 2. Data transfer 3. Data analysis and presentation
Page 140
• • •
M ANAGERIAL I SSUES IN M ANAGING D ATA
Principles in Managing Data
Application software should be considered disposable Data should be captured once There should be strict data standards
Page 143
M ANAGERIAL I SSUES IN M ANAGING D ATA
The Data Management Process Figure 5.6 Asset Management Functions Page 144
M ANAGERIAL I SSUES IN M ANAGING D ATA
Data Management Policies
• Organizations should have policies regarding: – Data ownership – Data administration
Page 148
M ANAGERIAL I SSUES IN M ANAGING D ATA
Data Ownership Corporate information policy –
foundation for managing the ownership of data
Page 148
Figure 5.8 Example Data Access Policy Page 149
M ANAGERIAL I SSUES IN M ANAGING D ATA
Data Administration
• • • • • • • Key functions of the data administration group:
Promote and control data sharing Analyze the impact of changes to application systems when data definitions change Maintain the data dictionary Reduce redundant data and processing Reduce system maintenance costs and improve system development productivity Improve quality and security of data Insure data integrity
Page 150
M ANAGERIAL I SSUES IN M ANAGING D ATA
Data Administration
• • • • • • Key functions of the database administrator (DBA): Tuning database management systems.
Selection and evaluation of and training on database technology.
Physical database design.
Design of methods to recover from damage to databases.
Physical placement of databases on specific computers and storage devices.
The interface of databases with telecommunications and other technologies.
Page 150-151
7 Habits of Highly Effective Data Modelers*
• Immerse – Immerse yourself in the task environment to find out what the client wants • Challenge – Challenge existing assumptions; dig out the exceptions and test the boundaries of the model • Generalize – Reduce the number of entities whenever possible; simpler is easier to understand • Test – Read it to yourself and to others to see if it makes sense and is relevant to the problem * adapted from R. Watson (1999)
7 Habits of Highly Effective Data Modelers
• Limit – Set reasonable limits to the time and scope of the data modeling activities. Identify the core entities and attributes that will solve the problem and stick to those • Integrate – Identify how your project’s model fits with the organization’s information architecture. Can it be integrated with the corporate data model? Look at the big picture.
• Complete – Don’t leave the data model ill-defined. Define entities, attributes, and relationships carefully.