Into. to Databases PPT(my lecture)

download report

Transcript Into. to Databases PPT(my lecture)

Database Processing
What Is the Purpose of a
• Purpose: to keep track of things
• If structure of a list is simple, i.e., one theme,
no need to use database technology (video)
General Rule
• Lists of data involving a single theme can be
stored in a spreadsheet.
• Lists that involve data with multiple themes
require a database.
• Database:
 A self-describing collection of integrated
 In databases, bytes are grouped into
columns, such as Student Number and
Student Name. Columns are also called
fields. Columns or fields, in turn, are
grouped into rows, which are also called
Characters, Fields, and Records
Hierarchy of Data Elements
What Are Relationships Among
First row of the Email Table is
related to Andrea Baker in
Student Table
Last row in Office_Visit Table
related to Adam Verberra in
Student Table
Relationship Special Terms
• Key – Primary Key
 A column or group of columns that identifies a unique row in
a table.
 Student Number is the primary key of the Student table.
Given a value of Student Number, you can determine one
and only one row in Student. Only one student has the
number 1325.
 Every table must have a primary key.
 Sometimes more than one column is needed to form a
unique identifier. In a table called City, for example, the
primary key would consist of combination of columns (City,
 Email_Num is the key of Email Table.
 VisitID is the key of Office_Visit Table.
Relationship Special Terms
• Foreign keys
 These are primary keys of a different
(foreign) table than the table in which they
• Relational databases
 Relationships among tables are created by
using foreign keys.
• Relation
 Formal name for a table
• Database:
 A database is a
collection of
• Metadata
 Data that
describe data
Components of a Database
Application System
• Applications make database data more accessible and useful.
• Users employ a database application that consists of forms,
formatted reports, queries, and application programs.
• Database management system (DBMS) processes database
tables for applications.
What Is a Database Management
System (DBMS)?
 A program (software) used to create, process, and
administer a database
• Companies license DBMS products from vendors:
 IBM, Microsoft, Oracle, and others
• Popular DBMS products are:
 DB2 from IBM
 Access and SQL Server from Microsoft
 Oracle from the Oracle Corporation
 MySQL—an open-source DBMS product that is
license-free for most applications
Creating the Database and Its
• Database developers use the DBMS to create and modify tables,
relationships, and other structures in the database.
• Below, the developer has added a new column called
Response?. This new column has data type Yes/No.
Processing the Database
• DBMS operations
 Read, insert, modify, delete data
 Applications call DBMS in different ways
• From a form, when the user enters new or
changed data, a computer program behind the
form calls the DBMS to make the necessary
database changes.
• From an application program, the program calls
the DBMS directly to make the change.
Structured Query Language (SQL)
• SQL—“see-quell”
• International standard language for creating databases
and database structures, and processing databases
• All five of the most popular DBMS products accept and
process SQL.
• Following SQL statement inserts a new row into the
Student table:
([Student Number], [Student Name], HW1, HW2, MidTerm)
(1000, ’Franklin, Benjamin’, 90, 95, 100);
Administering the Database
• DBMS provides tools to assist in administration of
the database.
• Used to set up a security system involving user
accounts, passwords, permissions, and limits for
processing the database
• Backing up database data, adding structures to
improve performance of database applications,
removing data no longer wanted or needed, and
similar tasks
• Most organizations dedicate one or more
employees to the role of database administration.
What Are Forms, Reports, and
Reports show data in a
structured context.
What Are Forms, Reports, and
Sample query
form used
to enter
phrase for
Sample query results of query operation
Why Are Database Application
Programs Needed?
• Forms, reports, and queries work well for standard
functions. However, most applications have unique
requirements that a simple form, report, or query cannot
• Application programs process logic that is specific to a
given business need.
• Application programs serve as an intermediary between
the Web server and database.
 Responds to events, such as when a user presses a
submit button; also reads, inserts, modifies, and
deletes database data
Four Database Application
Programs Running on a Web
Server Computer
Enterprise DBMS vs. Personal
• Enterprise DBMS
 Process large organizational and workgroup databases
 Support many, possibly thousands, of users and many different
database applications
 Support 24/7 operations and can manage databases that span dozens of
different magnetic disks with hundreds of gigabytes or more of data
 IBM’s DB2, Microsoft’s SQL Server, and Oracle’s Oracle are examples
of enterprise DBMS products.
Personal DBMS
 Designed for smaller, simpler database applications
 Used for personal or small workgroup applications that involve
fewer than 100 users (normally fewer than 15), single user
Access: A DBMS and an
Application Development Product
Before building a database, developers construct a
logical representation of database data called a data
model to describe the data and relationships to be
stored in database.
What Is the Entity-Relationship
Data Model?
• Entity-relationship (E-R) data model
 A tool for constructing data models
 Developers use it to describe the content of a data
model by defining entities that will be stored in
database and relationships among those entities
• Some thing that the users want to track
• Examples of entities:
 Order, Customer, Salesperson, and Item. Some entities represent a
physical object, such as Item or Salesperson; others represent a logical
construct or transaction, such as Order or Contract.
 Entity names are always singular.
• Attributes
 Describe characteristics of an entity.
 Examples: order attributes are OrderNumber, OrderDate, SubTotal,
Tax, Total, and so forth.
• Identifier
 An attribute (or group of attributes) whose value is associated with one
and only one entity instance.
Student Data Model Entities
Entities with Relationships
Sample Relationship (Version 1)
1:N = many-to-many
N:M = many-to-many
One department can have
many advisers, but an adviser
has at most one department.
One adviser can have many
students and one student can
have many advisers.
Sample Relationships (Version 2)
Advisers may advise in more than one department, but a
student may have only one adviser, representing a policy that
students may not have multiple majors.
Crow’s-Foot Diagram Version
Maximum cardinality—maximum number of entities that can be
involved in a relationship. Vertical bar on a line means that at least one
entity of that type is required.
Minimum cardinality—minimum number of entities that can be involved
in a relationship. Small oval means that the entity is optional; the
relationship need not have an entity of that type.
Database Design
• Database design is the process of converting a data
model into tables, relationships, and data
• Database design team transforms entities into tables
and expresses relationships by defining foreign keys.
• Two important database design concepts:
normalization and the representation of two kinds
of relationships.
• Normalization is a foundation of database design.
• Representation of relationships will help you
understand important design considerations.
Normalization is the process of converting a poorly structured table into
two or more well-structured tables. Problem with these tables, have two
independent themes: employees and departments.
Data Integrity Problems
• In previous figure, some rows show Dept. 100 is “Accounting and
Finance” and others show Dept. 100 is “Accounting.” Which one
is correct?
• A table with data integrity problems will produce incorrect results
and inconsistent information.
• Data integrity problems happen when data are duplicated.
• Users will lose confidence in the information, and system will
develop a poor reputation. Information systems with poor
reputations become serious burdens to the organizations that
use them.
Normalizing for Data Integrity
• Normalized tables eliminate data duplication,
but they can be slower to process.
• General goal of normalization is to construct
tables such that every table has a single topic
or theme.
Normalizing for Data Integrity
• The way to correct the problem is to split the table into two
tables, each with its own theme.
Summary of Normalization
• Database practitioners classify tables into various
normal forms according to the kinds of problems they
• Transforming a table into a normal form to remove
duplicated data and other problems is called
normalizing the table.
• Normalization is just one criterion for evaluating
database designs. Normalized designs can be slower
to process, database designers sometimes choose to
accept nonnormalized tables. The best design
depends on the users’ processing requirements.
Representing Relationships
a 1:N
Representing an N:M Relationship
Using Databases to Improve Business Performance and Decision Making
• Very large databases and systems require special
• Very
large databases
and systems require special
• To analyze large quantities of data
To analyze
access data
To access data from multiple systems
Data warehousing
Tools mining
for accessing internal databases through the Web
• Three key techniques
• Three
key techniques
• Data warehousing
• Tools for accessing internal databases through the Web
Using Databases to Improve Business Performance and Decision Making
• Data warehouse:
• Stores current and historical data from many core operational
transaction systems
• Consolidates and standardizes information for use across
enterprise, but data cannot be altered
• Data warehouse system will provide query, analysis, and reporting
• Data marts:
• Subset of data warehouse
• Summarized or highly focused portion of firm’s data for use by
specific population of users
• Typically focuses on single subject or line of business
Using Databases to Improve Business Performance and Decision Making
Components of a Data Warehouse
The data warehouse extracts current and historical data from multiple operational
systems inside the organization. These data are combined with data from external
sources and reorganized into a central database designed for management reporting
6-13 users with information about the
and analysis. The information directory
data available in the warehouse.
Using Databases to Improve Business Performance and Decision
• Business Intelligence:
• Tools for consolidating, analyzing, and providing access
to vast amounts of data to help users make better
business decisions
• E.g., Harrah’s Entertainment analyzes customers to
develop gambling profiles and identify most profitable
• Principle tools include:
• Software for database query and reporting
• Online analytical processing (OLAP)
• Data mining
Business Intelligence
Using Databases to Improve Business Performance and Decision Making
A series of
analytical tools
works with
data stored in
databases to
find patterns
and insights
for helping
managers and
make better
decisions to
Using Databases to Improve Business Performance and
Decision Making
• Online analytical processing (OLAP)
• Supports multidimensional data analysis
• Viewing data using multiple dimensions
• Each aspect of information (product, pricing, cost,
region, time period) is different dimension
• E.g., how many washers sold in East in June
compared with other regions?
• OLAP enables rapid, online answers to ad hoc queries
Using Databases to Improve Business Performance and Decision Making
The view that is
showing is
product versus
region. If you
rotate the cube
90 degrees, the
face that will
show is product
versus actual
and projected
sales. If you
rotate the cube
90 degrees
you will
see region
versus actual
and projected
sales. Other
views are
Multidimensional Data Model
Using Databases to Improve Business Performance and Decision Making
• Data mining:
• More discovery driven than OLAP
• Process by which great amounts of data are analyzed and investigated
• Finds hidden patterns, relationships in large databases and infers rules
to predict future behavior
• E.g., Finding patterns in customer data for one-to-one marketing
campaigns or to identify profitable customers.
• Key areas where businesses are leveraging data mining
• Customer segmentation
• Marketing and promotion targeting
• Market basket analysis
• Collaborative filtering
• Customer churn
• Fraud detection
• Financial modeling
Data Mining Methods
• Classification
– Define data classes
• Estimation
– Assign a value to data
• Affinity grouping or association rules
– Determine which data goes together
• Classification - Recognizes patterns that
describe group to which item belongs
• Clustering
– Organize data into subgroups
• Description and visualization
– Get a clear picture of what is happening
Using Databases to Improve Business Performance and
Decision Making
• Predictive analysis
• Uses data mining techniques, historical data, and
assumptions about future conditions to predict
outcomes of events
• E.g., Probability a customer will respond to an offer or
purchase a specific product
• Text mining
• Extracts key elements from large unstructured data sets
(e.g., stored e-mails)
Data Aggregators
• Laws that limit the data that federal and other
governmental agencies can acquire and store.
• Some legal safeguards on data maintained by credit
bureaus and medical facilities.
• No such laws that limit data storage by most
companies (nor are there laws that prohibit
governmental agencies from buying results from
companies like Acxiom.
How Will this Change by 2020?
• Absent any public outcry for legislation to limit such
activity, aggregator data storage will continue to grow
exponentially and companies will have even more
data about you, the state of your health, your wealth,
your purchase habits, your family, your travel, your
driving record, and, well, anything you do.
• Query, reporting, and data mining technology will
improve and Moore’s law will make computer
operations that are too slow to be practical today,
feasible tomorrow.
• The picture of you will become more and more
Why Do You Care?
• Data could be stolen and used for criminal activity
against you.
• Data might not be accurate
• More than 25% of critical data in Fortune 1000
company databases are inaccurate or incomplete
• Most data quality problems stem from faulty input.
• No organization is required by law to tell you the data
that it stores about you and what it does with it.
Protecting Your Data
• What can you do? Ask the following questions:
– For what purpose is the data being gathered?
– Are the reasons for gathering the data legitimate or
important to you?
– How will the information gathered be protected once it
has been obtained?
– Will the information collected be used for purposes
other than those for which it was originally collected?
– Could the information asked for be used for identity
– Are organizations that already have your data
safeguarding it?
What If…
• You enroll in a “healthy eaters” medical insurance
program, similar to “safe drivers” auto insurance.
Your premiums are lower because you eat well,
except that the insurance company notes from
last month’s data that you bought four large
packages of Cheetos, and your health insurance
premium is increased, automatically. You have
no idea why.
• Could this actually happen – or something like