Slide 1

Transcript Slide 1

SQL Unit 15
Normalization
prepared by Kirk Scott
1
•
•
•
•
•
•
•
•
1.
2.
3.
4.
5.
6.
7.
8.
Normal Forms
First Normal Form
Second Normal Form
Third Normal Form
Boyce-Codd Normal Form
Higher Normal Forms
Domains
Nulls and Integrity
2
1. Normal Forms
• The benefits of relational database theory can
be summarized as follows:
• There is a step-by-step way of arriving at a
correct design;
• there is a way of detecting flaws in a design;
• the design process has to do with the problem
domain, not with computer-related questions;
3
• and finally, if the design is correct, it will be
possible to:
• store all desired information in it;
• update the information on an ongoing basis;
• and retrieve the information when needed.
4
• Correct designs are based on what are called
normal forms.
• This section presents the background
information to the design process.
• It also discusses and illustrates the use of
normal forms.
5
• At its most basic level, design of a database
depends on determining what you want to
store information about.
• When deciding what the base tables will be,
you are trying to identify entities.
• From a language point of view, this involves
identifying nouns which do not modify other
things.
6
• Identifying the entities leads to identifying
their attributes.
• Attribute names usually end up being nouns
too, but you figure out what they are when
you try to describe entities, and the
descriptions usually involve adjectives.
• One of the key points of database design is
that you only store information about the
entities and attributes you need to.
7
• There may be many possible entities,
• and all entities may have a long list of
potential attributes,
• but you limit yourself to only those things you
will need to retrieve information about in the
future.
8
• You are familiar with primary keys and foreign
keys.
• When trying to organize the attributes around
entities in the design, the idea is to equate an
entity with a primary key field
• and then group the attributes with the entities
that they describe.
9
• Relationships between tables are captured by
embedding the primary keys of one or more
tables as foreign keys in other tables.
• When described in general, this sounds
sensible enough.
• In practice it can be difficult to do without
formal guidelines.
• This is what the normal forms provide.
10
• The normal forms are based on and described in
terms of an idea taken from math.
• One field in a table may functionally determine
another.
• Stated in reverse order: The other field depends
functionally on the one.
• In math, an example of a function such as this
might be:
•
• y = f(x), for example y = x2
11
• In this case, x is in the domain and y is in the
range.
• y is a function of x.
• In other words, x functionally determines y, or
y functionally depends on x.
12
• For a mathematical function, you find the
dependent value by doing some sort of
computation on the determining value.
• The key point underlying a function is the
following:
• For each value of x, there can only be one
corresponding value of y.
13
• The analogy in database design is the
following:
• The primary key of a table should functionally
determine the values of the other fields in the
table.
• In other words, the non-key fields should
functionally depend on the primary key field.
• Take the small table on the following overhead
for example:
14
Person(SSN, name, dob)
SSN
123-45-6789
…
Name
Bob
dob
1/1/01
15
• You don’t find a person’s name or birthdate by
doing a computation on their social security
number.
• However, assuming that the data are recorded in
the table, given any one social security number,
there is exactly one corresponding name and
exactly one corresponding date of birth.
• It is true that different people with different
social security numbers may have the same name
and the same date of birth, but this is not a
problem.
16
• The point of the primary key field is that it is
the unique identifier that makes it possible to
distinguish between these two people.
• The bottom line is that the name and date of
birth fields functionally depend on the social
security number field.
17
• A new notation can be used to indicate this.
• In this notation, the arrows go from the field
that functionally determines another field, to
the field that is dependent.
• This is illustrated on the next overhead.
18
19
• Some of the normal forms are identified by
number, for example 1st, 2nd, and 3rd normal
forms.
• Others are identified by name, for example
Boyce-Codd normal form, named after the
people who discovered it.
• These four normal forms are abbreviated 1NF,
2NF, 3NF, and BCNF, respectively.
• There are also higher normal forms, 4th, 5th, and
domain key normal forms (4NF, 5NF, DKNF).
20
• The normal forms have to do with finding
dependencies in tables which spring from fields
other than the primary key.
• These dependencies are undesirable and are
referred to as stray dependencies.
• The normal forms make increasingly strict
statements about the kinds of stray dependencies
that have to be eliminated from correctly
designed tables.
• Designs containing stray dependencies are said to
violate the normal forms.
21
• It is not hard to come up with examples of how these
conditions might be violated, and then use the normal
form definition to come up with an improved design
which doesn’t contain the violation.
• The rule of thumb at every stage is to remove stray
dependencies in the following way:
• Make any field which determines other fields the
primary key of a new table, and move the fields that
depend on that field to the new table.
• Make sure that the new table is connected to the old
table by a primary key, foreign key pair.
22
• Design problems that are based on violations of
normal forms lead to what are called anomalies.
• The hallmark of a problematic design is that the
same information is stored multiple times.
• In other words, there is redundancy in the
database.
• Depending on the nature of the redundancy, this
can lead to problems when inserting data, when
updating data, and when deleting data.
23
• The normal forms are discussed below.
• They are defined informally.
• Then designs with violations are given, the
anomalies that result will be pointed out, and the
requirements of the normal form will be used to
straighten out the design.
• The use of normal forms may seem unnecessarily
theoretical at first.
• However, they provide a convenient way of
identifying problems in designs and then
eliminating them.
24
2. First Normal Form
• The various normal forms will be presented in the
following way:
• A definition of the normal form will be given.
• Then a scenario for information to be held in a
database will be given, with the underlying
assumptions given.
• Then an example database design which violates
the normal form will be given and it will be
shown using a diagram with the notation
indicating functional dependencies.
25
• The desired functional dependencies from the
primary key will be shown using arrows below
the field names.
• Undesired, stray dependencies, which need to be
eliminated in order to correct the design, will be
shown using arrows above the field names.
• Anomalies resulting from the incorrect design will
be discussed.
• Then a corrected design will be given.
26
• All of the examples will be based on the general
topic of cars, salespeople, customers, and car
sales.
• Some of the field names are abbreviated, and
some of the fields clearly belong together in
some way.
• Here is a little preliminary explanation regarding
the fields that will be in the examples.
• Not all of the fields will appear in all of the
examples.
27
• vin: vehicle identification number. Vehicles
have makes, models, and years.
• spno, spname: Salesperson number and
name.
• custno, custname: Customer number and
name.
• A car sale has a salesprice and a date.
28
• 1NF can be defined as follows:
• Data is stored in flat files; there can be no
repeating groups in a record.
• This has been explained in detail in a previous
section.
• The example design uses {} to contain
repeating groups.
29
• The assumptions underlying the design are
that a salesperson can sell many cars, but each
car can only be sold by one salesperson.
• In this design, each car is only sold once, so
the design captures information about the
sales of new cars.
30
• Here is the design that violates 1NF:
• Carsale(spno, spname, {vin, salesprice})
• A diagram with arrows illustrating this is given
on the next overhead
31
32
• In general, normal form violations have insert,
update, and delete anomalies.
• It would be possible to analyze such problems
with a 1NF violation but it’s not necessary.
• The repeating group alone is a sufficient
problem to make this kind of design incorrect.
33
• As with all normal form violations, the
solution is to break out the separate
dependency, in this case the information
contained in the repeating group, into a
separate table.
• As stated in the assumptions, one salesperson
can sell many cars, but each car is sold only
once, so there is a 1-m relationship between
the two tables in the resulting design.
34
• The primary key of the table containing
salesperson information will have to be
embedded as a foreign key in the table
containing car information.
• Here is the corrected design:
• Salesperson(spno, spname)
• Carsale(vin, salesprice, spno f.k.)
35
3. Second Normal Form
• 2NF can be defined as follows:
• In a table with a concatenated primary key
field, there can be no stray dependencies that
originate in just part of the primary key field.
• In this example the underlying assumptions
are that the same car can come back to the lot
and be sold more than once.
36
• It can be sold by the same salesperson more
than once, but not on the same day.
• It can also be sold by different salespeople at
different times.
• Although unlikely, the design is made so that
two different salespeople could sell the same
car on the same date.
37
• The design doesn’t contain any information about
customers, but the scenario would be that one
customer brought the car back, and a different
salesperson sold it again.
• It seems unlikely that the same customer would buy
the same car twice, whether on the same date or
different dates.
• In summary, this design works for used car sales and
both the date and the salesperson information are
needed, along with the car information, to distinguish
between different sales.
38
• Here is the design that violates 2NF:
•
• Carsale(vin, spno, date, spname)
• A diagram with arrows illustrating this is given
on the next overhead
39
40
• This faulty design has insert, update, and delete
anomalies.
• Suppose a salesperson has not yet sold a car.
• In this case, it is not possible to insert information
about that salesperson.
• On the other hand, a salesperson may make many
sales.
• This means that the same information about that
salesperson would be stored in more than one record
in the table.
• This is redundancy.
41
• Not only is the redundancy itself wasteful, it
leads to the update anomaly.
• Suppose the salesperson’s name changes.
• Then it’s necessary to update multiple records
to reflect this fact, not just one.
42
• The delete anomaly is related to the insert
anomaly.
• Suppose that as part of the maintenance of
the database, on a yearly basis the sales table
is cleared.
• When you delete the last record containing a
sale by a particular salesperson, you not only
get rid of the sales record, you also lose the
salesperson’s name.
43
• As usual, the solution to the problem is to break
the stray dependency out into a table of its own.
• Each car sale has only one salesperson, but each
salesperson can be involved in many sales, so this
is a 1-m many relationship.
• The salesperson information is stored in a table
by itself, and the primary key of the salesperson
table is embedded as a foreign key in the car sale
table.
44
• Here is the corrected design:
• Salesperson(spno, spname)
• Carsale(vin, date, spno f.k.)
45
4. Third Normal Form
• 3NF can be defined as follows:
• There can be no stray dependencies from one non-key
field to another.
• In this example, for the sake of simplicity, it is assumed
that new cars are being sold and they can only be sold
once.
• Information about the customer is also recorded with
the sale.
• Each car can only be bought by one customer.
• It would be possible for a customer to buy more than
one car.
46
• Here is the design that violates 3NF:
• Carsale(vin, custno, custname, salesprice,
date)
• A diagram with arrows illustrating this is given
on the next overhead
47
48
• This design also has insert, update, and delete
anomalies and the pattern of the anomalies is
the same as in the previous example.
• They all stem from the presence of the stray
dependency in the design.
• If you have a potential customer who has not
yet bought a car, it is impossible to insert
information about that person.
49
• If a customer has bought more than one car, the
customer information is stored redundantly.
• In that case, if the customer’s name changes, it’s
necessary to change multiple records.
• Finally, if the sales table is cleared on a regular
basis, when you delete the last sales record for a
given customer, you not only get rid of the sales
record, you also lose the customer’s name.
50
• There is a situation that can arise in database
designs that appears to be a violation of 3NF,
but isn’t.
• The most common example of this situation is
a table which includes a city, state, and zip
code as part of an address.
• The postal service has divided up the country
into zones which are identified by zip codes.
51
• None of these zones cross city or state
boundaries.
• That means that a zip code determines the
city and state.
• It is not necessary for someone to break this
dependency out of their database design.
• The rule of thumb is that if you are not
responsible for maintaining the dependency,
then you can ignore it.
52
• The postal service has a table somewhere with
zip code as the primary key and all of the
descriptive fields about zip code that exist.
• The post office maintains this.
• A table not maintained by the postal service
can contain addresses with zip codes and
completely ignore the fact that there may in
reality be a dependency.
53
5. Boyce-Codd Normal Form
• A formal statement of BCNF would be
somewhat theoretical.
• Once understood, such a definition would
make it clear that BCNF is a summation of 1NF
through 3NF which covers one other case
which is not covered by the previous normal
forms.
• It is easier to explain BCNF by just presenting
this special case and explaining it.
54
• BCNF says that there can be no stray
dependencies from a non-key field to a field in
the key.
• For the purposes of this example suppose that
the same car can be sold by the same salesperson
more than once, but only one sale of that car is
possible per date.
• Suppose also that this dealership has a system for
assigning prospective customers to specific
salespeople, so that each salesperson is
associated with an exclusive list of clients.
55
• It would be normal to assume that this system
is implemented in some sort of table.
• Such a table is not shown here—it will
become part of the solution to the problem.
• The point now is to show the problem this
assumption leads to in the table of interest.
56
• Here is the design which violates BCNF:
• Carsale(vin, spno, date, custno)
• A diagram with arrows illustrating this is given
on the next overhead
57
58
• The anomalies in this design are analogous to
the anomalies in the previous designs.
• It is not possible to insert information about
the relationship between a given customer
and salesperson without a sales record which
matches them.
• If the given customer has bought from the
same salesperson many times, their
relationship is in multiple records.
59
• An update would require changes in multiple
records.
• Finally, if you’re down to the last record
containing information about a particular pair,
deleting the record would cause the
information to be lost.
• As usual, the solution to the problem is to
break out the stray dependency in a separate
table.
60
• Here is the corrected design:
• Carsale(vin, date, custno f.k.)
• Customer-Salesperson(custno, spno)
61
• The point is that if customers are uniquely
associated with a single salesperson,
• if the car sale record tells you who bought the
car,
• you can then look up the salesperson in the
Customer-Salesperson table.
62
• There is another aspect of BCNF that needs to
be explained.
• Consider a design which includes both a
university-generated student id number and a
social security number.
• It would seem to violate BCNF as explained
above:
63
• Here is the design which seems to violate
BCNF:
• Student(studentIDno, SSN, name)
• A diagram with arrows illustrating this is given
on the next overhead
64
65
• The additional part of BCNF is that if the stray
dependency results from another field which also
could have been chosen as a primary key for the
table, then it is not a normal form violation.
• In other words, both studentIDno and SSN are
valid, unique identifiers of students.
• You might want to record both.
• It is simply necessary to choose one of them as
the primary key of the field.
• The presence of the other one in the table does
no harm.
66
• Up through BCNF the normal forms can be
explained in terms of stray dependencies.
• An easy way to remember the requirements
for these normal forms is the following
statement:
• Every field in a table has to depend on the key,
the whole key, and nothing but the key.
67
• Because they are increasingly strict, the
normal forms can be thought of as nested.
• When checking a design, you begin with the
lowest normal form, make sure there are no
violations, and move on to the following ones.
• This is what makes the design process step-bystep.
68
• This idea can be represented using a Venn
diagram.
• The idea is that the set of designs which is in
some normal form is always a subset of those
designs which meet the conditions for a lower
normal form.
• A diagram of this is shown on the following
overhead
69
1NF
2NF
3NF
BCN
F…
70
6. Higher Normal Forms
• It was claimed earlier that there are only three
kinds of relationships: 1-1, 1-m, and m-n.
• This is not entirely true.
• There may be many-to-many-to-many
relationships (relationships between 3 different
types of entities at the same time, m-m-m),
• and in theory there is no reason why there can’t
be relationships among 4 or more different types
of entities at the same time.
• Fourth and fifth normal form, 4NF and 5NF, have
to do with cases like these.
71
• The presentation of 4NF will be done in the
opposite order to the presentation of the
earlier normal forms.
• First an example of a valid design will be given,
• and then a statement will be made about the
nature of a design that violates 4NF.
72
• Suppose that a given car can be sold more
than one time.
• In other words, you’re dealing in used cars.
• Suppose also that salespeople can sell more
than one different car, and customers can buy
more than one different car.
• This means that there are three 1-m
relationships.
73
• For the three base tables, Car, Salesperson,
and Customer, there could be one table in the
middle, Carsale, which brought all three
together.
• The idea can be represented using ER
modeling.
• This results in the star shaped design shown
on the next overhead:
74
Car
Salesperson
Carsale
Custome
r
75
• The relationships are captured by embedding
primary keys as foreign keys, and a valid
design can be given as follows:
• Car(vin, make, model, year)
• Salesperson(spno, spname)
• Customer(custno, custname)
• Carsale(vin, spno, custno, date)
76
• If someone tried to create a design which had
information on all three types of entities, cars,
salespeople, and customers, in the same table,
this would be a 4NF violation.
• No example of this is given.
• After working up through BCNF it should be clear
that when analyzing such a table you would find
more than one stray dependency.
• By removing each of the stray dependencies in
succession, you would solve the problem.
77
• 4NF violations like this are not common.
• Anyone familiar with database design principles
would not try to put three types of entities
together in a single table in the first place.
• On the other hand, people who are unfamiliar
with the rules sometimes think that they should
try and cram as much information into a single
table as possible.
• If that happens, then a violation such as this is
possible.
78
• You may have realized that there is another
way to relate all three of the base tables
together.
• What if each pair were related in an m-m
relationship?
• The idea can be represented using ER
modeling.
• This results in the design with a cycle in it
shown on the next overhead:
79
Salesperson
-Car
Car
CarCustomer
Salesperson
CustomerSalesperson
Customer
80
• The design could also be represented in this
way:
• Car(vin, make, model, year)
• Car-Customer(vin f.k., custno f.k.)
• Customer(custno, custname)
• Customer-Salesperson(custno f.k., spno f.k.)
• Salesperson(spno, spname)
• Salesperson-Car(spno f.k., vin f.k., date)
81
• This design does not violate 4NF like the
previous scenario of cramming all of the
information into a single table.
• The question is, does this design correctly
capture all of the assumptions stated above?
• In general, designs with cycles in them are
difficult to understand, and in the context of
4NF, the design with the cycle is not desirable,
while the design with the star is desirable.
82
• If you traced all of the links in the design with the cycle,
you would find that every car is connected to every
salesperson is connected to every customer.
• Put in business terms, if this design is supposed to
represent car sales, at one time or another every
salesperson has sold every car and every customer has
bought every car.
• This does not agree with the assumptions underlying
the star design, where the one table in the middle
captures information for that subset of possible sales
that actually occurred.
83
• The cyclical design leads to a brief
consideration of 5NF.
• The question now becomes, if every possible
pair of relationships actually does exist, which
design is better, the one with the star or the
one with the cycle?
• In this case, the design with the cycle is better.
84
• At this point the car sale example breaks
down.
• It is unrealistic to think that every car would
be sold by every salesperson and bought by
every customer.
• However, there are occasionally situations
where every entity in every base table is
related to every other entity in every other
base table.
85
• A general description of what 5NF says is the
following:
• A design is correct if two conditions are met:
• All real relationships between entities are
captured by the design;
• no false relationships between entities are
captured by the design.
86
7. Domains
• The highest normal form, domain-key normal
form (DKNF), is a theoretical statement of how
relationships are formed between tables.
• This form is not numbered, because the
theoretical statement encompasses all of the
other normal forms.
• This theoretical statement does not give you a
step-by-step procedure for determining whether
or not a design has violations and fixing them.
87
• DKNF is based on the idea of domains, which
have not been explained yet.
• Although this normal form is of little practical
use, the idea of domains is important for
correctly capturing the relationships between
tables, and will be explained.
88
• There is a preliminary point to be made before talking
about domains.
• When a table is created, a complete definition has to
tell the data type of each field.
• Some fields may hold numeric values, some may hold
strings of characters, some may hold dates, etc.
• If a field holds strings of characters, its length, or
maximum length also has to be stated.
• So, for example, a person’s last name may be defined
as containing a maximum of 24 characters.
89
• A social security number has 9 digits.
• Although the social security number is called a
number, it is never used numerically.
• There is no need to add, subtract, multiply, or
divide it, and a good design will prevent that.
• The simple way to do so is to define this field
as a character field containing 9 characters
where valid characters in this field are limited
to digits.
90
• A domain is a semantic concept.
• Most of the time, the name of a field is
descriptive of the kind of information it can hold.
• So for a “last name” field in a table containing
information about people, it is informally clear
what this means.
• Formally, the term domain refers to the whole set
of values that could appear as valid data in that
field.
91
• In general, a name would consist of a sequence of
letters of the alphabet.
• Names could come from any language or culture,
translated into the English alphabet.
• Some names do contain numeric information,
usually indicated with Roman numerals, for
example, John Smith I, John Smith II, etc.
• It would not be possible to come up with a
formula that mathematically defined all possible
values.
• Still, the general idea is clear.
92
• The idea of domains can be further clarified by
giving examples of cases where fields are not on
the same domain.
• A person’s last name field may be defined as 24
characters.
• A city field could also be defined the same way.
• There may be cases where a person’s name is the
same as the name of a city.
• There is a city of Lincoln in England.
93
• Abraham Lincoln’s ancestors probably came
from that area.
• There is also a city of Lincoln in Nebraska,
which was named after Abraham Lincoln.
• Even though there may be an intersection of
the values in the city and last name fields,
conceptually, city name and person last name
are two distinct domains.
94
• Another example of two fields that are not on the
same domain would be social security number and zip
code.
• A full zip code consists of 5 plus 4, or 9 digits, like a
social security number.
• Both might be defined as character fields containing 9
characters.
• However, social security numbers and zip codes have
nothing in common.
• There are doubtless cases where someone’s social
security number matches some zip code somewhere in
the country, but this is purely coincidental.
95
• Up to this point, the relationship between tables has
been explained by the process of embedding the
primary key of one table as a foreign key in another.
• This would mean that when defining the second table,
it would have a field with a suitable name, on the same
domain, that is defined to hold the same type of data
as the first field.
• This is good as far as it goes, but there can be other
relationships between tables which are the result of
domains, but not the direct result of embedding keys.
96
• Going back to one of the earlier examples, a
database may distinguish between mothers
and children as different kinds of entities, and
store them in different tables.
• Each of these tables may have social security
number fields and last name fields.
• You would not expect a mother and child to
have the same social security number.
• This would be a mistake.
97
• However, in most cases you would expect
mothers and children to have the same last
names.
• The idea is that a social security number is a
social security number, regardless of what
table it appears in.
• The idea of a social security number defines a
domain.
98
• Similarly, if the last name fields in both the
mother and child tables were defined as
containing 24 characters, a last name is a last
name, regardless of what table it appears in.
• The idea of a last name defines a domain.
• The last name field in both tables has the
same meaning even though they are not a
primary key, foreign key pair.
99
• As you can see, a domain is a cross-table
concept.
• Any given database may contain many
different fields in its tables, but the database
will contain fewer domains because various
fields are on the same domain.
• The idea of a domain is more fundamental
than the idea of a field.
100
• A field is just a manifestation of a domain.
• In very general terms, DKNF says that a
database is correctly designed if the
dependencies among the tables are the result
of correct choices of domains for all fields, in
particular the domains of the primary and
foreign keys of the tables.
101
8. Nulls and Integrity
• This section is a review of material that was
explained earlier.
• It will not be gone over in class.
• However, it is provided below in its entirety in
case you want to read the overheads yourself.
102
8. Nulls and Integrity
• The term “null” refers to the idea that a
particular field in a particular record may not
have data in it.
• In general, this is permissible.
• Cases often arise in practice where the
information doesn’t exist or isn’t known.
103
• It would be impractical to insist that all fields
always contain data.
• If that restriction were imposed, people would
get around it by putting in bogus values for
information that didn’t exist or wasn’t known.
• However, filling a database with bogus values
is not a very good idea.
104
• When a database management system supports null
values in fields, it’s important to understand what this
does not mean.
• It does not mean that the fields contain the sequence
of characters “null”.
• It also does not mean that the field contains invisible
blanks.
• Blank spaces themselves are a form of character.
• What it means is that there is absolutely nothing in the
field, and the database management system is able to
recognize fields that are in that state.
105
• The term integrity in database management systems refers
to the validity and consistency of data entered into a
database.
• The phrase “entity integrity” is the formal expression of a
requirement that was stated informally earlier.
• Entity integrity puts the following requirement on a
correctly implemented database:
• Every table has a primary key field, and no part of the
primary key field can be null for any record in the table.
• Clearly, if all or part of a key were allowed to be null, that
would defeat the purpose that the primary key field be the
unique identifier for every record in the table.
106
• As seen in the long discussion of normal forms, it is the
primary key to foreign key relationships that support the
interconnection between related entities that have been
separated into different tables by the design process.
• Once this has been done, it is critically important that the
data maintaining the relationships be valid and consistent.
• The phrase “referential integrity” has the following
meaning:
• Every value that appears in a foreign key field also has to
appear as a value in the corresponding primary key field.
• This can also be stated negatively:
• There can be no foreign key value that does not have a
corresponding primary key value.
107
• The meaning and importance of referential
integrity can be most easily explained with a
small example showing a violation of it.
• Consider the tables shown on the following
overhead:
108
mid
1
2
Mother
Name
Lily
Matilda
kid
a
b
c
Child
name
Ned
Ann
June
mid
3
2
109
• Child b, Ann, is shown as having a mother with
mid equal to 3.
• There is no such mother in the Mother table.
This is literally nonsense.
• There is no sense in which this can be correct
and this is what referential integrity forbids.
110
• This example also illustrates two other things,
which are related to “non-existent” values.
• Observe that mother 1, Lily does not have any
matching records in the Child table.
• This does not violate referential integrity.
• It suggests that the Mother table is misnamed,
and should be named the Woman table, but it is
reasonable to think that you might be recording
information about women and children and some
women will not have children.
111
• The other thing visible in the table is that child
c, June, does not have a mother listed.
• In other words, the foreign key field is null.
• This also does not violate referential integrity.
• As with null in any situation, it may mean that
the mother is not known.
112
• Nobody literally doesn’t have a mother, but if
the woman table only records information on
living women, for example, then for an
orphan, the mother “wouldn’t exist”.
• It is unlikely that you would rename the table
“Children and Orphans”—but the idea is that
the null value is allowed and this in some
sense affects the meaning of what kinds of
entities are entered into the table.
113
• Referential integrity leads to one last
consideration.
• The idea behind normalization was to get the
stray dependency out of one table and break it
into two.
• The problem with stray dependencies was
redundancy and anomalies.
• By breaking a design into two tables with a
primary to foreign key pair, you introduce
interrelationship constraints.
114
• Put simply, the question is this: What do you
do with foreign key values if the
corresponding primary key values in another
table are deleted or updated?
• A fully-featured database management system
will enforce referential integrity constraints.
• The default settings for these constraints are
summarized in these two phrases:
• On delete, restrict; on update, cascade.
115
• If these defaults are implemented, this is a
fuller explanation of what they mean in terms
of the concrete mother and child example:
• On delete, restrict:
• No mother record can be deleted if she has
corresponding child records in the other table.
• To allow the deletion would lead to a
referential integrity violation.
116
• On update, cascade:
• If the primary key value of a mother record is
updated, if she has corresponding child
records in the other table, the foreign key
values in those records is automatically
updated to reflect the change.
• This problem arises less frequently because
once a primary key value is assigned to an
entity, it is rarely changed.
117
The End
118

Slide 1

Transcript Slide 1

Directory