Transcript Lecture 11

Building knowledge bases
The fundamental problem of understanding intelligence is not the identification
of a few powerful techniques, but rather the question of how to represent large
amounts of knowledge in a fashion that permits their effective use and
interaction. To do this, we must first say how to acquire this knowledge?
Think knowledge = codified experience. This way we can view knowledge
as a transportable substance, manipulated by a process called knowledge
acquisition (KA).
KA consists of two interrelated activities:
1 Eliciting knowledge from some knowledge source (domain expert, data
bases, textbooks, etc.). This activity is called knowledge engineering.
2 Representation of the elicited knowledge in some formal language,
testing its validity, and its subsequent refinement. This activity is called
ontological engineering.
An overview of the KA process
Reformulation
Reformulation
Redesign
Refinement
Identification of
participants,
problem characteristics
and goals
Conceptualization: find
key concepts
and relations
already
mentioned
during the
identification
stage
Formalization:
mapping key
concepts into
more formal
representation
Implementation
formulate
rules to
embody
knowledge
Testing:
validate rules
representing
knowledge
Knowledge engineer is the main player in the whole process. To do her job, she
must (1) learn enough about the problem domain to be able to recognize
important objects and relations, (2) know the KR language to correctly encode
knowledge, and (3) know enough about the inference procedure to keep track
about the efficiency of the knowledge processing.
Modes of knowledge acquisition

“Expert -- knowledge engineer” mode
Domain expert
Knowledge Engineer
Knowledge base
Inference engine

“Expert -- Intelligent Editing Program” mode
Domain expert
Intelligent Editing Program
Knowledge base
Inference engine
Modes of knowledge acquisition (cont.)

“Data -- Induction Program” mode
Data bases
Induction Program
Knowledge base
Inference engine

“Textbooks -- Text Understanding Program” mode
Textbooks
Text Understanding Program
Knowledge base
Inference engine
Main problems is accessing expert knowledge




Expertise may not be expressible in language (D. Michie’s “cheese diagnosis”
example, 1982).
Expertise may not be understandable (by KE), even when it can be expressed
in language.
Even if the expertise can be verbally expressed and understood by the KE, it
might be impossible to convert a verbal comprehension of a skill into a skilled
performance.
Expertise communicated by the expert may be irrelevant, incomplete or even
incorrect.
KA is the bottleneck in KBS design. It may take years to build even a
moderately large knowledge base.
Types of knowledge: shallow knowledge
Shallow knowledge is represented in terms of heuristic rules, which map data
abstractions (such as symptoms in diagnostic domains) and solution abstractions
(such as diagnoses). In many domains, PL is the language of choice for
representing shallow knowledge.
Example. This is one of MYCIN top-level goal rules:
If there is an organism which requires therapy, and
consideration has been given to any other organisms requiring therapy
Then compile a list of possible therapies, and determine the best one on the
list
Shallow knowledge does not reflect causal mechanisms
underlying the relationship between symptoms and diagnoses;
MYCIN-like rules typically reflect empirical associations derived
from experience.
Acquiring and implementing shallow knowledge
Consider the following example (adapted from Gonzalez and Dankel “The
Engineering of KBS”). We want to build an expert system to advise a motorist
who is not mechanically inclined about his car’s cooling system malfunction.
Assume that the KE serves also as a domain expert.
The first issue which the KE will address is to compile a list of all possible
problems with the cooling system. The possible outputs expected in this
domain are:
–
–
–
–
–
radiator leaks
broken fan belt
defective water pump
broken water hose
frozen coolant
Example (cont.)
Next, the KE must identify possible inputs to discover these problems, namely:
–
–
–
–
temperature indicator on the dashboard
weather conditions
spots of coolant underneath engine compartment
steam coming out of the hood (the presence of a hissing sound)
Finally, the KE must determine the relationships between the inputs and the
outputs, which may require some intermediate states. Here, these relationships
are translated into the following heuristic rules:
Rule1: The presence of a “hot” reading on the dashboard implies that at least one
problem exists.
Rule 2: The absence of a “hot” reading on the dashboard does not necessarily imply
absence of a problem.
Rule 3: A large pool of coolant under the engine compartment can indicate radiator
leaks, broken hoses, and/or a defective water pump.
Rule 4: A relatively small pool of coolant under the engine compartment usually implies
a defective water pump.
Example (cont.)
Rule 5: Absence of a pool of coolant under the engine compartment, and a “hot”
reading on the dashboard implies a broken fan belt.
Rule 6: An ambient temperature below 10 degrees Fahrenheit implies that the coolant
is frozen.
Rule 7: The presence of a hissing sound accompanied by a small pool of coolant
under the engine compartment indicates a radiator and/or hose leak.
In PL, to represent this domain we need the following vocabulary:
A: “hot” reading on the dashboard
B: at least one problem with the cooling system exists
C: there is a large pool of coolant under the engine compartment
D: radiator leaks
E: broken water hose
F: defective water pump
H: there is a relatively small pool of coolant under the engine compartment
J: broken fan belt
I: an ambient temperature is below 10 degrees Fahrenheit
G: frozen coolant
K: a hissing sound is present
Example (cont.)
Rules 1 to 7 can now be represented as follows:
Rule1: A => B
Rule 2: not B => not A
Rule 3: C => D v E v F
Rule 4: H => F
Rule 5: not (C & H) & A => J
Rule 6: I => G
Rule 7: K & H => D v E
Further refinements of this set of rules may be required to improve the
performance adequacy of the KBS.
Deep knowledge
Deep knowledge reflects causal mechanisms underlying the relationships
between the objects in the domain. To represent such knowledge, we need at
least FOL.
Example: The electronic circuits domain (AIMA, p.262 -- 266).
Build a KB which can answer queries about digital circuits, such as:
– What combinations of inputs would cause the first output to be off, and
the second output to be on?
– What are the possible sets of values of all the terminals for the circuit?
Note that we only want to analyze circuits, not to diagnose faults. This is
why we can limit our ontology to include only gates, and ignore wires.
Designing the electronic circuits KB: Problem
identification stage
1. What problems the KBS is intended
to solve?
Answer: Verification of circuits to see if they
match their specifications.
2. What data will be used?
Answer: Descriptions of specific instances
of circuits.
3. What are important terms and
relations?
Answer: Circuits, gates, terminals, signals,
gate types (and, or, xor, not)
4. What does a solution look like?
Answer: Combinations of signals on
designated terminals including a
complete input / output table of signals
for the circuit.
Answer: General knowledge about the flow
of signals, connectivity of circuit
components, and the behavior of gates.
5. What is the nature of knowledge
underlying the solution?
Designing the electronic circuits KB:
Conceptualization stage
1. What types of data are to be
considered?
Answer: Objects (such as circuits,
terminals, gates, signal values),
functions (such as types of gates),
instances (such as gate1, gate2,
gate1input1), predicates (such as
connected which takes two terminals as
arguments.
2. What are the general dependencies
in the circuit domain?
Answer: An example dependency is the
following. If two terminals are
connected, then they have the same
signal.
Designing the electronic circuits KB:
Formalization stage
Mapping the identified domain entities into FOL constants, functions and
predicates.
Examples:
– Gates are named with constants x1, x2,….
– Terminals are represented by means of the IN and OUT functions,
for example OUT(1,x1), IN(1,x1), IN(2,x1), …
– Types of gates are represented by function TYPE, for example
TYPE(x1), TYPE(x2)
– Signal values are represented by objects On and Off, and the
function SIGNAL which takes a terminal as argument and denotes a
signal value.
Designing the electronic circuits KB:
Implementation stage
1. Encoding dependencies into rules. Example rules are the following:
“If two terminals are connected, then they have the same signal.”
 t1, t2 Connected(t1, t2) => (Signal(t1) = Signal(t2))
"An AND gate's output is Off if and only if (iff) any of its inputs is Off."
 g [ (Type(g) = AND => Signal (Out(1, g)) = Off ) <=>
<=>  n Signal (In(n, g)) = Off ]
"A NOT gate's output is different from its input."
 g [ (Type(g) = NOT) => (Signal (Out(1, g)) != Signal (In(1, g)) ]
Designing the electronic circuits KB:
Implementation stage (contd.)
2. Encoding specific instances. Examples:
Type(x1) = XOR
Type(a1) = AND
Connected (Out(1, x1), In(1, x2))
Note that the ontology of the electronic circuits domain is a very simple specialpurpose ontology. If we want to represent a general-purpose ontology, we must
represent a large variety of knowledge such as structured objects, time, space,
beliefs, processes, which is a very difficult task. An attempt to build a general
ontology was one of the goals of the CYC project (if interested, see D. Lenat
and R. Guha “Building Large Knowledge Bases: Representation and Inference
in the CYC Project”, Addison-Wesley, 1991 available in the library).
Designing the electronic circuits KB: An
example query
A possible query is the following one:
"What combination of inputs would cause the first output of C1 to be Off,
and the second to be On?"
 i1, i2, i3 (Signal(In (1, C1)) = i1) & (Signal(In (2, C1)) = i2) &
& (Signal(In (3, C1)) = i3) & (Signal(Out (1, C1)) = Off) &
& (Signal(Out (2, C1)) = On)
The expected answer is:
(i1 = On & i2 = On & i3 = Off) V (i1 = On & i2 = Off & i3 = On) V
V (i1 = Off & i2 = On & i3 = On)