Learning-use

Download Report

Transcript Learning-use

Expert Systems
What are Expert Systems?
“An expert systems is a computer system that operates by
applying an inference mechanism to a body of specialist
expertise represented in the form of knowledge that
manipulates this knowledge to perform efficient and
effective problem solving in a narrow problem domain.”
Emphasis is on Knowledge not Methods
1. Most difficult and interesting problems do not have
tractable algorithmic solutions
2. Human experts achieve outstanding performance because
they are knowledgeable
3. Knowledge is a scarce (and therefore, valuable) resource
It is better to call these systems:
Knowledge-Based Systems
Fundamental Concepts
• Knowledge consists of descriptions, relationships, and
procedures in some domain
• Knowledge takes many forms and is often hard to
categorize
Changing Focus in AI
HIGH
P
r
o
g
r
a
m
Find general methods for problem-solving
and use them to create general-purpose
programs
P
o
w
e
r
LOW
1960
1970
1980 Time Frame
Changing Focus in AI
HIGH
P
r
o
g
r
a
m
Find general methods to improve
representation and search and use them
to create specialized programs
P
o
w
e
r
LOW
1960
1970
1980 Time Frame
Changing Focus in AI
HIGH
P
r
o
g
r
a
m
Use extensive, high-quality,
specific knowledge about
some narrow problem area to
create very specialized programs
P
o
w
e
r
LOW
1960
1970
1980 Time Frame
Heuristics vs. Algorithms
It is great if we have algorithms but often a heuristic will
work almost as well at much less cost
Prevent Skyjacking
Algorithm
Heuristic
Why should we not use human expertise?
Human Expertise
Artificial Expertise
Perishable
Permanent
Difficult to transfer
Easy to transfer
Difficult to document
Easy to document
Unpredictable
Consistent
Expensive
Affordable
Why should we keep using humans?
Human Expertise
Artificial Expertise
Creative
Uninspired
Adaptive
Needs to be told
Sensory experience
Symbolic input
Broad focus
Narrow focus
Commonsense knowledge Technical knowledge
•
knows certain things are true
•
while others are not
•
knows limits of knowledge
Knowledge Engineering
The process of building an expert system
Expert System
Knowledge Engineer
Domain Expert
Views of an Expert System:
End-user
Intelligent
Program
User
User
Interface
Data Base
Views of an Expert System:
Knowledge Engineer
Rules, Semantic
Networks, Frames,
and Facts
General Problem
Solving Knowledge:
Intelligent
Program
Knowledge
Base
Inference
Engine
Forms of Inference
• Process of drawing conclusions based on facts known or
thought to be true
• We commonly use three different type:
 Deduction
 Abduction
 Induction
Deduction
Reasoning from a known principle to an unknown, from the
general to the specific, or from a premise to a logical conclusion
Modus Ponens
Modus Tolens
> rules, theorems, models
Deduction - Example
Suppose that we know:
" X, swimming(X) -> wet(X)
If we are now told:
swimming(andy)
Then we can derive (using Modus Ponens):
wet(andy)
Abduction
Used when generating explanations
Is an unsound form of reasoning
If we know:
Can we state:
" X, swimming(X) -> wet(X)
wet(alex)
swimming(alex) ?
Induction
Reasoning from particular facts or individual cases to a general
conclusion
This is the basis of scientific discovery
Key technique in machine learning and knowledge acquisition
IF
THEN
> generalization, observation
Example - Family Relationships
Basic domain facts:
child_of(alex, nicole).
male(alex).
male(phillip).
male(nicholas).
child_of(alina, nicole).
child_of(nicholas, leah).
child_of(phillip, leah).
child_of(melanie, cathy).
child_of(leslie, cathy).
child_of(sarah, cathy).
child_of(angela, cathy).
female(alina).
female(leah).
female(nicole).
female(angela).
female(sarah).
female(leslie).
female(melanie).
female(cathy).
Example - Family Relationships
Rules:
sisters(nicole, leah).
sisters(X, Z) :- child_of(X, Y),
child_of(Z, Y),
female(X),
female(Z).
brothers(X, Z) :- child_of(X, Y),
child_of(Z, Y),
male(X),
male(Z).
(cont.)
Machine Learning
(Induction from Examples)
What is learning?
• “changes in a system that enable a system to do
the same task more efficiently the next time” -Herbert Simon
• “constructing or modifying representations of
what is being experienced” -- Ryszard Michalski
• “making useful changes in our minds” -- Marvin
Minsky
What is learning?
• Shorter Oxford Dictionary defines learning as:
… to get knowledge of (a subject) or skill (in art,
etc) by study, experience or teaching. Also to
commit to memory …
• so learning involves
acquiring NEW knowledge
improving the use of EXISTING knowledge
• i.e., performance
Why learn?
• understand and improve human learning
learn to teach
• discover new things
data mining
• fill in skeletal information about a domain
incorporate new information in real time
make systems less “finicky” or “brittle” by making
them better able to generalize
Why learn?
• learning is considered to be a KEY element of AI
• any autonomous system MUST be able to learn
and adapt
• sometimes it is easier to `teach’ or `explain’ than
to `program’
 e.g., consider the difference in explaining tic-tac-toe and writing a
program to play the game
 e.g., consider the difference in using a few example pictures to
explain the difference between a lion and a tiger, and getting a
computer to do likewise
• any system that makes the same mistake twice is
pretty STUPID
 all systems (e.g., o/s, database) should have some integral learning
component
State of the Art
• modest achievements
• mostly isolated solutions to date
• but can
 assist automatic knowledge acquisition
 extract relevant knowledge from very large knowledge bases
 abstract higher-level concepts out of data sets
 … etc.
• recent trend to integrated systems
 combine various learning methods
• induction, deduction, analogy, abduction
• symbolic ML, neural networks, genetic algorithms, ...
Components of a learning system
Evaluating Performance
• several possible criteria
predictive accuracy of classifier
speed of learner
speed of classifier
space requirements
• Most common criterion is Predictive Accuracy
Symbolic vs. Numeric
• ML has traditionally concerned itself with symbolic
representations
 e.g., [color
= orange]
rather than
[wavelength = 600nm]
• concepts are inherently symbolic
• required for human understanding and recognition
 we think in linguistic terms (i.e., symbols) and not in numbers
 e.g.,
bird := has-wings & flies & has-beak & lays-eggs & ...
• the relationship between symbolic & numerical
representations is still an open debate
Learning as Search
• to learn a concept description, need to search through a
`hypothesis’ space
 the space of possible concept descriptions
• need a language to describe the concepts
 the choice of language defines a large (possibly infinite) set of
potential concept descriptions (i.e., rules)
• the task of the learning algorithm is to search this space in
an efficient manner
• the difficulty is how to ignore the vast majority of invalid
descriptions without missing the useful one(s)
• usually requires heuristic methods to prune the search
Summary
• Decision Trees are widely used
easy to understand rationale
can out-perform humans
fast, simple to implement
handles noisy data well
• Weaknesses
univariate (uses only 1 variable at a time)
batch (non-incremental)
Induction systems
The power behind an intelligent system is knowledge.
We can trace the system success or failure to the quality of its
knowledge.
Difficult task:
1. Extracting the knowledge.
2. Encoding the knowledge.
3. Inability to express the knowledge formally.
Induction
inducing general rules from knowledge contained in a finite set
of examples.
Induction is the process of reasoning from a given set of facts
to conclude general principles or rules.
Induction looks for patterns in available information to infer
reasonable conclusions.
Induction as search
Induction can be viewed as a search through a problem space for a
solution to a problem. The problem space is composed of the
problem’s major concepts linked together by an inductive process
that uses examples of the problem.
Induction
The choice of representation for the desired function is
probably the most important issue. As well as affecting the
nature of the algorithm, it can affect whether the problem is
feasible at all. Is the desired function representable in the
representation language?
An example is described by the values of the attributes and
the value of the goal predicate. We call the value of the goal
predicate the classification of the example. The complete set
of examples is called the training set.
Induction - first example
Determine an appropriate gift on the basis of available money and the
person’s age. Money and age will represent our decision factors
(problem attributes).
Money
Age
Gift
Much
Adult
Car
Much
Child
Computer
Little
Adult
Toaster
Little
Child
Calculator
Age
Money
Age
Age
Money
Money
Induction - decision trees
A decision tree takes as input an object or situation described by
a set of properties, and outputs a yes/no “decision.” Decision
trees therefore represent Boolean functions.
Each internal node in the tree corresponds to a test of the value
of one of the properties, and the branches from the node are
labeled with the possible values of the test. Each leaf node in the
tree specifies the Boolean value to be returned if that leaf is
reached.
Induction - decision trees
Decision trees are implicitly limited to talking about a single object.
That is, the decision tree language is essentially propositional, with
each attribute test being a proposition. We cannot use decision
trees to represent tests that refer to two or more different objects.
Decision trees are fully expressive within the class of propositional
languages, that is, any Boolean function can be written as a
decision tree. Have each row in the truth table for the function
correspond to a path in the tree. The truth table is exponentially
large in the number of attributes.
Supervised Concept Learning
• given a training set of positive and negative examples of a
concept
– construct a description that will accurately classify future
examples.
– Learn some good estimate of function f given a training
set:
{ (x1,y1), (x2,y2), . . . (xn,yn)}
where each yi is either + (positive) or - (negative)
• inductive learning generalizes from specific facts
– cannot be proven true, but can be proven false
• falsity preserving
– is like searching an Hypothesis Space H of possible f
functions
– bias allows us to pick which h is preferable
– need to define a metric for comparing f functions to find
the best
Inductive learning framework
• raw input is a feature vector, x, that describes the relevant
attributes of an example
• each x is a list of n (attribute, value) pairs
– x = (person=Sue, major=CS, age=Young, Gender=F)
• attributes have discrete values
– all examples have all attributes.
• each example is a point in n-dimensional feature space
• maintain a library of previous cases
• when a new problem arises
– find the most similar case(s) in the library
– adapt the similar cases to solving the current problem
Learning Decision Trees
• Goal: Build a decision tree for classifying examples as
positive or negative instances of a concept
• Supervised
– batch processing of training examples
– using a preference bias
Induction - decision trees - second example
Induction - decision trees - second example
•If there are some positive and some negative examples, then
choose the best attribute to split them.
•If all the remaining examples are positive (or all negative), then
we are done: we can answer Yes or No.
•If there are no examples left, it means that no such example has
been observed, and we return a default value calculated from the
majority classification at the node’s parent.
•If there are no attributes left, but both positive and negative
examples, we have a problem. It means that these examples have
exactly the same description, but different classifications. This
happens when some of the data are incorrect; we say there is
noise in the data. It also happens when the attributes do not give
enough information to fully describe the situation, or when the
domain is truly nondeterministic.
Induction - decision trees - choice of attributes
Information theory
Mathematical model for choosing the best attribute and at
methods for dealing with noise in the data.
The scheme used in decision tree learning for selecting
attributes is designed to minimize the depth of the final tree.
The idea is to pick the attribute that goes as far as possible
toward providing an exact classification of the examples. A
perfect attribute divides the examples into sets that are all
positive or all negative.
The measure should have its maximum value when the attribute
is perfect and its minimum value when the attribute is of no use
at all.
Induction - third example
Example
Height
Eyes
Hair
Class
E1
tall
blue
dark
1
E2
short
blue
dark
1
E3
tall
blue
blond
2
E4
tall
blue
red
2
E5
tall
brown blond
1
E6
short
blue
blond
2
E7
short
brown blond
1
E8
tall
brown dark
1
Induction - example
Example
Height
Hair
Eyes
Class
E1
tall
dark
blue
1
E2
short
dark
blue
1
E3
tall
blond
blue
2
E4
tall
red
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
E8
tall
dark
brown
1
Inform at ion in t he 8 exam ples:
c
N
N
Ent ropy(I) =   i log 2 i ,
N
N
i 1
Ni – num ber of exam ples in class i
N – t ot al num ber of exam ples (t raining)
Ent ropy(I) = -5/ 8 log2 (5/ 8) – 3/ 8 log2 (3/ 8) =
0.954 bit
Attribute test:
Select an attribute and calculate the information gain
(entropy) for it.
J
Entropy(I,AK) = 
nkj
j 1
N
entropy(I , AK ,J ) =
nkj  nkj (i )
nkj (i ) 
=
log 2


nkj 
j 1i 1 N 
 nkj
J c
attribute AK , k = 1, 2, …, K
The examples are divided into J subsets, where J is the
number of values the feature may take.
nkj(i) - examples from subset j belonging to class i
nkj – total number of examples in subset j
Induction - example
Example
Height
Hair
Eyes
Class
E1
tall
dark
blue
1
E2
short
dark
blue
1
E3
tall
blond
blue
2
E4
tall
red
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
E8
tall
dark
brown
1
entropy(I, hair) =
3 3
3 0
0

log

log
2
2 

8 3
3 3
3
1 0
0 1
1

log

log

2
2


8 1
1 1
1
4 2
2 2
2

log

log
2
2  =0.5 bit
8  4
4 4
4
Induction - example
Example
Height
Hair
Eyes
Class
E1
tall
dark
blue
1
E2
short
dark
blue
1
E3
tall
blond
blue
2
E4
tall
red
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
E8
tall
dark
brown
1
Induction - example
Example
Height
Hair
Eyes
Class
E1
tall
dark
blue
1
E2
short
dark
blue
1
E3
tall
blond
blue
2
E4
tall
red
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
E8
tall
dark
brown
1
Induction - example
Information gain  max{entropy (I) – entropy (I, AK)
IG (hair) = 0.954 – 0.5 = 0.454 bit
IG (height) = 0.954 – 0.951 = 0.003 bit
IG (eyes) = 0.954 – 0.607 = 0.347 bit
hair
blond
red
dark
E4 – class 2
E1 – class 1
E2 – class 1
E8 – class 1
E3 – class 2
E5 – class 1
E6 – class 2
E7 – class 1
Induction - example
Example
Height
Hair
Eyes
Class
E3
tall
blond
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
entropy(I, height) =
entropy(I, eyes) =
2 1
1 1
1
 log 2  log 2  

4 2
2 2
2
2 1
1 1
1
 log 2  log 2  = 0.302

4 2
2 2
2
2 0
0 2
2
 log 2  log 2  

4 2
2 2
2
2 2
2 0
0
 log 2  log 2  = 0

4 2
2 2
2
Induction - example
Example
Height
Hair
Eyes
Class
E3
tall
blond
blue
2
E5
tall
blond
brown
1
E6
short
blond
blue
2
E7
short
blond
brown
1
entropy(I, height) =
entropy(I, eyes) =
2 1
1 1
1
 log 2  log 2  

4 2
2 2
2
2 1
1 1
1
 log 2  log 2  = 0.302

4 2
2 2
2
2 0
0 2
2
 log 2  log 2  

4 2
2 2
2
2 2
2 0
0
 log 2  log 2  = 0

4 2
2 2
2
hair
blond
red
dark
E4 – class 2
E3 – class 2
E5 – class 1
E6 – class 2
E7 – class 1
E1 – class 1
E2 – class 1
E8 – class 1
Eyes
blue
brown
E3 – class 2
E6 – class 2
E5 – class 1
E7 – class 1
Induction systems
Determine objective - a search through a decision tree will reach one
of a finite set of decisions on the basis of the path taken through the
tree.
Determine decision factors - represent
decision tree.
the attribute nodes of the
Determine decision factor values - represent the attribute values of
the decision tree.
Determine solutions - list of final decisions that the system can make
- the leaf nodes in the tree.
Form example set.
Create decision tree.
Test the system.
Revise the system.
Induction systems - example
Football game prediction system
Predict the outcome of a football game (will our team win or lose).
Decision factors - location, weather, team record, opponent record.
Decision factor values Location
Weather
Own Record
Opponent Record
Home
Away
Rain
Cold
Moderate
Hot
Poor
Average
Good
Poor
Average
Good
Solutions - win or lose
Induction systems - example (cont’d)
Examples -
Week
Locat.
Weath Own r
Opp. r
Own
1
Home
Hot
Good
Good
Win
2
Home
Rain
Good
Averg
Win
3
Away
Moder.
Good
Averg
Loss
4
Away
Hot
Good
Poor
Win
5
Home
Cold
Good
Good
Loss
6
Away
Hot
Averg.
Averg.
Loss
7
Home
Moder.
Averg.
Good
Loss
8
Away
Cold
Poor
Averg.
Win
Induction systems - example (cont’d)
Decision tree -
rain
Win
Weather
Loss
Loss
hot
cold
home
moderate
Location
away
Win
Own rec
poor
good
No-data
average
Loss
Win
Test the system - predict the future games. Get the values for the
decision factors for the upcoming game and see on which team
to bet.
Induction systems - example (test)
Sensitivity study - Location
Induction systems - pros. and cons.
Discovers rules from examples - potential unknown rules could be
induced.
Avoids knowledge elicitation problems - system knowledge can be
acquired through past examples.
Can produce new knowledge.
Can uncover critical decision factors.
Can eliminate irrelevant decision factors.
Can uncover contradictions.
Difficult to choose good decision factors.
Difficult to understand rules.
Applicable only for classification problems.
Induction systems - implemented
AQ11 - diagnosing soybean diseases. Identifies 15 different
diseases. The knowledge was derived from 630 examples and used
35 decision rules.
Willard - forecasting thunderstorms. 140 examples, hierarchy of 30
modules, each with a decision tree.
Rulemaster - detecting signs of transformer faults.
Stock market predictions.