Artificial Intelligence CS 165A Tuesday, November 20, 2007 Knowledge Representation (Ch 10) Uncertainty (Ch 13)
Download
Report
Transcript Artificial Intelligence CS 165A Tuesday, November 20, 2007 Knowledge Representation (Ch 10) Uncertainty (Ch 13)
Artificial Intelligence
CS 165A
Tuesday, November 20, 2007
Knowledge Representation (Ch 10)
Uncertainty (Ch 13)
Notes
• HW #4 due by noon tomorrow
• Reminder: Final exam December 14, 4-7pm
– Review in class on Dec. 6th
2
Review
Situation Calculus – actions, events
• “Situation Calculus” is a way of describing change over
time in first-order logic
– Fluents: Functions or predicates that can vary over time have an
extra argument, Si (the situation argument)
Predicate(args, Si)
Location of an agent, aliveness, changing properties, ...
– The Result function is used to represent change from one situation
to another resulting from an action (or action sequence)
Result(GoForward, Si) = Sj
“Sj is the situation that results from the action GoForward
applied to situation Si
Result() indicates the relationship between situations
3
Review
Situation Calculus
4
Represents the world in different “situations” and the relationship between situations
Review
Situation Calculus
5
Represents the world in different “situations” and the relationship between situations
Review
Examples
• How would you interpret the following sentences in FirstOrder Logic using situation calculus?
x, s Studying(x, s) Failed(x, Result(TakeTest, s))
If you’re studying and then you take the test, you will fail.
(or) Studying a subject implies that you will fail the test for that subject.
x, s TurnedOn(x, s) LightSwitch(x) TurnedOff(x,
Result(FlipSwitch, s))
If you flip the light switch when it is turned on, it will then be turned off.
6
There are other ways to deal with time
• Event calculus
– Based on points in time rather than situations
– Designed to allow reasoning over periods of time
Can represent actions with duration, overlapping actions, etc.
• Generalized events
– Parts of a general “space-time chunk”
• Processes
– Not just discrete events
• Intervals
– Moments and durations of time
• Objects with state fluents
– Not just events, but objects can also have time properties
7
Event calculus relations
• Initiates(e, f, t)
– Event e at time t causes fluent f to become true
• Terminates(e, f, t)
– Event e at time t causes fluent f to no longer be true
• Happens(e, t)
– Event e happens at time t
• Clipped(f, t1, t2)
– f is terminated by some event sometime between t1 and t2
8
Generalized events
• An ontology of time that allows for reasoning about
various temporal events, subevents, durations, processes,
intervals, etc.
Space-time chunk
Australia
time
9
Time interval predicates
Ex:
After(ReignOf(ElizabethII), ReignOf(GeorgeVI))
Overlap(Fifties, ReignOf(Elvis))
Start(Fifties) = Start(AD1950)
Meet(Fifties, Sixties)
10
Objects with state fluents
President(USA)
11
Knowledge representation
• Chapter 10 covers many topics in knowledge
representation, many of which are important to real,
sophisticated AI reasoning systems
– We’re only scratching the surface of this topic
– Best covered in depth in an advanced AI course and in context of
particular AI problems
– Read through the Internet shopping world example in 10.5
• Now we move on to probabilistic reasoning, a different
way of representing and manipulating knowledge
– Chapters 13 and 14
12
Quick Review of Probability
From here on we will assume that you know this…
13
Probability notation and notes
• Probabilities of propositions
– P(A), P(the sun is shining)
• Probabilities of random variables
– P(X = x1), P(Y = y1), P(x1 < X < x2)
• P(A) usually means P(A = True)
(A is a proposition, not a variable)
– This is a probability value
– Technically, P(A) is a probability function
• P(X = x1)
– This is a probability value (P(X) is a probability function)
• P(X)
– This is a probability function or a probability density function
• Technically, if X is a variable, we should not write P(X) = 0.5
– But rather P(X = x1) = 0.5
14
Discrete and continuous probabilities
• Discrete: Probability function P(X, Y) is described by an
MxN matrix of probabilities
– Possible values of each: P(X=x1, Y=y1) = p1
– P(X=xi, Y=yj) = 1
– P(X, Y, Z) is an MxNxP matrix
• Continuous: Probability density function (pdf) P(X, Y) is
described by a 2D function
– P(x1 < X < x2, y1 < Y < y2) = p1
–
P(X, Y) dX dY = 1
15
Discrete probability distribution
p( X x ) 1
0.2
i
i
p(X)
0.1
0
1
2
3
4
5
6
7
X
8
9
10
11
12
16
Continuous probability distribution
0.4
p( X ) 1
p(X)
0.2
0
1
2
3
4
5
6
7
X
8
9
10
11
12
17
Continuous probability distribution
0.4
8
p( X ) a
P(X=5) = ???
P(X=5) = 0
6
P(X=x1) = 0
p(X)
0.2
0
1
2
3
4
5
6
7
X
8
9
10
11
12
18
Three Axioms of Probability
1. The probability of every event must be nonnegative
– For any event A, P(A) 0
2. Valid propositions have probability 1
– P(True) = 1
– P(A A) = 1
3. For disjoint events A1, A2, …
– P(A1 A2 …) = P(A1) + P(A2) + …
•
From these axioms, all other properties of probabilities
can be derived.
– E.g., derive P(A) + P(A) = 1
19
Some consequences of the axioms
• Unsatisfiable propositions have probability 0
– P(False) = 0
– P(A A) = 0
• For any two events A and B
– P(A B) = P(A) + P(B) – P(A B)
• For the complement Ac of event A
– P(Ac) = 1 – P(A)
• For any event A
– 0 P(A) 1
• For independent events A and B
– P(A B) = P(A) P(B)
20
Venn Diagram
True
A
A B
B
Visualize: P(True), P(False), P(A), P(B), P(A), P(B),
P(A B), P(A B), P(A B), …
21
Joint Probabilities
• A complete probability model is a single joint probability
distribution over all propositions/variables in the domain
– P(X1, X2, …, Xi, …)
• A particular instance of the world has the probability
– P(X1=x1 X2=x2 … Xi=xi …) = p
• Rather than stating knowledge as
WetGrass
WetGrass
Raining
0.8
0.04
Raining
0.01
0.15
– Raining WetGrass
• We can state it as
–
–
–
–
P(Raining, WetGrass) = 0.15
P(Raining, WetGrass) = 0.01
P(Raining, WetGrass) = 0.04
P(Raining, WetGrass) = 0.8
22
Conditional Probability
• Unconditional, or Prior, Probability
– Probabilities associated with a proposition or variable, prior to any
evidence
– E.g., P(WetGrass), P(Raining)
• Conditional, or Posterior, Probability
–
–
–
–
Probabilities after evidence is gathered
P(A | B) – “The probability of A given that we know B”
After (posterior to) procuring evidence
E.g., P(WetGrass | Raining)
P( X , Y )
P( X | Y )
P(Y )
or
P( X | Y ) P(Y ) P( X , Y )
Assumes P(Y) nonzero
23
The chain rule
P( X , Y ) P( X | Y ) P(Y )
By the Chain Rule
P( X , Y , Z ) P( X | Y , Z ) P(Y , Z )
P( X | Y , Z ) P(Y | Z ) P( Z )
or, equivalently
P( X ) P(Y | X ) P( Z | X , Y )
Notes:
• Precedence: ‘|’ is lowest
• E.g., P(X | Y, Z) means which?
P( (X | Y), Z )
P(X | (Y, Z) )
24
Joint probability distribution
From P(X,Y), we can always calculate:
X
Y
x1
x2
x3
y1
0.2
0.1
0.1
y2
0.1
0.2
0.3
P(X)
P(Y)
P(X|Y)
P(Y|X)
P(X=x1)
P(Y=y2)
P(X|Y=y1)
P(Y|X=x1)
P(X=x1|Y)
etc.
25
P(X,Y)
y1
y2
x1
x2
0.2
0.1
0.1
0.2
P(Y)
x3
P(X)
0.1
0.3
P(X|Y)
y1
0.4
y2
0.6
P(X=x1,Y=y2) = ?
P(X=x1) = ?
P(Y=y2) = ?
P(X|Y=y1) = ?
P(X=x1|Y) = ?
x1
x2
x3
0.3
0.3
0.4
x1
x2
x3
y1
0.5
0.25
0.25
y2
0.167
0.333
0.5
x1
x2
y1
0.667
0.333
0.25
y2
0.333
0.667
0.75
P(Y|X)
x3
26
Probability Distributions
Continuous vars
Discrete vars
P(X)
Function (of one variable)
M vector
P(X=x)
Scalar*
Scalar
P(X,Y)
Function of two variables
MxN matrix
P(X|Y)
Function of two variables
MxN matrix
P(X|Y=y)
Function of one variable
M vector
P(X=x|Y)
Function of one variable
N vector
P(X=x|Y=y)
Scalar*
Scalar
* - actually zero. Should be P(x1 < X < x2)
27
Bayes’ Rule
• Since
P( X , Y ) P( X | Y ) P(Y )
and
P( X , Y ) P(Y | X ) P( X )
• Then
P( X | Y ) P(Y ) P(Y | X ) P( X )
P(Y | X ) P( X )
Bayes’ Rule
P( X | Y )
P(Y )
28
Bayes’ Rule
• Similarly, P(X) conditioned on two variables:
P(Y | X , Z ) P( X | Z )
P( X | Y , Z )
P(Y | Z )
P( Z | X , Y ) P( X | Y )
P( X | Y , Z )
P( Z | Y )
• Or N variables:
P( X 2 | X 1 , X 3 ,, X N ) P( X 1 | X 3 ,, X N )
P( X 1 | X 2 , X 3 ,, X N )
P( X 2 | X 3 ,, X N )
29
Bayes’ Rule
• This simple equation is very useful in practice
– Usually framed in terms of hypotheses (H) and data (D)
Which of the hypotheses is best supported by the data?
Likelihood
(causal knowledge)
Prior probability
P( D | H i ) P( H i )
P ( H i | D)
P ( D)
Posterior probability
(diagnostic knowledge)
Normalizing constant
P( Hi | D) k P( D | Hi ) P( Hi )
30
Bayes’ rule example: Medical diagnosis
• Meningitis causes a stiff neck 50% of the time
• A patient comes in with a stiff neck – what is the
probability that he has meningitis?
• Need to know two things:
– The prior probability of a patient having meningitis (1/50,000)
– The prior probability of a patient having a stiff neck (1/20)
• ?
P( S | M ) P( M )
P( M | S )
P( S )
• P(M | S) = (0.5)(0.00002)/(0.05) = 0.0002
31
Example (cont.)
• Suppose that we also know about whiplash
– P(W) = 1/1000
– P(S | W) = 0.8
• What is the relative likelihood of whiplash and meningitis?
– P(W | S) / P(M | S)
P(W | S )
P( S | W ) P(W ) (0.8)(0.001)
0.016
P( S )
0.05
So the relative likelihood of whiplash vs. meningitis is (0.016/0.0002) = 80
32
A useful Bayes rule example
A test for a new, deadly strain of anthrax (that has no symptoms)
is known to be 99.9% accurate. Should you get tested? The
chances of having this strain are one in a million.
What are the random variables?
A – you have anthrax (boolean)
T – you test positive for anthrax (boolean)
Notation: Instead of P(A=True) and P(A=False), we will write P(A) and P(A)
What do we want to compute?
P(A|T)
What else do we need to know or assume?
Priors: P(A) , P(A)
Given: P(T|A) , P(T|A), P(T|A), P(T|A)
Possibilities
A
A
T
A
T
A
T
T
33
Example (cont.)
We know:
Given: P(T|A) = 0.999, P(T|A) = 0.001, P(T|A) = 0.001, P(T|A)
= 0.999
Prior knowledge: P(A) = 10-6, P(A) = 1 – 10-6
Want to know P(A|T)
P(A|T) = P(T|A) P(A) / P(T)
Calculate P(T) by marginalization
P(T) = P(T|A) P(A) + P(T|A) P(A) = (0.999)(10-6) + (0.001)(1 – 10-6)
0.001
So P(A|T) = (0.999)(10-6) / 0.001 0.001
Therefore P(A|T) 0.999
What if you work at a Post Office?
34
People with anthrax
People without anthrax
Bad T
(0.1%)
Good T
All people
35