Transcript PowerPoint
CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapter 13 An apology to Red Sox fans The only team ever in baseball to take a 3-0 series to a game seven I was playing the probabilities… Shortcomings of first-order logic Consider dental diagnosis • – Not all patients with toothaches have cavities. There are other causes of toothaches Shortcomings of first-order logic What’s wrong with this? • An unlimited number of toothache causes Shortcomings of first-order logic Alternatively, create a causal rule • Again, not all cavities cause pain. Must expand Shortcomings of first-order logic Both diagnostic and causal rules require countless qualifications • Difficult to be exhaustive – Too much work – We don’t know all the qualifications – Even correctly qualified rules may not be useful if the realtime application of the rules is missing data Shortcomings of first-order logic As an alternative to exhaustive logic… Probability Theory • Serves as a hedge against our laziness and ignorance Degrees of belief I believe the glass is full with 50% chance • Note this does not indicate the statement is half-true – We are not talking about a glass half-full • “The glass is full” is the only statement being considered • My statement indicates I believe with 50% that the statement is true. There are no claims about what other beliefs I have regarding the glass. – Fuzzy logic handles partial-truths Decision Theory What is rational behavior in context of probability? • Pick answer that satisfies goals with highest probability of actually working? – Sometimes more risk is acceptable • Must have a utility function that measures the many factors related to an agent’s happiness with an outcome • An agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action Building probability notation Propositions • Like propositional logic. The things we believe Atomic Events • A complete specification of the state of the world Prior Probability • Probability something is true in absence of other data Conditional Probability • Probability something is true given something else is known Propositions Like propositional logic • Random variables refer to parts of the world with unknown status • Random variables have a well-defined domain – Boolean – Discrete (countable) – Continuous Atomic events A complete specification of the world • All variables in the world are assigned values • Only one atomic event can be true • The set of all atomic events is exhaustive – at least one must be true • Any atomic even entails the truth or falsehood of every proposition Prior probability The degree of belief in the absence of other info • P (Weather) – P (Weather == sunny) = 0.7 – P (Weather == rainy) = 0.2 – P (Weather == cloudy) = 0.08 – P (Weather == snowy) = 0.02 • P (Weather) = <0.7, 0.2, 0.08, 0.02> – Probability distribution for the random variable Weather Prior probability - Discrete Joint probability distribution • P (Weather, Natural Disaster) = an n x m table of probs – n = instances of weather – m = instances of natural disasters Full joint probability distribution • Probabilities for all variables are established What about continuous variables where a table won’t suffice? Prior probability - Continuous Probability density functions (PDFs) • P (X = x) = Uniform [18, 26] (x) – The probability that tomorrow’s temperature is 20.5 degrees Celsius is U [18, 26] (20.5) = 0.125 Conditional probability The probability of a given all we know is b • P (a | b) Written as an unconditional probability • Axioms of probability • All probabilities are between 0 and 1 • Necessarily true propositions have probability 1 Necessarily false propositions have probability 0 • The probability of disjunction is: Using axioms of probability The probability of a proposition is equal to the sum of the probabilities of the atomic events in which it holds: An example Maginalization: Conditioning: Conditional probabilities Conditional probabilities Normalization Two previous calculations had the same denominator • P(cavity | toothache) = a P(cavity, toothache) – = a [P(cavity, toothache, catch) + P(cavity, toothache, ~catch)] – = a [<0.108, 0.016> + <0.012, 0.064>] = a<0.12, 0.08> = <0.6, 0.4> Generalized (X = cavity, e = toothache, y = catch) P (X, e, y) is a subset of the full joint distribution Using the full joint distribution It does not scale well… • n Boolean variables – Table size O (2n) – Process time O (2n) Independence Independence of variables in a domain can dramatically reduce the amount of information necessary to specify the full joint distribution • Adding weather (four states) to this table requires creating four versions of it (one for each weather state) = 8*4=32 cells Independence • P (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy | toothache, catch, cavity) * P(toothache, catch, cavity) Because weather and dentistry are independent • P (Weather=cloudy | toothache, catch, cavity) = P (Weather = cloudy) • P (toothache, catch, cavity, Weather=cloudy) = P(Weather=cloudy) * P(toothache, catch, cavity) 4-cell table 8-cell table Bayes’ Rule Useful when you know three things and need to know the fourth Example Meningitis • Doctor knows meningitis causes stiff necks 50% of the time • Doctor knows unconditional facts – The probability of having meningitis is 1 / 50,000 – The probability of having a stiff neck is 1 / 20 • The probability of having meningitis given a stiff neck: Power of Bayes’ rule Why not collect more diagnostic evidence? • Statistically sample to learn P (m | s) = 1 / 5,000 If P(m) changes… due to outbreak… Bayes’ computation adjusts automatically, but sampled P(m | s) is rigid Conditional independence Consider the infeasibility of full joint distributions • We must know P(toothache and catch) for all Cavity values Simplify using independence • Toothache and catch are not independent • Toothache and catch are independent given the presence or absence of a cavity Conditional independence Toothache and catch are independent given the presence or absence of a cavity • If you know you have a cavity, there’s no reason to believe the toothache and the dentist’s pick are related Conditional independence In general, when a single cause influences multiple effects, all of which are conditionally independent (given the cause) Naïve Bayes Even when “effect” variables are not conditionally independent, this model is sometimes used • Sometimes called a Bayesian Classifier