Transcript talk
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell Example The analyst has to distinguish between two hypotheses: 0.4 Retires 0.6 Joins Vikings Example Observations: Without the tearful public ceremony that accompanied his retirement announcement from the Green Bay Packers just 11 months ago, quarterback Brett Favre has told the New York Jets he is retiring. Minnesota coach Brad Childress, jilted at the altar Tuesday afternoon by Brett Farve telling him he wasn’t going to play for the Vikings in 2009. According to many rumors, quarterback Brett Favre has closed on the purchase of a home in Eden Prairie, MN, where the Minnesota Vikings' team facility is located. Example Observation distributions: Without the tearful public ceremony that accompanied his retirement announcement from the Green Bay Packers just 11 months ago, quarterback Brett Favre has told the New York Jets he is retiring. P(says retire | Retires) = 0.9 P(says retire | Joins Vikings) = 0.6 Bayesian induction: P (Retire|says retire) = P (Retire) ∙ P(says retire|Retire) / (P (Retire) ∙ P(says retire|Retire) + P (Joins Vikings) ∙ P(Joins Vikings|Retire)) = 0.5 General problem We have to distinguish among n mutually exclusive hypotheses, denoted H1, H2,…, Hn. 0.4 0.6 For every hypothesis, we know its prior; thus, we have an array of n of priors, P(H1), P(H2), P(Hn) General problem We base hypothesis, the analysiswe on observable features,of denoted For every observation, OBS know , we the know related the probability number its am distribution of observation. P(o possible values, have num[1..m] withthe thethat OBS …,num[a]. OBS Each we observation is a variable a,j | H i) represents 1, OBS 2, each m. Thus, number probabilities of of values of possible for discrete each values observation. of OBSa. takes one several values. OBS1 I will RETIRE! 0.9 0.4 I won’t RETIRE! 0.1 0.6 num[1] = 2 We know a specific value of each observation val [1..m]. General problem We have to evaluate the posterior probabilities of the n given hypotheses, denoted Post(H1), Post(H2), Post(Hn) 0.5 0.5 Extension #1 Prior: 0.6 0.35 0.05 Something else H0 (“surprise”) Extension #1 After discovering val, Posterior probability of H0: Post(H0) = P(H0) ∙ P(val | H0) / P(val) = P(H0) ∙ P(val | H0) / (P(H0) ∙ P(val | H0) + likelihood(val)). Bad news: We do not know P(val | H0). Good news: Post(H0) monotonically depends on P(val | H0); thus, if we obtain lower and upper bounds for P(val | H0), we also get bounds for Post(H0). Plausibility principle Unlikely events normally do not happen; thus, if we have observed val, then its likelihood must not be too small. Plausibility threshold: We use a global constant plaus, which must be between 0.0 and 1.0. If we have observed val, we assume that P(val) ≥ plaus / num. We use it to obtains bounds for P(val | H0), : Lower: (plaus / num − likelihood(val)) / prior[0]. Upper: 1.0. Plausibility principle We use it to obtains bounds for P(val | H0): Lower: (plaus / num − likelihood(val)) / P(H0) . Upper: 1.0. We substitute these bounds into the dependency of Post(H0) on P(val | H0), , thus obtaining the bounds for Post(H0): Lower: 1.0 − likelihood(val) ∙ num / plaus. Upper: P(H0) / (P(H0) + likelihood(val)). We have derived bounds for the probability that none of the given hypotheses is correct. Extension #2 Multiple observations: Which one(s) to Use? Bayesian analysis: Use their joint distribution? Difficult to get. Independence assumption: usually does not work. We identify the highest-utility observation and do not use other observations to corroborate it. Extension #2 Utility Function 0.5 0.5 0.4 0.6 0.35 0.65 Which one is “better”? Extension #2 Utility Function 0.5 0.5 0.4 0.6 0.35 0.65 Shannon’s Entropy (negation) Extension #2 Utility Function 0.5 0.5 0.4 0.6 0.35 0.65 KL-divergence Extension #2 Utility Function 0.5 0.5 0.4 0.6 0.35 0.65 Self-defined function Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell Example The analyst has to distinguish between two hypotheses: 0.4 Retires 0.6 Joins Vikings Example I will RETIRE! Ask 0.4 0.5 0.6 0.5 Probe: Execute external action and observe its response, to gather more information. Example Probe: Gain (utility function) Probe Cost Observation Probability Probe Selection single-obs-gain(probej) Utility Function = visible[i, a, j] · (likelihood(1) · probe-gain(1) + … + likelihood(num[a]) · probe-gain(num[a])) + (1.0 − visible[i, a, j]) · cost[j] Observation Probability Probe Cost gain(probj) = max (single-obs-gain(probj, obs1),…, single-obs-gain(probj, obsm)) Experiment Task: Evaluating hypothesizes (H1, H2, H3, H4). No Probe, Accuracy of distinguishing between H1 and other hypotheses Experiment Task: Evaluating hypothesizes (H1, H2, H3, H4). Probe Selection to distinguish H1 and other hypotheses Experiment Task: Evaluating hypothesizes (H1, H2, H3, H4). Probe Selection to distinguish four hypotheses Summary Use Bayesian inference to distinguish among mutually exclusive hypotheses. H0 hypothesis Multiple observations Use Probe to gather more information for better analysis Cost, Utility function, Observation Probability,... Thank you