HUMANOBS Predictive Heuristics for Decision-Making in Real-World Environments Helgi Páll Helgason, Kristinn R.
Download ReportTranscript HUMANOBS Predictive Heuristics for Decision-Making in Real-World Environments Helgi Páll Helgason, Kristinn R.
HUMANOBS Predictive Heuristics for Decision-Making in Real-World Environments Helgi Páll Helgason, Kristinn R. Thorisson, Eric Nivel, Pei Wang Reykjavik University / Icelandic Institute for Intelligent Machines Temple University, Philadelphia AGI 2013 - Beijing – August 2013 Problem > Multi-objective decision making > Realistic environments: dynamic, stochastic, continuous > Insufficient knowledge and resources > Knowledge is grounded in experience and re-evaluated continuously > Reasoning under uncertainty and in real-time Problem S0,0 0.2 0.6 S0 S0,1 S0,2 S0,3 0.1 0.4 time > Time is not discrete > Set of possible actions not always enumerable > Set of possible resulting states not always enumerable S0,0 S0,1 S0 S0,2 S0,3 time > Search guided by the predicted value of interesting future states > Value: relevancy to goals > Set of possible courses of action ordered set of interesting actions > Interestingness derived from experience (learning, attention control and self-compilation) and current activity (goals); real valued > Predictors are controlled by success rate and confidence > Predicted state has a likelihood: likelyhood(S)=confidence(P)*(SuccessRate(P)-0.5)+0.5 > Goal’s utility: utility(G)=priority(G)*urgency(G) where urgency is the time horizon (from now), computed relatively to the horizon of all other goals > Goal’s achievement: 1 if achieved in S, -1 otherwise > Expected value for a state S derived from S0 ... S, computed at the time of S0 ExpectedValue(S, S0)= product of all likelihoods of intermediate states leading to S from S0 * Sum of all (goal achievement in S * goal utility) > Use the expected value as the (predictive) heuristic > Domain-independence > Implemented in AERA > With an additional process of commitment (eliminate redundant goals, solve conflicts) > Scheduling of search driven by the predicted success of goals (learned from experience) in addition to the expected value of predicted future states > Anytime operation > Continuous updates of expected values as new goals are produced and new states predicted re-scheduling