HUMANOBS Predictive Heuristics for Decision-Making in Real-World Environments Helgi Páll Helgason, Kristinn R.

Download Report

Transcript HUMANOBS Predictive Heuristics for Decision-Making in Real-World Environments Helgi Páll Helgason, Kristinn R.

HUMANOBS
Predictive Heuristics for Decision-Making
in Real-World Environments
Helgi Páll Helgason, Kristinn R. Thorisson, Eric Nivel, Pei Wang
Reykjavik University / Icelandic Institute for Intelligent Machines
Temple University, Philadelphia
AGI 2013 - Beijing – August 2013
Problem
> Multi-objective decision making
> Realistic environments: dynamic, stochastic, continuous
> Insufficient knowledge and resources
> Knowledge is grounded in experience and re-evaluated
continuously
> Reasoning under uncertainty and in real-time
Problem
S0,0
0.2
0.6
S0
S0,1
S0,2
S0,3
0.1
0.4
time
> Time is not discrete
> Set of possible actions not always enumerable
> Set of possible resulting states not always enumerable
S0,0
S0,1
S0
S0,2
S0,3
time
> Search guided by the predicted value of interesting
future states
> Value: relevancy to goals
> Set of possible courses of action  ordered set of
interesting actions
> Interestingness derived from experience (learning,
attention control and self-compilation) and current activity
(goals); real valued
> Predictors are controlled by success rate and
confidence
> Predicted state has a likelihood:
likelyhood(S)=confidence(P)*(SuccessRate(P)-0.5)+0.5
> Goal’s utility:
utility(G)=priority(G)*urgency(G) where urgency is the time
horizon (from now), computed relatively to the horizon of
all other goals
> Goal’s achievement: 1 if achieved in S, -1 otherwise
> Expected value for a state S derived from S0  ...  S,
computed at the time of S0
ExpectedValue(S, S0)=
product of all likelihoods of intermediate states leading to S
from S0 * Sum of all (goal achievement in S * goal utility)
> Use the expected value as the (predictive) heuristic
> Domain-independence
> Implemented in AERA
> With an additional process of commitment (eliminate
redundant goals, solve conflicts)
> Scheduling of search driven by the predicted success of
goals (learned from experience) in addition to the expected
value of predicted future states
> Anytime operation
> Continuous updates of expected values as new goals are
produced and new states predicted  re-scheduling