AI Planning for Autonomy

Search algorithms

Heuristic

Exploration vs Exploitation

Tao Lu

Iterative Deepening

A*

STRIPES Modeling

P = <F, O, I, G>

Bellman-Ford

Best support

IW

$IW(k)$ means performing breadth-first search and prune newly generated states with novelty > k. Iteratively increase k until problem solved or reach limit.

MDP

When MDP is known, we can use value iteration or policy iteration to work out the optimal policy

Value Iteration

Policy Iteration

Reinforcement Learning

Reinforcement learning is used when the model of problem is unknown to us.

Q-Learning differs from SARSA that it has the assumption that the next state is taking optimal action, but in fact it does not necessarily (agent may need to explore, thus non-optimal action is taken). While SARSA uses the Q of next step the update the current Q, which means if the agent take non-optimal action at $s’$, this will affect updating Q value at the current state. Therefore if $\epsilon$-greedy is used, SARSA tend to train a more conservative policy instead of aggresive one.