AI Planning for Autonomy

Search algorithms


Exploration vs Exploitation

Tao Lu

Iterative Deepening


STRIPES Modeling

P = <F, O, I, G>


Best support


$IW(k)$ means performing breadth-first search and prune newly generated states with novelty > k. Iteratively increase k until problem solved or reach limit.


When MDP is known, we can use value iteration or policy iteration to work out the optimal policy

Value Iteration

Policy Iteration

Reinforcement Learning

Reinforcement learning is used when the model of problem is unknown to us.

Q-Learning differs from SARSA that it has the assumption that the next state is taking optimal action, but in fact it does not necessarily (agent may need to explore, thus non-optimal action is taken). While SARSA uses the Q of next step the update the current Q, which means if the agent take non-optimal action at $s’$, this will affect updating Q value at the current state. Therefore if $\epsilon$-greedy is used, SARSA tend to train a more conservative policy instead of aggresive one.