Further Reading

Since this chapter considers sequential versions of single-stage decision problems, the suggested reading at the end of Chapter 9 is also relevant here. The probabilistic formulation in Section 10.1 is a basic problem of stochastic control theory [95,564]. The framework is also popular in artificial intelligence [79,267,471,839]. For an early, influential work on stochastic control, see [109], in which the notion of sequential games against nature is developed. The forward projection and backprojection topics are not as common in control theory and are instead inspired from [281,313,659]. The nondeterministic formulation is obtained by eliminating probabilities from the formulation; worst-case analysis also appears extensively in control theory [57,58,301]. A case for using randomized strategies in robotics is made in [314].

Section 10.2 is based on classical dynamic programming work, but with emphasis on the stochastic shortest-path problem. For more reading on value and policy iteration in this context, see [95]. Section 10.2.3 is based on extending Dijkstra's algorithm. For convergence issues due to approximations of continuous problems, see [92,567,720]. For complexity results for games against nature, see [764,767].

Section 10.3 was inspired by coverage in [95]. For further reading on reinforcement learning, the subject of Section 10.4, see [19,74,97,895].

Section 10.5 was based on material in [59], but with an emphasis on unifying concepts from previous sections. Also contained in [59] are sequential game formulations on continuous spaces and even in continuous time. In continuous time, these are called differential games, and they are introduced in Section 13.5. Dynamic programming principles extend nicely into game theory. Furthermore, they extend to Pareto optimality [242].

The main purpose of Section 10.6 is to return to motion planning by considering continuous state spaces. Few works exist on combining stochastic optimal control with motion planning. The presented material is based mainly on [599,605,613,867,868].

Steven M LaValle 2012-04-20