### 2.3.1.1 Backward value iteration

As for the search methods, there are both forward and backward versions of the approach. The backward case will be covered first. Even though it may appear superficially to be easier to progress from , it turns out that progressing backward from is notationally simpler. The forward case will then be covered once some additional notation is introduced.

The key to deriving long optimal plans from shorter ones lies in the construction of optimal cost-to-go functions over . For from to , let denote the cost that accumulates from stage to under the execution of the optimal plan:

 (2.5)

Inside of the of (2.5) are the last terms of the cost functional, (2.4). The optimal cost-to-go for the boundary condition of reduces to

 (2.6)

This makes intuitive sense: Since there are no stages in which an action can be applied, the final stage cost is immediately received.

Now consider an algorithm that makes passes over , each time computing from , as ranges from down to . In the first iteration, is copied from without significant effort. In the second iteration, is computed for each as

 (2.7)

Since and , substitutions can be made into (2.7) to obtain

 (2.8)

which is straightforward to compute for each . This computes the costs of all optimal one-step plans from stage to stage .

It will be shown next that can be computed similarly once is given. Carefully study (2.5) and note that it can be written as

 (2.9)

by pulling the first term out of the sum and by separating the minimization over from the rest, which range from to . The second does not affect the term; thus, can be pulled outside to obtain

 (2.10)

The inner is exactly the definition of the optimal cost-to-go function . Upon substitution, this yields the recurrence

 (2.11)

in which . Now that the right side of (2.11) depends only on , , and , the computation of easily proceeds in time. This computation is called a value iteration. Note that in each value iteration, some states receive an infinite value only because they are not reachable; a -step plan from to does not exist. This means that there are no actions, , that bring to a state from which a -step plan exists that terminates in .

Summarizing, the value iterations proceed as follows:

 (2.12)

until finally is determined after time. The resulting may be applied to yield , the optimal cost to go to the goal from . It also conveniently gives the optimal cost-to-go from any other initial state. This cost is infinity for states from which cannot be reached in stages.

It seems convenient that the cost of the optimal plan can be computed so easily, but how is the actual plan extracted? One possibility is to store the action that satisfied the in (2.11) from every state, and at every stage. Unfortunately, this requires storage, but it can be reduced to using the tricks to come in Section 2.3.2 for the more general case of variable-length plans.

Example 2..3 (A Five-State Optimal Planning Problem)   Figure 2.8 shows a graph representation of a planning problem in which . Suppose that , , and . There will hence be four value iterations, which construct , , , and , once the final-stage cost-to-go, , is given.

The cost-to-go functions are shown in Figure 2.9. Figures 2.10 and 2.11 illustrate the computations. For computing , only and receive finite values because only they can reach in one stage. For computing , only the values and are important. Only paths that reach or can possibly lead to in stage . Note that the minimization in always chooses the action that produces the lowest total cost when arriving at a vertex in the next stage.

Steven M LaValle 2012-04-20