2.3 Discrete Optimal Planning

This section extends Formulation 2.1 to allow optimal planning problems to be defined. Rather than being satisfied with any sequence of actions that leads to the goal set, suppose we would like a solution that optimizes some criterion, such as time, distance, or energy consumed. Three important extensions will be made: 1) A stage index will be used to conveniently indicate the current plan step; 2) a cost functional will be introduced, which behaves like a taxi meter by indicating how much cost accumulates during the plan execution; and 3) a termination action will be introduced, which intuitively indicates when it is time to stop the plan and fix the total cost.

The presentation involves three phases. First, the problem of finding optimal paths of a fixed length is covered in Section 2.3.1. The approach, called value iteration, involves iteratively computing optimal cost-to-go functions over the state space. Although this case is not very useful by itself, it is much easier to understand than the general case of variable-length plans. Once the concepts from this section are understood, their extension to variable-length plans will be much clearer and is covered in Section 2.3.2. Finally, Section 2.3.3 explains the close relationship between value iteration and Dijkstra's algorithm, which was covered in Section 2.2.1.

With nearly all optimization problems, there is the arbitrary, symmetric choice of whether to define a criterion to minimize or maximize. If the cost is a kind of energy or expense, then minimization seems sensible, as is typical in robotics and control theory. If the cost is a kind of reward, as in investment planning or in most AI books, then maximization is preferred. Although this issue remains throughout the book, we will choose to minimize everything. If maximization is instead preferred, then multiplying the costs by

and swapping minimizations with maximizations should suffice.

The fixed-length optimal planning formulation will be given shortly, but first we introduce some new notation. Let $\pi _K$ denote a -step plan, which is a sequence

, $\ldots$ ,

actions. If $\pi _K$ and ${x_{I}}$ are given, then a sequence of states,

, $\ldots$ , $x_{K+1})$ , can be derived using the state transition function,

. Initially, $x_1 = {x_{I}}$ , and each subsequent state is obtained by $x_{k+1} = f(x_k,u_k)$ .

The model is now given; the most important addition with respect to Formulation 2.1 is

, the cost functional.

Formulation 2..2 (Discrete Fixed-Length Optimal Planning)

All of the components from Formulation 2.1 are inherited directly: , , , ${x_{I}}$ , and ${X_{G}}$ , except here it is assumed that is finite (some algorithms may easily extend to the case in which is countably infinite, but this will not be considered here).
A number, , of stages, which is the exact length of a plan (measured as the number of actions, , , $\ldots$ , ). States may also obtain a stage index. For example, $x_{k+1}$ denotes the state obtained after is applied.
Let denote a stage-additive cost (or loss) functional, which is applied to a -step plan, $\pi _K$ . This means that the sequence $(u_1,\ldots,u_K)$ of actions and the sequence $(x_1,\ldots,x_{K+1})$ of states may appear in an expression of . For convenience, let denote the final stage, (the application of advances the stage to ). The cost functional is

$\displaystyle L(\pi _K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_F) .$ (2.4)

The cost term yields a real value for every $x_k \in X$ and $u_k \in U(x_k)$ . The final term is outside of the sum and is defined as if $x_F \in {X_{G}}$ , and $l_F(x_F) = \infty$ otherwise.

An important comment must be made regarding

. Including

in (2.4) is actually unnecessary if it is agreed in advance that

will only be applied to evaluate plans that reach ${X_{G}}$ . It would then be undefined for all other plans. The algorithms to be presented shortly will also function nicely under this assumption; however, the notation and explanation can become more cumbersome because the action space must always be restricted to ensure that successful plans are produced. Instead of this, the domain of

is extended to include all plans, and those that do not reach ${X_{G}}$ are penalized with infinite cost so that they are eliminated automatically in any optimization steps. At some point, the role of

may become confusing, and it is helpful to remember that it is just a trick to convert feasibility constraints into a straightforward optimization ( $L(\pi _K) = \infty$ means not feasible and $L(\pi _K) < \infty$ means feasible with cost $L(\pi _K)$ ).

Now the task is to find a plan that minimizes

. To obtain a feasible planning problem like Formulation 2.1 but restricted to

-step plans, let $l(x,u) \equiv 0$ . To obtain a planning problem that requires minimizing the number of stages, let $l(x,u) \equiv 1$ . The possibility also exists of having goals that are less ``crisp'' by letting

vary for different $x \in {X_{G}}$ , as opposed to

. This is much more general than what was allowed with feasible planning because now states may take on any value, as opposed to being classified as inside or outside of ${X_{G}}$ .