The cost of a feedback plan

Consider the cost-to-go of executing a plan $ \pi $ from a state $ x_1 \in X$. The resulting cost depends on the sequences of states that are visited, actions that are applied by the plan, and the applied nature actions. In Chapter 2 this was obtained by adding the cost terms, but now there is a dependency on nature. Both worst-case and expected-case analyses are possible, which generalize the treatment of Section 9.2 to state spaces and multiple stages.

Let $ {\cal H}(\pi ,x_1)$ denote the set of state-action-nature histories that could arise from $ \pi $ when applied using $ x_1$ as the initial state. The cost-to-go, $ {G_\pi }(x_1)$, under a given plan $ \pi $ from $ x_1$ can be measured using worst-case analysis as

$\displaystyle {G_\pi }({ x}_1) = \max_{({\tilde{x}},{\tilde{u}},{\tilde{\theta}...
...{\cal H}(\pi ,x_1)} \Big\{ L({\tilde{x}},{\tilde{u}},{\tilde{\theta}}) \Big\} ,$ (10.31)

which is the maximum cost over all possible trajectories from $ x_1$ under the plan $ \pi $. If any of these fail to terminate in the goal, then the cost becomes infinity. In (10.31), $ {\tilde{x}}$, $ {\tilde{u}}$, and $ {\tilde{\theta}}$ are infinite histories, although their influence on the cost is expected to terminate early due to the application of $ u_T$.

An optimal plan using worst-case analysis is any plan for which $ {G_\pi }(x_1)$ is minimized over all possible plans (all ways to assign actions to the states). In the case of feasible planning, there are usually numerous equivalent alternatives. Sometimes the task may be only to find a feasible plan, which means that all trajectories must reach the goal, but the cost does not need to be optimized.

Using probabilistic uncertainty, the cost of a plan can be measured using expected-case analysis as

$\displaystyle G_\pi (x_1) = E_{{\cal H}(\pi ,x_1)} \Big[ L({\tilde{x}},{\tilde{u}},{\tilde{\theta}}) \Big],$ (10.32)

in which $ E$ denotes the mathematical expectation taken over $ {\cal H}(\pi ,x_1)$ (i.e., the plan is evaluated in terms of a weighted sum, in which each term has a weight for the probability of a state-action-nature history and its associated cost, $ L({\tilde{x}},{\tilde{u}},{\tilde{\theta}})$). This can also be interpreted as the expected cost over trajectories from $ x_1$. If any of these have nonzero probability and fail to terminate in the goal, then $ G_\pi (x_1) = \infty$. In the probabilistic setting, the task is usually to find a plan that minimizes $ G_\pi (x_1)$.

An interesting question now emerges: Can the same plan, $ \pi $, be optimal from every initial state $ x_1 \in X$, or do we need to potentially find a different optimal plan for each initial state? Fortunately, a single plan will suffice to be optimal over all initial states. Why? This behavior was also observed in Section 8.2.1. If $ \pi $ is optimal from some $ x_1$, then it must also be optimal from every other state that is potentially visited by executing $ \pi $ from $ x_1$. Let $ x$ denote some visited state. If $ \pi $ was not optimal from $ x$, then a better plan would exist, and the goal could be reached from $ x$ with lower cost. This contradicts the optimality of $ \pi $ because solutions must travel through $ x$. Let $ \pi^*$ denote a plan that is optimal from every initial state.

Steven M LaValle 2012-04-20