11.1.3 Defining a Planning Problem

Planning problems will be defined directly on the history I-space, which makes it appear as an ordinary state space in many ways. Keep in mind, however, that it was derived from another state space for which perfect state observations could not be obtained. In Section 10.1, a feedback plan was defined as a function of the state. Here, a feedback plan is instead a function of the I-state. Decisions cannot be based on the state because it will be generally unknown during the execution of the plan. However, the I-state is always known; thus, it is logical to base decisions on it.

Let $\pi _K$ denote a -step information-feedback plan, which is a sequence $(\pi _1$ , $\pi_2$ , $\ldots$ , $\pi _K$ ) of

functions, $\pi _k: {{\cal I}_k}\rightarrow U$ . Thus, at every stage

, the I-state ${\eta}_k \in {{\cal I}_k}$ is used as a basis for choosing the action $u_k = \pi _k({\eta}_k)$ . Due to interference of nature through both the state transition equation and the sensor mapping, the action sequence $(u_1,\ldots,u_K)$ produced by a plan, $\pi _K$ , will not be known until the plan terminates.

As in Formulation 2.3, it will be convenient to assume that

contains a termination action,

. If

is applied at stage

, then it is repeatedly applied forever. It is assumed once again that the state

remains fixed after the termination condition is applied. Remember, however,

is still unknown in general; it becomes fixed but unknown. Technically, based on the definition of the history I-space, the I-state must change after

is applied because the history grows. These changes can be ignored, however, because no new decisions are made after

is applied. A plan that uses a termination condition can be specified as $\pi = (\pi _1,\pi _2,\ldots)$ because the number of stages may vary each time the plan is executed. Using the history I-space definition in (11.19), an information-feedback plan is expressed as

Formulation 11..1 (Discrete Information Space Planning)

A nonempty state space that is either finite or countably infinite.
A nonempty, finite action space . It is assumed that contains a special termination action, which has the same effect as defined in Formulation 2.3.
A finite nature action space $\Theta(x,u)$ for each $x \in X$ and $u \in U$ .
A state transition function that produces a state, $f(x,u,\theta)$ , for every $x \in X$ , $u \in U$ , and $\theta \in \Theta(x,u)$ .
A finite or countably infinite observation space .
A finite nature sensing action space $\Psi(x)$ for each $x \in X$ .
A sensor mapping which produces an observation, $y=h(x,\psi)$ , for each $x \in X$ and $\psi \in \Psi(x)$ . This definition assumes a state-nature sensor mappings. A state sensor mapping or history-based sensor mapping, as defined in Section 11.1.1, could alternatively be used.
A set of stages, each denoted by , which begins at and continues indefinitely.
An initial condition ${\eta_0}$ , which is an element of an initial condition space, ${{\cal I}_0}$ .
A history I-space ${\cal I}_{hist}$ which is the union of ${\cal I}_0$ and ${{\cal I}_k}= {{\cal I}_0}\times {\tilde{U}}_{k-1} \times {\tilde{Y}}_k$ for every stage $k \in {\mathbb{N}}$ .
Let denote a stage-additive cost functional, which may be applied to any pair $({\tilde{x}}_{K+1},{\tilde{u}}_K)$ of state and action histories to yield

$\displaystyle L({\tilde{x}}_{K+1},{\tilde{u}}_K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_{K+1}) .$ (11.21)

If the termination action is applied at some stage , then for all $i \geq k$ , , , and . Either a feasible or optimal planning problem can be defined, as in Formulation 10.1; however, the plan here is specified as $\pi : {\cal I}\rightarrow U$ .

Some immediate extensions of Formulation 11.1 are possible, but we avoid them here simplify notation in the coming concepts. One extension is to allow different action sets,

, for each $x \in X$ . Be careful, however, because information regarding the current state can be inferred if the action set

is given, and it varies depending on

. Another extension is to allow the costs to depend on nature, to obtain $l(x_k,u_k,\theta_k)$ , instead of

in (11.21).