12.1.1 The Information Space as a Big State Space

Recall that any problem specified using Formulation 11.1 can be converted using derived I-states into a problem under Formulation 10.1. By building on the discussion from the end of Section 11.1.3, this can be achieved by treating the I-space as a big state space in which each state is an I-state in the original problem formulation. Some of the components were given previously, but here a complete formulation is given.

Suppose that a problem has been specified using Formulation 11.1, resulting in the usual components: $ X$, $ U$, $ \Theta$, $ f$, $ Y$, $ h$, $ x_I$, $ X_G$, and $ L$. The following concepts will work for any sufficient I-map; however, the presentation will be limited to two important cases: $ {{\kappa}_{ndet}}$ and $ {{\kappa}_{prob}}$, which yield derived I-spaces $ {\cal I}_{ndet}$ and $ {\cal I}_{prob}$, respectively (recall Sections 11.2.2 and 11.2.3).

Figure 12.1: The derived I-space can be treated as an ordinary state space on which planning with perfect state information can be performed.
Item & Notation & Explanation  \hline
...ional & ${\vec{L}}$ & Derived from original $L$ \\

The components of Formulation 10.1 will now be specified using components of the original problem. To avoid confusion between the two formulations, an arrow will be placed above all components of the new formulation. Figure 12.1 summarizes the coming definitions. The new state space, $ {\vec{X}}$, is defined as $ {\vec{X}}=
{\cal I}_{der}$, and a state, $ {\vec{x}}\in {\vec{X}}$, is a derived I-state, $ {\vec{x}}
= {\eta_{der}}$. Under nondeterministic uncertainty, $ {\vec{x}}_k$ means $ X_k({\eta}_k)$, in which $ {\eta}_k$ is the history I-state. Under probabilistic uncertainty, $ {\vec{x}}_k$ means $ P(x_k\vert{\eta}_k)$. The action space remains the same: $ {\vec{U}}= U$.

The strangest part of the formulation is the new nature action space, $ {\vec{\Theta}}({\vec{x}},{\vec{u}})$. The observations in Formulation 11.1 behave very much like nature actions because they are not selected by the robot, and, as will be seen shortly, they are the only unpredictable part of the new state transition equation. Therefore, $ {\vec{\Theta}}({\vec{x}},{\vec{u}}) \subseteq Y$, the original observation space. A new nature action, $ {\vec{\theta}}\in {\vec{\Theta}}$, is just an observation, $ {\vec{\theta}}({\vec{x}},{\vec{u}}) = y$. The set $ {\vec{\Theta}}({\vec{x}},{\vec{u}})$ generally depends on $ {\vec{x}}$ and $ {\vec{u}}$ because some observations may be impossible to receive from some states. For example, if a sensor that measures a mobile robot position is never wrong by more than $ 1$ meter, then observations that are further than $ 1$ meter from the true robot position are impossible.

A derived state transition equation is defined with $ {\vec{f}}({\vec{x}}_k,{\vec{u}}_k,{\vec{\theta}}_k)$ and yields a new state, $ {\vec{x}}_{k+1}$. Using the original notation, this is just a function that uses $ {\kappa}({\eta}_k)$, $ u_k$, and $ y_k$ to compute the next derived I-state, $ {\kappa}({\eta}_{k+1})$, which is allowed because we are working with sufficient I-maps, as described in Section 11.2.1.

Initial states and goal sets are optional and can be easily formulated in the new representation. The initial I-state, $ {\eta_0}$, becomes the new initial state, $ {\vec{x}}_I = {\eta_0}$. It is assumed that $ {\eta_0}$ is either a subset of $ X$ or a probability distribution, depending on whether planning occurs in $ {\cal I}_{ndet}$ or $ {\cal I}_{prob}$. In the nondeterministic case, the new goal set $ {\vec{X}}_G$ can be derived as

$\displaystyle {\vec{X}}_G = \{ X({\eta}) \in {\cal I}_{ndet}\;\vert\; X({\eta}) \subseteq X_G \} ,$ (12.1)

which is the set of derived I-states for which it is guaranteed that the true state lies in $ X_G$. A probabilistic version can be made by requiring that all states assigned nonzero probability by $ P(x\vert{\eta})$ lie in $ X_G$. Instead of being nonzero, a threshold could be used. For example, the goal may require being only 98% certain that the goal is reached.

The only remaining portion of Formulation 10.1 is the cost functional. We will develop a cost model that uses only the state and action histories. A dependency on nature would imply that the costs depend directly on the observation, $ y = {\vec{\theta}}$, which was not assumed in Formulation 11.1. The general $ K$-stage cost functional from Formulation 10.1 appears in this context as

$\displaystyle {\vec{L}}({\vec{x}}_k,{\vec{u}}_k) = \sum_{k=1}^K {\vec{l}}({\vec{x}}_k,{\vec{u}}_k) + {\vec{l}}_F({\vec{x}}_F) ,$ (12.2)

with the usual cost assumptions regarding the termination action.

The cost functional $ {\vec{L}}$ must be derived from the cost functional $ L$ of the original problem. This is expressed in terms of states, which are unknown. First consider the case of $ {\cal I}_{prob}$. The state $ x_k$ at stage $ k$ follows the probability distribution $ P(x_k\vert{\eta}_k)$, as derived in Section 11.2.3. Using $ {\vec{x}}_k$, an expected cost is assigned as

$\displaystyle {\vec{l}}({\vec{x}}_k,{\vec{u}}_k) = {\vec{l}}({\eta}_k,u_k) = \sum_{x_k \in X} P(x_k\vert{\eta}_k) l(x_k,u_k)$ (12.3)


$\displaystyle {\vec{l}}_F({\vec{x}}_F) = {\vec{l}}_F({\eta}_F) = \sum_{x_F \in X} P(x_F\vert{\eta}_K) l_F(x_F) .$ (12.4)

Ideally, we would like to make analogous expressions for the case of $ {\cal I}_{ndet}$; however, there is one problem. Formulating the worst-case cost for each stage is too pessimistic. For example, it may be possible to obtain high costs in two consecutive stages, but each of these may correspond to following different paths in $ X$. There is nothing to constrain the worst-case analysis to the same path. In the probabilistic case there is no problem because probabilities can be assigned to paths. For the nondeterministic case, a cost functional can be defined, but the stage-additive property needed for dynamic programming is destroyed in general. Under some restrictions on allowable costs, the stage-additive property is preserved.

The state $ x_k$ at stage $ k$ is known to lie in $ X_k({\eta}_k)$, as derived in Section 11.2.2. For every history I-state, $ {\eta}_k
= {\vec{x}}_k$, and $ u_k \in U$, assume that $ l(x_k,u_k)$ is invariant over all $ x_k \in X_k({\eta }_k)$. In this case,

$\displaystyle {\vec{l}}({\vec{x}}_k,{\vec{u}}_k) = {\vec{l}}({\eta}_k,u_k) = l(x_k,u_k) ,$ (12.5)

in which $ x_k \in X_k({\eta }_k)$, and

$\displaystyle {\vec{l}}_F({\vec{x}}_F) = {\vec{l}}_F({\eta}_F) = l_F(x_F) ,$ (12.6)

in which $ x_F \in X_F({\eta}_F)$.

A plan on the derived I-space, $ {\cal I}_{ndet}$ or $ {\cal I}_{prob}$, can now also be considered as a plan on the new state space $ {\vec{X}}$. Thus, state feedback is now possible, but in a larger state space $ {\vec{X}}$ instead of $ X$. The outcomes of actions are still generally unpredictable due to the observations. An interesting special case occurs when there are no observations. In this case, the I-state is predictable because it is derived only from actions that are chosen by the robot. In this case, the new formulation does not need nature actions, which reduces it down to Formulation 2.3. Due to this, feedback is no longer needed if the initial I-state is given. A plan can be expressed once again as a sequence of actions. Even though the original states are not predictable, the future information states are! This means that the state trajectory in the new state space is completely predictable as well.

Steven M LaValle 2012-04-20