11.6.1 Kalman Filtering

This section covers the most successful and widely used example of a derived I-space that dramatically collapses the history I-space. In the special case in which both $ f$ and $ h$ are linear functions, and $ p(\theta)$, $ p(\psi)$, and $ p(x_1)$ are Gaussian, all probabilistic I-states become Gaussian. This means that the probabilistic I-space, $ {\cal I}_{prob}$, does not need to represent every conceivable probability density function. The probabilistic I-state is always trapped in the subspace of $ {\cal I}_{prob}$ that corresponds only to Gaussians. The subspace is denoted as $ {\cal I}_{gauss}$. This implies that an I-map, $ {{\kappa}_{mom}}: {\cal I}_{prob}\rightarrow {\cal I}_{gauss}$, can be applied without any loss of information.

The model is called linear-Gaussian (or LG). Each Gaussian density on $ {\mathbb{R}}^n$ is fully specified by its $ n$-dimensional mean vector $ \mu$ and an $ n \times n$ symmetric covariance matrix, $ \Sigma$. Therefore, $ {\cal I}_{gauss}$ can be considered as a subset of $ {\mathbb{R}}^m$ in which $ m =
{2n+{(^{n}_{2})}}$. For example, if $ X = {\mathbb{R}}^2$, then $ {\cal I}_{gauss}
\subset {\mathbb{R}}^5$, because two independent parameters specify the mean and three independent parameters specify the covariance matrix (not four, because of symmetry). It was mentioned in Section 11.4.3 that moment-based approximations can be used in general; however, for an LG model it is important to remember that $ {\cal I}_{gauss}$ is an exact representation of $ {\cal I}_{prob}$.

In addition to the fact that the $ {\cal I}_{prob}$ collapses nicely, $ {{\kappa}_{mom}}$ is a sufficient I-map, and convenient expressions exist for incrementally updating the derived I-states entirely in terms of the computed means and covariance. This implies that we can work directly with $ {\cal I}_{gauss}$, without any regard for the original histories or even the general formulas for the probabilistic I-states from Section 11.4.1. The update expressions are given here without the full explanation, which is lengthy but not difficult and can be found in virtually any textbook on stochastic control (e.g., [95,564]).

For Kalman filtering, all of the required spaces are Euclidean, but they may have different dimensions. Therefore, let $ X = {\mathbb{R}}^n$, $ U =
\Theta = {\mathbb{R}}^m$, and $ Y = \Psi = {\mathbb{R}}^r$. Since Kalman filtering relies on linear models, everything can be expressed in terms of matrix transformations. Let $ A_k$, $ B_k$, $ C_k$, $ G_k$, and $ H_k$ each denote a matrix with constant real-valued entries and which may or may not be singular. The dimensions of the matrices will be inferred from the equations in which they will appear (the dimensions have to be defined correctly to make the multiplications work out right). The $ k$ subscript is used to indicate that a different matrix may be used in each stage. In many applications, the matrices will be the same in each stage, in which case they can be denoted by $ A$, $ B$, $ C$, $ G$, and $ H$. Since Kalman filtering can handle the more general case, the subscripts are included (even though they slightly complicate the expressions).

In general, the state transition equation, $ x_{k+1} =
f_k(x_k,u_k,\theta_k)$, is defined as

$\displaystyle x_{k+1} = A_k x_k + B_k u_k + G_k \theta_k ,$ (11.75)

in which the matrices $ A_k$, $ B_k$, and $ G_k$ are of appropriate dimensions. The notation $ f_k$ is used instead of $ f$, because the Kalman filter works even if $ f$ is different in every stage.

Example 11..25 (Linear-Gaussian Example)   For a simple example of (11.75), suppose $ X = {\mathbb{R}}^3$ and $ U = \Theta = {\mathbb{R}}^2$. A particular instance is

$\displaystyle x_{k+1} = \begin{pmatrix}0 & \sqrt{2} & 1  1 & -1 & 4  2 & 0 ...
...trix} u_k + \begin{pmatrix}1 & 1  0 & -1  0 & 1  \end{pmatrix} \theta_k .$ (11.76)

$ \blacksquare$

The general form of the sensor mapping $ y_k = h_k(x_k,\psi_k)$ is

$\displaystyle y_k = C_k x_k + H_k \psi_k ,$ (11.77)

in which the matrices $ C_k$ and $ H_k$ are of appropriate dimension. Once again, $ h_k$ is used instead of $ h$ because a different sensor mapping can be used in every stage.

So far the linear part of the model has been given. The next step is to specify the Gaussian part. In each stage, both nature actions $ \theta_k$ and $ \psi_k$ are modeled with zero-mean Gaussians. Thus, each has an associated covariance matrix, denoted by $ \Sigma_\theta$ and $ \Sigma_\psi$, respectively. Using the model given so far and starting with an initial Gaussian density over $ X$, all resulting probabilistic I-states will be Gaussian [564].

Every derived I-state in $ {\cal I}_{gauss}$ can be represented by a mean and covariance. Let $ \mu_k$ and $ \Sigma_k$ denote the mean and covariance of $ P(x_k\vert{\eta}_k)$. The expressions given in the remainder of this section define a derived information transition equation that computes $ \mu_{k+1}$ and $ \Sigma_{k+1}$, given $ \mu_k$, $ \Sigma_k$, $ u_k$, and $ y_{k+1}$. The process starts by computing $ \mu_1$ and $ \Sigma_1$ from the initial conditions.

Assume that an initial condition is given that represents a Gaussian density over $ {\mathbb{R}}^n$. Let this be denoted by $ \mu_0$, and $ \Sigma_0$. The first I-state, which incorporates the first observation $ y_1$, is computed as $ \mu_1 = \mu_0 + L_1 (y_1 - C_1 \mu_0)$ and

$\displaystyle \Sigma_1 = (I - L_1 C_1) \Sigma_0 ,$ (11.78)

in which $ I$ is the identity matrix and

$\displaystyle L_1 = \Sigma_0 C_1^T \big(C_1 \Sigma_0 C_1^T + H_1 \Sigma_\psi H_1\big)^{-1} .$ (11.79)

Although the expression for $ L_1$ is complicated, note that all matrices have been specified as part of the model. The only unfortunate part is that a matrix inversion is required, which sometimes leads to numerical instability in practice; see [564] or other sources for an alternative formulation that alleviates this problem.

Now that $ \mu_1$ and $ \Sigma_1$ have been expressed, the base case is completed. The next part is to give the iterative updates from stage $ k$ to stage $ k+1$. Using $ \mu_k$, the mean at the next stage is computed as

$\displaystyle \mu_{k+1} = A_k \mu_k + B_k u_k + L_{k+1} (y_{k+1} - C_{k+1} (A_k \mu_k + B_k u_k)) ,$ (11.80)

in which $ L_{k+1}$ will be defined shortly. The covariance is computed in two steps; one is based on applying $ u_k$, and the other arises from considering $ y_{k+1}$. Thus, after $ u_k$ is applied, the covariance becomes

$\displaystyle \Sigma^\prime_{k+1} = A_k \Sigma_k A_k^T + G_k \Sigma_\theta G_k^T .$ (11.81)

After $ y_{k+1}$ is received, the covariance $ \Sigma_{k+1}$ is computed from $ \Sigma^\prime_{k+1}$ as

$\displaystyle \Sigma_{k+1} = (I - L_{k+1}C_{k+1}) \Sigma^\prime_{k+1} .$ (11.82)

The expression for $ L_k$ is

$\displaystyle L_k = \Sigma^\prime_k C_k^T \big(C_k \Sigma^\prime_k C_k^T + H_k \Sigma_\psi H_k\big)^{-1} .$ (11.83)

To obtain $ L_{k+1}$, substitute $ k+1$ for $ k$ in (11.83). Note that to compute $ \mu_{k+1}$ using (11.80), $ \Sigma^\prime_{k+1}$ must first be computed because (11.80) depends on $ L_{k+1}$, which in turn depends on $ \Sigma^\prime_{k+1}$.

The most common use of the Kalman filter is to provide reliable estimates of the state $ x_k$ by using $ \mu_k$. It turns out that the optimal expected-cost feedback plan for a cost functional that is a quadratic form can be obtained for LG systems in a closed-from expression; see Section 15.2.2. This model is often called LQG, to reflect the fact that it is linear, quadratic-cost, and Gaussian. The optimal feedback plan can even be expressed directly in terms of $ \mu_k$, without requiring $ \Sigma_k$. This indicates that the I-space may be collapsed down to $ X$; however, the corresponding I-map is not sufficient. The covariances are still needed to compute the means, as is evident from (11.80) and (11.83). Thus, an optimal plan can be specified as $ \pi : X
\rightarrow U$, but the derived I-states in $ {\cal I}_{gauss}$ need to be represented for the I-map to be sufficient.

The Kalman filter provides a beautiful solution to the class of linear Gaussian models. It is even successfully applied quite often in practice for problems that do not even satisfy these conditions. This is called the extended Kalman filter. The success may be explained by recalling that the probabilistic I-space may be approximated by mean and covariance in a second-order moment-based approximation. In general, such an approximation may be inappropriate, but it is nevertheless widely used in practice.

Steven M LaValle 2012-04-20