11.6.1 Kalman Filtering

This section covers the most successful and widely used example of a derived I-space that dramatically collapses the history I-space. In the special case in which both

and

are linear functions, and $p(\theta)$ , $p(\psi)$ , and

are Gaussian, all probabilistic I-states become Gaussian. This means that the probabilistic I-space, ${\cal I}_{prob}$ , does not need to represent every conceivable probability density function. The probabilistic I-state is always trapped in the subspace of ${\cal I}_{prob}$ that corresponds only to Gaussians. The subspace is denoted as ${\cal I}_{gauss}$ . This implies that an I-map, ${{\kappa}_{mom}}: {\cal I}_{prob}\rightarrow {\cal I}_{gauss}$ , can be applied without any loss of information.

The model is called linear-Gaussian (or LG). Each Gaussian density on ${\mathbb{R}}^n$ is fully specified by its

-dimensional mean vector $\mu$ and an $n \times n$ symmetric covariance matrix, $\Sigma$ . Therefore, ${\cal I}_{gauss}$ can be considered as a subset of ${\mathbb{R}}^m$ in which $m = {2n+{(^{n}_{2})}}$ . For example, if $X = {\mathbb{R}}^2$ , then ${\cal I}_{gauss} \subset {\mathbb{R}}^5$ , because two independent parameters specify the mean and three independent parameters specify the covariance matrix (not four, because of symmetry). It was mentioned in Section 11.4.3 that moment-based approximations can be used in general; however, for an LG model it is important to remember that ${\cal I}_{gauss}$ is an exact representation of ${\cal I}_{prob}$ .

In addition to the fact that the ${\cal I}_{prob}$ collapses nicely, ${{\kappa}_{mom}}$ is a sufficient I-map, and convenient expressions exist for incrementally updating the derived I-states entirely in terms of the computed means and covariance. This implies that we can work directly with ${\cal I}_{gauss}$ , without any regard for the original histories or even the general formulas for the probabilistic I-states from Section 11.4.1. The update expressions are given here without the full explanation, which is lengthy but not difficult and can be found in virtually any textbook on stochastic control (e.g., [95,564]).

For Kalman filtering, all of the required spaces are Euclidean, but they may have different dimensions. Therefore, let $X = {\mathbb{R}}^n$ , $U = \Theta = {\mathbb{R}}^m$ , and $Y = \Psi = {\mathbb{R}}^r$ . Since Kalman filtering relies on linear models, everything can be expressed in terms of matrix transformations. Let

, and

each denote a matrix with constant real-valued entries and which may or may not be singular. The dimensions of the matrices will be inferred from the equations in which they will appear (the dimensions have to be defined correctly to make the multiplications work out right). The

subscript is used to indicate that a different matrix may be used in each stage. In many applications, the matrices will be the same in each stage, in which case they can be denoted by

, and

. Since Kalman filtering can handle the more general case, the subscripts are included (even though they slightly complicate the expressions).

In general, the state transition equation, $x_{k+1} = f_k(x_k,u_k,\theta_k)$ , is defined as

Example 11..25 (Linear-Gaussian Example) For a simple example of (11.75), suppose $X = {\mathbb{R}}^3$ and $U = \Theta = {\mathbb{R}}^2$ . A particular instance is

$\displaystyle x_{k+1} = \begin{pmatrix}0 & \sqrt{2} & 1 1 & -1 & 4 2 & 0 ... ...trix} u_k + \begin{pmatrix}1 & 1 0 & -1 0 & 1 \end{pmatrix} \theta_k .$

(11.76)

$\blacksquare$

So far the linear part of the model has been given. The next step is to specify the Gaussian part. In each stage, both nature actions $\theta_k$ and $\psi_k$ are modeled with zero-mean Gaussians. Thus, each has an associated covariance matrix, denoted by $\Sigma_\theta$ and $\Sigma_\psi$ , respectively. Using the model given so far and starting with an initial Gaussian density over

, all resulting probabilistic I-states will be Gaussian [564].

Every derived I-state in ${\cal I}_{gauss}$ can be represented by a mean and covariance. Let $\mu_k$ and $\Sigma_k$ denote the mean and covariance of $P(x_k\vert{\eta}_k)$ . The expressions given in the remainder of this section define a derived information transition equation that computes $\mu_{k+1}$ and $\Sigma_{k+1}$ , given $\mu_k$ , $\Sigma_k$ ,

, and $y_{k+1}$ . The process starts by computing $\mu_1$ and $\Sigma_1$ from the initial conditions.

Assume that an initial condition is given that represents a Gaussian density over ${\mathbb{R}}^n$ . Let this be denoted by $\mu_0$ , and $\Sigma_0$ . The first I-state, which incorporates the first observation

, is computed as $\mu_1 = \mu_0 + L_1 (y_1 - C_1 \mu_0)$ and

$\displaystyle L_1 = \Sigma_0 C_1^T \big(C_1 \Sigma_0 C_1^T + H_1 \Sigma_\psi H_1\big)^{-1} .$

(11.79)

Now that $\mu_1$ and $\Sigma_1$ have been expressed, the base case is completed. The next part is to give the iterative updates from stage

to stage

. Using $\mu_k$ , the mean at the next stage is computed as

$\displaystyle \mu_{k+1} = A_k \mu_k + B_k u_k + L_{k+1} (y_{k+1} - C_{k+1} (A_k \mu_k + B_k u_k)) ,$

(11.80)

$\displaystyle \Sigma^\prime_{k+1} = A_k \Sigma_k A_k^T + G_k \Sigma_\theta G_k^T .$

(11.81)

$\displaystyle L_k = \Sigma^\prime_k C_k^T \big(C_k \Sigma^\prime_k C_k^T + H_k \Sigma_\psi H_k\big)^{-1} .$

(11.83)

The most common use of the Kalman filter is to provide reliable estimates of the state

by using $\mu_k$ . It turns out that the optimal expected-cost feedback plan for a cost functional that is a quadratic form can be obtained for LG systems in a closed-from expression; see Section 15.2.2. This model is often called LQG, to reflect the fact that it is linear, quadratic-cost, and Gaussian. The optimal feedback plan can even be expressed directly in terms of $\mu_k$ , without requiring $\Sigma_k$ . This indicates that the I-space may be collapsed down to

; however, the corresponding I-map is not sufficient. The covariances are still needed to compute the means, as is evident from (11.80) and (11.83). Thus, an optimal plan can be specified as $\pi : X \rightarrow U$ , but the derived I-states in ${\cal I}_{gauss}$ need to be represented for the I-map to be sufficient.

The Kalman filter provides a beautiful solution to the class of linear Gaussian models. It is even successfully applied quite often in practice for problems that do not even satisfy these conditions. This is called the extended Kalman filter. The success may be explained by recalling that the probabilistic I-space may be approximated by mean and covariance in a second-order moment-based approximation. In general, such an approximation may be inappropriate, but it is nevertheless widely used in practice.