9.2.4.2 Parameter estimation

Another important application of the decision-making framework of this section is parameter estimation [89,268]. In this case, nature selects a parameter, $\theta \in \Theta$ , and $\Theta$ represents a parameter space. Through one or more independent trials, some observations are obtained. Each observation should ideally be a direct measurement of $\Theta$ , but imperfections in the measurement process distort the observation. Usually, $\Theta \subseteq Y$ , and in many cases, $Y = \Theta$ . The robot action is to guess the parameter that was chosen by nature. Hence, $U = \Theta$ . In most applications, all of the spaces are continuous subsets of ${\mathbb{R}}^n$ . The cost function is designed to increase as the error, $\Vert u - \theta\Vert$ , becomes larger.

Example 9..12 (Parameter Estimation) Suppose that $U = Y = \Theta = {\mathbb{R}}$ . Nature therefore chooses a real-valued parameter, which is estimated. The cost of making a mistake is

$\displaystyle L(u,\theta) = (u-\theta)^2 .$

(9.35)

Suppose that a Bayesian approach is taken. The prior probability density $p(\theta)$ is given as uniform over an interval $[a,b] \subset {\mathbb{R}}$ . An observation is received, but it is noisy. The noise can be modeled as a second action of nature, as described in Section 9.2.3. This leads to a density $p(y\vert\theta)$ . Suppose that the noise is modeled with a Gaussian, which results in

$\displaystyle p(y\vert\theta) = \frac{1}{\sqrt{2 \pi \sigma^2}} \; e^{-(y-\theta)^2/2\sigma^2},$

(9.36)

in which the mean is $\theta$ and the standard deviation is $\sigma$ .

The optimal parameter estimate based on is obtained by selecting $u \in {\mathbb{R}}$ to minimize

$\begin{displaymath}\begin{split}\int_{-\infty}^\infty L(u,\theta) p(\theta\vert y) d\theta, \end{split}\end{displaymath}$

(9.37)

in which

$\displaystyle p(\theta\vert y) = { p(y\vert\theta) p(\theta) \over p(y) },$

(9.38)

by Bayes' rule. The term

does not depend on $\theta$ , and it can therefore be ignored in the optimization. Using the prior density, $p(\theta) = 0$ outside of

; hence, the domain of integration can be restricted to

. The value of $p(\theta) = 1/(b-a)$ is also a constant that can be ignored in the optimization. Using (9.36), this means that

is selected to optimize

$\begin{displaymath}\begin{split}\int_a^b L(u,\theta) p(y\vert\theta) d\theta, \end{split}\end{displaymath}$

(9.39)

which can be expressed in terms of the standard error function, $\operatorname{erf}(x)$ (the integral from 0 to a constant, of a Gaussian density over an interval).

If a sequence, , $\ldots$ , , of independent observations is obtained, then (9.39) is replaced by

$\begin{displaymath}\begin{split}\int_a^b L(u,\theta) p(y_1\vert\theta) \cdots p(y_k\vert\theta) d\theta . \end{split}\end{displaymath}$

(9.40)

$\blacksquare$