Saddle points in a sequential game

A saddle point will be obtained once again by defining security strategies for each player. Each player treats the other as nature, and if the same worst-case value is obtained, then the result is a saddle point for the game. If the values are different, then a randomized plan is needed to close the gap between the upper and lower values.

Upper and lower values now depend on the initial state, $ x_1 \in X$. There was no equivalent for this in Section 10.5.1 because the root of the game tree is the only possible starting point.

If sequences, $ {\tilde{u}}_K$ and $ {\tilde{v}}_K$, of actions are applied from $ x_1$, then the state history, $ {\tilde{x}}_F$, can be derived by repeatedly using the state transition function, $ f$. The upper value from $ x_1$ is defined as

$\displaystyle \overline{L}^*(x_1) = \min_{u_1} \max_{v_1} \min_{u_2} \max_{v_2}...
...n_{u_K} \max_{v_K} \Big\{ L({\tilde{x}}_F,{\tilde{u}}_K,{\tilde{v}}_K) \Big\} ,$ (10.107)

which is identical to (10.33) if $ {{\rm P}_2}$ is replaced by nature. Also, (10.108) generalizes (9.44) to multiple stages. The lower value from $ x_1$, which generalizes (9.46), is

$\displaystyle \underline{L}^*(x_1) = \max_{v_1} \min_{u_1} \max_{v_2} \min_{u_2...
...x_{v_K} \min_{u_K} \Big\{ L({\tilde{x}}_F,{\tilde{u}}_K,{\tilde{v}}_K) \Big\} .$ (10.108)

If $ \overline{L}^*(x_1) = \underline{L}^*(x_2)$, then a deterministic saddle point exists from $ x_1$. This implies that the order of $ \max$ and $ \min$ can be swapped inside of every stage.

Steven M LaValle 2012-04-20