11.7.1 Information States in Game Trees

Recall from Section 10.5.1 that an important part of formulating a sequential game in a game tree is specifying the information model. This was described in Step 4 of Formulation 10.3. Three information models were considered in Section 10.5.1: alternating play, stage-by-stage, and open loop. These and many other information models can be described using I-spaces.

From Section 11.1, it should be clear that an I-space is always defined with respect to a state space. Even though Section 10.5.1 did not formally introduce a state space, it is not difficult to define one. Let the state space $ X$ be $ N$, the set of all vertices in the game tree. Assume that two players are engaged in a sequential zero-sum game. Using notation from Section 10.5.1, $ N_1$ and $ N_2$ are the decision vertices of $ {{\rm P}_1}$ and $ {{\rm P}_2}$, respectively. Consider the nondeterministic I-space $ {\cal I}_{ndet}$ over $ N$. Let $ {\eta}$ denote a nondeterministic I-state; thus, each $ {\eta}\in {\cal I}_{ndet}$ is a subset of $ N$.

There are now many possible ways in which the players can be confused while making their decisions. For example, if some $ {\eta}$ contains vertices from both $ N_1$ and $ N_2$, the player does not know whether it is even its turn to make a decision. If $ {\eta}$ additionally contains some leaf vertices, the game may be finished without a player even being aware of it. Most game tree formulations avoid these strange situations. It is usually assumed that the players at least know when it is their turn to make a decision. It is also usually assumed that they know the stage of the game. This eliminates many sets from $ {\cal I}_{ndet}$.

While playing the game, each player has its own nondeterministic I-state because the players may hide their decisions from each other. Let $ {\eta}_1$ and $ {\eta}_2$ denote the nondeterministic I-states for $ {{\rm P}_1}$ and $ {{\rm P}_2}$, respectively. For each player, many sets in $ {\cal I}_{ndet}$ are eliminated. Some are removed to avoid the confusions mentioned above. We also impose the constraint that $ {\eta}_i \subseteq
N_i$ for $ i=1$ and $ i=2$. We only care about the I-state of a player when it is that player's turn to make a decision. Thus, the nondeterministic I-state should tell us which decision vertices in $ N_i$ are possible as $ {{\rm P}_i}$ faces a decision. Let $ {\cal I}_1$ and $ {\cal I}_2$ represent the nondeterministic I-spaces for $ {{\rm P}_1}$ and $ {{\rm P}_2}$, respectively, with all impossible I-states eliminated.

Figure 11.30: Several different information models are illustrated for the game in Figure 10.13.
...} \\
(c) Open loop & (d) Something else

The I-spaces $ {\cal I}_1$ and $ {\cal I}_2$ are usually defined directly on the game tree by circling vertices that belong to the same I-state. They form a partition of the vertices in each level of the tree (except the leaves). In fact, $ {\cal I}_i$ even forms a partition of $ N_i$ for each player. Figure 11.30 shows four information models specified in this way for the example in Figure 10.13. The first three correspond directly to the models allowed in Section 10.5.1. In the alternating-play model, each player always knows the decision vertex. This corresponds to a case of perfect state information. In the stage-by-stage model, $ {{\rm P}_1}$ always knows the decision vertex; $ {{\rm P}_2}$ knows the decision vertex from which $ {{\rm P}_1}$ made its last decision, but it does not know which branch was chosen. The open-loop model represents the case that has the poorest information. Only $ {{\rm P}_1}$ knows its decision vertex at the beginning of the game. After that, there is no information about the actions chosen. In fact, the players cannot even remember their own previous actions. Figure 11.30d shows an information model that does not fit into any of the three previous ones. In this model, very strange behavior results. If $ {{\rm P}_1}$ and $ {{\rm P}_2}$ initially choose right branches, then the resulting decision vertex is known; however, if $ {{\rm P}_2}$ instead chooses the left branch, then $ {{\rm P}_1}$ will forget which action it applied (as if the action of $ {{\rm P}_2}$ caused $ {{\rm P}_1}$ to have amnesia!). Here is a single-stage example:

Example 11..26 (An Unusual Information Model)   Figure 11.31 shows a game that does not fit any of the information models in Section 10.5.1. It is actually a variant of the game considered before in Figure 10.12. The game is a kind of hybrid that partly looks like the alternating-play model and partly like the stage-by-stage model. This particular problem can be solved in the usual way, from the bottom up. A value is computed for each of the nondeterministic I-states, for the level in which $ {{\rm P}_2}$ makes a decision. The left I-state has value $ 5$, which corresponds to $ {{\rm P}_1}$ choosing $ 1$ and $ {{\rm P}_2}$ responding with $ 3$. The right I-state has value $ 4$, which results from the deterministic saddle point in a $ 2 \times 3$ matrix game played between $ {{\rm P}_1}$ and $ {{\rm P}_2}$. The overall game has a deterministic saddle point in which $ {{\rm P}_1}$ chooses $ 3$ and $ {{\rm P}_2}$ chooses $ 3$. This results in a value of $ 4$ for the game. $ \blacksquare$

Figure 11.31: A single-stage game that has an information model unlike those in Section 10.5.1.

Plans are now defined directly as functions on the I-spaces. A (deterministic) plan for $ {{\rm P}_1}$ is defined as a function $ \pi_1$ on $ {\cal I}_1$ that yields an action $ u \in
U({\eta}_1)$ for each $ {\eta}_1 \in {\cal I}_1$, and $ U({\eta}_1)$ is the set of actions that can be inferred from the I-state $ {\eta}_1$; assume that this set is the same for all decision vertices in $ {\eta}_1$. Similarly, a (deterministic) plan for $ {{\rm P}_2}$ is defined as a function $ \pi_2$ on $ {\cal I}_2$ that yields an action $ v \in
V({\eta}_2)$ for each $ {\eta}_2 \in {\cal I}_2$.

There are generally two alternative ways to define a randomized plan in terms of I-spaces. The first choice is to define a globally randomized plan, which is a probability distribution over the set of all deterministic plans. During execution, this means that an entire deterministic plan will be sampled in advance according to the probability distribution. An alternative is to sample actions as they are needed at each I-state. This is defined as follows. For the randomized case, let $ W({\eta}_1)$ and $ Z({\eta}_2)$ denote the sets of all probability distributions over $ U({\eta}_1)$ and $ V({\eta}_2)$, respectively. A locally randomized plan for $ {{\rm P}_1}$ is defined as a function that yields some $ w \in W({\eta}_1)$ for each $ {\eta}_1 \in {\cal I}_1$. Likewise, a locally randomized plan for $ {{\rm P}_2}$ is a function that maps from $ {\cal I}_2$ into $ Z({\eta}_2)$. Locally randomized plans expressed as functions of I-states are often called behavioral strategies in game theory literature.

A randomized saddle point on the space of locally randomized plans does not exist for all sequential games [59]. This is unfortunate because this form of randomization seems most natural for the way decisions are made during execution. At least for the stage-by-stage model, a randomized saddle point always exists on the space of locally randomized plans. For the open-loop model, randomized saddle points are only guaranteed to exist using a globally randomized plan (this was actually done in Section 10.5.1). To help understand the problem, suppose that the game tree is a balanced, binary tree with $ k$ stages (hence, $ 2k$ levels). For each player, there are $ 2^k$ possible deterministic plans. This means that $ 2^k-1$ probability values may be assigned independently (the last one is constrained to force them to sum to $ 1$) to define a globally randomized plan over the space of deterministic plans. Defining a locally randomized plan, there are $ k$ I-states for each player, one for each search stage. At each stage, a probability distribution is defined over the action set, which contains only two elements. Thus, each of these distributions has only one independent parameter. A randomized plan is specified in this way using $ k-1$ independent parameters. Since $ k-1$ is much less than $ 2^k-1$, there are many globally randomized plans that cannot be expressed as a locally randomized plan. Unfortunately, in some games the locally randomized representation removes the randomized saddle point.

This strange result arises mainly because players can forget information over time. A player with perfect recall remembers its own actions and also never forgets any information that it previously knew. It was shown by Kuhn that the space of all globally randomized plans is equivalent to the space of all locally randomized plans if and only if the players have perfect memory [562]. Thus, by sticking to games in which all players have perfect recall, a randomized saddle point always exists in the space locally randomized plans. The result of Kuhn even holds for the more general case of the existence of randomized Nash equilibria on the space of locally randomized plans.

The nondeterministic I-states can be used in game trees that involve more players. Accordingly, deterministic, globally randomized, and locally randomized plans can be defined. The result of Kuhn applies to any number of players, which ensures the existence of a randomized Nash equilibrium on the space of locally randomized strategies if (and only if) the players have perfect recall. It is generally preferable to exploit this fact and decompose the game tree into smaller matrix games, as described in Section 10.5.1. It turns out that the precise condition that allows this is that it must be ladder-nested [59]. This means that there are decision vertices, other than the root, at which 1) the player that must make a decision knows it is at that vertex (the nondeterministic I-state is a singleton set), and 2) the nondeterministic I-state will not leave the subtree rooted at that vertex (vertices outside of the subtree cannot be circled when drawing the game tree). In this case, the game tree can be decomposed at these special decision vertices and replaced with the game value(s). Unfortunately, there is still the nuisance of multiple Nash equilibria.

It may seem odd that nondeterministic I-states were defined without being derived from a history I-space. Without much difficulty, it is possible to define a sensing model that leads to the nondeterministic I-states used in this section. In many cases, the I-state can be expressed using only a subset of the action histories. Let $ {\tilde{u}}_k$ and $ {\tilde{v}}_k$ denote the action histories of $ {{\rm P}_1}$ and $ {{\rm P}_2}$, respectively. The history I-state for the alternating-play model at stage $ k$ is $ ({\tilde{u}}_{k-1},{\tilde{v}}_{k-1})$ for $ {{\rm P}_1}$ and $ ({\tilde{u}}_k,{\tilde{v}}_{k-1})$ for $ {{\rm P}_2}$. The history I-state for the stage-by-stage model is $ ({\tilde{u}}_{k-1},{\tilde{v}}_{k-1})$ for both players. The nondeterministic I-states used in this section can be derived from these histories. For other models, such as the one in Figure 11.31, a sensing model is additionally needed because only partial information regarding some actions appears. This leads into the formulation covered in the next section, which involves both sensing models and a state space.

Steven M LaValle 2012-04-20