9.3.1 Game Formulation

Suppose there are two players, $ {{\rm P}_1}$ and $ {{\rm P}_2}$, that each have to make a decision. Each has a finite set of actions, $ U$ and $ V$, respectively. The set $ V$ can be viewed as the ``replacement'' of $ \Theta$ from Formulation 9.3 by a set of actions chosen by a true opponent. Each player has a cost function, which is denoted as $ L_i : U \times V \rightarrow {\mathbb{R}}$ for $ i = 1, 2$. An important constraint for zero-sum games is

$\displaystyle L_1(u,v) = -L_2(u,v),$ (9.41)

which means that a cost for one player is a reward for the other. This is the basis of the term zero sum, which means that the two costs can be added to obtain zero. In zero-sum games the interests of the players are completely opposed. In Section 9.4 this constraint will be lifted to obtain more general games.

In light of (9.41) it is pointless to represent two cost functions. Instead, the superscript will be dropped, and $ L$ will refer to the cost, $ L_1$, of $ {{\rm P}_1}$. The goal of $ {{\rm P}_1}$ is to minimize $ L$. Due to (9.41), the goal of $ {{\rm P}_2}$ is to maximize $ L$. Thus, $ L$ can be considered as a reward for $ {{\rm P}_2}$, but a cost for $ {{\rm P}_1}$.

A formulation can now be given:

Formulation 9..7 (A Zero-Sum Game)  
  1. Two players, $ {{\rm P}_1}$ and $ {{\rm P}_2}$.
  2. A nonempty, finite set $ U$ called the action space for $ {{\rm P}_1}$. For convenience in describing examples, assume that $ U$ is a set of consecutive integers from $ 1$ to $ \vert U\vert$. Each $ u \in U$ is referred to as an action of $ {{\rm P}_1}$.
  3. A nonempty, finite set $ V$ called the action space for $ {{\rm P}_2}$. Assume that $ V$ is a set of consecutive integers from $ 1$ to $ \vert V\vert$. Each $ v \in V$ is referred to as an action of $ {{\rm P}_2}$.
  4. A function $ L: U \times V \rightarrow {\mathbb{R}}\cup
\{-\infty,\infty\}$ called the cost function for $ {{\rm P}_1}$. This also serves as a reward function for $ {{\rm P}_2}$ because of (9.41).

Before discussing what it means to solve a zero-sum game, some additional assumptions are needed. Assume that the players know each other's cost functions. This implies that the motivation of the opponent is completely understood. The other assumption is that the players are rational, which means that they will try to obtain the best cost whenever possible. $ {{\rm P}_1}$ will not choose an action that leads to higher cost when a lower cost action is available. Likewise, $ {{\rm P}_2}$ will not choose an action that leads to lower cost. Finally, it is assumed that both players make their decisions simultaneously. There is no information regarding the decision of $ {{\rm P}_1}$ that can be exploited by $ {{\rm P}_2}$, and vice versa.

Formulation 9.7 is often referred to as a matrix game because $ L$ can be expressed with a cost matrix, as was done in Section 9.2. Here the matrix indicates costs for $ {{\rm P}_1}$ and $ {{\rm P}_2}$, instead of the robot and nature. All of the required information from Formulation 9.7 is specified by a single matrix; therefore, it is a convenient form for expressing zero-sum games.

Example 9..13 (Matrix Representation of a Zero-Sum Game)   Suppose that $ U$, the action set for $ {{\rm P}_1}$, contains three actions and $ V$ contains four actions. There should be $ 3 \times 4 = 12$ values in the specification of the cost function, $ L$. This can be expressed as a cost matrix,

$\displaystyle \begin{tabular}{cc} & $V$  $U$ & \begin{tabular}{\vert c\vert...
...0 & -1 & 2 & 1  \hline -2 & 2 & 0 & 1  \hline \end{tabular} \end{tabular} ,$ (9.42)

in which each row corresponds to some $ u \in U$, and each column corresponds to some $ v \in V$. Each entry yields $ L(u,v)$, which is the cost for $ {{\rm P}_1}$. This representation is similar to that shown in Example 9.8, except that the nature action space, $ \Theta$, is replaced by $ V$. The cost for $ {{\rm P}_2}$ is $ -L(u,v)$. $ \blacksquare$

Steven M LaValle 2012-04-20