12.3.2 Stentz's Algorithm (D$ ^*$)

Imagine exploring an unknown planet using a robotic vehicle. The robot moves along the rugged terrain while using a range scanner to make precise measurements of the ground in its vicinity. As the robot moves, it may discover that some parts were easier to traverse than it originally thought. In other cases, it might realize that some direction it was intending to go is impassable due to a large bolder or a ravine. If the goal is to arrive at some specified coordinates, this problem can be viewed as a navigation problem in an unknown environment. The resulting solution is a lazy approach, as discussed in Section 12.2.1.

Figure 12.17: The Automated Cross-Country Unmanned Vehicle (XUV) is equipped with laser radar and other sensors, and uses Stentz's algorithm to navigate (courtesy of General Dynamics Robotic Systems).

This section presents Stentz's algorithm [913], which has been used in many outdoor vehicle navigation applications, such as the vehicle shown in Figure 12.17. The algorithm can be considered as a dynamic version of the backward variant of Dijkstra's algorithm. Thus, it maintains cost-to-go values, and the search grows outward from the goal, as opposed to cost-to-come values from $ x_I$ in the version of Dijkstra's algorithm in Section 2.3.3. The method applies to any optimal planning problem. In terms of the state transition graph, it is assumed that the costs of edge transitions are unknown (equivalently, each cost $ l(x,u)$ is unknown). In the navigation problem, a positive cost indicates the difficulty of traveling from state $ x$ to state $ x' = f(x,u)$.

To work with a concrete problem, imagine that a planet surface is partitioned into a high-resolution grid. The state space is simply a bounded set of grid tiles; hence, $ X \subseteq {\mathbb{Z}}\times {\mathbb{Z}}$. Each grid tile is assigned a positive, real value, $ c(x)$, that indicates the difficulty of its traversal. The actions $ U(x)$ at each grid point can be chosen using standard grid neighbors (e.g., four-neighbors or eight-neighbors). This now defines a state transition graph over $ X$. From any $ x' \in X$ and $ u' \in U(x')$ such that $ x = f(x',u')$, the cost term is assigned using $ c$ as $ l(x',u') = c(x)$. This model is a generalization of the grid in Section 12.3.1, in which the tiles were either empty or occupied; here any positive real value is allowed. In the coming explanation, the costs may be more general than what is permitted by starting from $ c(x)$, and the state transition graph does not need to be derived from a grid. Some initial values are assigned arbitrarily for all $ l(x,u)$. For example, in the planetary exploration application, the cost of traversing a level, unobstructed surface may be uniformly assumed.

The task is to navigate to some goal state, $ {x_{G}}$. The method works by initially constructing a feedback plan, $ \pi $, on a subset of $ X$ that includes both $ {x_{I}}$ and $ {x_{G}}$. The plan, $ \pi $, is computed by iteratively applying the procedure in Figure 12.18 until the optimal cost-to-go is known at $ x_I$. A priority queue, $ Q$, is maintained as in Dijkstra's algorithm; however, Stentz's algorithm allows the costs of elements in $ Q$ to be modified due to information sensed during execution. Let $ {G_{best}}(x)$ denote the lowest cost-to-go associated with $ x$ during the time it spends in $ Q$. Assume that $ Q$ is sorted according to $ {G_{best}}$. Let $ {G_{cur}}(x)$ denote its current cost-to-go value, which may actually be more than $ {G_{best}}(x)$ if some cost updates caused it to increase. Suppose that some $ u \in U(x)$ can be applied to reach a state $ x' = f(x,u)$. Let $ {G_{via}}(x,x')$ denote the cost-to-go from $ x$ by traveling via $ x'$,

$\displaystyle {G_{via}}(x,x') = {G_{cur}}(x') + l(x,u) .$ (12.24)

If $ {G_{via}}(x,x') < {G_{cur}}(x)$, then it indicates that $ {G_{cur}}(x)$ could be reduced. If $ {G_{cur}}(x') \leq {G_{best}}(x)$, then it is furthermore known that $ {G_{cur}}(x')$ is optimal. If both of these conditions are met, then $ {G_{cur}}(x)$ is updated to $ {G_{via}}(x,x')$.

After the iterations of Figure 12.18 finish, the robot executes $ \pi $, which generates a sequence of visited states. Let $ x_k$ denote the current state during execution. If it is discovered that if $ \pi(x_k) = u_k$ would be applied, the received cost would not match the cost $ l(x_k,u_k)$ in the current model, then the costs need to be updated. More generally, the robot may have to be able to update costs within a region around $ x_k$ that corresponds to the sensor field of view. For the description below, assume that an update, $ l(x_k,u_k)$, is obtained for $ x_k$ only (the more general case is handled similarly). First, $ l(x_k,u_k)$ is updated to the newly measured value. If $ x_k$ happened to be dead (visited, but no longer in $ Q$), then it is inserted again into $ Q$, with cost $ {G_{cur}}(x_k)$. The steps in Figure 12.18 are performed until $ {G_{cur}}(x_k)
\leq {G_{best}}(x)$ for all $ x \in Q$. Following this, the plan execution continues until either the goal is reached or another cost mismatch is discovered. At any time during execution, the robot motions are optimal given the current information about the costs [913].

Figure 12.18: Stentz's algorithm, often called $ D^*$ (pronounced ``dee star''), is a variant of Dijkstra's algorithm that dynamically updates cost values as the cost terms are learned during execution. The steps here are only one iteration of updating the costs after a removal of a state from $ Q$.
\begin{figure}STENTZ'S ALGORITHM
\item Remove $x$ from $Q$, w...
...< {G_{cur}}(x)$, and ${G_{cur}}(x) >

Figure 12.19: An example of executing Stentz's algorithm (courtesy of Tony Stentz).
...in,width=1.77truein} \\
(a) & (b) & (c)

Figure 12.19 illustrates the execution of the algorithm. Figure 12.19a shows a synthetic terrain that was generated by a stochastic fractal. Darker gray values indicate higher cost. In the center, very costly terrain acts as a barrier, for which an escape route exists in the downward direction. The initial state is the middle of the left edge of the environment, and the goal state is the right edge. The robot initially plans a straight-line path and then incrementally updates the path in each step as it moves. In Figure 12.19b, the robot has encountered the costly center and begins to search for a way around. Finally, the goal is reached, as shown in Figure 12.19c. The executed path is actually the result of executing a series of optimal paths, each of which is based on the known information at the time a single action is applied.

Steven M LaValle 2012-04-20