Motion Strategies for Changing, Partially-Predictable Environments
Com S 476/576 FINAL PROJECT - Spring 1998

Pradeep C Bollineni


Project Description

Problem Description

In this project we are concerned with the motion plans of a Robot in an environment that changes over time and is not completely predictable. Here, we model the environment as a markov process. We assume that the environment always exists in one of the states, picked from the set of all possible states. Also the transition from one state to another is based on a probability distribution which is specified a priori. The Robot only knows about the current state of the environment and has no idea about the environments' future states. Only knowledge it has is the probability with which each of the states( including the current) can be reached in future (next time instant). Hence the name partially predictable.

This class of problems (motion planning in a changing environment) are inherently difficult, and a number of intractability results have been proved. The problem of motion planning in a partially predictable environment is hard due to the fact that we need strategies for robot motion which can deal with any contingency that it could confront with. All we know is the initial state of the environment and the initial position of the robot.

In this project, we deal with a point robot in a two dimensional changing environment, in which a door can open and close with a certain probability. Aim of the Robot is to reach the goal in this kind of an environment by optimizing some performance measure(here we consider only time optimality). So, we are dealing with a two dimensional Configuration Space.

Basic Approach:

We discretize the world into a number of points. In the examples shown we consider a 20*20 matrix of points. Using a Dynamic Programming algorithm we find the best strategy to go to the goal from each point. Using these strategies generated a priori the Robot finds a path from its destination to the goal minimizing the time taken to reach the goal.

Using the Dynamic Programming Algorithm we generate the optimal cost to go to the goal from each point. This algorithm runs over a number of iterations. In each iteration the cost matrix of the previous iteration is used to calculate the cost matrix for this iteration. Initially, we start off by assigning a cost of 0 to all points within the goal region (if one is inside the goal region then there is no cost involved) and an infinite cost to all other points. We consider a set of actions (in the examples shown we consider 30 actions) at each point. We choose that action which costs the least (the action that could probably get the robot to the goal in minimum time from that point at this iteration). Then, we update the cost to go from the current point, to be the minimum among the costs of the actions considered. (a no movement action is also considered). So, after the first iteration of the algorithm all those points which are close to the goal and can get to the goal in a single step have their costs set to the single step time. While the costs at other points will still be infinite as they cannot get to the goal in a single step. In this manner after each iteration of the algorithm some points in the world will set their costs to the minimum cost(time) to reach the goal. This algorithm will run until all the points in the free space have set their costs to the minimum cost to reach the goal. Note that the points within the obstacle regions and the dynamic regions (those regions which change with time, eg. door - open, close) will have their costs still at infinite because if the robot is in these regions there is no way it can reach the goal.

We are dealing with changing environment and assume that there are a fixed number of states in which the environment can be present. So, some regions (Dynamic Regions) are present in some of the states while they are absent in others. When we are considering an action, as we donot know which state the environment is going to be in future we assume nothing about the environment's future state and consider all the states by using the probabilities given. For example, if there are two possible states(s1, s2) for the environment and an action takes the robot to a point K in the free space, the costs of which are K_s1 (in state 1) and K_s2 (in state 2). Also assume that the current state is s1 and the probability for the next state to be s2 is P12 and the probability for the next state to be s1 itself is P11 (Note that P11+P12=1). So, the cost for the action is found as C = K_s1 * P11 + K_s2 * P12.

Another key aspect of this calculation is as follows. Since we consider a set of actions. When an action is considered it may not take the robot to exactly some descretized point. So, to calculate the cost, from the point where the action takes the Robot to the goal, we use Interpolation. We use linear interpolation in this project. Specifically, we find the four points between which the point the Robot reaches upon the action falls. Then to find the cost from this point to the goal, we make use of the costs from the four descretized points. We take a weighted average of these four points to obtain the cost at the point we are interested in.

Once the best cost to reach goal from each point is reached we stop the dynamic programming algorithm. Now we are set for the execution stage where a robot placed at a certain position in the free space finds its way to the goal region. To simulate the environment process we use a random number generator and then select the next environment state based on this random value.

Implementation Issues

Most crucial part in this project from the implementation point of view is the dynamic programming algorithm which is central to the generation of a motion plan. In particular we need to be careful in setting the termination condition for this algorithm. In this project three points are checked for termination and atleast one of them should be satisfied at each of the descretized points. The conditions are-

Also we should be careful when considering points close to the dynamic regions. In particular, we should not allow the door to close when the robot is close to it. This can be implemented by forcing the door to remain open, if it is open and the robot is close to the door.

If distance is chosen as the parameter to optimize, rather than time, then something weird happens. At some points the robot thinks that staying at that point is the best action than taking any action and so we encounter a deadlock situation where the robot never reaches the goal. This happens because the distance that the robot has to travel to reach the goal can be covered at any point on the time line, so the robot keeps waiting which means that its going to cover this distance some time in future which goes on indefinitely.

Scope for Improvement:

In the Dynamic Programming algorithm, at each iteration we use the cost matrix from the previous iteration and calculate the cost matrix for the current iteration. But, from the previous matrix to the present one, only few of the costs change and so carrying this computation over all the descretized points is a futile effort. Instead we can consider at each iteration only those points whose costs will change in that iteration. In this manner we can reduce the time for each iteration. But, an important point here is to identify which values change in what iteration.

Trade-Off's in Parameters

This algorithm is Resolution Complete. It's accuracy depends on the number of descretized points, and the number of actions considered. As these parameters are increased in number, time for each execution goes up; while keeping them low effects the correctness (optimality). So, there is a tradeoff in selecting these parameters which is of great importance.

Parameters Used & Computation Results

In this project we used a grid of 20*20. 30 actions were considered at each point. One dynamic region was used, a door, which can be in one of the two states (1 => open, 2=> closed), whose probabilities are represented by the 4-tuple (P11,P12,P22,P21), where-

The program was tested for various probabilities. It was found that the Dynamic Programming Algorithm terminates in about 30 to 60 iterations. Each iteration took about 1.7 seconds.

References:

On Motion Planning in Changing, Partially-Predictable Environments, Steven M. LaValle, Rajeev Sharma.


Computed Examples


Implementation Files

Source Code:

Input Data Files: