The power of visibility

Figure 9.11: The real world contains special features, which are determined to lie along a line segment that connects to the focal point via perspective projection.

The most powerful paradigm for $ 6$-DOF tracking is visibility. The idea is to identify special parts of the physical world called features and calculate their positions along a line-of-sight ray to a known location. Figure 9.11 shows an example inspired by a camera, but other hardware could be used. One crucial aspect for tracking is distinguishability. If all features appear to be the same, then it may become difficult to determine and maintain ``which is which'' during the tracking process. Each feature should be assigned a unique label that is invariant over time, as rigid bodies in the world move. Confusing features with each other could cause catastrophically bad estimates to be made regarding the body pose.

The most common sensor used to detect features is a digital camera. Detecting, labeling, and tracking features are common tasks in computer vision or image processing. There are two options for features:

  1. Natural: The features are automatically discovered, assigned labels, and maintained during the tracking process.
  2. Artificial: The features are engineered and placed into the environment so that they can be easily detected, matched to preassigned labels, and tracked.
Natural features are advantageous because there are no setup costs. The environment does not need to be engineered. Unfortunately, they are also much more unreliable. Using a camera, this is considered to be a hard computer vision problem because it may be as challenging as it is for the human visual system. For some objects, textures, and lighting conditions, it could work well, but it is extremely hard to make it work reliably for all possible settings. Imagine trying to find and track features on an empty, white wall. Therefore, artificial features are much more common in products.

Figure 9.12: A sample QR code, which could be printed and used as an artificial feature. (Picture from Wikipedia.)

For artificial features, one of the simplest solutions is to print a special tag onto the object to be tracked. For example, one could print bright red dots onto the object and then scan for their appearance as red blobs in the image. To solve the distinguishability problem, multiple colors, such as red, green, blue, and yellow dots, might be needed. Trouble may occur if these colors exist naturally in other parts of the image. A more reliable method is to design a specific tag that is clearly distinct from the rest of the image. Such tags can be coded to contain large amounts of information, including a unique identification number. One of the most common coded tags is the QR code, an example of which is shown in Figure 9.12.

The features described so far are called passive because they do not emit energy. The hope is that sufficient light is in the world so that enough reflects off of the feature and enters the camera sensor. A more reliable alternative is to engineer active features that emit their own light. For example, colored LEDs can be mounted on the surface of a headset or controller. This comes at the expense of requiring a power source and increasing overall object cost and weight. Furthermore, its industrial design may be compromised because it might light up like a Christmas tree.

Figure 9.13: The Oculus Rift headset contains IR LEDs hidden behind IR-transparent plastic. (Photo from

Steven M LaValle 2020-01-06