Camera-based implementation

Figure 9.15: Two cases for camera placement: (a) A world-fixed camera is stationary, and the motions of objects relative to it are estimated using features on the objects. (b) An object-fixed camera is frequently under motion and features are ideally fixed to the world coordinate frame.
....eps,width=2.5truein} \\
(a) & & (b) \\

The visibility problem may be solved using a camera in two general ways, as indicated in Figure 9.15. Consider the camera frame, which is analogous to the eye frame from Figure 3.14 in Chapter 3. A world-fixed camera is usually stationary, meaning that the camera frame does not move relative to the world. A single transformation may be used to convert an object pose as estimated from the camera frame into a convenient world frame. For example, in the case of the Oculus Rift headset, the head pose could be converted to a world frame in which the $ -z$ direction is pointing at the camera, $ y$ is ``up'', and the position is in the center of the camera's tracking region or a suitable default based on the user's initial head position. For an object-fixed camera, the estimated pose, derived from features that remain fixed in the world, is the transformation from the camera frame to the world frame. This case would be obtained, for example, if QR codes were placed on the walls.

As in the case of an IMU, calibration is important for improving sensing accuracy. The following homogeneous transformation matrix can be applied to the image produced by a camera:

$\displaystyle \begin{bmatrix}\alpha_x & \gamma & u_0  0 & \alpha_y & v_0  0 & 0 & 1  \end{bmatrix}$ (9.23)

The five variables appearing in the matrix are called intrinsic parameters of the camera. The $ \alpha_x$ and $ \alpha_y$ parameters handle scaling, $ \gamma $ handles shearing, and $ u_0$ and $ v_0$ handle offset of the optical axis. These parameters are typically estimated by taking images of an object for which all dimensions and distances have been carefully measured, and performing least-squares estimation to select the parameters that reduce the sum-of-squares error (as described in Section 9.1). For a wide-angle lens, further calibration may be needed to overcome optical distortions (recall Section 7.3).

Now suppose that a feature has been observed in the image, perhaps using some form of blob detection to extract the pixels that correspond to it from the rest of the image [285,323]. This is easiest for a global shutter camera because all pixels will correspond to the same instant of time. In the case of a rolling shutter, the image may need to be transformed to undo the effects of motion (recall Figure 4.33). The location of the observed feature is calculated as a statistic of the blob pixel locations. Most commonly, the average over all blob pixels is used, resulting in non-integer image coordinates. Many issues affect performance: 1) quantization errors arise due to image coordinates for each blob pixel being integers; 2) if the feature does not cover enough pixels, then the quantization errors are worse; 3) changes in lighting conditions may make it difficult to extract the feature, especially in the case of natural features; 4) at some angles, two or more features may become close in the image, making it difficult to separate their corresponding blobs; 5) as various features enter or leave the camera view, the resulting estimated pose may jump. Furthermore, errors tend to be larger along the direction of the optical axis.

Steven M LaValle 2020-01-06