Evan Herbst: 2002-2003 CompSys: Background Papers

Background Papers

• A technical report in PDF format on Microsoft's Easy Camera Calibration Tool by Microsoft Research's Zhengyou Zhang

(View review)

• A report on corner detection in camera calibration by my summer mentor, Mark Livingston

(View review)

• A report on corner detection in camera calibration by Janne Heikkila of Infotech Oulu

(View review)

• An explanation of various methods of corner detection by the Berlin Technical University's Ute Schmid

(View review)

Background Paper Review: The Microsoft Easy Camera Calibration Tool

by Zhengyou Zhang

Once corners (or any type of feature desired) are detected, labeled and listed for a set of at least two images, the estimated corner coordinates and model coordinates can be used to in turn estimate the internal parameters of the camera that took the images. These parameters include (but are not limited to) position and orientation in 3-space, skew, and scaling in two dimensions. Zhang presents an accurate but simple- to-understand mathematical routine for performing this calibration.

I will not attempt to put down every math-laden detail of Zhang's routine, but I will cover the important points. Zhang takes in a model file and any number of image description files containing simply coordinates for all important points, each file having these coordinates in a certain (and the same) order. The coordinating sets of coordinates are all the image information that is necessary to calibrate a camera. Next Zhang transforms each set of points from a certain image to make them easier to work with; in his calculations, if points are left at large distances from the (very arbitrary) origin, some small differences in values are manipulated out of existence, and these small differences are the most important parts of the data.

Now Zhang is ready to perform the calibration itself. First he estimates the transformations necessary to move from the 3-D coordinate system of the model that was photographed to the coordinate systems of each image taken; this can be done using the angles between points and some knowledge of perspective. Then he calls an optimizer (see my poster page for details), gives it all of the camera's internal parameters in matrix form, and evaluates the result of each step based on the 3-D distances between each point in the model file and its corresponding points in each image. The smaller the value of this function, the better the estimate of the camera parameters.

This process is made easier if all images are moved to one of three planes--x, y, or z--before optimizing, as this moves the whole process to two dimensions. Zhang does this also. Thus the optimization is almost his entire process if measured in time. His routine is both simple mathematically and efficient. I will actually be rewriting his algorithm for our uses, which are a little more specific than Zhang's; see calibrate.cpp, my version of his routine, for details.

Background Paper Review: Corner Detection and its Application to Camera Calibration

by Mark Livingston and H. Harlyn Baker

Detecting corners in viewed scenes is one good way to calibrate video cameras, that is, to tell them where things are in 3-space and to give them information about their orientations. Corners are some of the easiest objects to detect in scenes because of the high contrast in color between different objects. This contrast also means that black-and-white images are better for calibration than are images with color, which have less differential between colors.

Two methods of attempting to find a Plessey value for a miniscule-sized window (this is analogous to finding the instantaneous image derivative) are researched and explained: a gradient search and a scale-space search.

The image gradient is a map of the derivatives of the image in all directions. The gradient corresponds very well to the error in detecting corner points by a modified Plessey algorithm (which actually finds the image derivatives in two directions); experiments have been conducted into the correspondence between each of the image gradient and the conjugate image gradient (perpendicular to the gradient) with the error in points found by the Plessey function. This error in calculated points is generally constant for a given set of images, implying that with a gradient with high enough contrast, the algorithm could have extremely close to zero maximum error.

A scale-space search attempts to extrapolate the derivative values for an infinitessimally small area of the 2-D scene given the values of several windows of finite size. The larger the window, the less accurate the result, but at small enough pixel-size values--3 to 5 pixels across--the windows provide little information due to gradient blurring.

The scale-space search provided better correlations between the gradient and the error values, while the gradient search, strangely, gave better correlations with the conjugate gradient than with the gradient. This is counterintuitive, and will be researched further.

Background Paper Review: A Four-Step Camera Calibration Procedure with Implicit Image Correction

by Janne Heikkila and Olli Silven

A four-step procedure for calibrating cameras extremely accurately is presented. It also describes the pinhole model of a camera, which assumes no lens distortion or 3-d surfaces visible. The approach built up in the paper describes a way of calculating a camera's intrinsic parameters from a small amount of known outside information.

A homogeneous coordinate system, adjusted for an unknown angle between the "plane" of the lens and the "plane" captured in the image, is used to model lens distortion. Heikkila and Silven use estimation of the camera view line, given an image, to help estimate camera coordinates with 1000% of normal previous accuracy.

The article affirms that the larger the number of visible interest points, or features of the image, the better those features can be placed in 3-space, and also that the better the features' positions are known, the better the camera's position can be estimated. This should come as no revelation.

Background Paper Review: UNKNOWN [Article Extract]

by Ute Schmid

"Interest points" are points in an image that hold interest for a user, such as corners or points of high intensity. Three main types of detectors for interest points are presented and explained in detail.

Contour-based detectors measure the contours, or curvatures, of the image function and select points of maximum or minimum curvature. They sometimes split curvatures into linear parts and locate intersections of large numbers of line segments.

Intensity-based methods select points at the centers of multi-pixel-wide areas of high image gradient; in other words, they locate areas that stand out in color. One intensity-based method, known to Schmid as "Harris", uses the normals of a function known as the auto-correlatin matrix to determine interest points very simply: the larger the gradient in both directions at a point, the more likely it is to be of interest to us.

Parametric intensity models fit parametric equations--I as a function of x and y-- to areas of the image to find areas of interest within the functions. They have very good accuracy, but can only operate in the two dimensions of the image, so they can basically only detect corners at very small angles with the conventional axes.

The Harris method is most of importance to us, because it is the method we will be using to detect our interest points (corners specifically). The auto-correlation matrix A is simply an approximate of a local measure of changes in the image gradient, or the multidimensional derivative of the image function I, which measures the color value(s) in the image at a point.