Zhang's Algorithm

 

Zhang's algorithm for calibration takes as input several files listing coordinates of detected corner points in various images taken by the interested camera, sorted for correspondence; a model file with a sorted list of coordinates in a separate coordinate system; and a boolean suggesting whether distortion is being modeled or not (my rewriting of his program assumes implicit simple distortion, for maximum accuracy and least time taken). It produces estimates of the position of the center of the camera lens with respect to any image; the horizontal and vertical focal lengths of the camera; the distortion assumed for the camera lens; and the homographies (3-D transformations) geometrically connecting the planes holding all the images to the plane holding the model, which is assumed to be parallel to the xy-plane with z=1, for ease fo calculation.
Normalization The coordinates representing the detected image points and those representing the model points, which were physically measured and entered, do not correspond. Therefore both sets of points are normalized to make them relatively close before the homographies from the model plane to each image plane are calculated.
Homographies A homography is composed of a rotation (in this case, in 3-D) and a translation; also, in this case, the z-movement created by the rotation is abandoned so that all points will, after homographization, lie on the plane z=1. The homographies, by mapping one plane to another and thus providing a correspondence between two sets of points on different planes, allow us to find the error in our estimate of one set of points that should correspond exactly to another. It would be much harder to measure the error if the two involved planes were left pointing in any direction and at any 3-D point.
Complexification A homography (a 3 x 3 matrix) is actually a composition of most of a 3-D rotation; a 3-D translation; and some adjustment for the properties of the camera. Zhang's next step is to decompose the homographies, over several steps, to form an initial guess at the rotations and translations that relate the different planes holding the images, as well as the internal characteristics fo the camera.
Distortion The camera's views of sets of points is not perfect because of inherent distortion attributable to the lens. Since most of the distortion is in the radial direction from the lens center, or principal point, only this distortion is modeled. Two parameters suffice to describe the distortion that must be applied to an ideal point to (hopefully) reconcile it with the measured data.
Optimization After all parameters have been measured--actually, after each step of obtaining a guess at some parameters--the entire set of data gathered thus far is optimized (adjusted as a group) to minimize the error in the initial estimate of the rest of the parameters. Optimization will adjust all parameters slightly (or should) in order to slightly, or maybe greatly, improve the accuracy of the entire model. This process involves using all the gathered data to create an estimated model of the points that were input, and comparing it to the actual input data, to try to maximize the correspondence of all of them. Optimization is fragile and subject to error, however, and is not always useful.