NASA Technical Memorandum 104795
Evaluation of Lens Distortion Errors Using
An Underwater Camera System
For Video-Based Motion Analysis
Jeffrey Poliner
Lockheed Engineering & Sciences Company
Houston, Texas
Lauren Fletcher & Glenn K. Klute
Lyndon B. Johnson Space Center
Houston, Texas
INTRODUCTION
Video-based motion analysis systems are widely employed to
study human movement, using computers to capture, process, and analyze video data. This
video data can be collected in any environment where cameras can be located.
The Anthropometry and Biomechanics Laboratory (ABL) at the
Johnson Space Center is responsible for the collection and quantitative evaluation of
human performance data for the National Aeronautics and Space Administration (NASA). One
of the NASA facilities where human performance research is conducted is the Weightless
Environment Training Facility (WETF). In this underwater facility, suited or unsuited crew
members or subjects can be made neutrally buoyant by adding weights or buoyant foam at
various locations on their bodies. Because it is underwater, the WETF poses unique
problems for collecting video data. Primarily, cameras must be either waterproof or
encased in a waterproof housing.
The video system currently used by the ABL is manufactured
by Underwater Video Vault. This system consists of closed circuit video cameras (Panasonic
WV-BL202) enclosed in a cylindrical case with a plexiglass dome covering the lens. The
dome used to counter the magnifying effect of the water is hypothesized to introduce
distortion errors.
As with any data acquisition system, it is important for
users to determine the accuracy and reliability of the system. Motion analysis systems
have many possible sources of error inherent in the hardware, such as the resolution of
recording, viewing and digitizing equipment, and l a video-based motion analysis system.
It is, therefore, of interest to determine the degree of this error in various regions of
the lens. A previous study (Poliner, et al., 1993) developed
a methodology for evaluating errors introduced by lens distortion. In that study, it was
seen that errors near the center of the video image were relatively small and the error
magnitude increased with the radial distance from the center. Both wide angle and standard
lenses introduced some degree of barrel distortion Fig 1.
Since the ABL conducts underwater experiments that involve
evaluating crew members' movements to understand and quantify the way they will perform in
space, it is of interest to apply this methodology to the cameras used to record
underwater activities. In addition to distortions from the lens itself, there will be
additional distortions caused by the refractive properties of the interfaces between the
water and camera lens.
This project evaluates the error caused by the lens
distortion of the cameras used by the ABL in the WETF.
METHODS
Data Collection
A grid was constructed from a sheet of 0.32 cm (0.125 in)
Plexiglas. Thin black lines spaced 3.8 cm (1.5 in) apart were drawn vertically and
horizontally on one side of the sheet. Both sides of the sheet were then painted with a
WETF approved white placite to give color contrast to the lines. The total grid size was
99.1 x 68.6 cm (39.0 x 27.0 in). The intersections of the 19 horizontal and 27 vertical
lines defined a total of 513 points (fig. 2). The center point of the grid was marked for
easy reference. Using Velcro, the grid was attached to a wooden frame, which was then
attached to a stand and placed on the floor of the) WFTF pool.
Fig 2.
At the heart of the Video Vault system was a Panasonic
model WV-BL202 closed circuit video camera. The camera had been focused above water,
according to the procedures described in and, and placed on the WETF floor facing the
grid. Divers used voice cues from the test director for fine alignment of the camera with
the center of the grid. By viewing the video on pool side monitors, the camera was
positioned so that a predetermined region of the grid nearly filled the field of view. The
distance from the camera to the grid was adjusted several times, ranging from 65.3 to 72.6
cm (25.7 to 28.6 in). Data collection consisted of videotaping the grid for at least 30
seconds in each of the positions, with each position considered a separate trial.
Descriptions of the arrangements of the four trials are given in Table 1.
Distance refers to the distance from the camera to the
grid. Image size was calculated by estimating the total number of grid units from the
video. The distance from the outermost visible grid lines to the edge of the image was
estimated to the nearest one-tenth of a grid unit. The distance and image size values are
all in centimeters.
Data Analysis
An Ariel performance analysis system (APAS) was
used to process the video data. Recorded images of the grid were played back on a
VCR. A personal computer was used to grab and store the images on disk. For each trial,
several frames were chosen from the recording and saved, as per APAS requirements. From
these, analyses were performed on a single frame for each trial.
Because of the large number of points (up to 357) being
digitized in each trial , the grid was subdivided into separate regions for digitizing and
analysis. Each row was defined as a region and digitized separately.
An experienced operator digitized all points in the grid
for each of the trials. Here digitizing refers to the process of the operator identifying
the location of points of interest in the image with the use of a mouse-driven cursor.
Often digitizing is used to refer to the process of grabbing an image from video format
and saving it in digital format on the computer. Digitizing and subsequent processing
resulted in X and Y coordinates for the points.
Part of the digitizing process involved identifying points
of known coordinates as control (calibration) points. Digitization of these allows for
calculation of the transformation relations from image space to actual coordinates. In
this study, the four points diagonal from the center of the grid were used as the control
points (points marked "X" in fig. 2). These were chosen because it was
anticipated that errors would be smallest near the center of the image. Using control
points which were in the distorted region of the image would have further complicated the
results. The control points were digitized and their known coordinates were used to
determine the scaling from screen units to actual coordinates.
For trial 1, the coordinates ranged from 0 to approximately
_38.1 cm in the X direction and 0 to approximately _30.48 cm in the Y direction. For
trials 2 and 3, the ranges were 0 to _34.29 cm in X and 0 to _26.67 cm in Y. For trial 4,
the range was 0 to _34.29 cm in X and 0 to -22.86 and +26.67 in Y. To remove the
dependence of the data on the size of the grid, normalized coordinates were calculated by
dividing the calculated X and Y coordinates by half the total image size in the X and Y
directions, respectively. Table 1 lists these sizes for the four trials. Thus, normalized
coordinates in both the X and Y directions were dimensionless and ranged approximately
from -1 to +1 for all four trials.
For all trials, the error for each digitized point was
calculated as the distance from the known coordinates of the point to the calculated
coordinates.
RESULTS
Raw data from the four trials are presented in figure 3. Shown are graphs of the calculated normalized
coordinates of points. Grid lines on the graphs do not necessarily correspond to the edges
of the images.
For each trial, the error of each point was calculated as
the distance between the calculated location (un-normalized) and the known location of
that point. These error values were then normalized by calculating them as a percent of
half the image size in the horizontal direction (trial 1, 40.2 cm; trial 2, 36.8 cm;
trials 3 and 4, 36.2 cm). This dimension was chosen arbitrarily to be representative of
the size of the image.
Figure 4 presents contour
plots of the normalized error as a function of the normalized X-Y location in the image
for each of the trials. This type of graph, commonly used in land elevation maps, displays
information three dimensionally. The coordinate axes represent two of the dimensions.
Here, these were the X and Y coordinates of the points. The third dimension represents the
value of interest as a function of the first two dimensions, in this case, the error as a
function of the X and Y location. Curves were created by connecting points of identical
value.
Interpreting these graphs is similar to interpreting a land
map; peaks and valleys are displayed as closed contour lines. Once again, it was clear
that errors were small near the center of the image and became progressively greater
further away from the center.
The unevenness evident in some of these graphs can be
partly attributed to splitting the image into separate regions for the purpose of
digitizing. The control points were redigitized for each individual section. Since the
control points were close to the center of the image, a small error in their digitization
would be magnified for points further away from the center.
Another quantitative way of viewing this data was to
examine how the error varied as a function of the radial distance from the center of the
image. This distance was normalized by dividing by half the image size in the horizontal
direction (trial 1, 40.2 cm; trial 2, 36.8 cm; trials 3 and 4, 36.2 cm). Figure 5 presents these data for each of the four trials.
Linear and binomial regressions were then fit to the data
for each trial and for all data combined. The linear fit was of the form
Error = An + A0 R
where R was the radial distance from the center of the
image (normalized), and Ao and A1 were the coefficients of the least-squares fit. The
binomial fit was of the form:
Error = Bo + B1 R + B2 R2
where Bo, B1, and B2 were the coefficients of the fit. The
results of these leastsquares fits are presented in table 2. </a> <br>The columns labeled " RC" are the
squares of the statistical regression coefficients (r-square)
When reviewing these results, several points should be
noted. First, this study utilized a two-dimensional analysis algorithm. A limitation of
the study was that exactly four calibration points were required to define the scaling
from screen coordinates to actual coordinates. The use of more than four points would
likely result in less variability. Second, all coordinates and calculated errors were
normalized to dimensions of the image. Although there were many possibilities for the
choice of dimension (e.g., horizontal, vertical or diagonal image size; maximum
horizontal, vertical, or diagonal coordinate; average of horizontal and vertical image
size or maximum coordinate; etc.), the dimensions used to normalize were assumed to best
represent the image size.
It is clear from these data that a systematic error caused
by lens distortion occurred when using the underwater video system. Lens distortion errors
were less than 1% from the center of the image up to radial distances equivalent to 25% of
the horizontal image length (normalized R equal to 0.5). Errors were less than 5% for
normalized R up to 1, an area covering most of the image.
There seemed to be some degree of random noise. This was
evident in the scatter pattern seen in the graphs in figure 5. This error can most likely
be attributed to the process of digitizing. There are factors which limit the ability to
correctly digitize the location of a point, such as: if the point is more than one pixel
in either or both dimensions, irregularly shaped points, a blurred image, shadows, etc.
Because of these factors, positioning the cursor when digitizing was often a subjective
decision.
Four trials were analyzed in this study. Although all the
data were normalized, there were slight differences among the four trials (fig. 5 and
table 2). These can most likely be attributed to the uncertainty in determining the grid
size, which was estimated from the fraction of a grid unit from the outermost visible grid
lines to the edge of the images.Two types of regressions were fit to the data: linear and
binomial. The interpretation of the coefficients of the linear regression can provide
insight into the data. A1, the slope of the error-distance relation represents the
sensitivity of the error to the distance from the origin. Thus, it is a measure of the
lens distortion. A0 the intercept of the linear relation can be interpreted as the error
at a distance of zero. If the relation being modeled were truly linear, this would be
related to the random error not accounted for by lens distortion. However, in this case,
it is not certain if the error-distance relation was linear. The RC values gave an
indication of how good the fit was. The binomial curve fit seemed to more correctly
represent the data. The interpretation of these coefficients, however, is not as
straightforward.
CONCLUSIONS
This study has taken a look at one of the sources of error
in video-based motion analysis using an underwater video system. It was demonstrated that
errors from lens distortion could be as high as 5%. By avoiding the outermost regions of
the lens, the errors can be kept to less than .5%.
REFERENCES
Poliner, J., Wilmington, R.P., Klute, G.K., and Micocci, A.
Evaluation of Lens Distortion for Video-Based Motion Analysis, NASA Technical Paper 3266,
May 1993.