improving automated 3d reconstruction methods via vision...
TRANSCRIPT
* [email protected]; phone +39 0461 314446; fax +39 0461 314340; http://3dom.fbk.eu/
Improving automated 3D reconstruction methods via vision metrology
Isabella Toschia, Erica Nocerino
a, Mona Hess
b, Fabio Menna
a,
Ben Sargeant b
, Lindsay MacDonald b
, Fabio Remondinoa, Stuart Robson
b
a3D Optical Metrology (3DOM) unit, Bruno Kessler Foundation (FBK), Trento, Italy
b Department of Civil, Environmental and Geomatic Engineering (CEGE), UCL, London, UK
ABSTRACT
This paper aims to provide a procedure for improving automated 3D reconstruction methods via vision metrology. The
3D reconstruction problem is generally addressed using two different approaches. On the one hand, vision metrology
(VM) systems try to accurately derive 3D coordinates of few sparse object points for industrial measurement and
inspection applications; on the other, recent dense image matching (DIM) algorithms are designed to produce dense
point clouds for surface representations and analyses. This paper strives to demonstrate a step towards narrowing the gap
between traditional VM and DIM approaches. Efforts are therefore intended to (i) test the metric performance of the
automated photogrammetric 3D reconstruction procedure, (ii) enhance the accuracy of the final results and (iii) obtain
statistical indicators of the quality achieved in the orientation step. VM tools are exploited to integrate their main
functionalities (centroid measurement, photogrammetric network adjustment, precision assessment, etc.) into the pipeline
of 3D dense reconstruction. Finally, geometric analyses and accuracy evaluations are performed on the raw output of the
matching (i.e. the point clouds) by adopting a metrological approach. The latter is based on the use of known geometric
shapes and quality parameters derived from VDI/VDE guidelines. Tests are carried out by imaging the calibrated
Portable Metric Test Object, designed and built at University College London (UCL), UK. It allows assessment of the
performance of the image orientation and matching procedures within a typical industrial scenario, characterised by poor
texture and known 3D/2D shapes.
Keywords: photogrammetry, vision metrology, dense image matching, computer vision, accuracy, precision, SIFT,
circular target centroid, pattern projection
1. INTRODUCTION
1.1 Background
The 3D reconstruction problem is a fundamental issue of vision systems and refers to the process of recovering 3D
information about a surveyed scene from two or more images taken from different viewpoints. Several methods have
been developed in order to address the same basic question, i.e. how to compute the 3D position of an object point, given
two (or more) corresponding image points. Literature in this field can be divided into two general approaches:
(i) vision metrology (VM) systems, that aim to accurately derive 3D coordinates for sparsely distributed object points;
(ii) dense image matching methods, that are designed to produce dense point clouds for surface representation.
The first approach is based on well-known principles1,2,3
, that have gained widespread acceptance for industrial
measurement, engineering purposes, medical, navigation and inspection applications4. Its priorities have remained
essentially constant over the years, i.e. measurement results shall be accurate, repeatable and traceable to national or
international standards. Photogrammetric multi-view measurements feature relative accuracies in the order of 1:50000 to
1:100,000 of the principal dimension of the object5. To achieve these performances, VM systems adopt carefully
designed image networks, calibrated cameras and highly structured scenes with coded targets and scale bars.
Traditionally, these systems have been employed to assign 3D coordinates to distinct object points, such as targets placed
at key locations or target-less features of interest6,7
.
On the other hand, several software solutions have been recently developed to automatically retrieve 3D dense point
clouds from a set of un-oriented and un-calibrated images8,9,10
. The automatic 3D reconstruction procedure consists of
two main steps, namely (i) camera calibration & image orientation (often called Structure from Motion – SfM) and (ii)
dense image matching (DIM). Fully automated methods for these steps were originally developed within the 3D
computer vision community11,12
and then adopted in the photogrammetric community. They were designed to address
unstructured and unknown scenes (i.e. without target points and lacking in metric references) and to fully-automate
image analyses. Both low-cost software packages and open-source solutions are nowadays available, providing users
with automated procedures for image orientation (SfM) and 3D dense reconstruction at different scales, thus covering
diverse application fields (e.g. cultural heritage documentation, archaeological mapping, architectural design, etc.). The
metrological consistency of the resulting 3D measurements is highly dependent on the quality of the imaged surface
(structure and texture), on the image network configuration and on the matching algorithm13,14
. Furthermore, the large
degree of automation is normally counter-balanced by an absence of statistical and evaluation parameters, that prevent a
proper quality analysis of the obtained numerical results.
Whilst a decade ago there was a sharp difference between the two aforementioned approaches, the gap is now reducing.
Due to the exchange and sharing of the most effective algorithms, it is difficult to classify the 3D imaging approaches
lately proposed by the research communities. Furthermore, recent advances in both hardware and software technologies
(e.g. GPU-supported solutions) have provided effective procedures for image processing and managing, that now enable
real-time applications even when large datasets are involved. Finally, some industrial applications require the accurate
3D reconstruction of deformed free-form surfaces without any control points. For instance, photogrammetric car-crash
test recordings adopt on-board high-speed stereo cameras to generate 3D point clouds of the deformed object area15
.
Such improvements and overlapping demands raise the expectations of the scientific and industrial communities for a
metrological-oriented use of automated techniques for image orientation and dense 3D reconstructions.
1.2 Paper objectives
The aim of the paper is to demonstrate the possibility to bridge the gap between VM and target-less automated 3D
imaging approaches by testing the metric performance of the automated photogrammetric 3D reconstruction procedure.
The latter approach includes many tasks and, among them, image block orientation and camera calibration represent the
most crucial ones. In order to enhance the accuracy of the final 3D results and obtain statistical indicators of the
orientation step, the well-established algorithms implemented in classical photogrammetric packages for accurate
measurement of single points are adopted. The VM tools Vision Measurement System (VMS)16
and PhotoModeler®
(PM)17
are thus exploited to integrate their main functionalities (centroid measurement, photogrammetric network
adjustment, precision assessment, etc.) into the pipeline of 3D dense reconstruction. Finally, further geometric analyses
and accuracy evaluations are performed on the raw output of the dense matching (i.e. the point clouds) by adopting a
metrological approach. This is based on the use of known geometric shapes and quality parameters which are derived
from VDI/VDE guidelines18,19
. Tests are carried out by imaging a calibrated Portable Metric Test Object designed and
built at UCL CEGE 20,21
. It supports the evaluation of several geometric parameters, such as sphere spacing error, sphere
diameter error, plane spacing error, angular error and structural resolution. Moreover, it features circular targets that can
be adopted to further improve the photogrammetric network adjustment and evaluate its quality. Finally, the test-object
is small and poorly textured, which is often the case in industrial applications. It thus allows assessment of the
performance of the automatic image orientation and matching procedures in dealing with a challenging artefact.
2. METHODOLOGY
The automatic 3D reconstruction procedure consists of two main steps, namely image triangulation (including camera
calibration) and dense image matching. Given the images, the first task requires the identification of homologous points
in different views of the same 3D scene. Once a set of homologous points (also called image correspondences or tie
points) are identified, the exterior orientation parameters of the images, interior parameters of the camera and 3D object
coordinates of the feature points are automatically computed through an iterative process based on a robust bundle
adjustment. Finally, the dense 3D reconstruction problem is performed with dense matching algorithms14
able to deliver
a dense point cloud up to one 3D point for every pixel. Although this pipeline is based on the adoption of well-
established algorithms derived from both the photogrammetric and computer vision communities, detailed information
about the internal quality achieved by the process is normally not provided. So far, this has generally precluded the
application of such automated approaches to the industrial field where rigorous statistical parameters (e.g. reliability,
normalized correction, standard error of unit weight, etc.) and quality analyses of results are mandatory.
To address this issue, this paper proposes an enhanced photogrammetric procedure that exploits the main functionalities
of the classical photogrammetric packages for accurate measurement of single points. In particular, their algorithms and
statistical analyses, together with a home-made filtering tool, are integrated into the classical photogrammetric 3D
reconstruction pipeline (VM). The resulting methodology is summarized in Figure 1 and further described in the
following subsections.
Figure 1. The procedural workflow adopted in the project.
2.1 Photogrammetric network adjustment
The photogrammetric network adjustment requires a set of image correspondences, manually or automatically extracted,
as its input. VM approaches for high precision applications usually adopt artificial object features (e.g. targets) to
identify the necessary image correspondences. Automatic sub-pixel measurements based on centroids, ellipse shape
fitting or least-squares template matching are adopted in order to achieve the highest precision for feature detection5. On
the other hand, a variety of algorithms have been developed by the computer vision community to automatically extract a
large number of points or regions of interest from images of unstructured and unknown scenes11,23
. Among these
solutions, the Scale Invariant Feature Transform (SIFT)22
algorithm provides highly distinctive features by following
four main steps, i.e. scale-space extreme detection, key point localization, orientation assignment and key point
descriptor creation. Corresponding points are then detected by comparing and matching the descriptors among the
different images.
In this paper, both approaches for homologous point identification, i.e. target centroid computation and SIFT point
detection are subsequently tested. In the second case, a filtering and regularization procedure of the extracted tie points is
additionally carried out, by applying a tool internally developed at FBK 3DOM24,25
. The method reduces the number of
image observations, so that they can be efficiently handled by classical photogrammetric bundle adjustments. It
regularizes the point distribution in object space, while preserving connectivity and high multiplicity between
observations.
The extracted image correspondences are then included in a self-calibrating bundle adjustment performed with VM
software packages. Two different processes are carried out by using the image observations derived from (i) the circular
target centroid computation and (ii) the SIFT point detection and filtering. In addition, a further test is performed by
simultaneous use of both types of observations, adequately weighted. A free-network bundle adjustment method is
adopted in all network adjustments, followed by a rigid similarity transformation for the definition of the scale and of a
common coordinate reference system.
2.2 3D reconstruction
Starting from the adjustment results (i.e. computed internal and external orientations), the 3D reconstruction process
allows to determine the 3D position of object points. For VM systems, this normally results in computing the 3D object
space coordinates of targets. On the other hand, dense image matching methodologies aim to determine correspondence
information for every pixel in order to derive dense point clouds describing the imaged surface or object. Algorithms are
generally categorized into local and global14,26
. The former computes the disparity at a given point using the intensity
values within a finite region, being thus sensitive to the choice of the window size and to locally ambiguous areas in the
images. Global methods, on the contrary, perform a global search which seeks for the optimal global solution to the
matching problem. Recently, semi-global methods27
have also been introduced in order to provide an efficient solution
through an approximation of the global model. This third approach gained a large acceptance in the photogrammetric
community that is increasingly involved in improving it28
.
A semi-global approach is here adopted to retrieve a dense 3D reconstruction of the test-object. The raw output of the
matching process (i.e. the dense point clouds) is then evaluated through the metrical characterization procedure described
in Section 3.
3. PROJECT DESCRIPTION
3.1 Test-object
Experiments are performed using the Portable Metric Test Object designed and built at UCL CEGE20
. The artefact
(Figure 2) is conceived for being adopted as an independent means of 3D imaging quality assessment. First results have
already been presented for the quantitative assessment of several commercially available close-range optical recording
technologies21
. The test-object is mainly made of Alcoa aluminium alloy T6061, selected for its thermal stability.
Around the base plate (25 cm x 25 cm) there is an irregular array of six individually calibrated spheres (20 mm
diameter), each mounted onto a conical base: this provides the reference coordinate system of the test-object. A
secondary plate is rigidly wedged into the base plate and includes the following geometric features20
:
angle feature;
step feature;
length features;
gap feature.
Figure 2. Side (left) and top (right) view of the Portable Metric Test-Object built by UCL CEGE.
The test-object was scanned with the Arius3D Foundation model 150 laser scanner (mounted on a CMM), in order to
produce a reference 3D dataset of the artefact for the accuracy evaluation. The sampling grid of the Arius3D scanner is
0.1 x 0.1 mm and it features a measurement uncertainty of +/- 0.035 mm in depth and +/- 0.1 mm in plane.
The test-object includes 87 planar targets of different dimensions, well-distributed on the plates and on the feature sides.
Both circular coded and retro-reflective targets are present. The 3D coordinates of 44 targets situated around the base
plate were measured by an ad-hoc photogrammetric network adjustment performed with VMS and a pre-calibrated
digital camera. Final results from the bundle orientation yielded a mean standard deviation (STDV) value in object space
of 12 μm (1 sigma), with maximum errors of about 24 μm.
3.2 Image acquisition
A Nikon D600 digital camera (6016 x 4016 pixels, pixel size of 6 μm) equipped with a macro prime lens (Nikon AF
Micro-Nikkor 60 mm f/2.8D) is used in the experiment. The test-object is imaged at a focusing distance of 0.9 m, after
having adequately fixed the lens in order to ensure a constant focal length during the overall acquisition phase. The angle
of convergent images is set to about 10°, so that the resulting range uncertainty and lateral resolution (ground sample
distance - GSD) are expected to be of 0.25 mm and 0.09 mm in average, respectively (assuming an a-priori image
measurement accuracy of 1/2 pixel). F-stop and ISO sensibility are kept fixed at f-16 and 100 and no automatic optical
image stabilization is present.
Figure 3. Example of two images acquired from the same camera position: with (left) and without (right) the projected
pattern.
Figure 4. The photogrammetric network geometry.
Since the test-object is poorly textured, two Optoma Pico projectors (PK301, resolution of 854 x 480 pixels) are used to
project a fine texture with recognizable feature points onto the surface. A pebble and gravel image is projected from a
distance of about 1 m, so that the object is illuminated with a sharp pattern. The stability of the projectors is carefully
checked for the entire image acquisition phase in order to avoid any apparent movements of the pattern on the test-
object. Furthermore, for each camera station, two images are taken, i.e. one while projecting the pattern and one without
the pattern projection (Figure 3). In the latter case, a ring light around the camera lens is used to illuminate the retro-
reflecting targets present in the scene, in order to facilitate the automatic extraction of target centroids. All images are
taken with the camera mounted on a tripod, following a quasi-circular protocol (due to the presence of the projectors),
repeated at three different heights. Figure 4 shows the camera network actually realized. Camera roll angles are
additionally included in the network to ensure good conditions for the following self-calibrating bundle adjustments.
Two calibrated scale bars are also included in the scene as further metric references.
3.3 Image processing
Before starting the image correspondence extraction procedure, a pre-processing step is performed on the raw image
files, including histogram equalization, white balance, highlight recovering, etc. The exposure of the photographs is
carefully monitored to ensure that the targets are not over-exposed.
Afterwards, images with the projected pattern are imported into a typical SfM software, namely Agisoft Photoscan
(PS)29
. Its SIFT-like operator is exploited to automatically extract a large number of homologous points, these are then
exported in the form of both image observations (2D points) and corresponding 3D coordinates. These data are then
filtered and regularized, in order to preserve only the more reliable ones (i.e. the observations with higher multiplicity
and lower re-projection error. The filtered correspondences are subsequently imported as image observations into VMS
and PM, where a self-calibrating bundle adjustment is performed (hereinafter called “self-calibrating bundle adjustment
with SIFT points”).
Concurrently, images acquired without the projected pattern are processed directly with PM, where the centroids of the
circular targets are automatically extracted. These image observations are included in a self-calibrating bundle
adjustment performed with PM and VMS (hereinafter called “self-calibrating bundle adjustment with target centroids”).
A further self-calibrating bundle adjustment is carried out in VMS and PM, using both SIFT points and circular target
centroids as image observations (hereinafter called “self-calibrating bundle adjustment with SIFT and circular target
centroids”).
After the precision evaluation (see Subsection 3.3), internal and external orientations are imported into PS, where the
dense image matching is subsequently performed. This is carried out using the second-level image pyramid,
corresponding to a quarter of the original full image resolution, in order to achieve a reasonable trade-off between
processing effort and resolution14
. Thus the derived dense point clouds feature a mean spatial resolution less than 0.2 mm
and consist of more than 4 million points.
3.4 Precision and accuracy evaluation
Precision defines the statistical noise of an adjustment, i.e. it models the inner accuracy of a system5. Typical SfM
methods provide only limited information about the internal quality of the bundle adjustment process, usually restricted
to the final re-projection error. On the other hand, VM systems yield a large number of statistical parameters supporting
the adjusted network results. Typically, residuals of image coordinates and corresponding statistics are used to evaluate
the precision in image space, whereas STDV values of the computed 3D coordinates provide a quality assessment in
object space (together with possible RMSE on check points). Statistics provided by VMS and PM are here exploited to
check the precision of the different self-calibrating bundle adjustment processes. Additionally, starting from the
orientation results achieved in the free-network approach, a forward intersection of the circular target centroids is
performed. After a similarity transformation, these 3D coordinates are compared to the coordinates previously measured
as described in Subsection 3.1. Residuals and corresponding statistics are thus derived.
Accuracy models the deviation of a measured value from an independent, nominal measurement. This nominal
measurement is defined by a measurement system of higher accuracy, a calibrated reference body or a true value5. In
terms of dense image matching algorithms, efforts are being made by the research community towards the accuracy
evaluation of their metric performance13,14,30,31
. Usually, the accuracy assessment is performed by comparing the
achieved results to a ground truth, which should theoretically be two or three times more accurate than the expected
results. Although this general approach may be seen as reasonable, better metrological traceability is given by the use of
geometric artefacts with known form and size: in this case, the accuracy analysis can be carried out using evaluation
parameters defined by national and internationals standards.
Figure 5. Geometric parameters and procedures adopted for the metrical characterization.
The test-object adopted in the experiment supports the assessment of different geometric parameters which are derived
from similar parameters defined in the German VDI/VDE guidelines. Geometric analyses and accuracy evaluations are
performed on the raw output of the dense matching method (i.e. the dense point clouds) using GOM Inspect V832
, a
certified software currently freely available. No prior filtering process is carried out on the photogrammetric dense point
clouds. As a first step, the adopted procedure requires a sphere centroid extraction, in order to register the
photogrammetric-derived 3D data into the reference coordinate system via a centroid-to-centroid alignment. For the next
step an automatic data extraction (internally developed at UCL CEGE) is applied to enable a repeatable and reproducible
evaluation across systems, which is followed by a rigorous workflow procedure (Figure 5), to evaluate the following
parameters:
Sphere diameter error, computed by comparing the diameters of the spheres extracted from the point clouds to their
reference values, i.e. the manufacturer’s certified reference diameters (accuracy 0.001 mm). The best-fit spheres are
calculated according to the least squares method (Gaussian fit, 5 sigma) with unconstrained radius for a standard
sphere artefact.
Sphere spacing error, defined by comparing the 3D distances between sphere centroids to the reference values. The
latter have been measured with digital calipers with an uncertainty of +/- 0.01 mm. Again, the best-fit spheres are
calculated according to the least squares method (Gaussian fit, 5 sigma) with unconstrained radius for a standard
sphere artefact.
Bi-directional plane spacing error, defined as the difference between the measured distance between two sideways
facing parallel planes with the opposite surface normal direction and the reference distance between two parallel
planes with the same normal as reference value. This parameter is computed by using the two length gauges, that
define the reference distances (uncertainty of +/- 0.01 mm).
Uni-directional plane spacing error, defined as the capability of the system in measuring steps. The step feature,
adopted for this assessment, is characterized by steps with nominal height difference between 0.01 mm and 20 mm.
The evaluation procedure requires the definition of nominal planes with vertical direction, in order to compute the
distances along the Z direction. These distances are then compared to the reference values and a significance test
(hypothesis test) is performed taking into account the standard deviation values of both planes which constitute the
distance between two planes.
Structural resolution, defined as the lateral resolution of distance sensors. It characterises the smallest structure
measureable with maximum permissible errors to be specified33
. It is analysed through a structure normal using the
gap feature, which is constructed of eight individual blocks with the same height and seven pits with the same depth
(8 mm). The slot widths are: 0.1 mm – 0.2 mm – 0.3 mm – 0.5 mm – 1.0 mm – 2.0 mm – 3 mm. A pass/fail test is
performed to determine the success of the gap recording where a pass is defined by a measured maximum distance
from the fitted plane with a maximum unsigned deviation from the reference pit depth of 2.00 mm larger than 63%
(=5.04 mm). A best-fit plane is extracted on the top of the gap feature and the colour coded maps of deviations from
this plane are finally evaluated. Other fitting variables of a best-fit plane are noted.
Angular error, defined as the difference between the measured angular deflection from a reference angle, both
computed from the datum plane to an oriented plane in degrees [°]. The angle feature, adopted for this evaluation,
comprises a series of upward-facing planar surfaces that provide varying angles to the base from 0° to 30°.
4. RESULTS
4.1 Precision of the adjustment process
Since VMS and PM delivered equivalent results in terms of precision in both image and object space, only one figure for
each parameter is here given. Regarding the assessment of the internal precision, statistics from the process performed
with the filtered SIFT points (extracted from images with projected pattern) are listed in the first column of Table 1. The
self-calibrating bundle adjustment resulted in a mean precision vector length of 129 μm in object space, whereas
residuals of image coordinates feature a mean re-projection error of 0.4 pixel and a maximum value of 1 pixel. The
results of the processing with circular targets show better statistics, both in image and object space.
When target locations are triangulated starting from the SIFT-based orientations, their residuals to the reference values
are in average 62 μm, whereas target-based orientations yield again better results, with an mean difference of 21 μm (as
shown in Table 2, first and second columns).
Table 1. Statistical results of the internal assessment.
SELF-CALIBRATING BUNDLE ADJUSTMENT
INTERNAL ASSESSMENT
Pattern No Pattern
SIFT Target SIFT SIFT & Target
OB
JE
CT
SP
AC
E sXYZ
Precision vector
length
[micron]
Mean 129 8 96 97
Stdv 102 6 80 81
Max 588 36 503 516
IMA
GE
SP
AC
E
Re-projection
error [pixel]
Mean 0.4 0.1 0.4 0.4
Stdv 0.2 0.1 0.2 0.2
Max 1.0 0.8 1.0 1.1
Table 2. Statistical results of the external assessment.
SELF-CALIBRATING BUNDLE ADJUSTMENT
EXTERNAL ASSESSMENT Pattern No Pattern
SIFT Target SIFT SIFT & Target
RMSE length [micron] 66 25 24 24
RMSE Mean (Magnitude)
[micron] 62 21 21 20
Max difference
[micron] 104 52 46 48
In order to further analyse this different behaviour, SIFT points are extracted and filtered again starting from the images
without projected pattern. A new self-calibrating bundle adjustment is finally carried out using the new set of image
correspondences. In this case, the internal assessment provides results comparable to the ones derived from the images
with a projected pattern. Regarding the external evaluation, SIFT points from images without the pattern show a
behaviour equivalent to the circular target-based process. No significant improvement is achieved by including both
SIFT points and target centroids into the network adjustment (Table 1 and 2, fourth column).
The tests here presented seem to suggest the existence of some problems related to the SIFT points extracted from the
images with projected pattern. The origins of this behaviour are still under study and may be due to the type of the
selected pattern or to instability effects induced by the two projectors, which may have caused small movements of the
pattern during the image acquisition, not noticeable by the naked eye.
4.2 Accuracy of the 3D reconstruction
Starting from the orientation results achieved by the self-calibrating bundle adjustment (SBA) performed with SIFT
points and the ones delivered by the SBA with target centroids, the dense image matching procedure is then carried out.
Both image datasets (with and w/o the projected pattern) are separately processed, delivering four point clouds that will
be referenced as follows:
SBA-SIFT-Pattern (shown in graphs as diamond). The point cloud derives from the self-calibrating bundle
adjustment with SIFT points extracted from images with pattern and dense image matching performed on images
with pattern.
SBA-SIFT-NoPattern (square). The point cloud derives from the self-calibrating bundle adjustment with SIFT
points extracted from image without pattern and dense image matching performed on images without pattern.
SBA-Target-Pattern (triangle). The point cloud derives from the self-calibrating bundle adjustment with circular
target centroids extracted from images without pattern and dense image matching performed on images with pattern
SBA-Target-NoPattern (cross). The point cloud derives from the self-calibrating bundle adjustment with circular
target centroids extracted from images without pattern and dense image matching performed on images without
pattern.
The automatic procedure for accuracy assessment is finally applied and its most notable results are summarized in
Figures 6 – 10 and in Tables 3-5 below. Since the spheres were not reconstructed by dense image matching performed on
images without pattern, neither sphere diameter error or sphere spacing error were performed for these datasets (although
demonstrated in a previous paper21
).
To investigate the orientation recording performance, the angle error is measured and the deviation plotted (Figure 6).
For the tested methodologies, point clouds SBA-SIFT-NoPattern and SBA-target-NoPattern perform best with the least
deviations from the reference angles. All methods show a trend to improved (i.e. lower deviation) values with increasing
angles.
Figure 6. Orientation error through angle error analysis [degree].
Figure 7. Length error through bi-directional plane-spacing error analysis for two length gauges [mm].
To investigate the length measurement error, the bi-directional plane spacing error is measured (Figure 7). SBA-SIFT-
Pattern is performing best with 0.04 mm and 0.06 mm deviations for Length Bar 1 and 2, respectively. This good metric
performance is achieved also by the other point clouds, with the exception of SBA-Target-Pattern, that delivers a
maximum deviation of 0.73 mm.
Further length measurement tests are conducted through the uni-directional plane spacing error (Figure 8). SBA-SIFT-
Pattern shows the most consistent performance with a standard deviation of 0.015 mm, and a maximum unsigned
deviation of 0.036 mm. All datasets show a high deviation for the highest step of 20 mm. When the statistical test (i.e.
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
Dev
iati
on
fro
m r
efe
ren
ce a
ng
le [
deg
ree]
Angle: 0.5º 1.0º 2.0º 3.0º 4.0º 5.0º 10.0º 15.0º 25.0º 30.0º
Orientation error: angle error analysis
SBA-SIFT-Pattern
SBA-SIFT-NoPattern
SBA-Target-Pattern
SBA-Target-NoPattern
0.04 0.06
0.07 0.09
0.26
0.73
0.07
0.19
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
Dev
iati
on
fro
m r
efe
ren
ce l
eng
th [
mm
]
Length Bar 1 [74.94 mm] Length Bar 2 [149.87 mm]
Length error: bi-directional plane spacing error analysis
SBA-SIFT-Pattern
SBA-SIFT-NoPattern
SBA-Target-Pattern
SBA-Target-NoPattern
significance test) is conducted, Table 3 shows that the smallest significant step that can be recorded with a confidence of
95% is 0.1 mm by all systems except SBA-Target-Pattern.
Figure 8. Length error through uni-directional plane-spacing error analysis [mm]. Step 1 is the smallest and Step 17 is the largest
step.
Table 3. Smallest significant step evidenced through a significance test (only relevant results are shown).
Step height
[mm] SBA-SIFT-Pattern SBA-SIFT-NoPattern SBA-Target-Pattern SBA-Target-NoPattern
0.05 FAIL FAIL FAIL FAIL
0.10 PASS PASS FAIL PASS
0.30 PASS PASS PASS PASS
0.50 PASS PASS PASS PASS
1.00 PASS PASS PASS PASS
Smallest step
[mm] 0.1 0.1 0.3 0.1
The structural resolution is evaluated through (i) a significance test for gap recording (Table 4) and (ii) visual inspection
and deviation map created to a fitted plane on across the top of the gap artefact (Figure 9). Results of the significance test
are shown in comparison with the reference dataset (Arius3D model) which is able to pass the hypothesis test for all
gaps. According to numerical results, the gap recording is performed best by SBA-Target-Pattern which records the pit
of a gap to a maximum of 1.0 mm. However, results from visual inspection of deviation maps show that all point clouds
feature clear gaps for Gap 1 and Gap 2. Furthermore, SBA-Target-NoPattern shows the clearest indication of gaps with a
visible indentation indicated on the deviation map of ca. 0.2 mm.
Since results delivered by the two analyses, i.e. significance test and visual inspection, do not provide a unique evidence
of the system performances, they are excluded from the final numerical summary of accuracy evaluation.
-0.24-0.22-0.20-0.18-0.16-0.14-0.12-0.10-0.08-0.06-0.04-0.020.000.020.040.06
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Dev
iati
on
fro
m r
efe
ren
ce l
eng
th [
mm
]
Step name
Length error: uni-directional plane-spacing error analysis
SBA-SIFT-Pattern
SBA-SIFT-NoPattern
SBA-Target-Pattern
SBA-Target-NoPattern
SBA-SIFT-Pattern
SBA-SIFT-NoPattern
SBA-Target-Pattern
SBA-Target-NoPattern
Figure 9. Colour coded deviation maps for gap recording [mm] with histograms of error distribution. The scale ranges from
-0.3 mm to +0.3 mm.
Table 4. Smallest significant gap evidenced through a significance test (only relevant results are shown).
Gap name Gaps [mm] Arius3D
model
SBA-SIFT-
Pattern
SBA-SIFT-
NoPattern
SBA-
Target-
Pattern
SBA-
Target-
NoPattern
Gap 1 (largest) 3.00 PASS PASS PASS PASS PASS Gap 2 2.00 PASS FAIL FAIL PASS PASS Gap 3 1.00 PASS FAIL FAIL PASS FAIL Gap 4 0.50 PASS FAIL FAIL FAIL FAIL Gap 5 0.30 PASS FAIL FAIL FAIL FAIL Gap 6 0.20 PASS FAIL FAIL FAIL FAIL
Gap 7 (smallest) 0.10 PASS FAIL FAIL FAIL FAIL Smallest recorded gap [mm]
0.10 3.00 3.00 1.00 2.00
Figure 10. Summary evaluation for a comparison of all tested methods. The smaller the value, the lower the performance.
Table 5. Metric results of evaluation for geometric parameters of Orientation, Length error and structural resolution
Geometric
Parameter Method
SBA-SIFT-
Pattern
SBA-SIFT-
NoPattern
SBA-Target-
Pattern
SBA-Target-
NoPattern
Length error Bi-directional plane spacing
error at 150 mm [mm] 0.04 0.07 0.26 0.07
Length error
Uni-directional Plane
Spacing error / Smallest
measureable step height
[mm]
0.10 0.10 0.30 0.10
Orientation
Angle error: maximum
unsigned angle deviation
[degrees]
0.68 0.38 0.53 0.42
Summary of results: Each of the system shows distinct strengths and weaknesses, although most differences delivered
by comparisons are not metrically significant, being less than the mean GSD. The results of the performance evaluation
can be summarized either in metrical values (Table 5), or through a graphic evaluation that normalized the results and
stacks the charts to reflect an overall score (Figure 10). The two point clouds extracted from images without projected
pattern, i.e. SBA-Target-NoPattern and SBA-SIFT-NoPattern, achieved the best results, whereas SBA-Target-Pattern
delivers the worst overall outcomes. SBA-SIFT-Pattern is characterised by an “intermediary” performance, although it
shows the best length recording. These general outcomes agree with the results of the orientation assessment. The origins
of these behaviours are still under study. It should be stressed again that results from structural resolution analyses have
not been included in the summary, since they do not provide a clear and consistent evidence of the system performances.
0.32 0.62
0.47 0.58
0.90
0.90
0.70
0.90
0.87
0.78
0.13
0.78
0.00
0.50
1.00
1.50
2.00
2.50
SBA-SIFT-Pattern SBA-SIFT-NoPattern SBA-Target-Pattern SBA-Target-NoPattern
Normalized values for recording quality
Length error: bi-directional plane spacing error at 150 mm [mm]. Compared to a maximum tolerance of 0.3 mm.
Length error: uni-directional Plane Spacing error / Smallest measureable step height [mm]. Compared to a minimum step
height recording of 0.5 mm.
Orientation: angle error, maximum unsigned angle deviation [degrees] Compared to a maximum tolerance of 1 degree.
5. CONCLUSIONS
The paper strives to provide a procedure for improving automated 3D reconstruction methods via vision metrology. A
metrological-oriented use of automated techniques for dense 3D reconstruction purposes requires a narrowing of the
accuracy gap between the industrial sector and the more relaxed amateur or heritage use of 3D optical imaging.
Measurement results should be thus supported by rigorous statistical parameters that provide information about the
precision of the process. To address this issue, an enhanced photogrammetric procedure has been proposed and tested in
this paper. It exploits the main functionalities of the classical photogrammetric packages for accurate measurement of
single points by integrating their algorithms and statistical analyses into the pipeline provided by fully automated 3D
reconstruction software. The test-object adopted in the experiments represents a challenging artefact for dense matching
applications. While the artefact is a highly structured 3D scene, due to the presence of a number of metric references
(like planar targets, scale bars and spheres), it is also small and poorly textured. This allowed assessment of the
performance of the automatic image orientation and matching procedures within a typical industrial scenario,
characterised by poor texture and known 3D/2D shapes. In order to provide a fine texture to the surfaces, a pattern was
projected using non-metrological equipment (small projectors with low resolution, light tripods). The choice
significantly affected the metric quality of the results and this influence has been carefully analysed through the adopted
augmented procedure in terms of both orientation and matching outcome.
Statistics delivered by the assessment of orientation processes show that the filtered SIFT points can represent valuable
image observations for obtaining results that are comparable to the ones achieved with the circular targets. This is
particularly true for SIFT points extracted without the projected pattern, whereas greater differences are highlighted with
SIFT points extracted from the images with a pattern. Further analyses are required to understand the origins of these
results. In particular, future studies are planned in order to compare triangulated target coordinates to a more accurate
reference and test different types of projected pattern.
A procedure for the metric evaluation of 3D reconstructed geometry has been demonstrated for length, orientation and
structural resolution, including automated data segmentation to prepare the input. The evaluation is applicable for other
3D imaging methods and sensors. Quantitative evaluation results showed that 3D reconstructions deriving from the self-
calibrating bundle adjustments with circular target centroids or SIFT points extracted from images without pattern and
dense image matching performed on images without pattern (i.e. SBA-Target-NoPattern and SBA-SIFT-NoPattern,
respectively) brought the best results. However, most differences delivered by the accuracy comparisons are not
metrically significant, as they are beyond the mean GSD (0.09 mm). By testing different types of pattern, future tests will
analyse how the projected texture (and its resolution) can affect the final dense point clouds and the corresponding
outcomes of the metrological assessment.
ACKNOWLEDGEMENTS
We wish to thank the COSCH network (MPNS EU Cost Action TD-1201) which enabled a Short-Term Scientific
Mission of Isabella Toschi, FBK (Italy), at 3DIMPact (3D Imaging, Metrology and Photogrammetry applied coordinate
technologies), a research group at UCL CEGE (UK).
REFERENCES
[1] Fraser, C.S. and Shortis, M.R., “Metric exploitation of still video imagery,” The Photogrammetric Record, 15(85),
107-122 (1995).
[2] Luhmann, T., Robson, S., Kyle, S. and Harley, I., [Close range photogrammetry: Principles, methods and
applications], Whittles, Dunbeath, UK (2006).
[3] Atkinson, K. B., [Close range photogrammetry and machine vision], Whittles Publishing, Caithness, UK (2001).
[4] Fryer, J.G., Mitchell, H.L. and Chandler, J.H., [Applications of 3D Measurement from Images], Whittles Publishing,
Caithness, UK (2007).
[5] Luhmann, T., “3D imaging: how to achieve highest accuracy,” Videometrics, Range Imaging, and Applications XI,
Proc. SPIE Vol. 8085 (2011).
[6] Godding, R., Luhmann, T. and Wendt, A., “4D Surface matching for high-speed stereo sequences,” Int. Arch.
Photogramm. Remote Sens. & Spatial Inf. Sci.,, Vol. XXXVI, Part 5 (2006).
[7] Xiao, Z., Liang, J., Yu, D., Tang, Z. and Asundi, A., “An accurate stereo vision system using cross-shaped target
self-calibration method based on photogrammetry,” Optics and Lasers in Engineering 48, 1252–1261 (2010).
[8] Fritsch, D., Khosravani, A.L.I.M., Cefalu, A. and Wenzel, K., “Multi-sensors and multiray reconstruction for digital
preservation,” Photogrammetry Week, 305–323 (2011).
[9] Remondino, F., Del Pizzo, S., Kersten, T.P. and Troisi, S., “Low-cost and open-source solutions for automated
image orientation – A critical overview,” Proc. EuroMed 2012 Conference, M. Ioannides et al. (Eds.), LNCS 7616,
40-54 (2012).
[10] Haala, N. and Rothermel, M., “Dense multi-stereo matching for high quality digital elevation models,”
Photogrammetrie, Fernerkundung. Geoinformation (PFG) 4, 331–343 (2012).
[11] Hartley, R. and Zisserman, A., [Multiple View Geometry], 2nd ed., Cambridge University Press, Cambridge, UK
(2004).
[12] Wöhler, C., [3D Computer Vision. Efficient Methods and Applications], Springer, Berlin, Germany (2009).
[13] Ahmadabadian A. H., Robson, S., Boehm, J., Shortis, M., Wenzel, K. and Fritsch, D., “A comparison of dense
matching algorithms for scaled surface reconstruction using stereo camera rigs,” ISPRS Journal of Photogrammetry
and Remote Sensing 78, 157-167 (2013).
[14] Remondino, F., Spera, M.G., Nocerino, E., Menna, F. and Nex, F., “State of the art in high density image
matching,” The Photogrammetric Record 29(146), 144-166 (2014).
[15] Jepping, C., Bethmann, F. and Luhmann, T., “Congruence analysis of point clouds from unstable stereo image
sequences,” Int. Arch. Photogramm. Remote Sens. & Spatial Inf. Sci., XL-5, 301-306, (2014).
[16] Shortis, M. and Robson, S. “Vision Measurement System”, <http://www.geomsoft.com/VMS/> (17 April 2015).
[17] Eos Systems Inc., “PhotoModeler”, <http://www.photomodeler.com/index.html> (17 April 2015).
[18] Association of German Engineers, [VDI/VDE 2634/Part2, 2002. Optical 3-D Measuring Systems – Optical Systems
based on Area Scanning] (2002).
[19] Association of German Engineers, [VDI/VDE 2634/Part3, 2008. Optical 3-D Measuring Systems – Multiple View
Systems based on Area Scanning] (2008).
[20] Hess, M. and Robson, S., “3D imaging for museum artefacts: a portable test object for heritage and museum
documentation of small objects,” Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., 39(B5), 103-108, (2012).
[21] Hess, M., Robson, S. and Hosseininaveh Ahmadabadian, A., “A contest of sensors in close range 3D imaging:
performance evaluation with a new metric test object,” Int. Arch. Photogramm. Remote Sens. & Spatial Inf. Sci.,
XL-5, 277-284, (2014).
[22] Lowe, D. G., “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision,
60(2), 91-110 (2004).
[23] Apollonio, F.I., Ballabeni, A., Gaiani, M. and Remondino, F., “Evaluation of feature-based methods for automated
network orientation,” Int. Arch. Photogramm. Remote Sens. & Spatial Inf. Sci, XL/5, 47-54 (2014).
[24] Nocerino, E., Menna, F., Remondino, F. and Saleri, R., “Accuracy and block deformation analysis in automatic
UAV and terrestrial photogrammetry - Lesson learnt,” ISPRS Annals of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, Vol. II(5/W1), 203-208 (2013).
[25] Nocerino, E., Menna, F. and Remondino, F., “Accuracy of typical photogrammetric networks in cultural heritage 3D
modeling projects,” Int. Arch. Photogramm. Remote Sens. & Spatial Inf. Sci., XL-5, 465-472 (2014).
[26] Brown, M.Z., Burschka, D., Hager, G.D. and Member, S., “Advances in computational stereo,” IEEE Transactions
on Pattern Analysis and Machine Intelligence 25, 993–1008 (2003).
[27] Hirschmuller, H., “Accurate and efficient stereo processing by semi-global matching and mutual information,” IEEE
Computer Vision and Pattern Recognition, 2, 807-814 (2005).
[28] Bethmann, F. and, Luhmann, T., “Semi-Global Matching in Object Space,” Int. Arch. Photogramm. Remote Sens.
&Spatial Inf. Sci., XL-3/W2 (2015).
[29] “Agisoft Photoscan,” <http://www.agisoft.com/> (17 April 2015).
[30] Haala, N., “The landscape of dense image matching algorithms,” Proc. Photogrammetric Week 2013. Dieter Fritsch
(Ed.), Stuttgart, 271-284 (2013).
[31] Toschi, I., Beraldin, J.-A., Cournoyer, L., De Luca, L. and Capra, A., “Evaluating dense 3D surface reconstruction
techniques using a metrological approach,” NCSLI Measure Journal, 10(1), 52-62 (2015).
[32] GOM mbH, “GOM Inspect,” <http://www.gom.com/3d-software/gom-inspect.html> (17 April 2015).
[33] Association of German Engineers. [VDI/VDE 2617-1:2007 - Accuracy of coordinate measuring machines with
optical probing] (2007).