Abstract
Distribution-to-distribution point cloud registration algorithms are fast and interpretable and perform well in unstructured environments. Unfortunately, existing strategies for predicting the solution error for these methods are overly optimistic, particularly in regions containing large or extended physical objects. In this paper, we introduce the iterative closest ellipsoidal transform (ICET), a novel three-dimensional (3D) lidar scan-matching algorithm that re-envisions the normal distributions transform (NDT) in order to provide robust accuracy prediction from first principles. Like NDT, ICET subdivides a lidar scan into voxels in order to analyze complex scenes by considering many smaller local point distributions; however, ICET assesses the voxel distribution to distinguish random noise from deterministic structure. ICET then uses a weighted least-squares formulation to incorporate this noise/structure distinction while computing a localization solution and predicting the solution-error covariance. To demonstrate the reasonableness of our accuracy predictions, we verify 3D ICET in three lidar tests involving real-world automotive data, high-fidelity simulated trajectories, and simulated corner-case scenes. For each test, ICET consistently performs scan matching with sub-centimeter accuracy. With this level of accuracy, combined with the fact that the algorithm is fully interpretable, this algorithm is well suited for safety-critical transportation applications. Code is available at https://github.com/mcdermatt/ICET.
1 INTRODUCTION
Scan matching is the task of finding the rigid transformation that best aligns two point clouds. Scan matching can be used for odometry, for map matching, and for building high-definition (HD) maps of an environment. Figure 1 shows a block diagram for an odometry application.
In this paper, we introduce the iterative closest ellipsoidal transform (ICET), a novel algorithm that is capable of robust point-cloud registration and accurate prediction of its own solution error as a function of scene geometry. Correctly modeling solution accuracy is useful for sensor fusion and necessary for safety-of-life navigation tasks. Additionally, ICET can identify degenerate cases in which there is insufficient information to constrain one or more components of its solution and suppress the corresponding output without adversely impacting the solution accuracy about other axes.
At its core, ICET uses a weighted least-squares (WLS) approach for scan matching and accuracy prediction. Similar to the popular three-dimensional (3D) normal distributions transform (NDT) distribution-to-distribution algorithm (Stoyanov et al., 2012), which obtains its solution through gradient-descent optimization, our approach also uses a voxel grid to subdivide a full lidar scan into smaller point clusters, which can be aligned between sequential scans to infer relative motion. An important difference is that our approach further analyzes the point distribution within each voxel to distinguish between noise and deterministic structure. By addressing this noise/structure distinction, ICET excludes ambiguous measurements and generates a better approximation of the measurement-noise covariance matrix. The measurement-noise covariance is then used in two ways: as a weighting matrix for enhancing solution accuracy and as an input for predicting the solution-error covariance.
The key contribution of this algorithm lies in its combination of noise/structure differentiation with covariance prediction in voxel-based lidar scan matching. We believe this characteristic of ICET is new in the research literature, although elements have been presented in the past. For instance, although the solution-error covariance is not generally predicted in most NDT implementations, an approach has been previously presented by Stoyanov et al. (2012), who extended Censi’s method (Censi, 2007) to the specific case of an NDT cost function. However, this approach was applied to voxel measurement distributions with no consideration for the differences between random measurement noise and the systematic spread of points associated with object shape. Conversely, the NDT variant implemented by Einhorn and Gross (2015) seeks increased accuracy by suppressing measurements except in the most compact principal-axis direction of the voxel distribution; however, their approach does not tie ambiguity mitigation to covariance estimation. In a similar vein, deep matching lidar odometry combines geometric methods with an artificial neural network to account for distribution shape, but not in the context of predicting the solution-error covariance (Z. Li & Wang, 2020). Utilizing the noise/structure distinction in the context of covariance prediction is particularly important for driving applications, where tunnels and highways with long flat walls can introduce ambiguities for voxel-based scan-matching algorithms and cause poor localization performance (Wen et al., 2018; Zhen & Scherer, 2019).
The cornerstone of the ICET implementation is its use of a standard least-squares methodology for deriving motion states from lidar data. Least-squares implementations of NDT (Takeuchi & Tsubouchi, 2006) are significantly simpler than traditional point-to-distribution implementations of NDT, which obtain solutions through gradient-descent optimization. Least-squares techniques (Simon, 2006) are widely known, easy to implement, and computationally efficient. Importantly, the relationship between the input measurement-noise covariance and the output solution-error covariance is well understood for least-squares methods. This relationship relies on the assumption that the noise inputs to least-squares processing are strictly random (and usually includes the assumption of Gaussian noise distribution), as in traditional NDT (Biber & Straßer, 2003; Stoyanov et al., 2012). In reality, the spread of points within each voxel is due to a combination of random noise and deterministic structure. The key point of this paper is that the latter must be suppressed in order to achieve robust error characterization.
The interpretable nature of the ICET algorithm and its enhanced noise modeling make it particularly well suited for safety-of-life applications such as automated driving. Developing a rigorous safety case usually requires the use of transparent algorithms, to identify and bound the risks of rare but potentially hazardous situations, which represent a risk to navigation integrity (Bhamidipati & Gao, 2022; Zhu et al., 2022). By comparison, methods that rely on machine learning, such as LO-Net (Q. Li et al., 2019) and VertexNet (Cho et al., 2020), are not currently interpretable, which makes them difficult to characterize through a rigorous safety case. With regard to safety certification, voxel-based methods such as NDT and ICET are also advantageous in that they do not introduce significant risks related to data association (or correspondence) between features drawn from different lidar scans. As noted by Hassani et al. (2018), data association issues make the safety case very challenging when matching landmarks, features, or points from lidar scans, as in the broad class of iterative closest point (ICP) algorithms such as generalized ICP (Segal et al., 2009), lidar odometry and mapping (LOAM) (Zhang & Singh, 2014), and intensity-enhanced LOAM (Park et al., 2020), even when the ICP covariance is approximated (Brossard et al., 2020; Yuan et al., 2023).
The remainder of this paper introduces a 3D formulation of ICET and verifies it through simulation. Section 2 provides additional background related to noise modeling for voxel-level lidar point distributions. Section 3 provides key equations for the 3D formulation of ICET. Section 4 evaluates the performance of ICET against NDT on a real lidar data set. This analysis provides a proof-of-concept that ICET effectively processes real-world data; however, the Global Positioning System (GPS) reference data (centimeter-level errors between epochs) are significantly less accurate than the lidar data (millimeter-level errors between scans). Thus, Section 5 performs a similar test using simulated lidara data, where perfect ground truth is available. Section 6 presents a final corner-case simulation to demonstrate how ICET identifies rare instances in which there is insufficient information to constrain solutions about all axes (as in the case of traveling through a straight tunnel); in these cases, ICET provides as complete a solution as possible given the detected ambiguity. Finally, Section 7 discusses the limitations of ICET and possible future enhancements.
2 DEFINING THE PROBLEM
This section visualizes the fundamental problem that motivates the ICET algorithm: separating deterministic object shape from random measurement noise, effects that together determine the distribution of lidar points across a voxel.
When modeling voxel-level noise, one important consideration is intrinsic measurement noise caused by the sensor itself. For example, errors are introduced by range-measurement circuitry and (for rotating units) by encoders that measure lidar orientation. Noise levels (e.g., one-sigma error) associated with intrinsic noise can be characterized in device specifications published by the manufacturer. As a visualization of intrinsic noise, consider the lidar points (red dots) in Figure 2(a). The surface is completely smooth, but lidar samples are scattered about the surface because of small errors that arise in measuring the time of flight.
A second important consideration when modeling voxel-level noise is discrete sampling on rough surfaces. Local surface texture on the scale of the lidar spatial resolution results in measurement variations amplified by multipath.1 Surface roughness results in additional apparent noise, as the sampled locations change continuously because of lidar motion. Because these effects are highly dependent on the nature of the surface creating the lidar reflections, these effects are difficult to characterize in a general way; thus, they are not characterized in the system specification. Figure 2(b) presents a rock formation with significant surface roughness, which amplifies the effective noise above the levels observed in Figure 2(a).
In addition to these random effects, the physical shape of the reflecting surface also influences the lidar point distribution. When small objects are viewed, as in Figures 2(a) and 2(b), the distinction between random noise and deterministic object structure is not significant; however, when the object becomes wide, as in Figure 2(c), the object shape becomes significant, and the distinction between random noise and deterministic structure cannot be ignored.
The combination of random and deterministic factors makes it difficult to model measurement noise. The most straightforward approach is to use manufacturer specifications, but these underestimate the actual measurement covariance (Glennie & Lichti, 2010), presumably because they reflect only intrinsic noise (Figure 2(a)) and not discrete sampling effects on rough surfaces (Figure 2(b)). As an alternative, the noise covariance can be estimated statistically from the distribution of points in a voxel, as was done by Stoyanov et al. (2012). Then, the resulting covariance captures both intrinsic noise and discrete sampling effects (as shown in Figure 2(b)); however, this statistical estimate also treats large-scale shape effects as noise (as visible in Figure 2(c)) and thereby generates an artificially large estimate of the noise covariance. (The computed covariance along the extended surface of Figure 2(c) is essentially equal to the moment of inertia for a long rod.)
Our solution in ICET is to model the noise covariance based on the distribution of points in the voxel, but only after excluding distribution directions that stretch across the length of the voxel. If the voxel is large enough to contain typical noise distributions, but not significantly larger, the lidar point distribution is dominated by random effects when extended directions are excluded.
3 IMPLEMENTATION
This section details the ICET algorithm, first covering the basic algorithm (Section 3.1), then discussing the noise-modeling process (Section 3.2), and closing with additional refinements needed to process real-world data (Section 3.3). The development is a refined and more complete version of our conference paper (McDermott & Rife, 2022a). Importantly, Section 3.2 details the paper’s key contribution, which is related to suppressing extended axes of the point-cloud distribution in order to provide a more reliable estimate of measurement noise.
3.1 ICET Least-Squares Solution in 3D
ICET generates a solution vector that represents the translation and rotation of a new scan to best align with an earlier scan, assuming that both scans capture roughly the same stationary (or near-stationary) objects and terrain features. The methodologies presented in this section are, for the most part, representative of those used in a wide range of existing NDT implementations.
In ICET, the solution vector x includes Cartesian translations {x, y, z} and body-fixed XYZ Euler angles {ϕ, θ, ψ} needed to align the scans:
1
Combining the rotation and translation, it is possible to map the position vector (i)p for each point i in the new scan into the coordinate system of the earlier (or reference) scan. The transformed point positions, labeled (i)q, are as follows:
2
In this equation, the matrix R is the 3D rotation matrix constructed from the roll, pitch, and yaw angles {ϕ, θ, ψ}:
3
Points are clustered by using an arbitrarily shaped grid of J voxels, centered on the location of the lidar. This grid is used because it mitigates data association fragility between scans, a limitation of early scan-matching algorithms such as the ICP algorithm (Besl & McKay, 1992; Chen & Medioni, 1991). By assigning points to voxels, it is possible to compute a mean position for the new-scan points and a mean position for the reference-scan points in each voxel j. Similarly, covariance matrices (j)Q and can be constructed for each voxel. Note that the 0 subscript is used to identify reference-scan variables and to distinguish them from new-scan variables. Once the means (j)y0 for each voxel in the reference scan are computed, they are concatenated into one large observation vector ; the states x are then tuned so that the observation vector y0 is approximated as closely as possible by a similar concatenated vector constructed from the new-scan , where the h(x) function is the vertical concatenation of J repetitions of , which are the new-scan voxel means computed after the new scan is transformed by Equation (2).
ICET uses an iterative least-squares approach to estimate the state as , leveraging the Taylor series expansion:
4
Starting with an initial guess of , the above equation is solved by assuming that second- and higher-order terms are small, to obtain the state correction δx. This Newton–Raphson operation is repeated to a reasonable convergence criterion, to obtain a final least-squares state estimate . By default, we often use a zero initial guess (assuming no position or rotation change), but convergence can be accelerated by inferring velocity from prior lidar measurements or by using external sensor aiding. Although the Newton–Raphson method does not guarantee global convergence, this solution approach and related gradient-descent algorithms are widely used in lidar scan matching, and countless implementations have demonstrated their reliable convergence in practice.
Solving Equation (4) requires that the Jacobian be evaluated for each iteration. In 3D, the Jacobian matrix can be constructed as a vertical concatenation of submatrices , defined as follows for each voxel j:
5
Here, the vectors (j)Hi are defined as , where (j)yp are the new-scan voxel-mean vectors in their original coordinate system (i.e., the mean of (i)p for all points i inside voxel j) and where the rotation matrix derivatives are the following matrices in . Similar rotation-matrix derivatives are widely used in scan matching, for example, by Hong and Lee (2017):
6
7
8
In solving Equation (4), we use a WLS formulation with the weighting matrix W constructed from the voxel-mean covariance matrices. Thus, W = Σ−1, where Σ is a block diagonal with each block covariance (j)Σ corresponding to one voxel j. The block covariances reflect the uncertainty in the difference between voxel means:
9
Once Σ is estimated (see Section 3.2) and inverted to obtain W, the standard WLS solution can be computed iteratively with the following equation (Simon, 2006):
10
After each iteration of Equation (10), the state estimate is updated until a reasonable convergence criterion is satisfied.
The WLS formulation employed by ICET makes accuracy prediction straightforward in a computational sense. Applying the standard WLS formulation (Simon, 2006), we obtain the state-error covariance as follows:
11
An important detail in our implementation of ICET is that we use computationally efficient forms to obtain the solution (Equation (10)) and the covariance (Equation (11)). Given the block-diagonal structure of the variables, it is possible to frame Equation (10) in an equivalent but more computationally efficient manner, by eliminating many multiplications by zero. The equivalent form recognizes the following:
12
and
13
In addition to accelerating the computation, the evaluation of Equation (10) using Equations (12) and (13) is also less memory-intensive because sequentially processing small blocks avoids the construction of the arbitrarily large matrices H and W. In the process, the value of the covariance matrix (Equation (11)) is automatically obtained from Equation (12). Accordingly, the ICET covariance estimate is significantly easier to compute than the NDT covariance estimate derived by Stoyanov et al. (2012).
3.2 Refinement of Measurement Covariance
Although computationally straightforward to evaluate, the solution covariance (Equation (11)) is only as reliable as the estimated measurement covariances Σ. To obtain a good estimate of Σ, it is important to carefully consider the random and deterministic factors that distribute lidar samples within a voxel, as discussed in Section 2. In this section, we present our main contribution, the algorithmic step that identifies and excludes ambiguous measurements from extended objects, such as the example object shown in Figure 2(b).
In ICET, our approach for modeling measurement noise within each voxel is to start with the sample covariance and to then exclude directions in which the point distribution is dominated by deterministic structure. This approach allows us to capture random effects (measurement noise and discrete sampling) without a priori modeling. Residual deterministic effects remain (e.g., structural distributions for small objects), providing some additional conservatism; however, ambiguous information associated with large objects is removed, which greatly improves the statistical estimation of Σ.
Our implementation, similar to that of Stoyanov et al. (2012), starts by computing the sample covariance of point locations within each voxel. We label the sample covariance (j)Q in the new scan and (j)Q0 in the reference scan. Here, we have the following:
14
as the standard formula for the sample covariance, where the location of each point i in voxel j (for the new scan) is identified by the coordinates (i)q and the mean of these point coordinates is labeled (j)y, with a total number of points (j)N. A similar formula is used to compute the sample covariance (j)Q0 for the reference scan, noting that the number of points (j)N0 in the reference scan is generally different from that in the new scan.
Our approach diverges from that of Stoyanov et al. (2012) in that ICET uses the mean as an observable for each voxel; thus, the point-location covariance must be converted to the covariance of the mean. For a random set of independent samples, the central limit theorem (NIST, 2006) implies that the covariance of the sample mean is the true covariance divided by the number of samples, which is (j)N for the new scan and (j)N0 for the reference scan. Because the actual observable is the mean difference between the new and reference scans, as indicated by Equations (9) and (10), the contributions of both scans must be considered. By extension, assuming independent samples within each scan and between scans, the measurement covariance (j)Σ for each voxel j is as follows:
15
A minor subtlety is that the above formula substitutes the sample covariance for the true covariance (which is not known). This distinction is not significant given that the number of samples in each voxel is sufficiently high. To this end, we introduce a minimum sample count (50 points per voxel in this paper) below which a voxel is excluded from computation.
Our next step is to exclude directions from the computation for which the sample variance is unrealistically high. This step further distinguishes ICET from prior work aiming to characterize state errors in voxel-based scan matching. Specifically, ICET removes extended distribution directions by using an eigen-transformation of the point-location covariance matrix for each voxel j. The eigen-transformation generates a unitary eigenvector matrix and a diagonal eigenvalue matrix for each voxel:
16
The eigenvectors describe the principal-axis directions, whereas the eigenvalues describe the principal-axis lengths (squared) for the covariance ellipsoid. Directions for which the point distribution is dominated by deterministic structure are identified as directions for which the eigenvalue is large.
Note that the eigen-transformation is applied to (j)Q0, which describes the reference scan. Deterministic structure is embedded not only in (j)Q0, of course, but also in (j)Q and (j)Σ. We chose to focus on (j)Q0 because of computational efficiency, noting that the voxel boundaries are defined for the reference scan and thus the lidar points are consistently assigned to the same voxel in the reference scan throughout the WLS iterations of Equation (10) to convergence. By comparison, each iteration reapplies Equation (2) to transform the new scan differently on to the voxel grid; hence, point associations for the new scan change after each WLS iteration. By focusing on the reference scan, the eigen-transformation is computed infrequently, i.e., only when the reference image is updated.
The next step is to define the eigenvalue cutoff beyond which corresponding eigenvector directions are excluded from the solution. In this regard, our basic premise is that some inflation of the sample covariance due to deterministic structure is acceptable, but that a significant problem can result if the point distribution appears to extend across the voxel, exiting the voxel through two of its boundaries. Voxels are typically configured to be wide enough (20 cm or wider) to contain essentially all random noise; thus, the voxel width provides a coarse but effective indicator of reasonableness for detecting large distribution widths caused by physical structure. This concept is illustrated by Figure 3, which depicts a birds-eye view of a wall (gray rectangle) and the distribution of lidar returns (blue dots) generated by the wall. The figure depicts two rounded voxels, which are characteristic of the voxel shapes used in ICET (see Section 3.3). A covariance estimate can be constructed from the lidar points in each voxel. Notably, the covariance ellipse extends entirely across each voxel, from one voxel face to another.
To a human observer, it is evident from Figure 3 that the distribution of lidar points along the wall is dominated by deterministic object structure and not random effects; however, the statistical estimation process of Equation (15) is blind to this distinction and incorrectly infers that the distribution of points along the wall surface is random. This inference is inherently problematic because the voxel mean remains centered along the segment passing through the voxel, even for small perturbations of the lidar unit’s location, indicating that the mean conveys no useful information for localization in the along-wall direction. Using an along-wall measurement in the least-squares solution (Equation (10)) injects error but no signal. In this sense, both voxels shown in the figure are problematic unless the along-wall measurement is suppressed; the effect is even more severe for the righthand voxel than the left, because the distribution for the righthand voxel appears narrower in the along-wall direction, implying a lower variance and, accordingly, a higher weighting in the solution.
This ambiguity issue is a specific example of the aperture problem (Shimojo et al., 1989), which is well known in the computer vision literature, but has not been discussed in the lidar literature prior to the work of McDermott and Rife (2022a).
Introducing a cutoff that limits the magnitude of the eigenvalues for (j)Q0 both caps unrealistic distributions and addresses the ambiguity introduced when relatively flat objects stretch across an entire voxel. We do not recommend a fixed threshold cutoff because the amount of a large object contained inside a voxel depends on its orientation within the voxel, as shown in Figure 3. Instead, in this paper, we recommend that the cutoff be framed in terms of the relationship between the point distribution and the voxel boundaries. Namely, we propose that if the point distribution along any principal-axis direction extends past two voxel boundaries, then the associated principal-axis direction should be excluded from the solution.
Our implementation of this cutoff is inspired by sigma-point filters, also known as unscented Kalman filters (Julier & Uhlmann, 2004; Roth et al., 2016). These filters model a covariance matrix by placing two points along each principal axis (or eigenvector direction), spaced symmetrically around the mean. In our case, we represent the covariance matrix with six test points (two for each direction in 3D space), placed along each eigenvector axis at a distance of 2σ on either side of the mean, where σ is used here to indicate the square root of the eigenvalue associated with the eigenvector. If both sigma points for any given eigenvector lie outside the voxel, then ICET suppresses measurement information in that direction. Axes with only a single test point outside the voxel boundary are included in the solution, as they still contribute to localization in a meaningful way. The sigma-point test is illustrated in Figure 4.
Figure 4 illustrates the pruning process. The figure shows only the sample covariance (blue ellipsoid) and the voxel boundaries (wireframe). The lidar points used to generate the covariance are not shown. The three principal axes of the ellipsoid are labeled I, II, and III, and the sigma points are indicated as dots along the length of each axis. The figure uses three orthographic views (top, front, right) and one isometric projection to show the sigma points from different angles. For principal axis I, both sigma points lie inside the voxel; for principal axis II, one sigma point lies inside and one lies outside; for principal axis III, both sigma points lie outside. As such, the pruning algorithm will only suppress measurements in the direction of principal axis III.
Pruning is applied before Equation (10) is computed, so that eigenvector axes excluded by the test can be removed from the WLS solution. Specifically, to remove extended directions, we multiply by a projection matrix (an identity matrix with specific rows removed, as detailed in Appendix B); for each voxel j, we label the projection matrix (j)L and remove rows to suppress eigenvector directions associated with extended distribution axes. The projection removes these directions from Equation (4), one voxel at a time. The modified residual vector (j)Δỹ for each voxel j is as follows:
17
By concatenating the modified residual vectors together, one can compute the WLS solution while excluding data along extended surfaces. Unlike the strategy employed by Einhorn and Gross (2015), which preserves residuals along only the most compact distribution axis, our approach has the flexibility to capture any sufficiently compact component of a solution vector. For example, a tall and narrow world feature, such as a lamppost, provides information in two directions and is ambiguous in one (along the post’s vertical axis). Our sigma-point refinement would preserve two axes in the case of the lamppost, as visualized in Appendix B. By contrast, for a flat wall, the sigma-point refinement would preserve only one direction (normal to the wall) and reject the other two. Most importantly, our innovation is to couple axis exclusion with covariance estimation to enhance the ability of ICET to estimate the solution covariance.
3.3 Additional Solution-Conditioning Steps
Three further considerations are relevant for enhancing ICET results: defining voxel shape, checking solution rank, and detecting moving objects.
With regard to voxel definition, the current ICET implementation uses a spherical voxel grid, with one-voxel radial depth in each azimuth and elevation direction. This single radial voxel is adaptively defined by our shadow-mitigation algorithm (McDermott & Rife, 2022b) to contain the first radial point cluster. More distant objects suffer from missing data (shadows) that vary as the lidar moves. The grid is illustrated in Figure 5. The value of spherical grids for lidar processing has also been recognized in other recent publications (Hassani & Joerger, 2021). For our application, spherical coordinates are particularly advantageous in that the voxels are aligned in the same direction in which discrete sampling occurs (i.e., radially away from the sensor). Consequently, the voxel resolution can be increased in the azimuthal and elevation directions (to reduce the impact of shape effects) while retaining the ability to stretch in the radial direction to properly characterize porous substrates (such as loose foliage).
With regard to solution rank, when some measurement directions are excluded, it is important to ensure that the matrix inverse in Equation (10) can be computed. If the removal of extended axes consistently suppresses measurements in a particular direction, for instance, in the along-track direction when driving through a tunnel, then the solution becomes ill-defined (or rank-deficient) in that direction. Implementing a condition-number check can ensure that the solution is full rank (in a computational sense) and that the inverse can be computed. We implement a condition-number test by performing an eigen-decomposion on the inverse of Equation (11) to obtain a unitary eigenvector matrix and eigenvalue matrix :
18
The condition number can be calculated as the ratio of the largest to the smallest eigenvalue in Λ2. In our testing, we found a suitable threshold Tcond to be 5 × 104. If the condition number is below the threshold, a reliable solution can be generated.
If the condition number is larger than the threshold, a solution can still be computed in a reduced-dimension solution space. In the dimension reduction process, one starts with the smallest eigenvalue λ0 and removes the associated eigenvector if the condition number exceeds Tcond. The test is then repeated to remove eigenvector directions from the solution until the condition-number test is passed. This process allows for a valid solution for extreme cases in which the same measurement direction is suppressed by the sigma-point test in nearly all voxels, as might happen in the along-track direction for a straight, featureless tunnel.
With regard to moving-object detection, we use classical outlier rejection to detect and exclude voxels with large discrepancies between sequential frames, as caused by moving cars and pedestrians. This step is achieved by first allowing the algorithm to converge on a preliminary solution. Once an initial registration is achieved, voxels are flagged in which the residual distance exceeds a threshold Tmod. Flagged voxels are excluded, and the algorithm is run again to produce a final estimate. We set the threshold to Tmod = 5 cm, which is high (approximately 5σ) compared with typical measurement errors.
The full algorithm is summarized as pseudocode in Algorithm 1.
4 VERIFICATION TESTING: REAL LIDAR DATA SET
It is important to first demonstrate that the ICET algorithm is capable of robust point-cloud registration on realistic data before analyzing ICET’s estimates of solution-error covariance. In this section, we validate the performance of our algorithm on a trajectory from the LeddarTech Pixset data set (Déziel et al., 2021), which was chosen because of its high-fidelity Ouster OS1-64 lidar unit as well as the inclusion of a global navigation satellite system (GNSS)/inertial navigation system (INS) fused baseline. The selected trajectory contains 276 frames of movement through the urban environment displayed in Figure 6.
For this test, vehicle motion was obtained by estimating the transformations between each sequential pair of frames and accumulating those transformations using lidar odometry, as shown in Figure 1. No loop closure or graph optimization was implemented for these tests. The resulting trajectory is visualized in Figure 7. On the left side of the figure, the trajectory generated by ICET odometry is plotted as a white dotted curve superposed on an HD map. This map was constructed using ICET odometry estimates, with each successive lidar scan transformed by the transformation output at a given step to form a single large point cloud. The sharp resolution of the map is a strong qualitative indicator of the quality of the ICET odometry. The right side of the figure shows the ICET trajectory on north–east axes, along with GPS/INS data and with the trajectory generated by NDT odometry. There is a small but visible difference between the odometry estimates of NDT and ICET. The gap is slightly larger between the lidar odometry methods and the GPS data, especially toward the end of the trajectory (upper-right corner). Such divergence over time is expected when comparing dead-reckoning solutions (lidar odometry) with absolute positioning (GPS).
In this paper, our primary goal is to assess the quality of the ICET covariance estimate between two sequential frames, as generated by Equation (11) and conditioned by the sigma-point test. Unfortunately, real-world data are not an effective tool for assessing the quality of our lidar covariance predictions because the GPS/INS data are not sufficient as ground truth when sequential lidar frames are compared. The GPS precise point positioning system produces centimeter-level localization errors when differencing any two points in time, which provides a significant advantage when GPS data are compared with dead-reckoning over an extended duration (e.g., over the whole 276-frame trajectory shown in Figure 7); however, for sequential frames, the lidar odometry exhibits only millimeter-level localization errors, an order of magnitude smaller than those for GPS. Because the error in the GPS baseline is too high to provide an effective ground truth when estimating scan-matching error covariance for sequential frames, Section 5 explores an alternative methodology, using simulated test environments in which we have access to perfect ground truth values.
5 VERIFICATION TESTING: HIGH-FIDELITY SIMULATED DATA SET
For this test, our goal is to assess the scan-matching error for each sequential transformation. We opt to use simulated lidar point clouds, which provide direct truth data with which ICET-derived position and attitude changes can be compared. Importantly, as other authors have demonstrated (Stoyanov et al., 2012; Wu et al., 2018), the performance of scan registration algorithms on sufficiently realistic synthetic lidar point clouds is consistent with performance on similar real-world data, particularly in the case of distribution-based algorithms (such as NDT and ICET), as these methods are especially robust to sensor dropout, partial reflection, and other artifacts observed in field applications.
To this end, our second simulation environment is a high-fidelity urban scene. Specifically, we use data from the Cooperative Driving Dataset (CODD) (Arnold et al., 2021), which was generated in the CARLA open-source driving simulator (Dosovitskiy et al., 2017). The testing environment is representative of a city with detailed features absent in the abstracted simulations conducted in Section 6. Included are loose foliage, complex organic-shaped structures, and moving objects such as pedestrians and vehicles. This high-fidelity data set considers a series of lidar locations along a simulated automotive trajectory through an urban landscape and introduces randomness due to sensor noise as well as different physical geometries for each scan. In the CODD data, measurement noise is dominated by range (time of flight) error and is set to modest but representative levels. Specifically, range errors are modeled as normally distributed with a standard deviation of 1 cm.
The initial condition for each new scan is defined by zeroing the transformation vector (Equation (1)), which is equivalent to starting with the assumption that there is no motion between subsequent scans. We parsed the data in the CODD data set into 123 sequential (overlapping) scan pairs, which represented the full driving trajectory for one CODD vehicle recording data at 5 Hz over a period of 24.6 s. Figure 8 visualizes the CODD data, showing an HD map generated from the ICET odometry data as well as an overhead plot of the estimated trajectories for NDT and ICET compared against the ground truth.
For this test, we applied our full ICET algorithm (including the sigma-point refinement) as well as an NDT baseline (without sigma-point refinement) to each sequential pair of lidar scans. To provide a fair comparison, the same set of hyper-parameters was used for both NDT and ICET, including voxel size (4° resolution in azimuth and elevation), voxel boundaries, number of iterations, moving-object rejection threshold, and minimum number of points per cell (50 points). For this study, the key parameter is voxel size. As discussed in Section 2, voxels must be large enough to capture random errors, yet not so large as to envelope large objects. For this reason, Appendix A presents a voxel-size parameter study. Appendix A shows little sensitivity in performance over a wide range of voxel sizes (up to a breaking point at which performance quickly degrades if voxels are too large or too small).
Because we have a known truth from the simulation, we can compute the true motion between each sequential scan and compare with the odometry solution. These frame-to-frame motions are plotted in Figure 9. The figure contains a grid of four plots, with the first column showing frame-to-frame changes in along-track position and the second column showing frame-to-frame changes in yaw heading. The top row shows NDT results, and the bottom row shows ICET results. In the figures, the dashed black line indicates the simulated truth, the solid line indicates the odometry, and the shaded region indicates the 2σ error bound about the truth. The 2σ error bound is computed from Equation (11) and differs slightly between NDT and ICET because the voxel-level covariance estimates are different in each case.
Notably, the ICET bounds are more representative of the actual errors than the NDT bounds. For both algorithms, Figure 9 indicates that the predicted 2σ error bounds grow and shrink as the vehicle passes through the simulated environment; however, the change in bound size throughout the trajectory is far more pronounced in ICET than in NDT. In ICET, the widest translational error bounds fall between frames 40 and 70, where there are large extended surfaces (fences and foliage) parallel to the road on both sides of the vehicle. The ICET bounds then decrease in size after frame 70, presumably because the vehicle rounds the corner where small-scale features become visible to the lidar sensor. In contrast, the NDT-estimated error bounds are relatively consistent in magnitude throughout the same range, presumably because NDT is mischaracterizing ambiguous lidar measurements along large surfaces and integrating them into the solution and covariance estimate. In theory, the 2σ bounds should contain roughly 95% of the estimated states. For NDT, the bounds only contain the translation error 40% of the time (73 out of 123 instances); by comparison, the ICET bounds contain the translation error 93% of the time (115 of 123 instances). The results are similar for rotation error, with NDT bounds containing the rotation error only 77% of the time (95 of 123 instances) and ICET bounding the rotation error 95% of the time (117 of 123 instances). In short, the ICET-predicted covariance metrics are reasonably representative of the true solution error on realistic point-cloud data, while those provided by NDT are not.
6 VERIFICATION TESTING: ABSTRACTED GEOMETRIES
As a final test, we consider the corner case, in which ICET’s sigma-point refinement might remove all of the data in one or more solution coordinates. This situation might arise, for example, in a straight and featureless tunnel, where there are no valid lidar measurement data in the along-track direction. In such a case, we expect ICET to still function, and, moreover, to determine that the solution is unavailable in the ambiguous direction. Notably, the issue of insufficient lidar scan-matching data along one or more axes has recently been identified as a problem for other algorithms as well (Tuna et al., 2022)).
We consider three corner cases, each constructed from an abstracted geometry. The geometries include a T-intersection, a straight tunnel, and an open field. In each case, we synthesize lidar measurements via the MATLAB Automated Driving Toolbox. The three geometries are depicted in Figure 10. For each geometry, the figure shows a pair of scans: a reference scan (blue) and a perturbed scan (red) that must be aligned to the reference scan. In using MATLAB to generate these scans, we applied zero-mean Gaussian noise with a 0.2-cm standard deviation in each of the x, y, and z directions.
The abstracted geometries were designed specifically to check ICET’s sigma-point exclusion and rank-testing capabilities, as described in Equation (18). The T-intersection consists of extended walls in all directions. No large-scale ambiguity exists, as data exist to constrain all six transformation states, including the translation estimates in the x, y, and z directions and rotation estimates in ϕ, θ, and ψ. By contrast, the straight tunnel provides useful information in all translation and rotation directions except in the road-aligned (or y) direction. The flat plane provides useful information only in the vertical, pitch, and roll directions and not in the horizontal or yaw directions. We hypothesize that the the sigma-point test will detect walls (and remove wall-coincident measurements) in all three simple scenes; however, only the latter two scenes are expected to fail the rank-checking test related to Equation (18). Note that we avoid edge effects by sampling the reference and new scans away from the boundaries of the simulated domain
In these abstracted geometry tests, we do not create a full trajectory; instead, we repeatedly generate new noise for the two scans via a Monte Carlo simulation. For each Monte Carlo trial, the initial guess is generated from a zero-mean, independent Gaussian distribution with a standard deviation of 12.5 cm in {x, y, z} and a standard deviation of 1.7° in {ϕ, θ, ψ}. In total, we generated 500 Monte Carlo trials for each abstracted geometry, each with a unique seed for measurement noise and randomized initial translation and rotation.
For each Monte Carlo trial, we ran two algorithms to conduct scan matching: the full ICET algorithm as well as a comparable implementation of NDT, similar in all respects to the ICET algorithm except for the sigma-point refinement and the related condition-number test of Equation (18). Again, the voxel grid was defined with 4° resolution in azimuth and elevation, with a minimum of 50 points per voxel. Error statistics were compiled over all trials, with the error computed as the difference of each state compared with the truth. Root-mean-square error (RMSE) values are presented in Table 1 for the T-intersection, Table 2 for the tunnel, and Table 3 for the plane. The tables list the results for the NDT baseline (first two rows of data) followed by the results for the ICET implementation (last two rows of data). Statistical RMSE values are reported in the tables, alongside predicted values, extracted from the diagonal of P as computed analytically by Equation (11).
As expected, ICET’s sigma-point refinement excludes measurements, and when all measurements in a given coordinate direction are lost, the condition-number test excludes those axes from the solution. This result can be clearly seen from the tables. For cases in which the condition-number test removes dimensions from state estimation, those dimensions are labeled in the table as “ Marked DNU,” where DNU stands for “Do Not Use.” Table 2 indicates that ICET correctly excluded the y component in the tunnel (for all Monte Carlo trials), while NDT confidently reported incorrect estimates. Similarly, in the open-field scene, ICET successfully excluded all x, y, and ψ solution components.
The standard deviations for scan-matching error that are reported in the tables are favorable (sub-millimeter translation errors and millidegree rotation errors). Notably, these error levels are consistent with results reported in prior NDT simulation studies, particularly those reported by Stoyanov et al. (2012). Low scan-match errors are critical for lidar odometry applications, where the standalone dead-reckoning error is an accumulation of the scan-match errors over time.
Importantly, the tables clearly show that ICET can reliably predict accuracy in these scenes and that the prediction is subtly improved by the sigma-point refinement step. With no refinement, P consistently underpredicts the error. Consider the T-intersection data in Table 1, where, for each state, the predicted standard deviation is lower than the actual standard deviation for the NDT case, with a percent difference between –2% (for ψ) and –10% (for θ) except for one extreme case with a relative error of –93% (for y). By comparison, the predictions for ICET are not consistently above or below the actual values, underpredicting φ and z by −3.9% and −4.6% and slightly overpredicting for the other cases, up to +4.2% (for y).
Although the effect of the sigma-point refinement is generally subtle for the T-intersection, the difference in the y (along-track) direction is truly striking. Table 1 shows that the NDT case (with no extended surface suppression) underpredicts the y standard deviation by a factor of nearly 2. The issue here is that the majority of voxels contain walls aligned with the along-track direction. Although the measurements in these directions have a large sample covariance (driven by voxel width), these large-covariance voxels are not significantly de-weighted by W, because there are very few voxels with “good” information in the along-track direction. As a result, NDT becomes overconfident in its along-track estimate. By comparison, the sigma-point refinement entirely removes along-wall measurements; thus, they are not even considered in the weighting matrix or the computation, resulting both in lower error and a better prediction of that error.
The effect of sigma-point refinement is even more pronounced in the straight-tunnel and open-field scenes, where strong ambiguities are present. In these scenes, the full ICET algorithm reports that the solution quality is poor for regions in which ambiguity exists (as denoted by the “Excluded” cells in the tables). Without the refinement, NDT produces a somewhat arbitrary error that is highly correlated to the initial condition. Recall that the initial-condition error was 12.5 cm in translation and 1.7° in rotation, which is reflected by the starred values in the tables (10.4 cm in translation and 1.7° in rotation). In these cases, the sigma-point refinement plays a critical role by recognizing and reporting ambiguity. As a side effect, the sigma-point refinement also converges more quickly (within five iterations for the open-field case for full ICET as opposed to 12 iterations for NDT).
Sigma-point refinement both reduces estimation error and improves the ability of the algorithm to predict the accuracy of its solution. The degree to which the true and predicted accuracies of the algorithm are improved is a function of the scene geometry and voxel size. For instance, in the T-intersection case, a sufficient number of distributions are artificially limited by voxel width in the y direction such that the sigma-point refinement greatly impacts both the actual and predicted error in the y direction. In the same scene, however, the accuracy and accuracy prediction in the z direction are similar regardless of whether the sigma-point refinement is employed.
It is worth noting that our processing runs in near real time. On a conventional desktop with a graphics processing unit (GPU; RTX 3090), we ran ICET on the high-resolution Leddertech experimental data at a rate of 5 scans/s. Our implementation ran with a single thread, and although we did not implement a parallelized solution, it should, in concept, be possible to run ICET on multiple parallel processors to match the data acquisition rate of 10 or 20 scans/s, because the block-diagonal structure of the key equations allows processing to be easily decomposed (McDermott & Rife, 2022a).
If 5 scans/s of high-resolution (Leddertech) data can be processed as a single threaded process (on a conventional laptop with a GPU), then ICET can essentially run in real time. To reach the higher scan rates possible with many lidar units (e.g., 10–20 scans/s), more processors and parallelization would be required. This has not yet been tested but should be relatively straightforward to implement because the block-diagonal structure of the key equation, as discussed by McDermott & Rife (2022a), allows the processing to be decoupled across many processors.
7 CONCLUSION
A topic for future work is to identify and address large-scale ambiguities that may appear in lidar scenes. Although our proposed strategy is capable of identifying local structure and global ambiguity that results in a rank deficiency, other types of ambiguity are not directly addressed by ICET. Most notably, ICET does not flag scenes containing a lattice of repeated features (e.g., driving past regularly spaced columns). Such a scene would result in an integer ambiguity problem, much like the problem of resolving wavelength in carrier-phase processing for GNSS (Enge, 1994). Ambiguity may also be masked by the WLS linearization. For instance, if a lidar is mounted on a vehicle moving along a curved tunnel, the along-track translation and yaw rotation should be coupled, and an ambiguous combination of states should be identified by the condition-number test; however, the condition-number test may not recognize this ambiguity if the curve is sufficiently sharp. New algorithm extensions are needed to identify such challenging cases of global ambiguity.
The main result of this paper was the introduction of ICET, an algorithm for scan matching with 3D lidar point-cloud data. As a lidar processing algorithm, ICET is similar to the well-known NDT algorithm, with the additional capability to recognize differences between randomness and deterministic structure in voxel point distributions and to leverage that information to construct a more representative model of the solution-error covariance. ICET is formulated via a least-squares method, supplemented by a sigma-point test to identify point distributions that are dominated by deterministic structure rather than measurement randomness. The 3D ICET algorithm was applied to real data (to demonstrate proof-of-concept), high-fidelity simulations (to study error properties with perfect ground truth), and corner-case simulations (to demonstrate the significance of dimension reduction for the solution using a condition-number test).
HOW TO CITE THIS ARTICLE
McDermott, M., & Rife, J. (2024). ICET online accuracy characterization for geometry-based laser scan matching. NAVIGATION, 71(2). https://doi.org/10.33012/navi.647
CONFLICT OF INTEREST
The authors declare no potential conflicts of interest.
ACKNOWLEDGMENTS
The authors wish to acknowledge and thank the U.S. Department of Transportation (DOT) Joint Program Office and the Office of the Assistant Secretary for Research and Technology for sponsorship of this work. We also gratefully acknowledge the National Science Foundation (NSF; grant CNS-1836942), which supported specific aspects of this research. Opinions discussed herein are those of the authors and do not necessarily represent those of the DOT, NSF, or other affiliated agencies.
A APPENDIX: PARAMETRIC STUDY OF SPHERICAL VOXEL RESOLUTION
As noted in Section 6, voxel size is a key parameter, as the sigma-point algorithm will not function correctly if voxels are too small to capture the random error distribution or too large to effectively exclude extended objects. Given the importance of voxel size to our concept, this Appendix presents a sensitivity study over a range of voxel sizes.
In this parameter-sensitivity study, we use the CODD data set to characterize error performance as a function of voxel size. The voxel grid is a one-cell-deep spherical grid, where the radial boundaries of each cell are adapted to the point distribution. The number of voxels is governed by the angle subtended by each voxel. In this sensitivity study, we consider angles between 2.7° and 10°. Solution-error values were computed over the full trajectory studied in Section 6, and mean-absolute error values were tabulated as a function of resolution. Figure 11 plots error values (mm) as a function of resolution (degrees of voxel width). The figure demonstrates that the effect of changing the resolution is relatively small over a wide range of angles. Between 3.5° and 7°, the curve is nearly flat, with mild curvature resulting in a minimum error for a resolution of 4°.
Although performance is a very weak function of voxel width over a wide range of resolutions, the figure shows a dramatic loss of performance at the extremes, when the voxel width decreases below 3° or increases above 9°. When the voxel resolution grows larger than 9°, presumably the cells are so coarse that they fit distributions to multiple surfaces occupying the same voxel, thereby treating deterministic scatter in data points as random noise. When the voxel resolution is set to less than 3.5°, voxels are less effective in capturing the extent of randomness and, perhaps more importantly, become data-starved because they cannot capture points from more than one scan line at a time. Thus, we would expect less sensitivity on the left side of the plot if we were to use a lidar sensor with more beams to provide higher resolution across elevation angles.
B APPENDIX: CONSTRUCTING THE VOXEL PROJECTION MATRIX
The projection matrix L eliminates the solution in extended-axis directions. In matrix terms, an axis is pruned by multiplying the measurement (rotated to align with principal-axis coordinates) by an identity matrix and then selectively removing the rows from the identity matrix corresponding to the pruned axis. This process is illustrated in Figure 12. The figure illustrates three point distributions: a compact point distribution (case I), a lamppost-like cylindrical point distribution (case II), and a point distribution acquired from a large flat wall (case III).
For each case, the objects and simulated lidar points are shown in the top row of the figure. At the voxel level, each point distribution is described by a covariance ellipse centered at the voxel mean, as shown in the middle row of the figure. The corresponding L matrix is shown in the last row. The compact point distribution (case I) has no extended axes; thus, the corresponding L matrix is the identity matrix. The cylindrical point distribution (case II) is pruned in the vertical direction; hence, the L matrix is truncated to remove the corresponding direction (third eigen-coordinate). The extended wall (case III) lies outside the voxel along two principal-axis directions, and consequently, those directions are removed from the corresponding L matrix, leaving only one remaining direction (the first eigen-coordinate).
Footnotes
↵1 On non-flat surfaces, the width of the lidar beam can result in multiple returns or multipath (Bhandari et al., 2014). For example, the Velodyne VLP-16 unit has a beam width of 8 cm at a range of 25 m (Velodyne LIDAR, 2019). This Velodyne unit can be configured to capture either the first return or the strongest return. When viewing a corner or rough surface, toggling this setting adjusts the expected value of the range measurement.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.