ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching

Distribution-to-Distribution (D2D) point cloud registration algorithms are fast, interpretable, and perform well in unstructured environments. Unfortunately, existing strategies for predicting solution error for these methods are overly optimistic, particularly in regions containing large or extended physical objects. In this paper we introduce the Iterative Closest Ellipsoidal Transform (ICET), a novel 3D LIDAR scan-matching algorithm that re-envisions NDT in order to provide robust accuracy prediction from first principles. Like NDT, ICET subdivides a LIDAR scan into voxels in order to analyze complex scenes by considering many smaller local point distributions, however, ICET assesses the voxel distribution to distinguish random noise from deterministic structure. ICET then uses a weighted least-squares formulation to incorporate this noise/structure distinction into computing a localization solution and predicting the solution-error covariance. In order to demonstrate the reasonableness of our accuracy predictions, we verify 3D ICET in three LIDAR tests involving real-world automotive data, high-fidelity simulated trajectories, and simulated corner-case scenes. For each test, ICET consistently performs scan matching with sub-centimeter accuracy. This level of accuracy, combined with the fact that the algorithm is fully interpretable, make it well suited for safety-critical transportation applications. Code is available at https://github.com/mcdermatt/ICET


INTRODUCTION
Scan matching is the task of finding the rigid transformation that best aligns two point clouds.Scan matching can be used to gain insight on the motion of a vehicle and to build HD Maps of an environment.In this paper we introduce the Iterative Closest Ellipsoidal Transform (ICET), a novel algorithm that is capable of robust point-cloud registration and accurate prediction of its own solution error as a function of scene geometry.Accurately modeling solution accuracy is useful for sensor fusion, and necessary for safety-of-life navigation tasks.Additionally, ICET can identify degenerate cases where there is insufficient information to constrain one or more components of its solution and suppress the corresponding output without adversely impacting solution accuracy about other axis.
At its core, ICET uses a weighted least-squares approach for scan matching and accuracy prediction.Like the popular 3D-NDT D2D (Stoyanov et al., 2012) algorithm, which obtains its solution through gradient-descent optimization, our approach 0 Abbreviations: D2D, Distribution-to-Distribution; ICP, Iterative Closest Point; DNN, Deep Neural Network arXiv:2306.08690v1[cs.RO] 14 Jun 2023 also uses a voxel grid to subdivide a full LIDAR scan into smaller point clusters, which can be aligned between sequential scans to infer relative motion.
An important difference is that our approach further analyzes the point distribution within each voxel to distinguish between noise and deterministic structure.ICET addresses this noise/structure distinction and thereby excludes ambiguous measurements and generates a better approximation of the measurement-noise covariance matrix.The measurement-noise covariance is then used in two ways: as a weighting matrix to enhance solution accuracy and as an input for predicting the solution-error covariance.
The key contribution of the algorithm is combining noise/structure differentiation with covariance prediction in voxel-based LIDAR scan matching.We believe this characteristic of ICET is new in the research literature, although elements have been presented in the past.For instance, though solution-error covariance is not generally predicted in most NDT implementations, an approach has been presented previously by Stoyanov et al. (2012), who extended Censi's method (Censi, 2007) to the specific case of an NDT cost function.However, this approach was applied to voxel measurement distributions with no consideration for the differences between random measurement noise and the systematic spread of points associated with object shape.Conversely, the NDT variant implemented by Einhorn et al. seeks increased accuracy by suppressing measurements except in the the voxel distribution's most compact principal-axis direction (Einhorn & Gross, 2015); however, their approach does not tie ambiguity mitigation to covariance estimation.In a similar vein, DMLO combines geometric methods with an Artificial Neural Network (ANN) to account for distribution shape, but not in the context of predicting solution-error covariance (Z.Li & Wang, 2020).Putting the noise/structure distinction in the context of covariance prediction is particlarly important for driving applications, where tunnels and highways with long flat walls can introduce ambiguities for voxel-based scan matching algorithms and cause bad localization performance (Wen et al., 2018;Zhen & Scherer, 2019).
The cornerstone of the ICET implementation is its use of a standard least-squares methodology for deriving motion states from LIDAR data.Least-squares implementations of NDT (Takeuchi & Tsubouchi, 2006) are significantly simpler than traditional Point-to-Distribution (P2D) implementations of NDT, which obtain solutions through gradient-descent optimization.Least-squares techniques (Simon, 2006) are widely known, easy to implement, and computationally efficient.Importantly, the relationship between the input measurement-noise covariance and the output solution-error covariance is well understood for least-squares methods.This relationship relies on the assumption that the noise inputs to least-squares processing are strictly random in nature, as is assumed in traditional NDT (Biber & Straßer, 2003;Stoyanov et al., 2012).In reality, the spread of points within each voxel is due to a combination of random noise and deterministic structure.The key point of this paper is that the later must be suppressed in order to achieve robust error characterization.
The interpretable nature of the ICET algorithm and its enhanced noise modeling make it particularly well suited for safety-oflife applications like automated driving.Developing a rigorous safety case usually requires the use of transparent algorithms, to identify and bound the risks of rare but potentially hazardous situations, which represent a risk to navigation integrity (Bhamidipati & Gao, 2022;Zhu et al., 2022).By comparison, methods that rely on machine learning, like LO-Net (Q.Li et al., 2019) and VertexNet (Cho et al., 2020), are not currently interpretable, which makes them difficult to characterize through a rigorous safety case.With regard to safety certification, voxel-based methods like NDT and ICET are also advantageous in that they do not introduce significant risks associated with data association (or correspondence) between features drawn from different LIDAR scans.As pointed out by Hassani et al. (2018), data association issues make the safety case very challenging when matching landmarks, features, or points from LIDAR scans, as in the broad class of Iterative Closest Point (ICP) algorithms such as Generalized-ICP (Segal et al., 2009), LOAM (Zhang & Singh, 2014), and I-LOAM (Park et al., 2020).
The remainder of this paper introduces a 3D formulation of ICET and verifies it through simulation.The next section, Section 2, provides additional background related to noise-modeling for voxel-level LIDAR point distributions.Section 3 provides the key equations for the 3D formulation of ICET.Section 4 evaluates the performance of ICET against NDT on a real LIDAR dataset.This analysis provides a proof-of-concept that ICET processes real-world data effectively; however, the GPS reference data (cm-level errors between epochs) is significantly less accurate than the LIDAR (mm-level errors between scans).As such, section 5 performs a similar test using simulated LIDAR, where perfect ground truth is available.Section 6 contains a final corner-case simulation to show how ICET identifies rare instances where there is insufficient information to constrain solutions about all axes (as in the case of traveling through a straight tunnel); in these cases, ICET provides as complete a solution as possible given detected ambiguity.Finally, section 7 discusses limitations of ICET and possible future enhancements.

DEFINING THE PROBLEM
This section visualizes the fundamental problem that motivates the ICET algorithm: separating deterministic object shape from random measurement noise, effects that together determine the distribution of LIDAR points across a voxel.
In modeling voxel-level noise, one important consideration is intrinsic measurement noise, caused by the sensor itself.For example, errors are introduced by range-measurement circuitry and (for rotating units) by encoders that measure LIDAR orientation.Noise levels (e.g., one-sigma error) associated with intrinsic noise can be characterized in device specifications published by the manufacturer.As a visualization of intrinsic noise, consider the LIDAR points (red dots) in Figure 1A.The surface is completely smooth, but LIDAR samples are scattered about the surface due to small errors in measuring time of flight.
In modeling voxel-level noise, a second important consideration is discrete sampling on rough surfaces.Local surface texture on the scale of the spatial resolution of the LIDAR results in measurement variations amplified by multipath 1 .Surface roughness results in additional apparent noise because the sampled locations change continuously due to LIDAR motion.Because these effects are highly dependent on the nature of the surface creating the LIDAR reflections, these effects are difficult to characterize in a general way, so they are not characterized in the system specification.In Figure 1B, a rock formation is observed with significant surface roughness, which amplifies the effective noise above the levels observed in Figure 1A.
In addition to these random effects, the physical shape of the reflecting surface also influences the LIDAR point distribution.When viewing small objects, as in Figure 1A and 1B, the distinction between random noise and deterministic object structure is not significant; however, when the object becomes wide, as in 1C, then the object shape becomes significant, and the distinction between random noise and deterministic structure cannot be ignored.
The combination of random and deterministic factors makes it somewhat difficult to model measurement noise.The most straightforward approach is to use manufacturer specifications, but these underestimate the actual measurement covariance Glennie & Lichti (2010), presumably because they reflect only intrinsic noise (Figure 1A) and not discrete sampling effects on rough surfaces (Figure 1B).As an alternative, the noise covariance can be estimated statistically from the distribution of points in a voxel, as was done in Stoyanov et al. (2012).Then the resulting covariance captures both intrinsic noise and discrete sampling effects (as shown in Figure 1B); however, this statistical estimate also treats large-scale shape effects as noise (as visible in Figure 1C).An artificially large estimate of the noise covariance results.(The computed covariance along the extended surface of Figure 1C is essentially equal to the moment of inertia for a long rod.) A compromise solution is to model the noise covariance based on the distribution of points in the voxel, but only after excluding distribution directions that stretch across the length of the voxel.As long as the voxel is large enough to contain typical noise distributions, but not significantly larger, then the LIDAR point distribution will be dominated by random effects as long as extended directions are excluded.We take this approach in ICET.

FIGURE 1
Voxel point distributions featuring (A) sensor noise, (B) surface roughness, and (C) extended surfaces 1 On non-flat surfaces, the width of the LIDAR beam can result in multiple returns, or multipath (Bhandari et al., 2014).As an example the Velodyne VLP-16 unit has a beam width of 8 cm at a range of 25 m (Velodyne LiDAR, 2019).This Velodyne unit can be configured to capture either the first return or the strongest return.When viewing a corner or rough surface, toggling this setting adjusts the expected value of the range measurement.

IMPLEMENTATION
This section details the ICET algorithm, first covering the basic algorithm (Section 3.1), then discussing the noise-modeling process (Section 3.2), and closing with additional refinements needed to process real-world data (Section 3.3).The development is a refined and more complete version of our conference paper (McDermott & Rife, 2022a).Importantly, Section 3.2 details the paper's key contribution, which is related to suppressing extended-axes of the point-cloud distribution in order to provide a more reliable estimate of measurement noise.

ICET Least-Squares Solution in 3D
ICET generates a solution vector that represents the translation and rotation of a new scan to best align with an earlier scan, assuming both scans capture roughly the same stationary (or near stationary) objects and terrain features.The methodologies presented in this section are, for the most part, representative of those used in a wide range of existing NDT implementations.
In ICET, the solution vector x includes Cartesian translations { , , } and body-fixed XYZ Euler angles { , , } needed to align the scans.

x =
(1) Combining the rotation and translation, it is possible to map the position vector ( ) for each point in the new scan into the coordinate system of the earlier (or reference) scan.The transformed point positions, labeled ( ) , are as follows.
In this equation, the matrix is the 3D rotation matrix constructed from the roll, pitch, and yaw angles { , , }.
Points are clustered using an arbitrarily shaped grid of voxels, centered on the location of the lidar.The grid is used because it mitigates data association fragility between scans, a limitation of early scan-matching algorithms such as the Iterative Closest Point algorithm (Besl & McKay, 1992;Chen & Medioni, 1991).By assigning points to voxels, it is possible to compute a mean position ( ) ∈ ℝ 3×1 for the new scan points and a mean position ( )  0 ∈ ℝ 3×1 for the reference scan points in each voxel .Similarly, covariance matrices ( ) and ( ) 0 ∈ ℝ 3×3 can be constructed for each voxel.Note that the 0-subscript is used to identify reference-scan variables and distinguish them from new-scan variables.Once means ( )  0 for each voxel in the reference scan are computed, they are concatenated into one large observation vector 0 ∈ ℝ 3 ×1 ; the states are then tuned so that the observation vector 0 is approximated as closely as possible by a similar concatenated vector constructed from the new scan = ( ) ∈ ℝ 3 ×1 , where the ( ) function is the vertical concatenation of repetitions of ( ) ( ) ∈ ℝ 3×1 , which are the new-scan voxel means computed after the new scan is transformed by (2).
ICET uses an iterative least-squares approach to estimate the state as ̂ , leveraging the Taylor series expansion: Starting with an initial guess of ̂ , the above equation is solved assuming second and higher-order terms are small, to obtain the state correction x.This Newton-Raphson operation is repeated to a reasonable convergence criterion, to obtain a final leastsquares state estimate ̂ .Though the Newton-Raphson method does not guarantee global convergence, this solution approach and related gradient-descent algorithms are widely used in LIDAR scan-matching, and countless implementations demonstrate their reliable convergence in practice.Solving (4) requires that the Jacobian be evaluated for each iteration.In 3D, the Jacobian matrix ∈ ℝ 3 ×6 can be constructed as a vertical concatenation of submatrices ( ) ∈ ℝ 3×6 , defined as follows for each voxel .
Here the vectors ( ) are defined ( ) = R ( ) y , where ( ) are the new-scan voxel-mean vectors in their original coordinates system (i.e., the mean of the ( ) for all points inside the voxel ), and where the rotation matrix derivatives are the following matrices in ℝ 3×3 .Similar rotation-matrix derivatives are used widely in scan matching, for example by Hong & Lee (2017).
In solving (4) we use a weighted-least squares (WLS) formulation with the weighting matrix constructed from the voxelmean covariance matrices.That is = −1 where is block diagonal with each block covariance ( ) corresponding to one voxel .The block covariances reflect the uncertainty in the difference between the voxel means.
Once is estimated and inverted to obtain , the standard WLS solution can be computed iteratively with the following equation (Simon, 2006).
After each iteration of (10), the state estimate is updated (x → x + x) until a reasonable convergence criterion is satisfied.
The WLS formulation employed by ICET makes accuracy prediction straightforward in a computational sense.Applying the standard WLS formulation (Simon, 2006), the state-error covariance An important detail in our implementation of ICET is that we use computationally efficient forms to obtain the solution (10) and the covariance (11).Given the block diagonal structure of the variables, it is possible to frame (10) in an equivalent but more computationally efficient manner, by eliminating many multiplications by zero.The equivalent form recognizes that and In addition to speeding computation, the evaluation of ( 10) using ( 12) and ( 13) is also less memory intensive, because sequentially processing small ℝ 6×6 blocks avoids the construction of the arbitrarily large matrices and .In the process, the value of the covariance matrix ( 11) is automatically obtained from ( 12).Accordingly, the ICET covariance-estimate is significantly easier to compute than the NDT covariance-estimate derived in Stoyanov et al. (2012).

Refinement of Measurement Covariance
Though computationally straightforward to evaluate, the solution covariance ( 11) is only as reliable as the estimated measurement-covariances .In order to obtain a good estimate of , it is important to think carefully about the random and deterministic factors that distribute LIDAR samples within a voxel, as discussed in Section 2. In this section we present our main contribution, the algorithmic step that identifies and excludes ambiguous measurements from extended objects, such as the example object shown in Figure 1B.
In ICET, our approach for modeling measurement noise within each voxel is to start with the sample covariance, and then to exclude directions where the point distribution is dominated by deterministic structure.This approach allows us to capture random effects (measurement noise and discrete sampling) without a prior modeling.Residual deterministic effects remain (e.g., the structural distributions for small objects), and these add some amount of additional conservatism; however, ambiguous information associated with large objects is removed, which greatly improves statistical estimation of .
Our implementation, like that of Stoyanov et al. (2012), starts by computing the sample covariance of point locations within each voxel.We label the sample covariance ( ) in the new scan and ( ) 0 in the reference scan.Our approach begins to diverge from (Stoyanov et al., 2012) in that ICET uses the mean as an observable for each voxel; as such, the point-location covariance must be converted to the covariance of the mean.For a random set of independent samples, the central limit theorem (NIST, 2006) implies that the covariance of the sample mean vector is the true covariance divided by the number of samples, which is ( ) for the new scan and ( ) 0 for the reference scan.Since the actual observable is the mean difference between the new and reference scans, as indicated by ( 9) and ( 10), the contributions of both scans must be considered.By extension, assuming independent samples within each scan and between scans, the measurement covariance ( ) for each voxel is A minor subtlety is that the above formula substitutes the sample covariance for the true covariance (which is not known); the distinction is not significant given that the number of samples in each voxel is sufficiently high.To this end we introduce a minimum-sample count (50 points per voxel in this paper) below which the voxel is excluded from computation.Our next step is to exclude directions from the computation where the sample variance is unrealistically high.This step further distinguishes ICET from prior work that has attempted to characterize state-error in voxel-based scan matching.Specifically, ICET removes extended distribution directions using an eigentransformation of the point-location covariance matrix for each voxel .The eigentransformation generates a unitary eigenvector matrix ( ) ∈ ℝ 3×3 and a diagonal eigenvalue matrix ( ) ∈ ℝ 3×3 for each voxel.
The eigenvectors describe the principal-axis directions, whereas the eigenvalues describe the principal-axis lengths (squared) for the covariance ellipsoid.Directions where the distribution of points is dominated by deterministic structure are identified as directions where the eigenvalue is large.Note that the eigentransformation is applied to ( ) 0 , which describes the reference scan.Deterministic structure is embedded not only in ( )  0 , of course, but also in ( ) and ( ) .The reason to focus on ( ) 0 is computational efficiency, noting that the voxel boundaries are defined for the reference scan, so the LIDAR points are consistently assigned to the same voxel in the reference scan throughout the WLS iterations of (10) to convergence.By comparison, each iteration re-applies (2) to transform the new scan differently on to the voxel grid, so point associations for the new scan change after each WLS iteration.By focusing on the reference scan, the eigentransformation is computed infrequently, only when the reference image is updated.
The next step is to define the eigenvalue cutoff beyond which corresponding eigenvector directions are excluded from the solution.In this regard, our basic premise is that some inflation of the sample covariance due to deterministic structure is acceptable, but that a significant problem can result if the point distribution appears to extend across the voxel, exiting the voxel through two of its boundaries.Voxels are typically configured to be wide enough (20 cm or wider across) to contain essentially all random noise; as such voxel width provides a coarse but effective reasonableness check to detect large distribution widths caused by physical structure.This concept is illustrated by Figure 2, which depicts a birds-eye view of a wall (gray rectangle) and the distribution of LIDAR returns(blue dots) generated by the wall.The figure depicts two rounded voxels, which are characteristic of the voxel shapes used in ICET (see next section).A covariance estimate can be constructed from the LIDAR points in each voxel.Notably the covariance ellipse extends entirely across each voxel, from one voxel face to another.
To a human observer, it is evident from the figure that the distribution of LIDAR points along the wall is dominated by deterministic object structure and not random effects; however, the statistical estimation process of ( 14) is blind to the distinction and infers incorrectly that the distribution of points along the wall surface is random.This inference is inherently problematic because the voxel-mean remains centered along the segment passing through the voxel even for small perturbations of the LIDAR unit's location, which means that the mean conveys no useful information for localization in the along-wall direction.Using an along-wall measurement in the least-squares solution (10) injects error but no signal.In this sense, both voxels shown in the figure are problematic unless the along-wall measurement is suppressed; the effect is even more severe for the righthand voxel than the left, because the distribution for the righthand voxel appears narrower in the along-wall direction, implying a lower variance and, accordingly, a higher weighting in the solution.This ambiguity issue is a specific example of the aperture problem (Shimojo et al., 1989), which is well known in the computer vision literature, but which has not been discussed in the LIDAR literature prior to the work of McDermott & Rife (2022a).
Introducing a cutoff that limits the magnitude of the eigenvalues for ( ) 0 both caps unrealistic distributions and addresses the ambiguity introduced when relatively flat objects stretch across an entire voxel.We do not recommend a fixed threshold cutoff, because the amount of the large object contained inside a voxel depends on its orientation within the voxel, as shown in Figure 2. Rather, in this paper, we recommend that the cutoff be framed in terms of the relationship between the point distribution and the voxel boundaries.Namely, we propose that if the point distribution along any principal-axis direction extends past two voxel boundaries, then the associated principal-axis direction should be excluded from the solution.
Our implementation of this cutoff is inspired by sigma-point filters, also known as Unscented Kalman Filters or UKFs (Julier & Uhlmann, 2004;Roth et al., 2016).They model a covariance matrix by placing two points along each principal axis (or eigenvector direction), spaced symmetrically around the mean.In our case, we represent the covariance matrix with six test points (two for each direction in 3D space), placed along each eigenvector axis at a distance of 2 on either side of the mean, where is used here to refer to the square-root of the eigenvalue associated with the eigenvector.If both sigma points for any given eigenvector lie outside the voxel, then ICET suppresses measurement information in that direction.Axes with only a single test point outside the voxel boundary are included in the solution, as they still contribute to localization in a meaningful way.The sigma-point test is illustrated in Figure 3.
Figure 3 illustrates the pruning process.The figure shows only the sample covariance (blue ellipsoid) and the voxel boundaries (wireframe).The LIDAR points used to generate the covariance are not shown.The three principal axes of the ellipsoid are labeled I, II, and III, and the sigma points are indicated as dots along the length of each axis.The figure uses three orthographic views (top, front, right) and one isometric projection to show the sigma points from different angles.For principal axis I, both sigma points lie inside the voxel; for principal axis II, one sigma point lies inside and one outside; for principal axis III, both sigma points lie outside.As such, the pruning algorithm will only suppress measurements in the direction of principal axis III.
Pruning is applied before computing (10), so that eigenvector axes excluded by the test can be removed from the WLS solution.The specific process for removing extended directions is, for each voxel , to introduce a projection matrix ( ) , which preserves only the eigenvector directions associated with compact distribution axes (which do not extend outside the confines of their respective cell boundaries).The projection removes those directions from (4), one voxel at a time.The modified residual vector ( ) Δ̃ for each voxel is: Concatenating the modified residual vectors together, the WLS solution can be computed excluding data along extended surfaces.Unlike the strategy employed by (Einhorn & Gross, 2015) that preserves residuals along only the most compact distribution axis, our approach has the flexibility to capture any sufficiently compact component of a solution vector.For example, a tall and narrow world feature, such as a lamppost provides information in two directions and is ambiguous in one (along the post's vertical axis).Our sigma-point refinement would preserve two axes in the case of the lamppost.By contrast, for a flat wall, the sigma-point refinement would preserve only one direction (normal to the wall) and reject the other two.Most importantly, though, our innovation is to couple axis-exclusion with covariance estimation, to enhance ICET's ability to estimate the solution covariance.

Additional Solution Conditioning Steps
Three further considerations are relevant to enhancing ICET results: defining voxel shape, checking solution rank, and detecting moving-objects.
With regard to voxel definition, the current ICET implementation uses a spherical voxel grid, with one-voxel radial depth in each azimuth and elevation direction.The grid is generated by a shadow-mitigation algorithm that we have introduced as a pre-processing step for NDT and ICET (McDermott & Rife, 2022b).An illustration of the grid is shown in Figure 4.The value of spherical grids for LIDAR processing has also been recognized in other recent publications (Hassani & Joerger (2021)).For our application, spherical coordinates are particularly advantageous in that the voxels are aligned in the same direction in which discrete sampling occurs (i.e., radially away from the sensor).This allows voxel resolution to be increased in the azimuthal and elevation directions (to decrease the impact of shape effects) while retaining the ability to stretch in the radial direction as needed to properly characterize porous substrates (like loose foliage).
With regard to solution rank, when some measurements directions are excluded, it is important to ensure that the matrix inverse in (10) can be computed.Implementing a condition-number check can ensure that the solution is full rank (in a computational sense) and that the inverse can be computed.We implement a condition-number test by performing an eigendecomposion on the inverse of ( 11) to obtain a unitary eigenvector matrix 2 ∈ ℝ 6×6 and eigenvalue matrix, 2 ∈ ℝ 6×6 = 2 2 2 (17) The condition number can be calculated as the ratio of the largest to the smallest eigenvalue in 2 .In our testing, we found a suitable threshold to be 5 × 10 4 .If the condition number is below the threshold, a reliable solution can be generated.If the condition number is larger than the threshold, a solution can still be computed in a reduced-dimension solution space.The dimension reduction process is to start with the smallest eigenvalue 0 and remove the associated eigenvector if the conditionnumber exceeds .The test is then repeated to remove eigenvector directions from the solution until the condition-number test is passed.This process allows for a valid solution in extreme cases when the same measurement direction is suppressed by the sigma-point test across in nearly all voxels, as might happen in the along-track direction for a straight, featureless tunnel.
With regard to moving-object detection, we use classical outlier rejection to detect and exclude voxels with large discrepancies between sequential frames, as caused by moving cars and pedestrians.This is achieved by first allowing the algorithm to converge on a preliminary solution.Once an initial registration is achieved, voxels are flagged in which the residual distance ( ) 0 − ( ) ̂ exceeds a threshold .Flagged voxels are excluded, and the algorithm is run again to produce a final estimate.We set the threshold to be = 5 cm, which is high (about 5 ) compared to typical measurement errors.The full algorithm is summarized as pseudocode in the table labeled Algorithm 1.

VERIFICATION TESTING: REAL LIDAR DATASET
It is important to first demonstrate that the ICET algorithm is capable of robust point cloud registration on realistic data before analyzing ICET's estimates of solution error covariance.In this section, we validate the performance of our algorithm on a trajectory from the LeddarTech Pixset dataset (Déziel et al., 2021), which was chosen for its high-fidelity Ouster OS1-64 LIDAR unit as well as the inclusion of a GNSS/INS fused baseline.The selected trajectory contains 276 frames of movement through an urban environment pictured in Figure 5.
For this test, vehicle motion was obtained by estimating the transformations between every sequential pair of frames.No loop closure or graph optimization was implemented for these tests.The resulting trajectory is visualized in Figure 6.On the left side of the figure, the trajectory generated by ICET odometry is plotted as a white-dotted curve superposed on an HD map.This map was constructed using ICET odometry estimates, transforming each successive LIDAR scan by the transformation output at a given step to form a single large point cloud.The sharp resolution of the map is a strong qualitative indicator of the quality of the ICET odometry.The right side of the figure shows the ICET trajectory on North-East axes, along with GPS/INS data and with the trajectory generated by NDT odometry.There is a small but visible difference between the odometry estimates of NDT and ICET.The gap is slightly larger between the LIDAR odometry methods and the GPS data, especially toward the end of the trajectory (upper-right corner).Such divergence over time is expected when comparing dead-reckoning solutions (LIDAR odometry) to absolute positioning (GPS).
In this paper, our primary goal is to assess the quality of the ICET covariance estimate between two sequential frames, as generated by ( 11) and conditioned by the sigma-point test.Unfortunately, real-world data are not an effective tool for assessing the quality of our LIDAR covariance predictions.The problem is that the GPS/INS data are not sufficient as ground truth when comparing sequential LIDAR frames.The GPS-PPP system produces cm-level localization errors when differencing any two points in time.This becomes a significant advantage when comparing GPS to dead-reckoning over an extended duration (e.g., over the whole 276 frames trajectory shown in Figure 6); however, for sequential frames, the LIDAR odometry exhibits only mm-level localization errors, an order of magnitude smaller than those for GPS.Because the error in the GPS baseline is too high to provide an effective ground truth when estimating scan-matching error covariance for sequential frames, the next section explores an alternative methodology.In the next section, we make use of simulated test environments in which we have access to perfect ground truth values.

VERIFICATION TESTING: HIGH FIDELITY SIMULATED DATASET
For this test, our goal is to assess scan-matching error for each sequential transformation.We opt to use simulated LIDAR point clouds, which provide direct truth data to which ICET-derived position and attitude changes can be compared.Importantly, as other authors have demonstrated (Stoyanov et al., 2012;Wu et al., 2018), performance of scan registration algorithms on sufficiently realistic synthetic LIDAR point clouds is consistent with performance on similar real-world data, particularly in the case of distributions-based algorithms (like NDT and ICET), as these methods are especially robust to sensor dropout, partial reflection, and other artifacts observed in field applications.
To this end, our second simulation environment is a high-fidelity urban scene.Specifically we use data from the Cooperative Driving Datatset (CODD) (Arnold et al., 2021), which was generated in the CARLA open-source driving simulator (Dosovitskiy et al., 2017).The testing environment is representative of a city with detailed features absent in the abstracted simulations conducted in the following section.Included are loose foliage, complex organic-shaped structures, and moving objects like pedestrians and vehicles.This high-fidelity data set considers a series of LIDAR locations along a simulated automotive trajectory through an urban landscape, and introduces randomness due to sensor noise as well as different physical geometries for each scan.In the CODD data, measurement noise is dominated by range (time of flight) error, and is set to modest but representative levels.Specifically range errors are modeled as normally distributed with a standard deviation of 1cm.
The initial condition for each new scan is defined by zeroing the transformation vector (1), which is equivalent to starting with the assumption that there is no motion between subsequent scans.We parsed the data in the CODD dataset into 123 sequential (overlapping) scan pairs, which represented the full driving trajectory for one CODD vehicle recording data at 5 Hz over a period of 24.6 seconds.Figure 7 visualizes the CODD data, showing an HD map generated from the ICET odometry data as well as an overhead plot of the estimated trajectories for NDT and ICET compared against truth.
For this test, we applied our full ICET algorithm (including the sigma-point refinement) as well as an NDT baseline (without sigma-point refinement) to each sequential pair of LIDAR scans.To provide a fair comparison, NDT and ICET used the same set of hyperparameters including voxel size (4 • resolution in azimuth and elevation), voxel boundaries, number of iterations, moving-object rejection threshold, and minimum number of points per cell (50 points).For this study, the key parameter is voxel size.As discussed in Section 2, voxels must be big enough to capture random errors, yet not so big as to envelope large objects.For this reason, the Appendix performs a voxel-size parameter study.The Appendix shows little sensitivity in performance over a wide range of voxel sizes (up to a breaking point when performance quickly degrades if voxels are too large or too small).

VERIFICATION TESTING: ABSTRACTED GEOMETRIES
As a final test, we consider the corner-case where ICET's sigma-point refinement might remove all of the data in one or more solution coordinates.This might happen, for example, in a straight and featureless tunnel, where there is no valid LIDAR measurement data in the along-track direction.In such a case, we expect ICET should still function, and moreover that it should enunciate that the solution is unavailable in the ambiguous direction.It is notable the issue of insufficient LIDAR scan-matching data along one or more axes has also recently been identified as a problem for other algorithms, too (Tuna et al., 2022)).
We consider three corner-cases, each constructed from an abstracted geometry.The geometries include a T-intersection, a straight tunnel, and an open field.In each case, we synthesize LIDAR measurements via the Matlab Automated Driving Toolbox.The three geometries are depicted in Figure 9.For each geometry, the figure shows a pair of scans: a reference scan (blue) and a  perturbed scan (red) that must be aligned to the reference scan.In using Matlab to generate these scans, we applied zero-mean Gaussian noise with an 0.2 cm standard deviation in each of the x, y, and z directions.
The abstracted geometries were designed specifically to check ICET's sigma-point exclusion and rank-testing capabilities, as described in Equation ( 17).The T-intersection consists of extended walls in all directions.No large-scale ambiguity exists, since data exist to constrain all six transformation states, including the translation estimates in the , , and directions and rotation estimates in , and .By contrast, the straight tunnel provides useful information in all translation and rotation directions except in the road-aligned (or ) direction.The flat plane provides useful information only in the vertical, pitch, and roll directions, and not in the horizontal and yaw directions.We hypothesize that the the sigma-point test should detect walls (and remove wall-coincident measurements) in all three simple scenes; however, only the latter two scenes are expected to fail the rank-checking test related to (17).Note that we avoid edge effects by sampling the reference and new scans away from the boundaries of the simulated domain.
In these abstracted geometry tests, we do not create a full trajectory; instead we repeatedly generate new noise for the two scans via a Monte Carlo simulation.For each Monte Carlo trial, the initial guess is generated from a zero-mean, independent Gaussian distributions with a standard devation of 12.5cm in { , , } and with a standard devation of 1.7 • in { , , }.In all, we generated 500 Monte Carlo trials for each abstracted geometry, each with a unique seed for measurement noise and randomized initial translation and rotation.
For each Monte Carlo trial, we ran two algorithms to conduct scan matching: the full ICET algorithm as well as a comparable implementation of NDT, similar in all respects to the ICET algorithm save for the sigma-point refinement and the related condition-number test of (17).Again, the voxel grid was defined with 4 • resolution in azimuth and elevation, with a minimum number of points per voxel of 50.Error statistics were compiled over all trials, with error computed as the difference of each state compared to the truth.Root-mean-square error (RMSE) values are plotted in Table ( 1) for the T-intersection, Table (2) for the tunnel, and Table (3) for the plane.The tables list the results for the NDT baseline (first two rows of data) followed by the results for the ICET implementation (last two rows of data).Statistical RMSE values are reported in the table, alongside predicted values, extracted from the diagonal of as computed analytically by (11).As expected, ICET's sigma-point refinement excludes measurements and when all measurements in a given coordinate direction are lost, the condition number test excludes those axes from the solution.This result can be seen clearly from the tables.In cases where the condition-number test removes dimensions from state estimation, those dimensions are labeled in the table as "Marked DNU" where DNU stands for "Do Not Use."The table indicates that ICET correctly excluded the component in the tunnel (for all Monte Carlo trials), while NDT confidently reported incorrect estimates.Similarly, in the open-field scene, ICET successfully excluded all , , and solution components.
The standard deviations for scan-matching error that are reported in the table are quite favorable (sub-millimeter translation errors and millidegree rotation errors).It is relevant to note that these error levels are consistent with the results reported in prior NDT simulation studies, notably in (Stoyanov et al., 2012).Low scan-match errors are critical for LIDAR odometry applications, where the standalone dead-reckoning error is an accumulation of the scan-match errors over time.Importantly, the tables show clearly that ICET can reliably predict accuracy in these scenes and that the prediction is improved subtly by the sigma-point refinement step.With no refinement, consistently underpredicts the error.Consider the T-intersection data in Table (1), where for each state, the predicted standard deviation is lower than the actual standard deviation for the NDT case, with the percent difference between -2% (for ) and -10% (for ) except in one extreme case with relative error of -93% (for ).By comparison, the predictions for ICET are neither consistently above nor below the actual values, under-predicting and by -3.9% and -4.6%, and slightly over-predicting for the other cases, up to +4.2% (for ).
Although the effect of the sigma-point refinement is generally somewhat subtle for the T-intersection, the difference in the (along-track) direction is truly striking.Table (1) shows that the NDT case (with no extended surface suppression) underpredicts the standard deviation by a factor of nearly 2. The issue here is that the majority of voxels contain walls aligned with the alongtrack direction.Even though the measurements in these directions have a somewhat large sample covariance (driven by voxel width), these large-covariance voxels are not significantly deweighted by , because there are very few voxels with "good" information in the along-track direction.As a result, NDT becomes overconfident in its along-track estimate.By comparison, the sigma-point refinement entirely removes along-wall measurements, so they are not even considered in the weighting matrix or the computation, resulting both in lower error and a better prediction of that error.
The effect of sigma-point refinement is even more pronounced in the straight-tunnel and open-field scenes, where strong ambiguities were present.In these scenes, the full ICET algorithm reports that the solution quality is poor where the ambiguity exists (as denoted by the "Excluded" cells in the tables).Without the refinement, NDT produces a somewhat arbitrary error that is highly correlated to the initial condition.Recall that the initial-condition error was 12.5 cm in translation and 1.7 degrees in rotation, which is reflected in the starred values in the tables (10.4 cm in translation and 1.7 degrees in translation).In these cases, the sigma-point refinement plays a critical role by recognizing and reporting ambiguity.As a side-effect, the sigma-point refinement also converges more quickly (for the open field case: within 5 iterations for full ICET as opposed to 12 iterations for NDT).
Sigma-point refinement both reduces estimation error and improves the ability of the algorithm to predict the accuracy of its solution.The degree to which the true and predicted accuracy of the algorithm is improved is a function of the geometry of the scene and the size of the voxels.For instance, in the T-Intersection case, there are enough distributions artificially limited in the direction by voxel width that the sigma-point refinement greatly impacts both the actual and predicted error in the direction.In the same scene, however, the accuracy and accuracy prediction in the direction are very similar whether or not the sigma-point refinement is employed.

CONCLUSION
A topic for future work is to identify and address large-scale ambiguities that may appear in LIDAR scenes.While our proposed strategy is capable of identifying local structure and global ambiguity that results in a rank deficiency, other types of ambiguity are not directly addressed by ICET.Most notably, ICET does not flag scenes containing a lattice of repeated features (e.g.driving past regularly spaced columns).Such a scene would result in an integer ambiguity problem, much like the problem of resolving wavelength in carrier-phase processing for GNSS (Enge, 1994).Ambiguity may also be masked by the WLS linearization.For instance, if a LIDAR is mounted on a vehicle moving along a curved tunnel, the along-track translation and yaw rotation should be coupled and an ambiguous combination of states should be identified by the condition-number test; however, the conditionnumber test may not recognize this ambiguity if the curve is sufficiently sharp.New algorithm extensions will be needed to identify such challenging cases of global ambiguity.
The main result of this paper was the introduction of Iterative Closest Ellipsoidal Transformation (ICET), an algorithm for scan-matching with 3D LIDAR point-cloud data.As a LIDAR processing algorithm, ICET is similar to the well known Normal Distributions Transform (NDT) algorithm, with the addition of a capability to recognize differences between randomness and deterministic structure in voxel point distributions and to leverage that information to construct a more representative model of the solution-error covariance.ICET is formulated using a least-squares method, supplemented by a sigma-point test to identify point distributions that are dominated by deterministic structure rather than measurement randomness.The 3D ICET algorithm was applied to real data (to demonstrate proof-of-concept), high-fidelity simulations (to study error properties with perfect ground truth), and corner-case simulation (to demonstrate the significance of dimension reduction for the solution using a condition-number test).

FIGURE 2
FIGURE 2 Illustration of a wall (seen from above) passing through two rounded voxels.LIDAR measurements of the surface are shown as dots.The mean and covariance matrix for the scan points in each voxel are indicated by the shaded ellipsoids.

FIGURE 3
FIGURE 3Sigma-point refinement for identification of extended surfaces within a single voxel.A covariance ellipse is shown in relation to a wireframe voxel.The sigma points along the ellipsoid's three principal axes (labeled I, II, and III), are identified with small dots.Only axis III is eliminated by our sigma-point exclusion text, since both of its sigma points lie outside the voxel boundaries.

FIGURE 4
FIGURE 4 Spherical voxels with adaptive radial binning.Point distributions within each cell are shown as ellipsoids.

FIGURE 6
FIGURE 6 ICET demonstration for real-world PixSet data.(Left) ICET-odometry trajectory, shown in white, with a High Definition (HD) Map generated using the odometry to stitch scans together.(Right) Vehicle trajectory projected on North-East plane, as measured by three methods: GPS ground truth, ICET odometry, and NDT odometry.

FIGURE 7
FIGURE 7 ICET performance for simulated CODD data.(Left) ICET-odometry trajectory, shown in white, with an HD Map generated using the odometry to stitch scans together.(Right) Vehicle trajectory projected on North-East plane, including the known truth from the simulation (dashed line) as well as ICET and NDT odometry measurements (solid lines).

FIGURE 8
FIGURE 8 NDT and ICET generated estimates for forward translation and rotation about the vertical axis compared to ground truth values.Shaded regions show the 2 error bound for each frame predicted by each algorithm.

FIGURE 9
FIGURE 9 Corner-case simulations including scans for (left) a T-intersection, (middle) a tunnel, and (right) a large open field.