Abstract
Because of the high complementarity between global navigation satellite systems (GNSSs) and visual-inertial odometry (VIO), integrated GNSS-VIO navigation technology has been the subject of increased attention in recent years. In this paper, we propose an embedded high-precision multi-sensor fusion suite that includes a multi-frequency and multi-constellation GNSS module, a consumption-grade inertial measurement unit (IMU), and a grayscale camera. The suite uses an NVIDIA Jetson Xavier NX as the host and develops a Field Programmable Gate Array-based controller for hardware time synchronization between heterogeneous sensors. A multi-state constraint Kalman filter is used to generate the tightly-coupled estimation from the camera and the IMU. As a result, the GNSS output is loosely coupled to facilitate the acquisition of the global drift-free estimation. Results from the calibration reveal that the time synchronization accuracy of the suite is better than 30 μs (standard deviation [STD]) and that the projection error of camera-IMU is less than 0.1 pixels (STD); these results highlight the advantage of this hardware time synchronization mechanism. Results from the vehicle-mounted tests reveal reductions in the three-dimensional (3D) positioning error from 8.455 m to 5.751 m (root mean square) on experimental urban roads, which significantly improves the accuracy and continuity of GNSS individual positioning. In underground sites where the satellite signal is completely unavailable, the 3D position error drift of the suite is only 1.58 ‰, which also shows excellent performance.
1 INTRODUCTION
Visual-inertial odometry (VIO) and simultaneous localization and mapping (SLAM) are the key technologies for emerging applications, including robots, drones, virtual reality, and augmented reality (AR). These technologies enable a carrier to achieve autonomous positioning and mapping in unknown environments to facilitate navigation, transportation, and operations tasks (Durrant-Whyte & Bailey, 2006; Qin & Li, 2018; Dias et al., 2007; Corke et al., 2007). At present, satellite positioning has entered the era of multiple global navigation satellite systems (GNSSs). Users from around the world can receive signals from more than a dozen GNSS satellites at any one time (Liu et al., 2020a; Liu et al., 2020b).
VIO and GNSS are complementary technologies. When satellite signals are available, a GNSS can provide an accurate absolute position basis for the fusion system, update the error parameters of VIO online, and eliminate accumulated errors. VIO can maintain the position and attitude estimation when satellite signals are blocked or unavailable. This solution may lead users to abandon traditional and expensive inertial navigation system (INS) equipment and rely instead on consumer-grade low-cost microelectromechanical system (MEMS) inertial measurement units (IMUs) to achieve seamless, continuous, and high-precision navigation in both indoor and outdoor environments. For these reasons, integrated GNSS-VIO fusion navigation technology has been the subject of increased attention in recent years. For example, Lee et al. (2020) evaluated a loosely-coupled method based on VIO and GPS positions under a multi-state constraint framework that included an initial alignment of VIO and GPS and online calibration of the lever-arm vector and time delay. A tightly-coupled optimized VIO framework based on a sliding window was established with GPS measurements used as constraint equations to improve the real-time accuracy of the global localization (Cioffi & Scaramuzza, 2020). Based on a non-linear optimization method, Cao et al. (2022) used raw GNSS observations for tight coupling of vision and IMU measurements to generate a global drift-free state estimation.
Time synchronization between different sensors is another critical issue. We already know that multi-sensor fusion navigation requires all sensors to be well-calibrated, tightly connected, and accurately time-synchronized. This process includes two tasks: spatial calibration and time alignment. Nikolic et al. (2014) noted that a time synchronization error between the camera and IMU of even a few milliseconds (ms) will generate significant errors in the VIO. In serious cases, this time discrepancy may even lead to the failure of the system. However, because sensors (e.g., cameras and IMUs) generally function independently of one another and have different clock sources, their respective time references are not uniform. In addition, the sampling frequencies, transmission paths, and interface protocols of each sensor differ; the time delays of camera exposure, sensor internal filter, data transmission, and software system operating and scheduling are also changing. The influence of these factors makes it difficult to achieve accurate time synchronization between sensors.
Some currently-available time synchronization technologies are software-based. Among the advantage of this method, it relaxes the requirements for hardware design (Mair et al., 2011; Mills et al., 2010; Harrison & Newman, 2011; Fleps et al., 2011; Mirzaei & Roumeliotis, 2007; Hol et al., 2010; Kelly & Sukhatme, 2014). For example, Fleps et al. (2011) explored an interpretation of the traditional filter-based inertial-visual calibration process as a batch optimization problem and used both the simulated and real data to prove the advantages of the proposed method. Mirzaei & Similarly, Roumeliotis (2007) presented a method featuring an augmented-state extended Kalman filter (EKF) to achieve millimeter and sub-degree accuracy in the translational and rotational components of the IMU-camera transformation, respectively. Kelly and Sukhatme (2014) proposed an algorithm known as the time delay iterative closest point (TD-ICP). For each pair of sensors, Using this algorithm, one obtains two sets of orientation measurements for each pair of sensors that form curves in three-dimensional (3D) space. All points on each curve have corresponding timestamps that identify the time at which the measurement arrived at the receiver. By registering the orientation curves and constructing a total least squares cost function, the time delay between the sensor data streams can be estimated and calibrated with accuracies on the order of 1 to 2 ms. However, this algorithm assumes that the clock of each sensor is accurate and that the time delay between the measurements remains constant; these criteria are difficult to meet in practice. In addition, because it takes a long period of data collection to identify the trajectory of a given sensor, the method requires significant initialization time. To address this concern, Furgale et al. (2013) presented an algorithm framework that would facilitate joint estimation of the time offset between measurements of sensors and their spatial displacements. The approach relies on continuous-time batch estimation and incorporates time offsets within the theoretical framework of maximum likelihood estimation. Experimental results obtained using camera-IMU calibration reveal that this framework can accurately estimate time offsets up to a fraction of the minimum measurement period.
Although the software approach is comparatively easy to implement, the most precise method that might be used to determine measurement timestamps focuses on interruptions that can detect electrical signals that change the state of the sensor with the support of hardware (Furgale et al., 2013; Maune, 2019). This led to the presentation of an open versatile multi-camera visual-inertial sensor suite (VersaVIS) which can perform time synchronization of an IMU, a host, and a stereo camera (Tschopp et al., 2020). This technology features a microcontroller unit (MCU) for sensor triggering and control and can transmit camera images and exposure timestamps separately, achieving a time synchronization accuracy better than 1 ms. VI-Sensor is a stereo VIO that includes a field programmable gate array (FPGA) image preprocessing function (Nikolic et al., 2014) that triggers and controls the camera and periodically polls the IMU to achieve time synchronization. This solution is particularly important because it estimates and corrects the fixed delay of IMU data by measuring and extrapolating the zero exposure time of the camera, resulting in an average inter-sensor delay of only about 7 μs. A GPS/INS integrated system with high-accuracy time synchronization was described by Li et al. (2006). This system records the GPS and INS data on a compact flash card and performs software-based data processing to estimate the INS errors via a Kalman filter. The results of these tests reveal that the system has a timing resolution of 5.12 μs with a potential accuracy better than 0.3 ms when compared with a commercial GPS/INS integrated system. However, this system does not rely on camera synchronization.
In this paper, we propose an embedded high-precision GNSS-visual-inertial integrated suite. This system uses an NVIDIA Jetson Xavier NX as the host and computing board and has developed an FPGA-based trigger and control module to generate stable and high-precision time synchronization between heterogeneous sensors. In contrast to other systems, the GNSS-visual-inertial integrated suite proposed here achieves adaptive high-precision time synchronization between GNSSs, IMUs, and cameras. Compared with the aforementioned software method, the hardware time synchronization solution can lead to more stable synchronization and reduce the algorithm-associated burden. The time synchronization mechanism is based on a scalable design that can be used in future applications to support multiple cameras and even other sensors, e.g., lidar. Together, the IMU and camera constitute a classic VIO, while the GNSS module provides both spatial scale and heading angle information that enables the suite to work seamlessly in both open and blocked environments. Furthermore, the results of an evaluation performed using the Kalibr calibration tool revealed that the time synchronization accuracy of the suite was better than 30 μs (standard deviation [STD]) and that the projection error of camera-IMU was less than 0.1 pixels (STD), which highlighted the overall advantage of the hardware time synchronization mechanism. A vehicle-mounted test platform was constructed and system performance was evaluated on the Fourth Ring Road and in an underground parking lot in Beijing, China. The results of this evaluation revealed that the suite can effectively overcome the obstruction of satellite signals by buildings, trees, overpasses, and other objects typically found on urban roads via integration with the VIO. Furthermore, the 3D positioning error was reduced from 8.455 to 5.751 m (root mean square [RMS]), which represents a significant improvement in both accuracy and continuity over GNSS-alone positioning. In the underground parking garage, where satellite signals were completely blocked, the 3D error drift of the suite (i.e., the percentage of the positioning error relative to the moving distance) was only 1.58 ‰, which also highlights its excellent performance.
The remainder of the manuscript is structured as follows. Section 2 introduces the composition of the embedded integrated suite and includes descriptions of the sensors and the main components. Section 3 presents the time synchronization implementation based on the FPGA trigger and control module. The derivation of the estimation model of the GNSS-visual-IMU fusion framework is presented in Section 4. Finally, in Section 5, the time synchronization accuracy of sensors is analyzed together with a presentation of system performance in an on-road test carried out in an urban area of Beijing and in a parking garage in which satellite signals were blocked.
2 OVERVIEW OF THE SUITE
The suite is mainly composed of a motherboard, a trigger control module, a computing board, and a sensor system, as shown in Figure 1. The sensor system includes a GNSS module from Unicore Communications, Inc. with model number UB482, an IMU module from Analog Devices Corporate, Inc. (ADI) with model number S16505, and an industrial digital grayscale camera from Imaging Development Systems GmbH (IDS) with model number UI-3240CP-M-GL.
The motherboard was welded to a 4G communication module, integrated with the GNSS module through a general-purpose input/output (GPIO) slot, and connected with the IDS camera through a USB 3.0 interface. The 4G module is mainly used to support the real-time kinematic (RTK) function of the GNSS module.
The trigger control module adopts a Xilinx XC3S200-4VQG100 FPGA chip of the Xilinx Spartan®-3 series that features 63 inputs/outputs, 200 k gates, and 216 kB of RAM. This chip also integrates the IMU and connects the camera through the GPIO slot. The main function of the trigger control module is to control the IMU for data reading and the camera for image exposure to achieve accurate time synchronization between the sensors.
The computing board adopts an NVIDIA Jetson Xavier NX that runs Linux and ROS operating systems. The NVIDIA Jetson Xavier NX integrates a 6-core 64-bit ARM CPU with a main frequency of 1.4 GHz, and a 384 core graphics processing unit (GPU) with 48 Tenser cores which can meet the real-time operation and solution requirements of the vision-fusion positioning system.
3 TIME SYNCHRONIZATION SCHEME
The mechanisms described in the section permitted us to develop high-precision time synchronization between the GNSS measurement, the IMU data, and camera images. The synchronization between IMU and camera relied on the FPGA trigger, while the GNSS 1PPS signal facilitated the synchronization of the GNSS module and IMU/camera.
3.1 Time Synchronization Between the IMU and the Camera
A hardware time synchronization method was adopted (Tschopp et al., 2020). The main improvement presented by our solution is that the trigger controller was implemented based on an FPGA rather than an MCU, as shown in Figure 2. As shown, FPGAs are more suitable for applications involving high-speed and repetitive work than MCUs. In theory, the processing speed of an FPGA can reach the system clock frequency generated by the crystal oscillator and thus can be used more effectively in applications that include measuring event times. This is the best choice for applications that need to manage and control high-speed peripherals.
Under the control of the local clock, the FPGA controller triggers the IMU and camera by creating interruptions and recording the trigger timestamps of the IMU and the camera . Because an IMU is a triggerable high-speed sensor and the amount of accelerometer and gyroscope data involved is quite small, the IMU module sends its data to the FPGA controller after receiving the trigger signal. The FPGA controller then arranges the IMU data and the timestamp and sends them together to the host. The camera uses a different processing mechanism; given the large amount of data to be processed, the image is not sent to the FPGA controller, but to the host instead. The host records the timestamp of the received image together with its serial number and associates the image exposure trigger timestamp with the image receiving timestamp . Once the correspondence between the image receiving timestamp and the image serial number has been determined by the host, and once the correspondence between the image trigger timestamp and the image receiving timestamp has been determined, the correspondence between the image trigger timestamp and the image frame which were transmitted separately can be obtained.
The critical component of this solution is the synchronization processing in the initial stage. If the FPGA controller and the camera are initiated synchronously and if the camera can respond once the FPGA controller sends its first trigger signal, then the exposure trigger timestamp , the image receiving timestamp , and the image frame can be easily matched. However, because it is difficult to start the FPGA controller and the camera at precisely the same time, the trigger signal sent by the FPGA controller at the initial stage may not be associated with a response from the camera. In addition, due to the time delay in signal transmission and device response (e.g., camera exposure and image transmission), if a camera is in a high-rate trigger state, mismatching events are likely to occur. For example, the host may not be able to determine whether the exposure trigger timestamp that arrived in advance should be aligned with an image frame that arrived afterward or its receiving timestamp .
To solve this problem, the camera can be triggered slowly at the initial stages to facilitate data alignment. When the FPGA controller is first activated, it triggers the camera to exposure at a low frequency (e.g., 1 Hz). Because the delays of the camera response, exposure, and image transmission are all on the order of milliseconds (i.e., much less than 1 s), the smallest difference between an FPGA trigger timestamp and a host receiving trigger timestamp are interpreted as corresponding to the same image frame . This process can be described as shown in Equation (1):
1
where the symbol “↔“indicates correspondence with one another.
Previously, the camera trigger timestamp , the IMU trigger timestamp , and the IMU data were already matched. Therefore, once the matching task of the camera trigger timestamp and the image frame was completed, the time alignment between the image frame and the IMU data could be realized.
In addition, we used the intermediate exposure time of the camera to optimize the time synchronization processing. Because the exposure processing of the camera has a specific time requirement (usually a few hundred microseconds to several milliseconds), it is more appropriate to use the timestamp at the middle of the camera exposure than the instant timestamp at the start of the exposure. This is particularly important when using global exposure cameras or when there are multiple cameras in the suite (Nikolic et al., 2014). In this implementation, we use the following information to update and correct the camera’s image exposure timestamp as shown in Equation (2):
2
where the texposure is the camera exposure duration.
Our implementation supports both the fixed exposure and the auto exposure modes of the camera. When the camera uses the fixed exposure mode, the exposure duration (texposure) which is manually set, can be known in advance. When the camera works in the auto exposure mode, the FPGA controller needs to transmit the exposure duration information from the camera to the host (i.e., the data stream represented by the dotted line in Figure 2).
3.2 Time Synchronization Between the GNSS and the IMU/Camera
As shown in Figure 3, the GNSS module is connected to the host and the FPGA controller at the same time. On the one hand, the GNSS module produces raw observations (e.g., satellite ephemeris, as well as pseudorange, carrier, and Doppler measurements, among others) and solutions (e.g., antenna position and speed, among others) to the host. The GNSS module also provides 1PPS signals to the FPGA controller.
1PPS is a periodic pulsed electrical signal whose sharp rise or fall edge repeats exactly once per second with a pulse duration of < 1 s and thus can be used for precise timing in some applications with high precision requirements for time. The 1PPS output signal jitter of the GNSS receiver is much smaller than that of MCU or FPGA with an accuracy of ten to several tens of nanoseconds. For some scientific-grade GNSS receivers, the accuracy may even be better than 1 ns. Using the GNSS 1PPS signal, the user time can be aligned to the GNSS time and converted to the Universal Time Coordinated (UTC).
When the GNSS module can provide 1PPS signals normally, the FPGA controller module can conveniently synchronize the local clock counting time to the GNSS 1PPS edge and measure the time of operation events such as triggers to accurately synchronize GNSS measurements and IMU/camera data. The FPGA controller module generates and releases a local 1PPS signal based on its own crystal oscillator and thus synchronizes with the host. The processing flow of the FPGA controller module receiving the GNSS 1PPS signal and performing time synchronization is shown in Figure 4. Here, it is necessary to determine whether the difference between the local clock count of the GNSS week number and that of the rising edge of the 1PPS signal are within 1s. This is because, in practical applications, the receiver occasionally encounters a situation in which the GNSS 1PPS signal received cannot be correctly matched with the satellite navigation message. This may be due to a failure of the GNSS 1PPS signal detection resulting from circuit signal crosstalk or loss of the seconds of week (SOW) data secondary to GNSS signal occlusion. Once this occurs, the GNSS measurement will update the VIO states at the wrong integer second, which will lead to errors. By adding a simple judgment, this situation can be avoided, and the robustness of the time synchronization system can be improved.
When the GNSS module cannot provide a 1PPS signal due to the occlusion or interference of satellite signals, the FPGA controller cannot align its local time with the GNSS time. Nonetheless, the controller can realize the relative time synchronization between itself, the host, the IMU, and the camera. At this point, the suite can utilize the time synchronization mechanism discussed above in Section 3.1.
4 GNSS-VIO MOTION ESTIMATION MODEL
For the outputs of the camera and IMU, the multi-state constraint Kalman filter (MSCKF), which is based on a sliding window, is used to perform the tightly-coupled estimation to obtain the position and attitude change of the suite. The outputs of GNSS are loosely coupled to facilitate the further acquisition of the global position and attitude estimation without drift. The corresponding estimation pipeline primarily includes an initialization module, a visual front-end module, a status update module, and a measurement update module.
4.1 Initialization and Visual Front-End
Because the suite contains various sensors and needs to be able to boot in different environments, we need to include flexible initialization strategies that support the various application scenarios. If the vehicle is stationary, the suite will perform static initialization; otherwise, it will perform dynamic initialization. If the system needs to operate in a global frame (e.g., World Geodetic System 1984 [WGS-84]), the GNSS measurements must be available; otherwise, it will operate in a local frame. If both the GNSS module and the camera do not work effectively at boot due to occlusion, illumination, or other problems, an error-based Kalman filter can be used to initialize the IMU independently (Solà, 2017). If the camera is available while the GNSS module is not, the relative pose of the camera is estimated based on the structure from motion (SFM) model and initialized jointly with IMU (Qin et al., 2018). In this case, the suite operates in the VIO local frame. Finally, if both the GNSS module and the camera are available, joint initialization of GNSS-VIO is performed (Cao et al., 2022). The initialization strategy used by the suite is shown in Figure 5. A vehicle can be identified as stationary can via the use of the zero-velocity update (ZUPT) technical of IMU and the change of camera images.
In terms of the visual front-end, considering the limited edge computing capability of the embedded device, we choose the features from accelerated segment test (FAST) key-point for image feature extraction and Lucas-Kanade (LK) sparse optical flow for tracking (Rosten & Drummond, 2006; Lucas & Kanade, 1981). To distribute the FAST features as uniformly as possible, the image was evenly divided into several grids and each was processed separately (Bailo et al., 2018). The random sample consensus (RANSAC) algorithm was used to eliminate any additional gross errors and reliable features are finally obtained (Fischler & Bolles, 1981). The visual front-end strategy of the suite is shown in Figure 6.
4.2 State Update
In the example shown in Figure 7, the suite is fixed to a carrier with a coordinate system (denoted as {b}) in which the x-axis points forward, the y-axis points to the left, and the z-axis points to the zenith. The IMU coordinate system, which obeys the right-hand rule, is denoted as {I} and the camera coordinate system is denoted as {C}. The camera and IMU are fixed mounted and their joint external parameters are denoted as . The global coordinate system of the vision-inertial system that includes the camera and IMU is denoted as {V} with an origin determined by the orientation of the camera and the direction of gravity obtained at initialization. The output of the GNSS module adopts the WGS-84 global coordinate system, denoted as {G}, which can be converted into the East-North-Up (ENU) local navigation coordinate system, marked as {E}.
IMU coordinate system {I}, the camera coordinate system {C}, VIO global coordinate system (V), GNSS world coordinate system (G), and GNSS local coordinate system (E).
Because the IMU is the only feature that remains unaffected by the external environment, its state vector can be taken as the current position and attitude of the suite, (Mourikis & Roumeliotis, 2007) which can be expressed as shown in Equation (3):
3
where pI, vI, and qI, are the position, velocity, and attitude of the IMU in the VIO coordinate system {V}, respectively, and ba and bg are the accelerometer bias and the gyroscope bias of IMU in the coordinate system {I}, respectively.
As noted above, a sliding window can be used to improve the accuracy of visual measurement updates. Thus, the camera pose in the global coordinate system {V} can be expressed as an augmented state vector in the sliding window as indicated in Equation (4):
4
where tk is the current measurement moment, and M is the size of the window.
The state vector of features after triangulation in the global coordinate system {V} can be expressed as indicated in Equation (5):
5
where N is the number of features used for the filter update.
Because the global coordinates of the triangulated features are usually not sufficiently accurate, they can be estimated together as the augmented state vector. Thus, the total state vector can be expressed as indicated in Equation (6):
6
where the dimensions of xf and xC will vary with the visual observation.
4.3 Visual Measurement Update
For the feature fi observed in epoch k, the observation model can be expressed as (Mirzaei & Roumeliotis, 2007) as indicated in Equation (7):
7
where CkPi = [CkXi, CkYi, CkZi]T is the three-dimensional representation of the feature fi in the camera coordinate system. The relationship of this representation to the presentation in the world coordinate system can be determined as shown in Equation (8):
8
where GPfi is the global coordinate of the feature fi in the world coordinate system, and is the pose of the camera relative to the global coordinate system.
According to the camera model, the observed pixel coordinates of features can be converted to the normalized camera coordinate system as shown in Equation (9):
9
where f(·) represents the function with respect to the pinhole camera model.
Thus, the reprojection error of feature fi can be obtained from Equation (10):
10
where the meaning of can be found in Equation (7).
The error observation Equation (7) can be constructed by linearization as shown in Equation (11):
11
where and are the Jacobian matrices of the measurement residuals relative to the state and the coordinate of feature fi, respectively, and represents the error of the state.
Finally, all the features are joined together, and the error rk is projected to the left null space of the Jacobian matrix to eliminate the dependence of the observation equation on the three-dimensional features. Thus, we obtain Equation (12):
12
where , and are the joint forms of matrices , and , respectively.
4.4 GNSS Measurement Update
Once the visual-updated state estimation is obtained, it can be further fused with the GNSS output to generate the global drift-free state estimation. In this section, we derive the GNSS measurement update equations.
4.4.1 Measurement Residual
For this case, we can assume that the GNSS, IMU, and camera module have been time-synchronized and that the VIO has been aligned with the ENU coordinate system. After obtaining a GNSS measurement with a timestamp of tg which is between frame a and frame b of the VIO, the IMU pose estimated by the VIO at the time tg can be obtained by interpolation (Lee et al., 2020) as indicated in Equations (13)–(15):
13
14
15
where ta and tb are the timestamps of the VIO estimations of frame a and frame b, respectively; , , and are the IMU pose at times tg, ta, and tb, respectively, which are also the rotation transformation matrices from the global coordinate system to the IMU coordinate system; and GPIg, GPIa, and GPIb are the IMU coordinates in the ENU coordinate system at times tg, ta, and tb, respectively.
Therefore, from the IMU pose, the extrinsic parameter of the IMU, and the GNSS antenna, the position GPgnss of the phase center of the GNSS antenna in the global coordinate system can be obtained as indicated in Equation (16):
16
where IPgnss is the lever arm between the IMU and the GNSS antenna.
Based on results from Equation (16), we can obtain the measurement residuals of GNSS-VIO fusion positioning as indicated in Equation (17):
17
where mgnss. is the GNSS output in the ENU coordinate system at time tg.
4.4.2 Jacobian Matrix
From the derivation shown in Section 4.2, we can identify the system variables related to the measurement residuals as attitude of frame a, attitude of frame b, position of frame a, and position of frame b in the VIO slide window. Thus, the Jacobian matrix of the GNSS-VIO measurement equation is the first-order partial derivative of the GNSS antenna position GPgnss for the aforementioned system states.
First, we generate the derivative of GPgnss to as shown in Equation (18):
18
where θg is the perturbation of . A first-order Taylor expansion is used for approximation in this derivation.
Then, we can then obtain the derivative of to as indicated in Equation (19):
19
where θa, θb and θg are perturbations of , and , respectively; is the change of θb relative to θa; Jr and Jl are the approximate Jacobian matrices of the right-multiplied and left-multiplied Baker-Campbell-Hausdorff (BCH) formulae, respectively.
Similarly, the derivative of to can be generated as shown in Equation (20):
20
where θa, θb, , and Jl are as described in Equation (19).
Therefore, according to the chain rule for the derivation of composite functions, it can be further deduced that the first-order partial derivative of GPgnss to is as indicated in Equation (21):
21
and the first-order partial derivative of GPgnss to is as shown in Equation (22):
22
where , Jr, and Jl are as described in Equation (19).
Similarly, we can obtain the derivative of GPgnss to as follows from Equation (23):
23
and the derivative of GPgnss to GPIb as indicated in Equation (24):
24
The derivation of the Jacobian matrix of the GNSS-VIO measurement equation has thus been completed. By performing interpolation using two VIO states with adjacent timestamps, the delayed update for the GNSS measurement will be allowed. Fusion for other sensors, for example, the wheel odometer, also can be performed in the same fashion. Thus, this method presents a general approach for updating asynchronous measurements.
5 RESULTS
In this section, we evaluate the navigation performance of this new suite under real-life conditions. In Section 5.1, the time synchronization precision between sensors was evaluated and analyzed. In Section 5.2, a vehicle-mounted test platform was built and tests were carried out on roads in urban areas of Beijing and underground parking lots.
5.1 Time Synchronization
Using the time synchronization mechanism introduced in Section 3, the suite could achieve relative time synchronization between sensors as well as absolute time synchronization between sensors and GNSS. Given an open positioning environment and a continuous output GNSS 1PPS signal, we could easily maintain the difference between the FPGA local clock and the GNSS time at the nanosecond level and integrate GNSS measurements with IMU data and camera images. Therefore, the focus of these experiments was to analyze the relative time synchronization performance of the camera and IMU sensor in the suite.
Using the widely used VIO calibration kit known as Kalibr, the external parameters between the camera and the IMU could be calibrated accurately and the data delay could be obtained (Furgale et al., 2013). To investigate the performance of the time synchronization mechanism and the consistency under different exposure times, we set the IMU trigger rate to 200Hz and the camera trigger rate to 20Hz and collected ten datasets with six fixed exposure times from 1 ms to 6 ms, respectively. We then used Kalibr to calibrate the external parameters and the time delay between the camera and IMU in each dataset and we generated statistics to analyze the accuracy of hardware time synchronization of the suite.
5.1.1 Camera-IMU Joint Extrinsic Calibration
The external parameter calibration between the camera and the IMU is a spatial calibration task in which we need to determine the six-degrees-of-freedom (6DOF) rigid body transformation relationship between the sensor reference frames. The translation vector [x y z]T between the camera and the IMU needs to be estimated together with the pose transformation matrix [θx θy θz]T expressed in Lie algebra angles. The calibration results are shown in Table 1. Among them, the STD of the camera-IMU translation vector in the three coordinate directions were 0.195, 0.188, and 0.068 cm, respectively, and the STDs of the three components of the Lie algebra rotation angle vector were 0.0009, 0.0010, and 0.0005 rad, respectively. The estimates of the translation vector and the rotation angle vector indicate the reliability of the calibration results.
5.1.2 Camera-IMU Time Delay Calibration
The camera-IMU time delay calibration results between the camera and the IMU are shown in the second column of Table 2. As shown, the camera-IMU time delays under different exposure durations were roughly equivalent, which is consistent with the conclusions drawn by Nikolic et al., (2014). The results of the statistical analysis revealed that the calibration result was stable and that the STD of the camera-IMU time delay was only 28.14 μs.
Because the camera-IMU time delay is relatively fixed and only fluctuates within a limited range, we could remove the effect of the fixed delay in the off-line calibration process, thereby further improving the accuracy of time synchronization between the sensors (Furgale et al., 2013; Moon et al., 1999). The camera-IMU delays calculated after subtracting the statistical average are presented in the third column in Table 3. The residual time delay error will no longer have a noticeable impact on the performance of the suite.
5.1.3 Reprojection Error
Reprojection error is an important indicator in camera calibration and camera-IMU joint calibration because it reflects the distortion degree of the camera lens and the consistency of camera and IMU motion. The reprojection error distribution of camera-IMU external parameter calibration including 10 randomly-selected consecutive experimental results at different exposure times is shown in Figure 8. Statistical analysis of 10 continuous calibration results from different exposure durations is shown in Table 3.
As shown in Table 3, the reprojection error of the calibration is < 0.1 pixels. This means that the camera and the IMU have achieved precise time synchronization and spatial calibration accuracy. The reprojection error is independent of the exposure time of the camera, indicating the consistency and reliability of the time synchronization mechanism.
5.2 Experimental Evaluation
5.2.1 Vehicle-Mounted Test Platform
To verify the performance of the suite, a vehicle-mounted test platform was designed and created, as shown in Figure 9(a). The installation and distribution of the devices, including the camera, IMU, and GNSS antennas, are shown in Figure 9(b). These devices were fixed on the platform to ensure that the relative spatial positions between the sensors remained fixed.
5.2.2 Test Results
The in-vehicle test was carried out using two scenarios: (i) an urban road with a moderate degree of obstruction was used to verify the performance of the suite in the GNSS-VIO mode and (ii) an underground parking lot where the satellite signal was completely blocked was used to verify the performance of the suite in the VIO mode.
5.2.2.1 GNSS-VIO Mode
For this scenario, the vehicle drove on the Fourth Ring Road in urban Beijing. In this setting, the suite worked in the GNSS-VIO mode with a camera exposure frequency of 30 Hz; at the same time, an additional RTK result with a frequency of 1 Hz was generated as a reference. The positioning trajectory of the vehicle is shown in Figure 10; the red dot is the RTK output and the green line is the GNSS-VIO output.
The top panel includes a real-time output camera image. The bottom panel is the vehicle positioning trajectory. In the bottom panel, the red dot represents the RTK positioning result with a frequency of 1 Hz. The green line represents the GNSS-VIO positioning result with a frequency of 30 Hz.
Using the RTK output as a reference, the GNSS and GNSS-VIO positioning errors can be calculated as shown in Table 4. We note that the RTK and GNSS signals are sometimes interrupted during the test by tall buildings, trees, viaducts, and other obstructions; thus, we performed statistical analyses on those epochs with RTK outputs. As shown in Figure 10, integration with VIO permitted the suite to maintain continuous positioning when the vehicle passed under an overpass where the GNSS signal was completely blocked. Likewise, as shown in Table 4, the 3D position error of the vehicle was reduced from 8.455 m to 5.751 m (RMS), which also improved the positioning accuracy.
5.2.2.2 VIO Mode
We chose a large two-story underground parking lot to verify the VIO performance of the suite. The suite was functioning effectively in the RTK-VIO positioning mode before the vehicle entered the underground parking lot. The GNSS satellite signal was completely rejected once the vehicle entered the underground parking lot; under these conditions, the suite switched to the VIO mode. Once the vehicle exited the parking lot, the VIO output was compared with the first RTK fixed output to estimate the VIO accumulated error. Because the accuracy of the stable RTK fixed solution is typically several centimeters, it could be used for an initial evaluation of the VIO cumulative error that was several meters in magnitude. The vehicle trajectory is shown in Figure 11; the locations at which the vehicle entered and exited the parking lot were marked. The results from this suite are compared with those generated using VINS-FUSION (Qin et al., 2018; Qin & Shen, 2018; Qin et al., 2019), which loosely combines the GNSS and VIO outputs. Our positioning trajectory was clearly more accurate than that presented by VINS-FUSION which required a significant adjustment after exiting the parking lot.
The vehicle was driven from the street level into the parking lot and then back to the street level. The blue dot is the RTK reference, and the yellow and red lines are the positioning results of VINS-FUSION and our solution, respectively.
The vehicle drove for a distance of ~3.1 km over ~360 s in the two-story parking lot before returning to street level. The evaluation process was repeated 100 times to obtain an average 3D positioning cumulative error. As shown in Table 5, the 3D positioning cumulative error was only 1.58 ‰. This result indicated that the integrated suite also performed excellently in VIO mode. By contrast, the 3D positioning cumulative error of VINS-FUSION is ~6.1 ‰.
6 CONCLUSION
In this manuscript, we present the development of an embedded multi-sensor fusion suite that includes a multi-frequency and multi-constellation GNSS module, a consumption-grade IMU, and a monocular grayscale camera. The suite uses an NVIDIA Jetson Xavier NX as the host and computing board together with an FPGA-based trigger and control module to achieve stable and high-precision time synchronization between sensors. This embedded multi-sensor fusion suite can be used as a platform for rapid algorithm development and verification and can also provide a reference for subsequent hardware upgrades. The tightly coupled VIO positioning and the loosely coupled GNSS-VIO positioning were based on an MSCKF framework.
The results from our evaluation of this new system revealed that the hardware time synchronization accuracy of sensors was better than 30 μs (STD) and the projection error of camera-IMU was < 0.1 pixels (STD). The actual vehicle-mounted test was carried out in an urban area of Beijing. In the GNSS-VIO mode, compared with the GNSS individual positioning, the 3D-positioning error was reduced from 8.455 m to 5.751 m (RMS) and the positioning continuity was also greatly improved. In the VIO mode, the 3D-error drift was only 1.58 ‰, further attesting to its excellent performance. Our future studies will focus on the use of a high-precision integrated navigation system as a benchmark to conduct a more accurate and comprehensive performance evaluation of the suite. We also plan to upgrade the support for multiple cameras and explore the use of tightly coupled GNSS and VIO to promote further improvements in system performance.
HOW TO CITE THIS ARTICLE
Liu, C., Xiong, S., Geng, Y., Cheng, S., Hu, F., Shao, B., Li, F., & Zhang, J. (2023). An embedded high-precision GNSS-visual-inertial multi-sensor fusion suite. NAVIGATION, 70(4). https://doi.org/10.33012/navi.607
FUNDING INFORMATION
National Natural Science Foundation of China, Grant Number 42074044 “Young Talent Lifting Project” of China Association for Science and Technology, Grant Number 2019QNRC001.
ACKNOWLEDGMENTS
We extend our sincere gratitude to Ms. Jiang Nan for her help in grammar and proofreading.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.