Abstract
Accurate positioning in urban canyons remains a challenging problem. To facilitate the research and development of reliable and precise positioning methods using multiple sensors in urban canyons, we built a multisensory dataset, UrbanNav, collected in diverse, challenging urban scenarios in Hong Kong. The dataset provides multi-sensor data, including data from multi-frequency global navigation satellite system (GNSS) receivers, an inertial measurement unit (IMU), multiple light detection and ranging (lidar) units, and cameras. Meanwhile, the ground truth of the positioning (with centimeter-level accuracy) is postprocessed by commercial software from NovAtel using an integrated GNSS real-time kinematic and fiber optics gyroscope inertial system. In this paper, the sensor systems, spatial and temporal calibration, data formats, and scenario descriptions are presented in detail. Meanwhile, the benchmark performance of several existing positioning methods is provided as a baseline. Based on the evaluations, we conclude that GNSS can provide satisfactory results in a middle-class urban canyon if an appropriate receiver and algorithms are applied. Both visual and lidar odometry are satisfactory in deep urban canyons, whereas tunnels are still a major challenge. Multisensory integration with the aid of an IMU is a promising solution for achieving seamless positioning in cities. The dataset in its entirety can be found on GitHub at https://github.com/IPNL-POLYU/UrbanNavDataset.
1 INTRODUCTION
Accurate, globally referenced, cost-effective positioning service is a key component for the full autonomy of unmanned autonomous systems. Multi-sensor integration is believed to be a promising solution to achieve this goal based on typical sensors, including global navigation satellite systems (GNSSs), inertial measurement units (IMUs), light detection and ranging (lidar) units, and cameras. Satisfactory performance can be achieved in constrained areas (Wan et al., 2018) or open areas (Levinson et al., 2011) with good sky visibility for GNSS positioning and abundant environment features for lidar units and cameras. Unfortunately, the performance of existing positioning technologies is significantly degraded in the urban scenarios of megacities, such as Hong Kong, Tokyo, and New York City, due to complex building structures, unstable illumination conditions, and excessive expected dynamic objects.
Poor GNSS positioning in urban canyons
GNSS positioning is significantly degraded in urban canyons because of signal reflection and blockage, leading to the notorious multipath and non-line-of-sight (NLOS) phenomenon (Groves et al., 2013; Hsu, 2018). Numerous existing methods have been investigated to mitigate the impacts of multipath and NLOS reception, such as three-dimensional (3D) mapping-aided (3DMA) GNSS (Zhong & Groves, 2022), camera-aided GNSS NLOS detection (Kato et al., 2016; Meguro et al., 2009; Wen, Bai, et al., 2019), and 3D lidar-aided GNSS NLOS detection (Wen, Zhang et al., 2018) or correction (Wen, Zhang, et al., 2019). Unfortunately, the GNSS positioning achieved thus far is still far from sufficient for fully autonomous systems requiring decimeter-level accuracy and stringent integrity requirements. Further work is needed to effectively solve the GNSS positioning problem in dense urban canyons.
Unreliable odometry estimation in highly dynamic scenarios
Visual/inertial integrated systems (VINSs) (Qin et al., 2018) can provide low-cost, locally accurate odometry estimation in environments with sufficient features using camera and IMU measurements. VINSs are characterized by advantages in size, power consumption, weight, and availability. Many state-of-the-art VINS pipelines with superior performance have been developed in the past several decades, such as filtering-based methods including multi-state constraint Kalman filter (MSCKF) (Mourikis & Roumeliotis, 2007), robust visual inertial odometry (ROVIO) (Bloesch et al., 2015), and OpenVINS (Geneva et al., 2020). Another research stream focuses on optimization-based VINS pipelines, including open keyframe-based visual-inertial SLAM (OKVIS) (Leutenegger et al., 2015), VINS-Mono (Qin et al., 2018), and oriented FAST and rotated BRIEF-based simultaneous localization and mapping 3 (ORB-SLAM3) (Campos et al., 2021). To further study the performance of VINSs in challenging outdoor urban canyons, our previous work evaluated and analyzed VINS-Mono (Qin et al., 2018) based on datasets collected in urban canyons of Hong Kong. The results (X. Bai et al., 2020) showed that the VINS accuracy was significantly decreased in the evaluated urban canyons, with the accumulated error reaching approximately 34.21 m for a driving distance of 2.1 km. The large errors are primarily attributed to outliers caused by dynamic objects and motion blur. Conversely, an active sensor, i.e., 3D lidar unit, can provide distance measurements for surrounding environments that are invariant to illumination. Therefore, lidar odometry (Shan & Englot, 2018) provides a more accurate and robust odometry estimation than VINSs. Unfortunately, lidar odometry shares the same drawback as VINSs, which are sensitive to dynamic objects. According to our previous evaluations (Wen, Hsu et al., 2018), lidar odometry can drift more than 2 m in a 2-km test in numerous urban scenarios.
Given these challenges raised by urban scenarios for existing key navigation techniques, it is desirable to provide an integrated multi-sensor dataset for both academic and industry fields to further improve existing navigation algorithms. Due to the rapid development of the autonomous driving industry, some datasets are open-source, such as the ApolloScape dataset from Baidu (Huang et al., 2018) and the Waymo open dataset from Google’s autonomous driving research group (Sun et al., 2020). Unfortunately, these datasets share several similar drawbacks: (1) These datasets focus on object detection and perception algorithm developments that are not suitable for navigation algorithm development. For example, raw GNSS measurements (pseudorange, carrier phase, and Doppler frequency) are not provided, making it difficult for researchers to improve urban GNSS positioning. (2) The environments for dataset collection are primarily collected in suburban or open areas, which do not provide sufficient challenges for positioning and navigation algorithms. Realizing the strong need for accurate and cost-effective urban positioning, in 2019, a joint working group was formed under the joint efforts of the International Association of Geodesy (IAG) and The Institute of Navigation (ION). This working group is under Sub-Commission 4.1: Emerging Positioning Technologies and GNSS Augmentations of IAG. After consolidating suggestions and comments from international navigation researchers, this workgroup established the following objectives:
Open-sourcing positioning sensor data, including GNSS, IMU, lidar, and camera data collected in Asian urban canyons;
Raising awareness of the urgent navigation requirement for highly urbanized areas, especially in Asian–Pacific regions;
Providing an integrated online platform for data sharing to facilitate the development of navigation solutions for the research community; and
Benchmarking positioning algorithms based on open-source data.
Based on these objectives, we built a multisensory dataset, UrbanNav, collected in diverse challenging urban scenarios in Hong Kong and Tokyo (Hsu et al., 2021). In this paper, we focus on the Hong Kong dataset (see Figure 1). The key contributions of this paper are as follows:
The UrbanNav dataset in this paper introduces diversity in both sensor sources and scenarios. Specifically, UrbanNav includes raw sensor measurements from GNSS, lidar units, IMUs, and cameras, together with reliable ground truth of positioning. Moreover, the dataset includes scenarios with different degrees of challenges for benchmarking existing navigation algorithms, such as middle-class, deep, and harsh urban scenarios.
UrbanNav provides raw GNSS measurements generated by different grades of receivers (smartphone, automobile, and geodetic levels of GNSS receivers). Moreover, multiple 3D lidar sensors are provided for research with different applications, ranging from robotics to large-scale mapping.
This paper benchmarks the majority of existing localization algorithms, including GNSS positioning with real-time kinematic library (RTKLIB) (Takasu & Yasuda, 2009), lidar odometry (Zhang & Singh, 2017), lidar inertial odometry (Shan et al., 2020), and VINS (Qin et al., 2018), using the UrbanNav dataset to provide a baseline performance for the community.
The remainder of this paper is structured as follows. Related works are reviewed in Section 2. The details of the data collection platform are introduced in Section 3. Details regarding the dataset and its evaluation are discussed in Section 4. Lessons learned during dataset development are presented in Section 5, and conclusions and directions for future work are presented in Section 6. Baseline positioning performance evaluations of lidar odometry, lidar inertial odometry, VINS, and GNSS are provided in the appendix of this paper.
2 RELATED WORK
Over the past decades, numerous datasets have been released by research groups from both academia and industry to advance navigation research. These datasets have enabled researchers worldwide to engage in navigation research without system or data limitations. Table 1 shows a detailed comparison of the existing datasets related to navigation research. A new college dataset (Smith et al., 2009) was released in 2009, based on an IMU, two-dimensional (2D) lidar, camera, and ground truth positioning; this dataset was collected for campus scenarios. This dataset has been widely used in the robotics field for 2D lidar simultaneous localization and mapping (SLAM) research. However, the sensor sources are limited and are insufficient for urban navigation research. In 2010, the urban challenge dataset of the Defense Advanced Research Projects Agency (Huang et al., 2010) was released, based on an autonomous driving competition. The dataset was collected in urban and forest scenarios, obtained from lidar units and IMUs together with ground truth from GNSS real-time kinematic (RTK) positioning. However, the DARPA urban challenge dataset shares the same drawback as the new college dataset, in that the sensor sources are highly limited. In 2011, 3D lidar units became available, although expensive for the typical research community. During this time, the Ford campus dataset (Pandey et al., 2011) was released, with increased diversity in sensors. However, the raw GNSS measurements were not available, and the diversity of scenarios was very limited. As a result, this dataset is not widely used.
Due to the catalyzing effect of autonomous driving technologies, the KITTI dataset (Geiger et al., 2012), one of the most well-known datasets within the autonomous driving and robotics community, was released in 2012. Specifically, this dataset includes images from multiple cameras, 3D point clouds captured by expensive 64-channel 3D lidar sensors, global positioning data from a geodetic GNSS receiver (but no raw GNSS measurements), and raw measurements from an IMU sensor. Moreover, the KITTI dataset launched several benchmark tools, enabling the community to verify and compare algorithms on lidar odometry, visual odometry, environment understanding, object detection, and scenario segmentation. Meanwhile, the KITTI dataset includes numerous scenarios ranging from open to urban areas. However, the dataset is less applicable to the satellite navigation research community, as raw GNSS measurements are unavailable. Moreover, the scenarios are not sufficiently challenging for GNSS positioning, as the typical GNSS positioning error in the KITTI dataset is within 4 m. In 2016, the north campus long-term (NCLT) dataset (Carlevaris-Bianco et al., 2016), collected for campus scenarios, was released. Importantly, the automobile-level GNSS solution was provided, but raw GNSS measurements were still not available. In 2017, the Oxford dataset (Maddern et al., 2017) was released, focusing on autonomous driving vehicles in urban scenarios. Similar datasets, such as the ApolloScape dataset (Huang et al., 2018) from Baidu, Waymo dataset (Sun et al., 2020) from Google, nuScenes dataset (Caesar et al., 2020), A2D2 dataset (Geyer et al., 2020), Ford multi-autonomous vehicle (AV) seasonal dataset (Agarwal et al., 2020), and European Union (EU) long-term dataset (Yan et al., 2020), were also released. These datasets were mainly developed for autonomous driving vehicles, focusing on SLAM, object detection, and scene segmentation. As a result, these datasets do not provide raw GNSS measurements for navigation research. In 2020, the UrbanLoco dataset was released (Wen et al., 2020), including data for downtown San Francisco and Hong Kong. Notably, the dataset provides raw GNSS measurements collected by a u-blox M8T receiver (automobile-level). Currently, researchers in the fields of robotics and intelligent transportation systems widely use this dataset. However, the UrbanLoco dataset has several key drawbacks. First, only single-frequency global positioning system (GPS)/BeiDou measurements were collected, and the diversity of GNSS receivers is limited, as only an automobile-level GNSS receivers was applied. Second, the trajectories of the released dataset are short because the dataset was originally designed for SLAM research with multi-sensor fusion. In response to this challenge and the urgent need for high-accuracy urban GNSS positioning, the Google Smartphone Challenge dataset (Zangenehnejad & Gao, 2021) was released in 2021 at the ION GNSS+ conference. Specifically, this dataset provides raw GNSS measurements collected by a smartphone together with ground truth positioning for urban scenarios in California. This dataset represented a substantial step toward urban GNSS positioning. Unfortunately, lidar and camera measurements are not available from the Smartphone Challenge dataset, and thus, this dataset does not satisfy the requirements of the autonomous driving community. In 2022, the multi-modal and multi-scenario dataset for ground robots (M2DGR) (Yin et al., 2022) dataset was released; this dataset was collected in diverse campus scenarios, including both indoor and outdoor environments, using a ground robot. However, this dataset focuses on the ground robot and is therefore not suitable for autonomous driving research.
In short, autonomous driving research has led to the release of many multi-sensor datasets with GNSS/IMU/lidar/camera measurements. However, these datasets were not specifically designed for navigation research, as they lack raw GNSS measurements and diversity in different levels of GNSS receivers and urbanization. Therefore, it is highly desirable to provide an integrated, navigation-oriented multi-sensor dataset collected from urban scenarios for the research community, which is the key objective of this paper. Both Hong Kong and Tokyo UrbanNav datasets were first presented in the commercial-track ION GNSS+ 2021 (Hsu et al., 2021). We introduce the technical details in this paper to help fellow researchers better understand the dataset collected in Hong Kong.
3 OVERVIEW OF THE DATA COLLECTION PLATFORM
3.1 System Configuration
3.1.1 Platform and Hardware Connection
Data collection platform and hardware configuration: The data collection platform is shown in Figure 2. All of the sensors are integrated into a compact sensor kit (see Figure 2(b)) installed on the roof of an automobile (Honda Jazz/Fit). Data from the GNSS receivers, cameras, lidar units, and IMU are collected by a desktop computer (i7 processor, 512-GB solid-state drive for data logging) and the Linux system installed with a robot operation system (ROS) (Quigley et al., 2009). The framework of the hardware connection for data collection is shown in Figure 3. The details of the sensors used in this dataset are listed in Table 2.
GNSS receivers for raw GNSS measurement collection: In this dataset, we provide raw GNSS measurements (e.g., pseudorange, carrier-phase, and Doppler frequency measurements) from different levels of GNSS receivers. We also provide a tutorial on how to download the ephemeris data from continuously operating reference stations in Hong Kong.
Smartphone-level GNSS receiver: The Xiaomi 8 smartphone is employed to collect the multi-constellation and multi-frequency raw GNSS measurements, including measurements from GPS (L1, L5), GLONASS (G1), GALILEO (E1, E5a), BeiDou (B1), and quasi-zenith satellite system (QZSS) (L1, L5) at 1 Hz.
Automobile-level GNSS receiver: The u-blox M8T and u-blox F9P are employed to collect raw GNSS measurements at 1 Hz. First, a u-blox M8T is employed to collect data from GPS (L1), GLONASS (G1), GALILEO (E1), and BeiDou (B1) at 1 Hz. Second, a u-blox F9P is employed to collect data from GPS (L1, L2), GLONASS (G1, G2), GALILEO (E1, E5b), BeiDou (B1, B2), and QZSS (L1, L2). Third, another u-blox F9P is attached with a geodetic antenna to evaluate the performance of the u-blox F9P receiver with a high-end GNSS antenna.
Geodetic-level GNSS receiver: The NovAtel FlexPak6 receiver is employed to collect raw GNSS measurements, including data from GPS (L1, L2), GLONASS (L1, L2), GALILEO (E1, E5b), and BeiDou (B1I, B2I), at 1 Hz.
Lidar for 3D point cloud collection: In this dataset, three lidar sensors are employed to collect 3D point clouds from the surrounding environments. First, a 32-channel 3D lidar unit from Velodyne (Vel’as et al., 2014) is installed on top of the data collection vehicle to horizontally capture the surroundings. Second, a slant 16-channel 3D lidar unit from Velodyne is installed on the right side of the sensor kit. Third, a low-cost slant 16-channel 3D lidar device from Leishen Technologies (Z. Bai et al., 2020) is installed on the left side of the sensor kit. The lidar units installed on both sides are tilted outward to capture the high-rise building structures for mapping and SLAM applications. All of the lidar data are collected at 10 Hz.
Camera for image collection: In this dataset, a stereo camera from ZED (Varma et al., 2018) is employed to collect images at a frequency of 27 Hz. The raw image data can be used for visual odometry-based positioning research.
IMU for acceleration and gyroscope data collection: In this dataset, an Xsens-MTI-30 IMU is employed to collect raw acceleration and angular velocity measurements at a frequency of 400 Hz.
Ground truth of positioning in GNSS-available scenarios: In this dataset, the ground truth of positioning is provided by a NovAtel SPAN-CPT (Kennedy et al., 2006), a GNSS (GPS, GLONASS, and BeiDou) RTK/inertial navigation system (INS) (fiber-optic gyroscope [FOG]) integrated navigation system. The gyro bias in-run stability of the FOG is 1° per hour, and its random walk is 0.067° per hour. The baseline between the rover (SPAN-CPT) and the GNSS base station is within 7 km. According to the specifications of the NovAtel SPAN-CPT, centimeter-level accuracy can be obtained when the RTK correction is available with the correct fixed solution. However, accuracy is not guaranteed in urban canyons with inferences from building reflections. Therefore, we collect raw measurements from the SPAN-CPT and postprocess the data using state-of-the-art Inertial Explorer software from NovAtel (Kennedy et al., 2006), which maximizes the accuracy of the trajectory by processing forward and backward in time, performing a backward smoothing step, and combining the results. Inertial Explorer can significantly improve the overall accuracy of the ground truth of positioning.
Ground truth of positioning in GNSS-denied scenarios: One of our datasets involves scenarios in a tunnel, where GNSS positioning is not available. As a result, the NovAtel SPAN-CPT cannot collect reliable GNSS measurements for the tunnel dataset. Fortunately, the NovAtel SPAN-CPT involves high-accuracy INS, which can provide low-drift dead-reckoning. To this end, we initialize the NovAtel SPAN-CPT from an open area in which the GNSS-RTK can easily obtain a fixed solution. As a result, the INS bias can be effectively calibrated. Then, the data collection vehicle enters the tunnel and exits the tunnel within 7 min. Next, the fixed GNSS-RTK solution can be obtained again by the NovAtel SPAN-CPT. In short, high-accuracy positioning can be achieved immediately before and after the data collection vehicle enters and exits the tunnel. Finally, the Inertial Explorer software is used to postprocess the collected raw data from the NovAtel SPAN-CPT to obtain reliable ground truth positioning. We carefully check the ground truth positioning of the tunnel data from the NovAtel SPAN-CPT based on the geodetic map from the Survey and Mapping Office (SMO) of Hong Kong. At least meter-level accuracy can be guaranteed for the challenging tunnel dataset.
GNSS reference station data: The SMO of the Lands Department of the Hong Kong government provides GNSS reference station service, named the Hong Kong Satellite Positioning Reference Station Network (SatRef). SatRef consists of 16 reference stations and 2 integrity monitoring stations evenly distributed throughout Hong Kong. SatRef provides raw data for postprocessing in receiver independent exchange format (RINEX) format for web or file transfer protocol (FTP) download; details can be found at https://www.geodetic.gov.hk/en/satref/satref.htm. This resource provides both RINEX 2 and 3 versions, with a file length of 1 h or 24 h. For RINEX 3.02, the file length of 1 h provides a data interval of 1 s or 5 s; for 24 h, only a data interval of 30 s is provided.
3.1.2 Time Synchronization of Different Sensors
Different sensors have different time systems, and it is important to effectively synchronize the diverse measurements. To achieve this, we use Chrony (Soares et al., 2020) to acquire the National Marine Electronics Association (NMEA) pulse per second (PPS) (Niu et al., 2015) signal from a u-blox EVK-M8T GNSS receiver to synchronize the ROS time from the desktop computer with the GPS time. Meanwhile, an NMEA driver is also used to record the time difference between ROS time and GPS time. In this case, the time between the ROS time (desktop computer) and the standard GPS time is synchronized. Therefore, the timestamps for those sensors are recorded as follows:
Timestamp for the camera sensor: The images from the camera are collected using the driver in ROS provided by the manufacturer. Therefore, the images are recorded with ROS timestamps upon arrival of the image.
Timestamp for lidar sensors: The 3D point clouds from the lidar units are collected using the driver in ROS. Similar to the images, the data are recorded with ROS timestamps upon arrival of the 3D point clouds.
Timestamp for the IMU sensor: Raw acceleration and angular velocity measurements from the IMU are collected using the default ROS driver. Similar to the images, the data are recorded with ROS timestamps upon arrival of the raw IMU data.
Timestamp for GNSS smartphone, u-blox (automobile-level), and NovAtel (geodetic-level) receivers: Raw GNSS measurements from the smartphones are collected using the Geo++ RINEX Logger and GNSSLogger application (Wanninger & Heßelbarth, 2020). Raw GNSS measurements from the u-blox and NovAtel receivers are collected in Windows using a laptop through the u-center and NovAtel Connect software, respectively, provided by the manufacturer. All of these GPS measurements are stamped with the GPS time.
Timestamp for the ground truth system: The ground truth positioning is provided by the SPAN-CPT system. Raw data (including raw GNSS and IMU data) are collected using the NovAtel Connect software provided by NovAtel under the Windows system using a laptop. Therefore, the data are timestamped with the GPS time.
3.2 Sensor Calibration
The extrinsic parameters of different sensors are important for sensor integration. Moreover, the intrinsic parameters associated with the camera are also required, such as the projection factor and distortion parameters. This section presents the process for calibrating associated intrinsic and extrinsic parameters. Computer-aided design software is utilized to design the 3D model of the sensor rack. The machining of the sensor rack is supported by a precise manufacturer. Measurements from the drawing are used as initial guesses for the calibration methods described below. In this dataset, we align all of the sensor measurements to the IMU sensor, which is not impacted by environmental changes. An illustration of the coordinate system of the platform is shown in Figure 4.
3.2.1 IMU Calibration
Noise parameters are important for integration of the IMU with additional sensors. In this dataset, the noise parameters for both the gyroscopes and the accelerometers are calibrated using the Allan variance (El-Sheimy et al., 2007). To achieve this, static data from the IMU are collected within 2 h and then processed via Allan variance analysis. The details of the IMU noise parameters can be found on our designed UrbanNav page.
3.2.2 Visual–Inertial Calibration
The accuracy of the intrinsic camera parameters is significant for the performance of the visual-based positioning and integration with other sensors. In this dataset, Kalibr (Furgale et al., 2013), a toolbox that can resolve intrinsic camera calibration and visual–inertial calibration problems, was employed to calibrate the intrinsic camera parameters. Kalibr can also enable the calibration of extrinsic parameters between the IMU and the camera. To achieve this, an Aprilgrid board, which is recommended by Kalibr, is designed to collect the raw image data together with the IMU data from a constrained area with a clear background. All of these data are recorded using a rosbag (Quigley et al., 2009). Finally, the recorded data are processed by Kalibr to estimate the following parameters: 1) intrinsic camera parameters, 2) extrinsic parameters between the left camera of ZED2 and the IMU, and 3) extrinsic parameters between the right camera of ZED2 and the IMU. All of these parameters can be found on our designed UrbanNav page. Moreover, in the evaluation of the quality of extrinsic parameter calibration between the cameras and the IMU, the mean reprojection error of extrinsic parameters between the left camera and the IMU is 0.57 pixels, and the mean reprojection error of extrinsic parameters between the right camera and the IMU is 0.68 pixels.
3.2.3 Lidar Calibration
For extrinsic parameters between the top lidar unit and the IMU sensor, this dataset uses the continuous-time batch optimization-based method (Lv et al., 2020), using the extrinsic parameters from the 3D drawing as an initial guess. Because the top lidar is a 32-channel that can provide 3D point clouds, the lidar odometry can be locally accurate over a short period. Therefore, the calibration of extrinsic parameters between the top lidar unit and the IMU sensor can be guaranteed with this method (Lv et al., 2020). Then, we calibrate the extrinsic parameters of the left and right lidar units to the top lidar device using Autoware (Kato et al., 2018) based on their 3D point clouds. Finally, the extrinsic parameters calibrated between the three lidar sensors and the IMU are obtained. The details of these parameters can be found on our designed UrbanNav page. Moreover, we confirm the quality of the estimated extrinsic lidar parameters by checking the degree of overlap between the 3D point clouds from different lidar units scanning the same regular wall.
3.2.4 GNSS Receiver Level-Arm Calibration
Because the calibration of extrinsic parameters (also called level-arm) between the GNSS antenna (u-blox receiver) and the IMU only involve translation, we mechanically calibrate the extrinsic parameters based on the 3D drawing with an accuracy of better than 5 cm. The details of these parameters can be found on our designed UrbanNav page.
4 DATASET DESCRIPTION
4.1 Scenarios
The degree of urbanization of a given scenario is primarily determined by the density and height of the buildings on site. This paper uses the mean masking elevation angle (MMEA), which is introduced in Section 3 of our previous work (Wen et al., 2020), to describe the degree of urbanization. Specifically, a tall building will lead to a higher masking elevation angle. Based on the urbanization level evaluated from the 3D building models, we selected four representative scenarios that cover typical environments for urban navigation, as shown in Figure 5: (a) middle-class urban, (b) deep urban, (c) harsh urban, and (d) tunnel. The key features of these datasets are shown in Table 3. A video demonstration of the sky-view images based on a virtual fisheye camera simulated from Google Earth Studio (Suzuki & Kubo, 2015) is available here.
Middle-class urban: In this scenario, the vehicle starts data collection in a wide street environment for better initial positioning performance. Then, the vehicle enters an environment with middle-class-height buildings (50 m) on two sides of a normal-width street (two-lane road). Next, the vehicle enters the main road, where the west side has a line of uniformly arranged buildings and the east side has a relatively clear sky-view. Finally, the vehicle returns to a wide street environment and achieves a loop closure. Thus, this dataset represents a typical urban worldwide scenario with a limited sky-view and numerous dynamic objects (cars, pedestrians, etc.).
Deep urban: In this scenario, the vehicle starts in an open-sky environment near the seashore to ensure ground truth quality. Then, the vehicle travels along a wide street (three-lane road) and enters a narrow street with two lanes closely adjacent to buildings. Here, nearly 70% of the sky-view is blocked by buildings, resulting in poor GNSS performance. Next, the vehicle enters a residential area with medium-height buildings (50 m) and a few tall buildings. Finally, the vehicle returns to an open-sky environment for loop closure. These data represent a deep urban scenario in which the GNSS cannot obtain satisfactory accuracy due to the poor sky-view. A large number of dynamic objects are present during data collection.
Harsh urban: For this dataset, the vehicle starts data collection on a narrow street environment, with a road width of approximately 10 m and tall buildings (ranging from 30 to 90 m) lined up closely on either side. After two closed loops, the vehicle enters another narrow street environment with a footbridge right over the vehicle, blocking most of the sky-view. Finally, the vehicle travels to a wider street surrounded by tall buildings. This scenario has an extremely limited sky-view, significantly reducing GNSS accuracy and availability. The scenario includes busy traffic loads and crowds, which degrade the performance of camera- or lidar-based navigation. This scenario represents the most challenging environment for any navigation aid.
Tunnel: In this scenario, the vehicle starts in an open-sky environment and then enters the tunnel. The tunnel is 1.7 km long and four lanes wide. Next, the vehicle leaves the tunnel and ends the data collection in an environment with buildings on one side of the lane. This scenario represents a typical tunnel environment for autonomous driving, where the GNSS is completely blocked and there are insufficient features for localization by other sensors.
4.2 Data Format
GNSS data: In this dataset, raw GNSS measurements are archived in RINEX3.02 format. To benefit the field of robotics research, we released an open-source tool, GraphGNSSLib (Wen & Hsu, 2021), to decode the RINEX3.02 data to the typical ROSTOPIC format (Quigley et al., 2009). Robotics researchers can easily access raw GNSS data for further study and algorithm development. We also provide some examples for decoding the RINEX3.02 data using GraphGNSSLib and for performing single-point positioning based on pseudorange measurements. More details can be found at this GitHub link.
Lidar data: In this dataset, the lidar data are recorded in the rosbag (Quigley et al., 2009) with a topic (/velodyne_points) for the top 32-channel lidar, a topic (/left/lslidar_point_cloud) for the left 16-channel lidar, and a topic (/right/velodyne_points) for the left 16-channel lidar. An illustrative video of using lidar data for lidar inertial odometry (Shan et al., 2020) can be found at this link.
Camera data: In this dataset, the camera data are recorded in the rosbag (Quigley et al., 2009) with a topic (/zed2/camera/left/image_raw) for the left camera and a topic (/zed2/camera/right/image_raw) for the right camera. An illustrative video of using the image data for a visual–inertial navigation system (Qin et al., 2018) can be found at this link.
IMU data: In this dataset, the IMU data are recorded in the rosbag (Quigley et al., 2009) with one topic (/imu/data).
Ground truth data: In this dataset, the ground truth positioning data are stacked in the rosbag (Quigley et al., 2009) with one topic (/novatel_data/inspvax) by postprocessing.
Skymask data: Recently, the 3DMA GNSS positioning method (Groves, 2016) has attracted attention because of its high potential to improve urban GNSS positioning performance. The skymask is the key component generated by the 3D building model, providing a skyplot with the building boundaries at a given location. The skymask is useful for satellite visibility prediction. It stores the highest elevation angle of the surrounding building boundaries, with 0.1° resolution, for each corresponding azimuth angle with 1° resolution. The skymask is usually generated offline and stored in a formatted CSV file to reduce the computational load (Ng et al., 2020). Each CSV file stores all potential locations and the skymask that covers the whole area of a single dataset location. Each row represents one single location in the file, consisting of WGS84 coordinates and the skymask. There are 364 entries in each line, separated by a comma. The first three entries are WGS84 coordinates, with latitude and longitude in degrees and altitude in meters. The remaining entries are the skymask of the highest elevation angle for building boundaries at an azimuth angle of 0° to 360°, giving a total of 361 entries. The skymask at 0° and 360° has the same value for convenient indexing based on the azimuth angle. Skymask data (see example in Figure 6) are provided for the middle-class, deep, and harsh urban datasets. More details can be found through this link.
5 BENCHMARK OF STATE-OF-THE-ART METHODS
5.1 GNSS Single-Point Positioning
The weighted least square (WLS) method is the most widely used approach for GNSS single-point positioning. A flowchart of this method, which utilizes a straightforward algorithm, is shown in Figure 7.
In this section, the GNSS WLS method is based on the RTKLIB (Takasu & Yasuda, 2009). The processing steps utilize the received GNSS constellations and frequencies depending on the receiver’s capabilities. The processed results are shown in Table 4 for the middle-class, deep, and harsh urban scenarios. Note that the tunnel dataset is not evaluated, as the GNSS signal is not available in the tunnel. It can be seen from Table 4 that the mean error of the GNSS WLS positioning gradually increases as the complexity increases. Here, the availability represents the percentage of valid solutions over the whole period with an expected solution output rate of 1 Hz. Based on the results shown in Table 4, the positioning root mean square error (RMSE) is roughly proportional to the complexity of the environment of the dataset. Additionally, the positioning error increases across different receivers, from a geodetic-grade receiver to a low-cost smartphone receiver. The positioning error increased because the measurement quality is worse in a low-cost receiver than in a geodetic-grade receiver, as a result of the sampling rate of the radiofrequency signals and the receiver algorithm and equipped antenna. The trajectories are shown in Figure 8.
5.2 Urban State-of-the-Art 3DMA GNSS Positioning
This section presents the positioning performance achieved by urban GNSS positioning with the aid of 3D building models, known as 3DMA GNSS. First, the WLS solution is used as an initial guess to distribute the particles. A skymask is generated based on the particles’ locations and 3D models. The likelihood of each particle can consider the matching of both predicted and measured satellite visibilities (i.e., GNSS shadow matching) and predicted and measured satellite ranging measurements. Then, the weighted average (considering the likelihood) of the particles’ locations is regarded as the receiver’s absolute location. A flowchart of the 3DMA GNSS method is shown in Figure 9.
The results are processed based on our previous work (Ng et al., 2021) using the datasets collected in middle-class, deep, and harsh urban scenarios, as shown in Table 5. Shadow matching and likelihood-based-ranging 3DMA GNSS data are selected for the integration solution. Compared with the results in Table 4 (WLS results), a reduced positioning error can be found for commercial-grade and smartphone receivers. However, this result is not observed for the geodetic-grade receiver in the deep urban canyon dataset. This difference is attributed to the fact that a limited number of satellites with high elevation angles are visible at the beginning of the dataset or in a very harsh environment, and the sensor does not have adequate satellite visibility for shadow matching to match with the satellite in the ephemeris. Thus, the position estimation results converge to a local minima with a significant error. For more technical details of the 3DMA GNSS method evaluated here, readers are referred to our previous work (Ng et al., 2021). The trajectories are shown in Figure 10.
5.3 GNSS Positioning via Factor Graph Optimization
The recently investigated factor graph optimization (FGO)-based formulation (Das et al., 2021; Wen & Hsu, 2021) shows improved robustness against the extended Kalman filter because of the global optimization and multiple iterations of FGO. A performance comparison of the extended Kalman filter and FGO for GNSS/IMU integration was presented in our previous work (Wen et al., 2021), which showed the advantages of FGO. Interestingly, the advantage of FGO in GNSS positioning was recently verified in the Google Smartphone Challenge (Zangenehnejad & Gao, 2021) organized by Google at ION GNSS+ 2021. A team from Japan (Suzuki, 2021) achieved first place with an algorithm based on FGO. Inspired by the advantages of FGO, our recent work (Wen & Hsu, 2021) presented the development of an FGO-based GNSS positioning framework, GraphGNSSLib, which is provided as an open-source resource to the community. A flowchart of this framework is shown in Figure 11. Multiple epochs of the pseudorange measurements from satellites are used to formulate pseudorange factors, and the Doppler velocity is estimated from a WLS solution. Then, multiple epochs of the estimated Doppler velocity are used to formulate the velocity factors. Finally, a nonlinear optimization problem is formed, and the batch of states χ can be optimized using a nonlinear least squares (NLS) approach.
The resulting performance of the FGO-based GNSS pseudorange/Doppler measurement fusion with the UrbanNav dataset based on GraphGNSSLib is shown in Table 6, and the trajectories are presented in Figure 12. Overall, it can be seen that both the availability (100%) and positioning accuracy are improved compared with the conventional WLS. However, the achieved RMSE can still reach more than 10 m because of the numerous expected multipath and NLOS receptions. For more technical details of the GNSS FGO method evaluated here, readers are referred to our previous work (Wen & Hsu, 2021).
5.4 Lidar Odometry and Lidar Inertial Odometry
This section presents the baseline performance achieved for lidar odometry using the popular lidar odometry and mapping (LOAM) algorithm (Zhang & Singh, 2017) and lidar inertial odometry via smoothing and mapping (LIO-SAM) algorithm (Shan et al., 2020). A flowchart of LOAM is shown in Figure 13. The edge and planar features are detected based on feature extraction algorithms. Then, the rough odometry can be estimated by comparing the features detected in the previous and current epochs of data. This rough transformation matrix is used as an initial guess for multi-epoch global optimization. This graph can be solved by NLS, and once the map is built or updated, the accumulated map will be used in the next epoch of rough odometry. This method is called the scan-to-map approach, and it may provide a more accurate initial guess for global optimization than conventional methods. In the integration of INS with lidar odometry, the high-frequency INS update is used to provide a good initial guess for the odometry. Thus, a flowchart of the LIO-SAM algorithm is omitted.
The relative positioning errors (RPEs) of the LOAM and LIO-SAM are evaluated using the EVO toolkit (Grupp, 2017), which is a popular evaluation tool in the SLAM research field. A benchmark comparison of publicly available lidar odometry in urban canyons was demonstrated in our previous work (Huang et al., 2022), which demonstrated the computational efficiency and satisfactory positioning accuracy of LOAM. Interestingly, as shown in Table 7, the best lidar odometry performance was achieved in harsh urban scenarios using the center lidar, with an RMSE of 0.17 m (Velodyne 32E). With the help of lidar inertial odometry, the RMSE was further reduced from 0.17 m (LOAM) to 0.15 m (LIO-SAM). The trajectories are shown in Figure 14. A decrease in positioning error is observed for the slant lidars (left and right) for the middle-class urban to harsh urban scenarios, as shown in Table 7. One possible reason for this result is that the higher buildings in the harsh urban scenario can provide dense features for the slant lidar units (the slant lidar unit primarily scans high-rise features), thus leading to improved geometry constraints. Similar to the center lidar, the RMSE of the right lidar decreased from 0.88 m (LOAM) to 0.54 m (LIO-SAM) in the middle-class urban scenario. However, lidar inertial odometry of the slant lidars failed in the deep urban and harsh urban scenarios because of the incorrect estimation of inertial sensor bias with increasing positioning error. Additionally, lidar odometry and lidar inertial odometry failed for all lidar units in the tunnel dataset, as the features in the tunnel scene are quite limited. Determining how to accurately perform mapping in tunnels using onboard sensors remains a challenging problem. Overall, the performance comparisons reveal that the center 32-channel 3D lidar achieves the best performance among the three lidar sensors using LOAM. A similar conclusion can also be obtained for lidar inertial integration using LIO-SAM. For more technical details of the lidar odometry evaluated in urban canyons, readers are referred to our previous work (Huang et al., 2022).
5.5 VINS and ORB-SLAM3
This section presents the baseline performance of a VINS using the popular VINS-Mono and VINS-Fusion pipeline (Qin et al., 2018), which integrates the camera and IMU measurements by using a sliding-window FGO. Different from the previously mentioned VINS framework, ORB-SLAM3 is also employed to evaluate the baseline performance; this approach can be applied in pure visual or visual–inertial modes with monocular, stereo, or red green blue-depth (RGB-D) sensors. A flowchart of a typical visual SLAM using the ORB descriptor is shown in Figure 15. The ORB descriptor is used to represent distinctive features, and details regarding this descriptor can be found in Rublee et al. (2011). Similar to the lidar method, the rough transformation matrix is then used as an initial guess for multi-epoch global optimization. In visual SLAM, global optimization is well-known as a bundle adjustment. This approach considers a feature that can be mutually observed in different images. If feature i is visible in image j, then, bi,j = 1; otherwise, bi,j = 0. Thus, the pairs are identified. By re-projecting feature j on image i, considering the rotation R and translation t matrices, the predicted pixel position can be found. Then, the transformation matrix that minimizes the Euclidean distance between the predicted and measured pixel positions of the feature gives the optimized relative positioning. Again, the flowchart of the VINS is omitted.
Similar to lidar odometry, the EVO toolkit (Grupp, 2017) is employed to evaluate the RPE of the VINS. As shown in Table 8, the best performance is achieved in the middle-class urban scenario, with a mean RPE of 0.33 m, based on stereo–inertial integration using VINS-Fusion. Interestingly, ORB-SLAM3 can achieve better performance than VINS-Fusion in the harsh urban and tunnel scenarios. The trajectories are shown in Figure 16 and Figure 17. Unfortunately, the VINS fails in the harsh urban scenario because of the numerous dynamic objects. The VINS also fails in the tunnel dataset, as the texture information in the tunnel is very repetitive. Therefore, further work is needed to improve the robustness of the VINS in a tunnel environment.
5.6 Discussion on the Complementariness of the Sensors for Multisensory Integration
A qualitative summary of the sensors is given in Table 9. The GNSS sensor is the only one that can provide absolute positioning (i.e., globally referenced solutions), whereas the other sensors can provide only relative positioning. Thus, GNSS is indispensable in urban areas if no other mapping database is used for other sensors. INS can always provide stable relative positioning with a drifting rate. If the temperature factor is ignored, INS is not affected by the surrounding environment. Consequently, INS is a perfect choice for sensor fusion. In middle-class and deep urban scenarios, most of the solutions (sensor+algorithm) have the potential to provide satisfactory performance. Only the visual and lidar odometry failed at certain epochs when small features were detected. This problem may be overcome if a sophisticated INS integration algorithm is developed. In the harsh urban scenario, only lidar can provide satisfactory performance while the GNSS sensor and camera are strongly hindered by the complexity of the environment. Because of this difference, level-4 autonomous driving employs lidar matching as the primary localization approach. In the tunnel, the GNSS, camera, and lidar sensors are unsatisfactory. The GNSS and lidar sensors are nearly prevented from outputting a solution, as the former and latter sensors are challenged by complete signal blockage and repeated features, respectively. The camera has a small likelihood of performing in the tunnel, but many calibrations using location-known features are required. Thus, to achieve seamless positioning in urban areas, the integration of all sensors is required. In this case, tunnels and indoor areas may require the use of additional sensing technologies.
6 LESSONS LEARNED
The UrbanNav dataset is the accumulation of work performed at the Intelligent Positioning and Navigation Laboratory over the past five years, in the Department of Aeronautical and Aviation Engineering, Hong Kong Polytechnic University. After completing this dataset, we have identified several lessons worth sharing:
Sensor interference: During data collection, we encountered several cases of sensor interference, where the surrounding environment interfered with the camera images, leading to blurred images. Moreover, some GNSS receivers were affected by electromagnetic interference from the camera. To solve this problem, we blocked the signal transmission wires/cables of the camera with tinfoil to prevent unexpected interference from the environment, as shown in Figure 18(a).
Ground truth acquisition in GNSS-challenged environments: Even when using the high-accuracy NovAtel SPAN-CPT, centimeter-level accuracy still cannot be guaranteed when the data collection vehicle stays in a GNSS-denied scenario, such as a tunnel or dense urban canyon, for a long period. It is important to initialize the data collection system in an open area in which the INS bias can be easily calibrated with the help of fixed GNSS-RTK solutions. In the UrbanNav dataset, we start the data collection in a harbor side with satisfactory satellite visibility, as shown in Figure 18(b).
Time synchronization of multiple sensors: Synchronizing the time systems of different sensors is of great importance for integrated navigation systems. The smaller offset in timestamps associated with sensor measurements can lead to a large error in the navigation algorithm, especially in high-speed operations. We adopted the PPS (Niu et al., 2015) signal from a GNSS receiver to synchronize the ROS time from the desktop computer with the GPS time, as shown in Figure 18(c). In other words, all sensor measurements can be aligned to the GPS time with a time synchronization error of less than 1 ms (Vyskocil & Sebesta, 2009).
Update and maintenance of the dataset: Feedback from both academic and industry users can help to continuously improve the completeness of the dataset. Thus, we constructed a GitHub page (initially at https://github.com/weisongwen/UrbanNavDataset, recently updated to https://github.com/IPNL-POLYU/UrbanNavDataset), as a bridge between our group and the users. We have currently obtained more than 300 stars on GitHub in total. The latest GitHub statistics are presented in Figure 18(d). Different new issues have been raised by users and have been continuously solved by our group through the GitHub pages.
7 CONCLUSIONS
An accurate and reliable positioning solution for urban scenarios is significant for systems with navigation requirements. The lack of certifiable positioning service presents a bottleneck that prevents the arrival of fully autonomous systems in urban scenarios. In contrast to open areas with good sky visibility and abundant static environment features, urban scenarios introduce additional challenges to existing navigation algorithms. Therefore, an integrated multi-sensor dataset can effectively benefit the navigation research community, but the existing datasets do not satisfy the necessary requirements. This paper introduces the Hong Kong UrbanNav dataset to bridge this gap. We wish to raise the awareness of the research community on the urgency of accurate and reliable urban navigation. Finally, we will engage in further work on the UrbanNav dataset, including the following:
Build a website to allow researchers to upload their papers and results evaluated based on the open-source data in terms of proposed criteria.
Invite experts in the field to design assessment criteria for different positioning algorithms. Report the performance of state-of-the-art positioning and integration algorithms in urban canyons every two years.
HOW TO CITE THIS ARTICLE
Hsu, L-T., Huang, F., Ng, H-F., Zhang, G., Zhong, Y., Bai, X., & Wen, W. (2023). Hong Kong UrbanNav: An open-source multisensory dataset for benchmarking urban navigation algorithms. NAVIGATION, 70(4). https://doi.org/10.33012/navi.602
ACKNOWLEDGMENTS
The authors appreciate financial support from the Faculty of Engineering, Hong Kong Polytechnic University under the project “Perception-based GNSS PPP-RTK/LVINS integrated navigation system for unmanned autonomous systems operating in urban canyons” (grant number ZVZ8). The authors also thank Dr. Taro Suzuki from the Chiba Institute of Technology, Japan, who provided the algorithm to obtain the sky-pointing fisheye image from Google Earth Studio. Lastly, we thank Jiachen Zhang, Yin-Chiu Kan, Weichang Xu, and Song Yang for their help in developing and evaluating the sensor platform.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.