## Abstract

We propose a Simultaneous Localization and Mapping (SLAM)-based Integrity Monitoring (IM) algorithm using GPS and fish-eye camera to compute the protection levels while accounting for multiple faults in GPS and vision. We perform graph optimization using GPS pseudoranges, pixel intensities, vehicle dynamics, and satellite ephemeris to simultaneously localize the vehicle, GPS satellites, and key image pixels in the world frame. We estimate the fault mode vector by analyzing the temporal correlation across pseudorange residuals and spatial correlation across pixel intensity residuals. To isolate the vision faults, we develop a superpixel-based piecewise random sample consensus. For the estimated fault mode, we compute the protection levels by performing worst-case failure slope analysis on the batch realization of linearized Graph-SLAM formulation. We perform real-world experiments in an alleyway in Stanford, California and a semi-urban area in Champaign, Illinois. We demonstrate higher localization accuracy and tighter protection levels as compared to GPS-only SLAM-based IM.

## 1 INTRODUCTION

Integrity serves as a key performance measure in assessing the correctness of the estimated position (Ochieng et al., 2003). In urban environments, GPS systems receive fewer measurements due to degraded satellite visibility. They also suffer from received signal faults caused by multipath and satellite faults caused by anomalies in the broadcast navigation message. To address the above-mentioned challenges, one possible solution is to incorporate additional redundancy through the sensor fusion of GPS and vision. Vision sensor performs well in urban areas due to the feature-rich surroundings (Hol, 2011). Sensor fusion (Krishnaswamy, 2008) integrates measurements from multiple sensors to improve the accuracy of the vehicle and provide robust performance. Individual sensors, such as GPS and camera, have inherent limitations in operability that are reliably corrected by combining these complementary sensors in a sensor fusion framework. In particular, occlusion and illumination variations in multiple pixel intensities induce data association errors across images, thereby termed as vision faults (Miro, Zhou, & Dissanayake, 2006). Therefore, there is a need for the development of multi-sensor Integrity Monitoring (IM) techniques that account for multiple faults in both GPS and vision.

### 1.1 Related work

The concept of Receiver Autonomous Integrity Monitoring (RAIM) (Hewitson & Wang, 2006) has been initially developed in the aviation sector for executing safety-critical applications. However, extending the concept of integrity from aviation to urban transportation is not entirely straightforward (Zhu, Marais, Bétaille, & Berbineau, 2018). This is because, as mentioned earlier, the sensors operating in urban areas face additional challenges (Joerger & Spenko, 2017) due to static infrastructure, such as buildings and thick foliage, dynamic obstacles, such as traffic and pedestrians, and environmental conditions, such as shadows, sunlight, and weather. Research (Joerger & Spenko, 2017; Pullen, Walter, & Enge, 2011a) describes the initial parallels between the established integrity requirements in aviation and the early safety standards followed in the transportation sector. Based on this, the standardized integrity requirements in aviation are found to be too conservative for urban applications.

A collection of earlier works (Binjammaz, Al-Bayatti, & Al-Hargan, 2013) adapted the traditional snapshot RAIM techniques, which were originally developed for aviation industry, to urban context. Salós, Martineau, Macabiau, Bonhoure, and Kubrak (2013) studied the RAIM availability in the context of electronic toll collection by designing a modified weighted Least-Squares (LS) residual-based RAIM that considers a constant mis-detection probability. Another work (Zhang, Li, Cui, & Liu, 2017a) proposed an adaptive RAIM by characterizing the position errors and measurement residuals as a chi-squared distribution. However, the limitation of this approach is that all pseudorange measurements are considered to have the same error variance, which is not a realistic assumption in an urban scenario.

Prior literature on GPS-based IM for urban navigation addresses the insufficient availability of GPS measurements by incorporating multiple GNSS constellations (Borio & Gioia, 2016) and characterizing the measurement errors associated with the Non-Line-Of-Sight (NLOS) satellite signals (Groves & Jiang, 2013). However, due to the dense building infrastructure, the geometry of satellites is skewed, which still degrades the confidence in the estimated integrity bounds. In this context, Zhang, Li, Cui, and Liu (2017b) proposed an adapted-RAIM approach that accounts for the satellite geometry in deciding whether to include or exclude a potential multipath-affected measurement. Santa, Úbeda, Toledo, and Skarmeta (2006) and Pullen, Walter, and Enge (2011b) analyzed the existing infrastructure of Ground-Based Augmentation System (GBAS) and Satellite-Based Augmentation System (SBAS) for use by non-aviation users. However, these augmentation systems have selected coverage and comply with the aviation integrity standards, which limits their applicability in the urban transportation sector.

In addition to the above-mentioned snapshot algorithms, a rich literature exists on utilizing sequential approaches for RAIM analysis in urban areas. In Le Marchand, Bonnifait, Ibañez-Guzmán, Peyret, and Bétaille (2008), the authors developed a Kalman Filter (KF)-based Fault Detection and Exclusion (FDE) algorithm where the measurement residuals were independently compared against a threshold to be flagged as either faulty or reliable. Tominaga and Kubo (2019) proposed a KF approach where the measurement noise covariance matrix is adaptively estimated, and thereafter, the position error bounds are estimated as being proportional to the state uncertainty. Tran and Lo Presti (2019) designed a KF-based Solution Separation (SS) RAIM methodology for urban navigation that combines the smoothness and higher accuracy of KF with the flexible framework of SS RAIM. Zhu et al. (2018) and Margaria and Falletti (2016) summarize the existing state-of-the-art techniques for IM in urban contexts and also point out several open research issues in this field.

Some existing works rely on external information sources to perform urban GPS-based IM. In Velaga, Quddus, Bristow, and Zheng (2012), the authors developed a sequential map-aided IM technique that checks for outliers in position and Geographic Information System (GIS) using traditional RAIM (Walter, Blanch, Enge, Pervan, & Gratton, 2008) and weight-based topological map-matching process, respectively. Another paper (Binjammaz et al., 2013) developed three phases of integrity checks that include assessing the position quality via traditional RAIM, speed integrity via GPS Doppler, and map-matching accuracy via fuzzy inference system. However, these approaches have practical limitations because the offline map database is not always available, and its accuracy cannot be guaranteed due to the dynamic changes in the urban surroundings. Another line of prior work (El-Mowafy & Kubo, 2017; Li, Bonnifait, Ibanez-Guzman, & Zinoune, 2017; Toledo-Moreo, Bétaille, & Peyret, 2009) utilizes the odometry information obtained from other sensors, such as inertial measurement units, speedometer, and camera, to perform GPS integrity analysis. But the drawbacks of these approaches are that they do not account for the faults in the external sensors and the simultaneous occurrence of multiple faults.

Nowadays, research on multi-sensor navigation in urban areas using GPS and vision has received great attention (Bonin-Font, Ortiz, & Oliver, 2008; Won et al., 2014). Dusha and Mejias (2012) and Chen, Hu, Zhang, Shi, and Li (2018) proposed a visual-GPS methodology, wherein the GPS measurements are utilized to estimate the scale ambiguity in visual odometry while the camera images aid in attitude estimation. Won, Yun, Lee, and Sung (2012) propose an integration of GPS with a monocular camera to improve the navigation performance. In Shepard and Humphreys (2014), the authors proposed a Simultaneous Localization and Mapping (SLAM) framework to estimate high-precision position and attitude using carrier-phase GPS measurements and feature-based visual odometry (Klein & Murray, 2007). While the existing literature addresses the notion of position accuracy, none of the state-of-the-art techniques provide any safety guarantees for the multi-sensor estimated navigation solution.

We leverage the generalized and flexible platform developed in our prior work (Bhamidipati & Gao, 2018), which is Simultaneous Localization and Mapping (SLAM)-based Fault Detection and Isolation (FDI), as the basis for assessing the multi-sensor integrity. This paper is based on our recent ION GNSS+ 2019 conference paper (Bhamidipati & Gao, 2019b). Another extension of the SLAM-based FDI platform, described in our prior work (Bhamidipati & Gao, 2019a), assesses the integrity of cooperative localization using a network of receivers. SLAM (Cadena et al., 2016), a well-known technique in robotics, utilizes sensor measurements to estimate the landmarks in a three-dimensional (3D) map while simultaneously localizing the robot within it. Analogous to this, our prior work (Bhamidipati & Gao, 2018) on SLAM-based FDI combines the sequential data of GPS pseudoranges, vehicle motion model, and satellite broadcast ephemeris in a graph framework to simultaneously localize the *robot*, which is the GPS receiver, *landmarks* in the map, which are the GPS satellites. A key feature of this platform is that it utilizes graph optimization techniques (Latif, Cadena, & Neira, 2014) and, therefore, does not require any prior assumption regarding the distribution of states. As seen in Figure 1(a), given that we localize the landmarks as well, our SLAM-based FDI does not require any prior information regarding the surrounding 3D maps.

Utilizing GPS satellites as landmarks is especially fruitful during faults, namely broadcast anomalies and multipath effects. As seen in Figure 1(b), the location of GPS satellites estimated via satellite ephemeris is incorrect during broadcast anomalies and during multipath; as seen in Figure 1(c), the received GPS pseudoranges have additional bias. In both scenarios, the conventional residual-based FDI module (Joerger, Chan, & Pervan, 2014) flags the measurements as faulty and excludes them from navigation solution estimation. However, excluding the faulty ones lower the total number of measurements available for navigation. In contrast, our SLAM-based FDI utilizes the sequential data of vehicle motion inputs and GPS pseudoranges to estimate the position of GPS satellites’, thereby mitigating the effect of broadcast anomalies. Also, during multipath, instead of estimating the GPS satellite location that agrees with the broadcast ephemeris, our framework intelligently localizes the landmark to a virtual location that complies with the received GPS pseudorange. The concept of estimating virtual satellite location, which corresponds to the reflection of an actual GPS satellite across a building surface, has been demonstrated in existing literature (Ng & Gao, 2016). Therefore, unlike traditional GPS algorithms, the flexibility provided by our SLAM-based FDI in localizing the GPS satellites improves the number of reliable measurements available for navigation. A real-world experimental validation that demonstrates an improved vehicle positioning accuracy by utilizing our SLAM-based FDI framework is shown later in Figures 11(a) and 11(b).

### 1.2 Key contributions

We propose the SLAM-based IM algorithm using GPS and fish-eye camera to compute the error bounds, termed as *protection levels*, of the Graph-SLAM estimated navigation solution. In the graph framework, we consider *global landmarks* as the GPS satellites and additional *local landmarks* as the key image pixels in the world frame. We constrain the graph via GPS pseudoranges, vision pixel intensities, vehicle dynamics, and satellite ephemeris.

To obtain vision pixel intensities, we opt for a fish-eye camera mounted on a vehicle and point it upwards for the following reasons: firstly, given its wide (≥ 180°) Field-Of-View (FOV), the image pixels are spatially spread out in different directions with respect to the vehicle, thereby, compensating for the restricted spatial geometry of the limited global landmarks, i.e., GPS satellites. Secondly, given that the camera is pointing upwards, the unstructured skyline of the buildings aids in resolving the attitude of the vehicle. Thirdly, the fish-eye image captures the open-sky section with respect to the vehicle that is used to distinguish the Line-Of-Sight (LOS) GPS satellites from that of the NLOS ones (Shytermeja, Garcia-Pena, & Julien, 2014).

The main contributions of this paper are listed as follows:

We design a Graph-SLAM framework using GPS and fish-eye camera that simultaneously estimates the vehicle, GPS satellites, and key image pixels and also accounts for multiple faults in both GPS and vision.

To detect and isolate GPS faults, we evaluate the deviation in each pseudorange residual against its corresponding empirical Cumulative Distribution Function (CDF), which represents the measurement errors in non-faulty conditions.

We develop a superpixel based piecewise Random Sample Consensus (RANSAC) (Conte & Doherty, 2009; Li & Chen, 2015) technique that performs spatial voting across pixel intensity residuals to detect and isolate vision faults.

Utilizing the fault mode vector estimated from multiple FDI, we derive the protection levels by applying worst-case failure mode slope analysis (Joerger et al., 2014; Salós et al., 2013) to the batch realization of the linearized Graph-SLAM formulation.

We have conducted two real-world experiments using a ground vehicle, one in an alleyway in Stanford, California and the other in a semi-urban area in Champaign, Illinois. We have demonstrated the successful detection of multiple measurement faults in GPS and fish-eye camera. We have also validated tighter protection levels and lower localization errors achieved via the proposed algorithm as compared to SLAM-based IM that utilizes only GPS measurements.

The rest of the paper is organized as follows: Section II describes our SLAM-based IM that utilizes GPS and fish-eye camera; Section III experimentally validates the detection of multiple faults in both GPS and vision, lower localization accuracy, and tighter protection levels via the proposed algorithm as compared to GPS-only SLAM-based IM; Section IV concludes the paper.

## 2 SLAM-BASED IM USING GPS AND FISH-EYE CAMERA

In this work, we focus on the measurement faults related to GPS and vision that frequently occur in urban areas. While the formulation of the proposed algorithm is capable of addressing other faults, for simplicity, we consider no measurement faults associated with the receiver motion model and satellite broadcast ephemeris. For reference, our prior work (Bhamidipati & Gao, 2018) describes details on utilizing SLAM-based FDI to account for broadcast anomalies, described earlier in Figure 1(b).

In Figure 2, we outline the architecture of the proposed multi-sensor SLAM-based IM using GPS and fish-eye camera, which is summarized as follows:

During initialization, we create a 3D world map via the estimated Position, Velocity, and Time (PVT) of the receiver and satellites computed from an existing GPS algorithm (Lashley, Bevly, & Hung, 2010). We also compute a series of camera images to compute the intrinsic and extrinsic camera parameters. Further details regarding the initialization are given in Section 3.2.

We pre-process the raw image obtained from the fish-eye camera using our hybrid sky detection algorithm to distinguish the sky pixels from the non-sky pixels. The detected sky pixels are used to distinguish the LOS and NLOS satellites. This information is thereafter used in formulating the GPS measurement covariance.

We consider the non-sky pixels along with GPS pseudoranges and carrier-to-noise density (

*C*/*N*_{0}) values, receiver motion model, and satellite orbital model as inputs to our algorithm. We combine the measurements in an extended graph optimization module to estimate the overall state vector, which consists of the state vector of the vehicle, GPS satellites, and key image pixels using an M-estimator based Levenberg Marquardt algorithm (Lourakis et al., 2005; Shevlyakov, Morgenthaler, & Shurygin, 2008).We independently analyze the measurement residuals against an empirical Normal distribution to detect and isolate GPS faults. We develop a superpixel-based piecewise RANSAC to perform spatial voting for the detection and isolation of vision faults. Based on the estimated fault status of the measurements, we formulate the measurement fault mode, which has binary entries, such that 0 indicates non-faulty and 1 represents faulty.

Utilizing the estimated fault mode and overall state vector, we formulate the batch realization of linearized Graph-SLAM formulation and, subsequently, compute the protection levels using worst-case failure mode slope analysis (Joerger et al., 2014; Salós et al., 2013).

The proposed SLAM-based IM algorithm using GPS and fish-eye camera consists of three main modules, namely measurement pre-processing, extended graph optimization, and IM for Graph-SLAM. We describe the details as follows.

### 2.1 Pre-processing the measurements

We consider the following measurements as inputs to our SLAM-based IM algorithm: GPS pseudoranges and *C*/*N*_{0} values from the GPS receiver, pixel intensities from the fish-eye image, motion input obtained from the vehicle motion model, and satellite ephemeris decoded from the navigation message.

#### 2.1.1 Vision module

We pre-process the raw image obtained from the fish-eye camera using our hybrid sky detection algorithm to distinguish the sky-pixels from the non-sky pixels. The pipeline of our hybrid sky detection is seen in Figure 3(a). Our hybrid sky detection takes into account not only the pixel intensities but also prior knowledge regarding the spatial location of the sky pixels.

We convert the raw image to gray scale and then perform the median blur operation. The median blur (Wang, Li, Yang, & Gong, 2010) is a nonlinear filter that reduces the noise in the image while keeping the edges relatively sharp. Next, we compute the gradient by combining the magnitude obtained via two Sobel operations (Gao, Zhang, Yang, & Liu, 2010), one executed in horizontal and the other in vertical direction. An example of the image obtained from the fish-eye camera operating in urban areas and pointing upwards is seen in Figure 3(b).

We observe that the probability of sky is highest close to the center and exponentially decreases outwards (Haque, Murshed, & Paul, 2008). Therefore, the corresponding location-based sky probability, denoted by *p _{loc}* is given by

1

where * u* is the two-dimensional (2D) image coordinates, such that

*∈ Π, Π represents the pre-defined domain of the image coordinates. |⋅| denotes the cardinality of the image domain, and ‖⋅‖ denotes the 2-norm residual.*

**u***⊂*

**c**_{loc}*∈ Π denotes the pre-defined center coordinates in the 2D image frame.*

**u**Combining the location probability with Otsu’s method of intensity thresholding (Moghaddam, Cheriet, & Adotsu, 2012), we compute the hybrid optimal border, seen in Figure 3(c), that separates the sky region, represented by subscript *sky*, from that of the infrastructure, denoted by subscript *inf*. We minimize the variance of sky and infrastructure to estimate the Otsu’s intensity threshold *I _{otsu}* as

2

where

–

denotes the intensity vector that stacks all the pixel intensities in the fish-eye image, such that**I**= {**I***I*()|**u**∈ Π}, where**u***I*() ∶ Π → ℝ denotes the intensity of any 2D pixel coordinates**u**;**u**–

*ω*_{sky}(*k*) and*ω*_{inf}(*k*) denote the weights associated with the sky and building infrastructure, respectively, such that and ;– and denote the variance of the pixel intensities associated with the sky and building infrastructure.

Utilizing Equations (1) and (2), we compute the hybrid sky probability, denoted by *p _{sky}* at any 2D image coordinate

*∈ Π as*

**u**,**u**3

where *I _{max}* and

*I*are the maximum and minimum intensity values in the fish-eye image, such that

_{min}*I*= max

_{max}_{u∈Π}

*I*(

*) and*

**u***I*= min

_{min}_{u∈Π}

*I*(

*), respectively. Considering*

**u***η*as the pre-defined sky threshold, if

*p*(

_{sky}*) >*

**u***η*, it is categorized as sky pixel and non-sky pixel otherwise. The sky-enclosed area in the fish-eye image is seen in Figure 3(d). The value of

*η*is set during initialization and explained later in Section 3.2.

Using the non-sky detected pixels, we describe the vision measurement model of an omni-directional fish-eye camera in Eq (4), which is formulated via direct image alignment (Caruso, Engel, & Cremers, 2015) and semi-dense inverse depth map estimation (Engel, Schöps, & Cremers, 2014). Instead of utilizing standard algorithms (Klein & Murray, 2007; Mur-Artal, Montiel, & Tardos, 2015) that rely on feature detection and extraction, we perform direct image alignment that directly processes the pixel intensities of keyframe and current image frame to estimate the relative change in position and orientation of the vehicle. The selected pixels of the key frame, termed as key pixels, refer to the ones that exhibit a high intensity gradient. Details regarding the keyframe selection are explained in the existing literature (Engel et al., 2014).

4

with *η*_{vis}(* u*) is pixel noise and from Caruso et al. (2015),

where the subscript *kf* refers to keyframe,

–

*I*_{kf}() ∶ Π**u**_{kf}→ ℝ denotes the intensity of any 2D pixel coordinatesin the reference keyframe, and Π**u**_{kf}⊂ ℝ^{2}denotes the image domain of keyframe;–

*I*_{t}() ∶ Π**u**_{ns}→ ℝ denotes the intensity of any 2D pixel coordinatesin the current frame, and Π**u**_{ns}⊆ Π denotes the image domain consisting of non-sky pixels;–

*π*∶ ℝ^{3}→ Π_{ns}denotes the mapping function of any 3D world position coordinates**p**= [*p*_{x},*p*_{y},*p*_{z}] to a 2D pixel in the image frame;–

*w*(Δ) denotes the 3D warp function that unprojects the pixel coordinates**μ**,**u**and transforms it by a relative state vector Δ**u**. The relative state vector Δ**μ**indicates the difference between the current vehicle pose, denoted by**μ****μ**_{t}= [**x**,]**ψ**_{t}with respect to that of the keyframe, denoted by**μ**_{kf}, such that Δ=**μ****μ**_{t}‒**μ**_{kf}; here,**x**= [*x,y,z*] denotes the 3D vehicle position anddenotes the 3D orientation with respect to the world frame;**ψ****R**∈ SO(3) and**t**∈ ℝ^{3}denotes the rotation matrix and translation vector of Δ, respectively;**μ**–

*π*^{‒1}∶ Π_{ns}×ℝ^{+}→ ℝ^{3}denotes the inverse mapping of 2D pixel coordinates to 3D world coordinates via an inverse distance represented by*d*. Here, and denotes the transformed 2D pixel coordinates. The intrinsic camera parameters are estimated during initialization and are denoted by*f*_{x},*f*_{y},*c*_{x},*c*_{y}and*ξ*;–

*d*_{kf}() denotes the inverse distance of the pixel coordinates in the keyframe.**u**

Direct image alignment accounts for the similarity in urban building infrastructure by preserving the spatial context of the image. We compute a semi-dense map that estimates the inverse-depth of key pixels in the keyframe by executing a short-temporal baseline matching across images. This vision measurement model is utilized later in our extended graph optimization to formulate the corresponding vision odometry-based component of the cost function.

#### 2.1.2 GPS module

In the GPS module, considering *N* visible satellites, we describe the GPS measurement model as

5

where **x** and **y**^{k} denotes the 3D position of the vehicle and *k*^{th} satellite, respectively; *cδt* and *cδt*^{k} represents the receiver clock bias and *k*^{th} satellite clock bias corrections, respectively; *η*^{k} represents the measurement noise related to *k*^{th} satellite.

We also formulate the measurement covariance of *k*^{th} satellite via the measured *C*/*N*_{0} values and the sky area detected via Equation. (3) in the vision pre-processing module. Note that the classification of the satellite as either LOS or NLOS depends on the unknown state vector of the vehicle and *k*^{th} satellite. Therefore, the measurement covariance of *k*^{th} satellite is given by

6

where

–

**x**_{t}denotes the vehicle state vector at*t*^{th}time instant comprising of 3D position, 3D velocity, clock bias, clock drift, and 3D attitude, respectively, such that**x**_{t}= [**x**,*cδt,*]**ψ**_{t};– denotes the state vector of

*i*^{th}satellite comprising of its 3D position, 3D velocity, clock bias, and clock drift corrections, such that ,*i*∈ {1,…,*N*};–

*b*^{k}and*a*^{k}are the vision coefficients, such that and when*p*(_{sky}*π*(**y**^{k})) >*η*and and otherwise;*η*is the pre-defined threshold explained in Equation (3);*b*and_{LOS}, b_{NLOS}, a_{LOS}*a*are constant pre-determined coefficients, which are set during initialization and explained later in Section 3.2, and_{NLOS}*π*(**y**^{k}) denotes the projection of the state vector of*k*^{th}satellite in the image frame.

### 2.2 Extended graph optimization

In our extended graph optimization module, our cost function consists of four error terms, namely GPS pseudoranges, non-sky pixel intensities, receiver motion model, and satellite orbital model, as follows:

7

where

–

**θ**_{t}denotes the overall state vector comprising of the state vector of the vehicle, GPS satellites, and key pixels in the world frame, given by and is unknown to be estimated via graph optimization;– Λ denotes the M-estimator function of weighted residuals, where M-estimator (Shepard & Humphreys, 2014) represents a robust regression method that accounts for data containing outliers, extreme observations, or that does not follow a normal distribution. Here, Λ(⋅) is a symmetric, positive-definite function with a unique minimum at zero. For instance, the conventional least-squares is a special case of M-estimator, where Λ(Δ

*z*) = (Δ*z*)^{2}. The choice of M-estimator for this work is given in Section 3.2;– denotes the estimated fault status of the received GPS pseudorange from

*k*^{th}satellite at the past time instant; similarly, denotes the estimated vision fault status of any 2D pixel∈ Π**u**_{kf}at the previous time instant; the GPS and vision fault status are estimated later in Section 2.3.1;–

*h*denotes the GPS measurement model obtained from Equation (5), such that ;denotes the motion model of the receiver and**g***f*denotes the satellite orbital model; and denote the estimated state vector of the vehicle and*k*^{th}satellite, respectively, at the previous time instant;**u**_{x,t}and denote the motion control inputs of the vehicle and*k*^{th}satellite, respectively; later in Section 3.2, we describe the choice of vehicle motion model and satellite orbital model used to validate our algorithm;– and denote the estimated covariance matrices of the vehicle state vector and

*k*^{th}satellite state vector. For simplicity, we estimate the state covariances via a statistical process known as propagation of uncertainty (Engel et al., 2014);– denotes the integrity vector comprising of protection levels associated with each term in the vehicle state vector

**x**_{t}. Details regarding estimating the covariances and protection levels are explained later in Section 2.3;– denotes the measurement covariance of the

*k*^{th}satellite and is estimated from Equation (6); similarly,*ω*_{t}() denotes the covariance associated with the intensity of non-sky pixel**u**and is estimated based on existing literature (Eckenhoff, Geneva, & Huang, 2019).**u**

The first three terms in the cost function **e**_{t} given in Equation (7) correspond to the residuals associated with the GPS pseudoranges, satellite ephemeris, and vehicle state vector. These terms are equivalent to the terms considered in our prior work on GPS-only SLAM-based FDI (Bhamidipati & Gao, 2018). The last term represents the summation of intensity residuals across non-sky pixels based on the vision measurement model explained in Equation (4). In particular, we perform sub-graph optimization at each instant, as seen in Equation (8), where the cost function is formulated using the past history of measurements.

8

where *T* denotes the number of time instants utilized in the sub-graph optimization thread and denotes our SLAM-based IM estimate of the overall state vector computed during the sub-graph optimization. We estimate the key pixels in the world frame, represented by , via inverse-mapping defined in Equation (4).

In addition to sub-graph optimization, which is executed at each instant, we also periodically perform full-graph optimization, which optimizes the entire graph and binds the overall drift errors in our 3D world map. In this work, when a new reference keyframe is chosen, we perform full-graph optimization to compute its corresponding semi-dense inverse depth map. This occurs at a slower update rate as compared to sub-graph optimization so as to reduce the overall computational complexity. We referred to Engel et al. (2014) for executing the full-graph optimization.

### 2.3 IM for multi-sensor Graph-SLAM framework

By expressing our extended graph optimization, explained earlier in Section 2.2, as a nonlinear weighted LS problem, we compute the protection levels of the estimated vehicle position using worst-case failure mode slope analysis (Joerger et al., 2014). For this, we refer to the existing rich literature (Joerger et al., 2014; Salós et al., 2013) on worst-case failure mode slope analysis for a linear LS estimator. The formulation of our extended graph optimization as a nonlinear WLS problem is shown later in Section 2.3.2. However, there are challenges involved in applying worst-case failure slope analysis for the protection-level computation of Graph-SLAM framework. The challenges and our design solutions are listed as follows:

The worst-case failure mode slope is derived for a linear measurement model, but the cost function associated with the Graph-SLAM is nonlinear. So, we linearize the formulation of the cost function at the estimated overall state vector.

The Graph-SLAM is a sequential method, whereas the worst-case failure mode slope falls under the category of snapshot IM technique. Therefore, we formulate a batch realization of the linearized Graph-SLAM that incorporates not only the measurements received at the current time instant, but also the past time history. This incorporates the temporal aspect of Graph-SLAM in computing the protection level.

Our Graph-SLAM framework consists of a large number of states and measurements. However, evaluating all possible measurement fault modes is computationally cumbersome. Therefore, using multiple FDI modules, we directly estimate one fault mode, which is a binary vector, such that an element of the fault mode is one if the corresponding measurement is flagged as faulty and zero if otherwise.

#### 2.3.1 Multiple FDI

Based on the estimated overall state vector from the extended graph optimization explained in Section 2.2, we independently compute the measurement residuals associated with GPS pseudoranges and non-sky pixel intensities. In our multiple FDI module, we evaluate the GPS residuals by analyzing them against the temporal correlation of their non-faulty error distribution and vision residuals using spatial correlation across image pixels.

**GPS faults**: To detect and isolate GPS faults in pseudoranges, we evaluate each residual against an empirical Normal distribution, which represents the measurement error distribution during non-faulty conditions. This is justified because the GPS measurements follow a Gaussian distribution during non-faulty conditions (Bhamidipati & Gao, 2018). We empirically compute the non-faulty distribution of GPS measurements during initialization, which is described later in Section 3.2. Thereafter, deviation of the measurement residual, denoted by Δ*ρ*^{k}, from the CDF of its empirical distribution, denoted by , is categorized as a fault and the corresponding fault status is computed in Equation (9). The formulation of fault status is explained in our prior work (Bhamidipati & Gao, 2019a).

9

**Vision faults**: Unlike GPS faults, vision faults caused by data association errors exhibit a high spatial correlation across image pixels and low temporal correlation. This is justified because the vision faults are localized to a group of neighboring pixels and are not isolated to a stand-alone pixel. We developed a superpixel-based piecewise RANSAC that performs spatial voting across pixel intensity residuals to detect and isolate vision faults. RANSAC (Conte & Doherty, 2009), a popular outlier detection method in image processing, estimates the optimal fitting parameters of a model via random sampling of data containing both inliers and outliers.

The steps involved in the superpixel-based piecewise RANSAC are described as follows: first, we segment the image into clusters, known as superpixels, based on the color similarity and space proximity between image pixels using superpixel segmentation (Li & Chen, 2015). During initialization, which is described later in Section 3.2, we pre-define the number of superpixels into which the non-sky pixels are segmented to be Γ. For each non-sky superpixel, we denote the pixel intensity vector as **I**^{j} ∀*j* ∈ {1,…,Γ}, which stacks the pixel intensities of each superpixel. We represent as a 2D plot the expected and received intensities, which are associated with the pixels of the current image and keyframe, respectively. The mapping between the key pixels and pixels of the current image was estimated earlier during extended graph optimization. Next, we estimate the fitted line using RANSAC that passes through the optimal set of inliers and, thereafter, compute the fraction of outliers in the superpixel, which is defined by . Then, utilizing the estimated model parameters of the fitted line, we evaluate the corresponding fraction of outliers at all the other non-sky superpixels, denoted by . Finally, the fault status at each superpixel is computed as the product of all the estimated outlier fractions, as seen in Equation (10), and the same fault status is assigned to all the pixels within that superpixel. This procedure is repeated for all the non-sky superpixels to compute the fault status of all the non-sky pixels in the keyframe. Our algorithm considers an underlying assumption that there is a sufficient number of superpixels to reach a common consensus. If the number of superpixels associated with non-sky pixels is less, such as in a open-sky setting, a pre-defined penalty is assigned to the vision fault status.

10

#### 2.3.2 Protection-level computation

In accordance with the design solutions listed earlier, we linearize the sub-graph optimization framework, seen in Equations (7) and (8), using the Taylor-series expansion (Shepard & Humphreys, 2014). Using measurement equations described in Equations (4) and (5), we derive the batch realization of linearized Graph-SLAM in Equation (11), which is based on the formulation of batch realization derived for a linear LS estimator in Joerger and Pervan (2013). Batch realization refers to stacking the measurement equations obtained across the desired time history.

11

where

– Δ

**z**_{T}denotes the overall measurement vector of subgraph optimization, such that Δ**z**_{T}= [Δ*z*_{t‒T},…,Δ*z*_{t}]^{⊤}where Δ*z*_{t}denotes the measurement vector at*t*^{th}time and concatenates the GPS pseudoranges, vehicle motion input, satellite ephemeris, and keyframe pixel intensities; similarly,**η**_{T}denotes the overall measurement noise;–

**C**_{T}denotes the linearized overall measurement model that represents the sub-graph optimization, such that**C**_{T}=*diag*{*C*_{t‒T},…,*C*_{t}}. Here,*C*_{t}denotes the measurement model at*t*^{th}time that vertically stacks the Jacobian associated with the GPS pseudoranges, denoted by*H*, vehicle motion model and satellite orbital model, denoted by*A*and non-sky pixel intensities, denoted by*B*, such that ;–

**f**_{T}denotes the overall fault vector associated with the overall measurement vector and, thereby, stacks measurement faults obtained from individual sensor sources and across time instants from*t*‒*T*to*t*.

Re-writing Equations (7) and (8), we express our M-estimator-based Graph-SLAM formulation that is solved via Levenberg-Marquardt (Ranganathan, 2004) algorithm, as a weighted nonlinear LS problem in Equation (12).

12

where

–

**K**_{T}denotes the overall estimation matrix, such that**K**_{T}=*diag*{*K*_{t‒T},…,*K*_{t}}. Here,*K*_{t}denotes the estimation matrix at*t*^{th}time, such that and*V*denotes the pseudo-inverse matrix, such that ;–

*β*denotes the damping factor of the LM algorithm that is set during initialization, which is described later in Section 3.2.–

*S*_{t},*R*_{t},*P*_{t}denotes the M-estimator-based weight functions for the GPS pseudoranges, motion models of vehicle and satellites, and non-sky pixel intensities, respectively, and evaluated at . Details regarding the weight function of an M-estimator are explained later in our prior work (Bhamidipati & Gao, 2018);

Next, we define the overall test statistic, denoted by *ζ*, as the summation of the weighted squared residuals across all the measurements. We consider an assumption that the overall test statistic is the chi-square distributed, denoted by under non-faulty conditions, and non-central chi-squared, denoted by , under the presence of GPS faults or vision faults or both.

13

with

14

where *k* denotes the number of redundant measurements, i.e., the difference between the number of overall measurements, denoted by *n* and overall states, denoted by *l*, such that *k* = *n*‒*l*. *λ* indicates the non-centrality parameter associated with the overall test statistic during faulty conditions.

According to the worst-case failure mode slope analysis (Salós et al., 2013), as seen in Figure 5, the protection level is calculated as the projection in the position domain of the measurement faults that would generate a non-centrality parameter *λ* = *λ*_{th} in the overall test statistic *ζ* with the maximum slope. In particular, the non-centrality parameter *λ*_{th} is estimated from the false-alarm, denoted by *p*_{FA} and mis-detection rates, denoted by *p*_{MD}, which are set according to the pre-defined integrity requirements.

We formulate the overall measurement fault mode, denoted by **b**_{T}, where **b**_{T} = [**b**_{t‒T},…,**b**_{t}]^{⊤}. Here, **b**_{t} is the measurement fault mode, which is estimated in Equation (15) using GPS and vision fault status obtained via Equations (9) and (10). For this, we consider a pre-defined fault threshold, denoted by *κ*, such that if the fault status is above *κ*, the measurement is flagged as faulty in the computation of protection levels. Given that we consider measurement faults in only GPS and vision, the fault status of receiver and satellite motion models is set to zero. However, the corresponding fault vector, which comprises of the exact measurement fault magnitudes, is still unknown. According to Salós et al. (2013), for a given fault mode, the worst-case fault direction that maximizes the integrity risk is the one that maximizes the failure mode slope, which is seen in Figure 5 and is denoted by **g**_{b}.

15

In Equation (16), we define the square of failure mode slope, denoted by , as the ratio of squared state estimation error in the vehicle position, which is denoted by * ε*, over the squared overall test statistic, which is denoted by

*ζ*. Using the linearized equations seen in Equations (11)-(13), we derive the failure slope for the sub-graph optimization framework in terms of the unknown fault vector. For this, we consider

**C**_{T}

**K**_{T}

**C**_{T}is negligible, which is a valid approximation after the iterative convergence of the graph optimization at any time instant since

*β*<< 1. Considering

*n*

_{b}to be the number of non-zero entries in the fault mode

**b**_{T}estimated via multiple FDI module, we define the fault matrix, denoted by

*B*

_{b}, as

*B*

_{b}= [I

_{nb×nb},

**O**

_{(n‒nb)×nb}]

^{T}, where I

_{nb×nb}is an identity matrix of size

*n*

_{b}×

*n*

_{b}and

**O**

_{(n‒nb)×nb}is a zero matrix of size (

*n*‒

*n*

_{b})×

*n*

_{b}. Note that I is different from the intensity vector I defined earlier in Equation (2). We re-arrange the rows of the

*m*

_{ε}and

*M*

_{ζ}matrices, such that the faulty measurements are stacked first. Thereafter, we define a transformed fault vector, denoted by

**f**_{ζ}, such that

**f**_{T}=

*B*

_{b}

*M*

_{ζ}

**f**_{ζ}. Based on the above-mentioned steps, we describe the failure slope formulation of Graph-SLAM framework in Equation (16).

16

where *α* extracts a state of interest from the overall state vector * θ*, such that

*α*

^{T}= [

**O**

_{1×(l‒1‒la}), 1,

**O**

_{1×la}] where

*l*

_{a}indicates the number of states before the state of interest.

*M*

_{ζ}denotes the residual matrix, such that and

*m*

_{ε}represents the state gain matrix, such that .

Referring to Joerger et al. (2014), for a given fault mode but unknown fault vector, the worst-case failure slope equals the maximum eigenvalue of the corresponding failure slope formulation. Therefore, we express the worst-case failure slope of the Graph-SLAM framework as

17

Next, we compute protection level for the state of interest considered in *α*^{T}, seen in Equation (18), as the y-coordinate that corresponds to the integrity metric *λ*_{th} along the line passing through the origin and with slope given by .

18

## 3 EXPERIMENT RESULTS

We analyze the performance of the proposed multi-sensor SLAM-based IM algorithm that utilizes a GPS receiver and a fish-eye camera. In particular, we validate the improved localization accuracy and tighter protection levels obtained via the proposed multi-sensor SLAM-based IM as compared to GPS-only SLAM-based IM.

### 3.1 Experimental Setup

We utilize a ground vehicle to perform two real-world experiments: one in the alleyway of Stanford, California and the other in the semi-urban area of Champaign, Illinois. Our experimental setup comprises of a UBLOX NEO-M8U GPS receiver that is connected to a roof-mounted patch antenna. We also have an upward-facing ELP USB camera with 8 mega pixel resolution and fitted with a 180° FOV fish-eye lens. In addition, we utilize the navigation solution estimated by a high-fidelity RTK-GNSS Trimble BD940-INS receiver equipped with a Zephyr antenna as the ground truth reference. We utilize the Robot Operating System (ROS) to publish the measurements obtained from the GPS receiver and fish-eye camera. We record the data in rosbag files, which are later post-processed to validate the proposed algorithm. We chose ROS as our data collection platform because it provides an easy-to-implement interface to obtain measurements from varied sensors and synchronize them to a common UNIX epoch time. The frame transformations between the body frame of GPS receiver, fish-eye camera, and ground truth is measured before the start of the experiment and is given as a user-defined input during the initialization of our SLAM-based IM algorithm.

### 3.2 Implementation details

Before we post-process the experimental data, we refer to the existing work (Scaramuzza, Martinelli, & Siegwart, 2006) on omni-directional cameras for executing the calibration procedure. In particular, we capture a multiple images of the checkerboard taken from different viewpoints, to estimate the intrinsic parameters of the fish-eye camera, namely *f*_{x},*f*_{y},*c*_{x},*c*_{y}, *ξ*, which are defined earlier in Equation (4). These intrinsic parameters map the image plane to a 3D camera frame, where the origin of the camera frame is at its optical center, *x* and *y* axes are aligned with the image plane. This process also undoes the distortion caused in the images because of its wide FOV.

As explained earlier in Section 2, we perform initialization in two stages: one in open-sky and the other in an urban setting. In the first stage of open-sky initialization, we create a 3D world map using the PVT of receiver and GPS satellites, which are estimated via weighted LS (Salós et al., 2013). Here, world frame represents the Earth-Centered Earth-Fixed (ECEF) frame in which the GPS operates. We also empirically analyze the pseudorange residuals to independently compute the non-faulty Gaussian distribution of all visible GPS satellite signals. We estimate the vehicle motion inputs by added simulated measurement noise to the ground truth velocity measurements. This is justified because, as explained earlier, in this work, we consider no faults/anomalies in the vehicle motion model and satellite ephemeris. However, for readers interested in accounting for faults in the vehicle motion model, the motion inputs can be extracted from an IMU sensor and the fault mode vector **b**_{t}, described in Equation (15), can be modified accordingly. The second stage of initialization is executed after the vehicle enters the semi-urban/urban surroundings. We process multiple instances of short-temporal baseline images to select a reference keyframe. Based on the keyframe, we compute the extrinsic parameters, i.e., frame transformation from the camera to world frame.

We utilize a combination of Huber M-estimator (Huber, 2004) and Tukey bisquare M-estimator (Hoaglin, 2003) to formulate the cost function. The justification regarding our choice of M-estimator is explained in our prior work (Bhamidipati & Gao, 2018). During initialization and when the total number of landmarks are ≤ 15, we utilize Huber M-estimator to ensure convergence and in all the other conditions, we opt for a bisquare M-estimator to achieve better accuracy. Detailed explanation regarding the M-estimator equations and the corresponding weight functions are discussed in our prior work (Bhamidipati & Gao, 2018). Based on heuristic analysis, we set the parameters of different modules in the proposed multi-sensor SLAM-based IM as follows: vision coefficients as *b _{LOS}* = 2,

*b*= 3,

_{NLOS}*a*= 5 and

_{LOS}*a*= 50, number of superpixels as Γ = 50, damping factor of LM algorithm as

_{NLOS}*β*= 0.02. To minimize mis-detection rates, we set a conservative value of

*η*= 0.7 for the pre-defined sky threshold described in Equation (3). Also, the fault status threshold of our multi-sensor SLAM-based IM is set as

*κ*= 0.65.

### 3.3 Experiment-1: Alleyway in Stanford, California

The first set of experiments are conducted in an alleyway in Stanford, California where the ground vehicle drove along the street for a duration of 45 s. As seen in Figure 6, the entire duration of the experiment can be divided into three urban contexts: first is the magenta highlighted region during *t* < 8 s that represents relatively open-sky conditions, second is the red highlighted region between *t* = 8‒25 s that represents the region where the ground vehicle passes from under an overhead bridge, and third is the blue highlighted region for *t* > 25 s that indicates a semi-urban setting with feature-rich visual surroundings. The aim of these experiments is to analyze the convergence trend of the proposed multi-sensor SLAM-based IM algorithm and its performance in the presence of different urban challenges.

In Figure 7, we demonstrate an improved performance of the proposed algorithm that utilizes the GPS and fish-eye camera. Our first and second stage of initialization, explained earlier in Section 3.2, is carried out till *t* = 2 s. We observe that the weighted LS approach using GPS-only shows large localization errors both when the ground vehicle passes from under the overhead bridge and also when surrounded by tall buildings. Unlike the LS approach that shows a mean of 19.51 m and standard deviation of 9.12 m, the proposed multi-sensor SLAM-based IM algorithm demonstrates a low mean of 6.14 m and standard deviation of 0.97 m. We also demonstrate robust protection levels that accurately bind the localization errors for the entire duration. The convergence of our protection levels is achieved at around *t* ≈ 7 s. The size of the protection level shows a low mean of 6.9 m and standard deviation of 2.44 m. Here, the size of the protection level is indicated in Figure 7 by the gap between the black line, which denotes the protection level and blue line, which denotes the localization error of SLAM-based IM.

### 3.4 Experiment-2: Semi-urban area of Champaign, Illinois

The second set of experiments are conducted in the semi-urban area of Champaign, Illinois for a duration of 100 s. During *t* = 0‒70 s, the ground vehicle operates in open-sky conditions, thereby experiencing no GPS faults but less visual features. In Figure 8, the blue highlighted region suffers from vision challenges, namely illumination variations due to sunlight and shadows, that cause data association errors across images. Similarly, the red highlighted region is enclosed with tall buildings that lead to multipath effects in the GPS measurements. For instance, at *t* = 78 s, we showcase the true overlap of the GPS satellite positions over the fish-eye image, where three out of the seven visible satellites are likely to be affected by multipath.

Figure 9 shows the average fault status of GPS pseudoranges and vision superpixels, as indicated in red and blue, respectively. Given that the ground vehicle navigates in open-sky conditions for *t* < 70 s, the average GPS fault status estimated via our multiple FDI module is low, whereas the average vision fault status is high due to the feature-less surroundings. As the vehicle passes through the red highlighted region shown in Figure 8 that represents the semi-urban area, the average fault status of vision is low but that of the GPS increases due to multipath. Next, we further analyze the performance of our multiple FDI modules in the challenging semi-urban area, i.e., for *t* > 70 s during which the ground vehicle experiences GPS and vision faults. Figure 10(a) plots that the individual GPS fault status of three out of the seven visible satellites with PRNs 6, 12, and 2. In accordance with the skyplot shown in Figure 8, our proposed SLAM-based IM algorithm successfully flags the satellites with PRN 6 and 12 as faulty while accurately estimating the high-elevation satellite with PRN 2 as non-faulty. A key point to note is that we consider fault status to simply indicate the confidence in the reliability of the landmark-measurement pair. Based on our earlier discussion related to Figure 1, the fault status doesn’t flag the measurement based on whether its a reflected signal or not, but on the relevance of information provided by the measurement. During the same duration, we also analyze the vision fault status associated with the superpixels. In Figure 10(b), at each time instant, we plot the top four fault statuses of the superpixels, such that each marker represents a superpixel. We observe that in the urban region, the value of the associated vision fault status decreases due to feature-rich tall buildings in urban areas. However, when the vehicle enters the blue highlighted region seen in Figure 10(b), the illumination variations induced by the bright sunlight causes the fault status associated with certain superpixels to show an increasing trend.

We demonstrate the improved performance of the SLAM-based IM algorithm that utilizes GPS and fish-eye camera seen in Figure 11(a), as compared to the SLAM-based IM algorithm that utilizes GPS-only seen in Figure 11(b). By utilizing GPS and fish-eye camera, we demonstrate higher localization accuracy, with a mean of 8.8 m and standard deviation of 1.73 m, as compared to employing GPS-only that shows a mean of 16.2 m and standard deviation of 2.86 m. We also validate that a lower mean size of protection levels are estimated using GPS and fish-eye camera, i.e., 6.5 m than using GPS-only, i.e., 10.5 m thereby, achieving tighter protection levels. Here, the size of the protection level denotes the gap between the black line, which indicates the protection level, and blue line, which indicates the localization error associated with SLAM-based IM.

In Figure 11(a), we also compare the localization accuracy of the proposed multi-sensor SLAM-based IM, indicated by the blue line, with that of multi-sensor SLAM-based IM that excludes GPS satellites as landmarks, indicated by the green line. To perform this analysis, instead of considering the GPS satellite state vector **y**^{k} as unknown in the overall state vector * θ*, we consider this to be known information that is computed via satellite ephemeris. We observe that by excluding GPS satellites as landmarks, we achieve localization errors that are better than GPS-only Graph-SLAM but equal or worse than multi-sensor Graph-SLAM. This is justified because, as explained earlier in Section 1.1, not considering GPS satellites as landmarks potentially lowers the available measurement redundancy in the presence of multipath effects.

## 4 CONCLUSIONS

We proposed a Simultaneous Localization and Mapping (SLAM)-based Integrity Monitoring (IM) algorithm using a GPS and fish-eye camera that estimates the protection levels of the Graph-SLAM framework while accounting for multiple faults in GPS and vision. We developed a hybrid sky detection algorithm to distinguish the non-sky and sky pixels, which are later used in graph optimization and GPS measurement covariance, respectively. By utilizing the GPS pseudoranges, non-sky pixel intensities and receiver and satellite motion models, we performed graph optimization via the M-estimator-based Levenberg Marquardt algorithm. We simultaneously estimated the state vector of the vehicle, GPS satellites, and key image pixels in the world frame. We estimated the fault mode vector by independently evaluating the measurement residuals against an empirical Normal distribution for GPS faults and using our developed superpixel-based piecewise RANSAC for vision faults. We computed the protection levels via the worst-case failure slope analysis that estimates the maximum eigenvalue associated with the failure slope formulation for the batch realization of the linearized Graph-SLAM framework.

We conducted two real-world experiments using a ground vehicle: one in an alleyway in Stanford, California and the other in a semi-urban area in Champaign, Illinois. In the first set of experiments, we demonstrated the robust performance of the proposed SLAM-based IM algorithm under different urban conditions, namely open-sky, overhead bridge, and dense building infrastructure. In the second set of experiments, we successfully detected and isolated multiple measurement faults in GPS and vision. We demonstrated higher localization accuracy using our proposed algorithm with a mean of 8.8 m and standard deviation of 1.73 m, as compared to GPS-only that shows a mean of 16.2 m and standard deviation of 2.86 m. We also validated that the mean size of protection levels are estimated using GPS and fish-eye camera, i.e., 6.5 m is lower than using GPS-only, i.e., 10.5 m.

## HOW TO CITE THIS ARTICLE

Bhamidipati S, Gao GX. Integrity monitoring of Graph-SLAM using GPS and fish-eye camera. *NAVIGATION*.2020;67: 583–600. https://doi.org/10.1002/navi.381

## Footnotes

**Funding information**National Science Foundation, Grant/Award Number: 2006162

- Received October 7, 2019.
- Revision received March 1, 2020.
- Accepted March 27, 2020.

- © 2020 Institute of Navigation

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.