Abstract
Smartphone positioning based on global navigation satellite systems is crucial for various applications, including navigation, emergency response, and augmented and virtual reality. Despite significant advancements, constraints on size, weight, power consumption, and cost still pose challenges, leading to degraded accuracy in challenging urban settings. To improve smartphone positioning accuracy, we introduce a novel framework that deeply couples a graph neural network (GNN) with a learnable backpropagation Kalman filter. This hybrid approach combines the strengths of both model-based and data-driven methods, enhancing adaptability in complex urban settings. We further augment the measurement modeling capabilities of the GNN with extended features, a novel edge creation technique, and an inductive graph learning framework. Additionally, we implement a unique backpropagation strategy that uses real-time positioning corrections to refine the performance of both the GNN and the learned Kalman filter. We validate our algorithm on real-world data sets collected via smartphone receivers in urban environments and demonstrate improved performance over existing model-based and learning-based approaches.
1 INTRODUCTION
Improving global navigation satellite system (GNSS) smartphone accuracy is crucial for enhancing location-based services such as navigation, emergency response, and tracking applications. Improved accuracy can also revolutionize user experiences for emerging industries such as augmented- and virtual-reality-based gaming environments. Although smartphone receivers have made great strides in terms of accuracy and reliability, achieving high-precision positioning still presents challenges.
Under multipath conditions, phone receivers can currently only offer 3–5 m of positioning accuracy, which is degraded under harsh environments (Zangenehnejad & Gao, 2021). Because of the low quality of chipset and antenna technology in smartphones, the received signals have higher noise than traditional GNSS receivers, leading to an increased number of outliers in measurements (Li & Geng, 2019).
To overcome these challenges, recent efforts have been reported from multiple industries. For example, Google has conducted the Smartphone Decimeter Challenge (GSDC) for the last three years to encourage the development of algorithms that can provide decimeter-level accuracy in urban environments (Fu et al., 2020). Some smartphone manufacturers have begun to include dual-frequency pseudorange and carrier phase signal capabilities (L1 and L5) in their receivers (Jahn et al., 2019; Yong et al., 2021). Additionally, semiconductor companies have increased efforts to fuse 5G-based positioning with GNSSs to provide high accuracies in both urban areas and indoor settings (Rubin, 2021).
Given the recent impetus in the industry to improve smartphone positioning accuracy, many works have proposed post-processing algorithms that can improve accuracy to the decimeter level without being limited by size, cost, weight, or power constraints. While there are many approaches for integrating GNSSs with complementary sources of information such as three-dimensional (3D) maps, Wi-Fi signals, and cellular data, we consider those that use only GNSS measurements. These works can be divided into two categories: model-based methods and data-driven methods.
Common choices of model-based methods include precise point positioning (PPP), real-time kinematics (RTK), Kalman filters, and factor graphs. Fortunato et al. (2019) achieved high accuracy in dynamic applications by integrating PPP and RTK methods, utilizing multi-frequency measurements from multiple smartphones. In their study, Geng et al. (2019) demonstrated the potential of Hatch filters to attain sub-meter positioning accuracy, leveraging only Android raw GNSS measurements without requiring external correction data. The winner of the GSDC for the past two years designed a factor graph optimization method using corrected pseudorange observations from GNSS reference stations as constraints, achieving nearly meter-level positioning accuracy (Suzuki, 2021). Despite these advances, the key limitation of model-based approaches is that they can achieve high positioning accuracy only in certain environments because of the strong assumptions imposed on the noise models and the specifically tuned hyperparameters.
In contrast, data-driven methods can be adapted to different environments and GNSS signal conditions. These methods can be designed to be robust to multipath and signal attenuation and can be scaled to large data sets. Siemuri et al. (2021) investigated various machine learning algorithms, including linear regression, Bayesian ridge regression, and artificial neural networks, to predict positioning corrections for GSDC data sets. Xia et al. (2020) employed a recurrent neural network to identify complex urban scenarios and enhance positioning performance through scenario recognition. Zhang & Masoud (2021) employed reinforcement learning to develop an optimal strategy for making Global Positioning System corrections, utilizing an efficient confidence-based reward mechanism. Kanhere et al. (2022) explored an application of the transformer architecture, an increasingly prominent deep learning tool, to improve navigation accuracy in urban settings. Although these learning-based methods showed promising results, their accuracy on unseen data sets was often lower than that of model-based methods because of overfitting on the training data.
Instead of relying on either model-based or learning-based methods alone, we can combine these two approaches to improve accuracy, robustness, and adaptability. A few works in the literature have explored such a design. Zhang et al. (2018) developed a novel deep reinforcement learning (DRL)-based framework to improve GNSS positioning corrections by fusing vehicle trajectory information with GNSS measurements. Their approach employed an actor–critic DRL structure with a cumulative reward setting to learn an optimal positioning correction strategy. Another work augmented the geometry matrix with data-driven components to improve positioning performance in scenarios that deviate from the modeling assumptions (Gupta et al., 2022). In our prior work, we proposed a hybrid framework using a graph neural network (GNN) and a learnable backpropagation Kalman filter (BKF) to learn positioning corrections (Mohanty & Gao, 2022). While the GNN learned position corrections by aggregating measurement residuals from all satellites, the Kalman filter provided better initialization and helped the GNN learn only finer position corrections. In the GNN, we represented different nodes using satellite positions and constructed edges by utilizing known signal properties of multi-constellation and multi-frequency measurements. Although our prior work showed improvements over purely model-based and learning-based methods, there were three key limitations. First, the GNN could not adapt to dynamic graph structures, which is crucial in urban environments, as the receiver may fail to track certain satellites. Second, the Kalman filter had to be manually tuned to different underlying measurement noise distributions and was not adaptive to data sets in diverse urban environments. Lastly, the GNN and Kalman filter were optimized separately; thus, the GNN accuracy had an upper bound based on how well we could tune the Kalman filter.
In this work, we propose a framework to further improve existing hybrid smartphone positioning approaches by deeply coupling a GNN with a learned Kalman filter. This approach addresses the three key limitations of our prior work. To address the limitation of extending our prior GNN to dynamic graphs, we extend the feature set of the GNN to provide more information about the GNSS measurements, propose a novel edge creation technique, and utilize an inductive graph learning method. Extending the feature set enhances the adaptability of the GNN to changing graph structures, improving its robustness to real-world, dynamic data. The novel edge creation technique helps the GNN discover hidden relationships and patterns from GNSS measurements. Additionally, the inductive learning method aids the GNN in generalizing patterns from training data to new and unforeseen data sets. Generalization capability is crucial when we lack precise knowledge about future graph structures. To tackle the second limitation of manually tuning the Kalman filter, we introduce a learnable Kalman filter that updates its state estimates and parameters through the use of backpropagation. This approach optimizes both the state and parameters based on incoming data. As a result, the filter continually refines its predictions, ensuring improved position estimates across diverse urban environments and varying measurement errors. By jointly training both components, we address the third limitation of individual tuning for the GNN and the Kalman filter. This approach establishes a continuous feedback loop and updates the parameters of both components to improve future state predictions.
Our paper is based on our recent Institute of Navigation GNSS+ 2023 conference submission (Mohanty & Gao, 2023). The key contributions of our paper are as follows:
We propose a hybrid framework to learn the position of a smartphone receiver from GNSS measurements by deeply coupling a GNN and a learned Kalman filter.
We increase the measurement modeling capabilities of the GNN by using an extended feature set, designing a novel method for edge creation based on similarity scores, and applying inductive learning to train the GNN.
We design a strategy that uses the learned position to backpropagate positioning corrections and improve the performance of both the learned Kalman filter and the GNN.
We validate our proposed algorithm on multiple real-world GSDC data sets.
2 PROPOSED TIGHT COUPLING FRAMEWORK WITH A GNN AND LEARNED BKF
Figure 1 presents an overview of our proposed approach. Our framework integrates a GNN and BKF to estimate a receiver’s position with high accuracy in multiple environments. Because the BKF has learnable state parameters, our model can continually refine its state estimates through backpropagation. The merit of our design lies in its deep fusion of the spatial awareness provided by GNNs with the temporal insight offered by Kalman filters. GNNs excel at understanding complex spatial relationships, making them particularly effective for handling GNSS data represented as a graph, whereas Kalman filters are commonly used for their real-time state estimation capabilities based on dynamic system behavior. By integrating these two approaches, our framework provides high accuracy in multiple situations, such as urban settings and open-sky conditions. In our framework, instead of using a traditional Kalman filter, we design a learned Kalman filter in which the filter parameters can be updated iteratively via a mean squared error (MSE) loss function. Gradients of the loss are passed to both the GNN and the learnable parameters of the Kalman filter, enabling joint optimization that improves future state estimates.
Similar to KalmanNet (Revach et al., 2021), we use the traditional predict– update cycle inherent to Kalman filtering, enhancing this process with neural network integration. This approach incorporates a learning paradigm that leverages data to refine estimation accuracy—a natural consequence of the neural network’s capacity to approximate nonlinear dynamics. Like KalmanNet, which utilizes ground truth data to learn the Kalman gain, our framework uses ground truth during training to learn the true positional corrections, thereby optimizing the GNN parameters.
The key difference is that our architecture integrates GNN layers to ensure that the learnable Kalman filter is capable of handling graph-structured data, whereas the data used to train the KalmanNet are regular and may not be graph-structured. Whereas KalmanNet relies on a recurrent neural network module to address the challenges of partially known dynamics, our architecture uses the BKF as a smoothing tool to iteratively refine the GNN parameters. Rather than learning directly from the data, our framework uses the gradients from the loss function to iteratively adjust the GNN parameters. This iterative refinement leads to progressively more accurate state estimations from the Kalman filter, benefiting from the ability of the GNN to improve positioning corrections.
It is important to note that this paper does not aim to learn the covariance, unlike the KalmanNet approach, where the covariance is also learned from the neural network. Our current focus is on learning the positional corrections, and we leave the exploration of covariance learning for future work. Furthermore, whereas the KalmanNet design is geared towards training neural network approximations of the Kalman gain, our framework can be modified to integrate gradient-based learning of other components within the framework.
We describe each module in detail below.
2.1 Feature Preprocessing
Given Android GNSS measurements, we first select satellites by thresholding the carrier frequency of the tracked signal and the elevation angle, thus eliminating unreliable signals. Then, we reduce the noise in pseudorange measurements by using carrier phase smoothing with a Hatch filter after accounting for various error sources, such as clock biases and ionospheric and tropospheric delays. We use a window of two measurements from the accumulated delta range measurement field of the Android application programming interface. In the event of a cycle slip, we use Doppler values to smooth the code phase measurements. A bitwise operation recorded from the receiver indicates the presence of a cycle slip.
We then use weighted least squares to estimate the user’s position and velocity based on the corrected pseudorange and pseudorange rate. The position and velocity are used to initialize the Kalman filter and the GNN for the first time step of the training phase. We then use this initialization to compute the features for the GNN. Note that feature preprocessing is performed at each time step for each new set of GNSS measurements.
2.2 Learned Kalman Filter
In machine learning, backpropagation is an optimization algorithm that minimizes the error in model predictions. The principle behind backpropagation is that the error is computed at the output and distributed back through the network layers, allowing the weight of each neuron to be adjusted to reduce the overall error.
For a neural network function f(x; w) with weights w and input x, the objective is to minimize the loss function . The update rule for weights, using the gradient descent, is as follows:
1
where η is the learning rate.
The Kalman filter is a recursive state estimation algorithm, introduced in the 1960s by Rudolf E. Kálmán to solve the problem of linear filtering and prediction. The Kalman filter operates in two main steps: prediction and update. The filter estimates a state xt from a system model and then refines this estimate based on an observation zt.
Here, we are given a state transition model xt = Axt–1 + But + wt and an observation model zt = Hxt + vt
where:
xt and xt–1 are the states at time t – 1 and t, respectively.
A is the state transition matrix.
B is the control matrix.
ut is the control input.
H is the observation matrix.
wt and vt are process and observation noises, which are assumed to be Gaussian.
The two main steps are as follows:
Prediction:
2
3
where represents the predicted state at time t based on all information up to time t – 1. The matrix A is the state transition matrix, B is the control input matrix, and ut is the control vector. The term Pt|t–1 is the predicted error covariance, and Q is the process noise covariance matrix.
Update:
4
5
6
where Kt is the Kalman gain, calculated based on the predicted error covariance Pt|t–1 and the measurement matrix H. The term R is the measurement noise covariance. The updated state estimate is calculated from the predicted state , the Kalman gain Kt, and the actual measurement zt. Finally, the updated error covariance Pt|t is calculated by using the Kalman gain and the predicted error covariance.
Haarnoja et al. (2016) first explored the idea of incorporating backpropagation in Kalman filters, but their focus was on vision-based state estimation tasks rather than GNSS positioning applications. This hybrid approach, commonly referred to as the BKF, offers multiple advantages. The combination allows the Kalman filter to be more adaptive. Traditional Kalman filters rely on fixed system and measurement noise covariances. However, with backpropagation, these parameters can be learned and adapted from data, rendering the filter more responsive to environments for which the measurement noise models are unknown a priori. Lastly, the integration of backpropagation provides a way for the Kalman filter to continuously refine its state estimates.
We initialize the BKF with parameters for the state dimension, process noise, and measurement noise. During the prediction step, the state is updated based on the provided system dynamics, captured by the matrices A (state transition matrix) and B (control input matrix), and the control vector u. The state covariance matrix, P, is also updated accordingly, as shown in Equation (6). During the measurement update, the Kalman filter takes the measurement vector from the GNN layers as its observation, as shown in Figure 2. Based on this observation, the filter updates its internal state and uncertainty, both of which are represented as learnable parameters in PyTorch. As shown in Equation (4), the Kalman gain is computed and used to update the state estimate and its covariance matrix. Here, the measurement model H is set to be the identity matrix. By making the state vector a learnable parameter, our BKF integrates seamlessly with the gradient-based optimization algorithms, making our approach suitable for end-to-end training.
2.3 Incorporating Measurement Updates from the GNN
The GNN module takes processed GNSS features as inputs and transforms them into a lower-dimensional representation suitable for graph-based learning. The transformed data pass through several GNN layers designed to capture spatial correlations between the satellites. GNNs have emerged as a powerful tool for learning graph representations. We provide a brief background on GNNs here, but more details can be found in the original GNN paper (Scarselli et al., 2009) and a recent survey paper (Wu et al., 2020).
2.3.1 Background
GNNs operate on the premise of recursive neighborhood aggregation. Each node gathers information from its neighbors, processes it, and uses it to update its own state. GNNs offer several advantages over other deep learning paradigms. Traditional neural networks are not designed to handle graph data natively, whereas GNNs are specifically tailored for graphs. GNNs can operate on non-Euclidean data structures without the need for any transformation or flattening. GNNs are adept at learning and representing relationships between entities. Some GNN architectures can generalize to unseen data and generate embeddings for previously unseen nodes or entire subgraphs. GNNs can also capture local neighborhood and global graph-level information with a stack of layers. The depth of a GNN determines how far information propagates, allowing nodes to gather insights from distant parts of the graph.
The GNN update rule is given by the following:
7
where:
u, v are nodes in the graph.
is the feature vector of node v at the (l + 1)-th GNN layer.
σ is an activation function, e.g., rectified linear unit (ReLU).
Wl is a trainable weight matrix for the l-th layer.
The AGGREGATE function can be as simple as a mean or max function or as complex as an attention mechanism.
is the feature vector of node u at the l-th GNN layer.
Below, we explain how we improve the measurement modeling capabilities of the GNN by utilizing extended feature sets, novel edge creation techniques, and an inductive graph learning framework.
2.3.2 Extended Feature Set
The selection of features for nodes in the GNN is a critical aspect of the modeling process. Because GNSS measurements are noisy, utilizing the measurements directly may degrade the predictive performance of the GNN. However, by carefully choosing the most relevant features, we can improve the ability of the GNN to discern patterns and relationships within the data, leading to more accurate position correction predictions. Effective feature selection filters out noise, reduces dimensionality, and focuses the attention of the GNN on only features directly impacting the position corrections. For the GNN, we design the following features, as illustrated in Figure 3:
C/N0 indicates the quality of the received signal. A higher C/N0 implies a stronger signal strength and a lower likelihood of interference or multipath errors, providing the GNN with a metric for evaluating signal reliability.
The line-of-sight (LOS) vector gives directional information from the receiver to each satellite, allowing the GNN to understand spatial configurations. Discrepancies between expected and observed ranges can indicate errors or anomalies.
Pseudorange uncertainty provides the GNN with a degree of confidence in the range measurements.
The residuals, or the differences between the measured and expected pseudorange measurements, provide insights into potential errors in measurements or the positioning estimate. The expected range between the satellite and the user is given by the magnitude of their position difference.
2.3.3 Novel Edge Creation Technique
For constructing edges in the GNN, we utilize two design strategies. First, we group satellites based on whether they belong to the same constellation. Second, we adopt the cosine similarity metric to construct additional edges based on the satellite features at the first step, i.e., before the features are transformed into embeddings through the GNN aggregation and fully connected layers. Cosine similarity is fundamentally based on the angle between vectors, irrespective of their magnitude. When constructing a graph from satellite measurements, the magnitude of the vectors (e.g., signal strength) might vary considerably because of multiple factors such as atmospheric disturbances, multipath effects, or satellite clock errors. However, by focusing on the angle, the cosine similarity captures the inherent directionality or pattern in the measurements without being affected by the measurement scale. GNSS measurements share spatial errors; thus, the cosine similarity is apt for identifying and grouping measurements with shared error sources. This approach helps in modeling and correcting these errors within the graph structure.
Given satellite embeddings si and sj, the cosine similarity is mathematically expressed as follows:
8
where · denotes the dot product and ||·|| represents the vector norm. If the similarity surpasses an empirically determined threshold, an edge is generated between the two nodes i and j. This choice of similarity metric leads to a sparser graph, which helps reduce the computational overhead of the GNN.
2.3.4 Inductive Graph Learning
When designing the learning framework for the GNN, we have two key choices: a transductive setting or an inductive setting. Transductive learning makes predictions only for data points that were available during training. It does not generalize to new, unseen data points. Instead, transductive learning aims to optimize predictions specifically for the known instances in the training set. Transductive GNNs make predictions for nodes or data within the existing graph and do not generalize to new data points outside of it. In contrast, in inductive learning, a model is trained on a labeled data set and then generalizes its learned patterns to make predictions on new, previously unseen data points. Inductive learning focuses on learning underlying patterns in the data and can adapt to novel instances beyond the training set. Inductive GNNs can predict outcomes for new nodes or graphs based on their learned understanding of the data structure.
To improve urban GNSS positioning accuracy, it is crucial to use inductive graph learning frameworks for several reasons. GNSS measurements can exhibit considerable variation in urban environments because of factors like tall buildings, signal obstructions, and dynamic surroundings. Because inductive GNNs excel in generalizing patterns learned from training data to previously unseen or dynamically changing data points, we can utilize this ability to predict position corrections in diverse data conditions. Inductive GNNs can also adapt to changing graph structures, accommodating new nodes and edges and thus capturing the loss of tracked satellites. By design, inductive GNNs are more robust against overfitting to specific training data. This characteristic is valuable when dealing with the inherent variability and unpredictability of urban GNSS data, reducing the risk of model performance degradation in real-world conditions.
Among different GNN architectures, we use GraphSAGE in this work, as it offers inductive learning capabilities (Hamilton et al., 2017). Given a fixed set of graphs, GraphSAGE can generate embeddings for previously unseen nodes during the training process. This property is especially useful for dynamic graphs, where nodes might be added over time. GraphSAGE introduces a novel neighbor sampling method. Because of the vast number of neighbors a node might have in large-scale graphs, it is computationally challenging to consider all neighbors. Instead, GraphSAGE samples a fixed-size set of neighbors at each depth level. This sampling technique allows the method to be scalable.
We briefly describe how GraphSAGE updates node embeddings for the entire graph given a fixed set of graph nodes with features.
Given a node v, GraphSAGE samples neighbors , a set of neighbors from the k-th layer.
GraphSAGE then aggregates the features of these neighbors:
9
where is the aggregated feature vector for node v at the k-th layer and AGGREGATEk is the aggregation function used at the k-th layer. This function combines a set of feature vectors into a single feature vector. is the feature vector for node u from the previous layer (k – 1), and indicates that the aggregation is performed over all neighboring nodes u of node v at the k-th layer.
Lastly, the aggregated features are combined with the current node features:
10
where is the feature vector of node v at layer k, updated via a nonlinear activation function σ. The term Wk is the weight matrix for layer k, and is the concatenation of the feature vector of node v from the previous layer k – 1 with the aggregated feature vector at the current layer. The dot product between the weight matrix Wk and the concatenated feature vector is then passed through the activation function σ to produce the updated feature vector . This process is repeated for several layers (hops) to generate the final embedding for the node.
Given several choices of aggregator functions, this work focuses on the mean aggregator, which takes the average of the feature vectors of the neighbors. Mathematically, we have the following:
11
where the notations have the same meaning as described for Equation (7).
In our GNN, we use a sequence of multiple convolution layers within the GraphSAGE learning framework that deepen the hierarchical feature capture, offering a richer representation of the graph. After the node embeddings are updated via the GNN layers, we standardize the features using the one-dimensional batch normalization layer. This step accelerates the model’s learning by ensuring a consistent feature scale and enhancing its generalization capability. In addition to the GraphSAGE convolution layers, we apply fully connected layers to increase the modeling capabilities of the GNN and to transform the GNN-extracted features into a 3D positioning correction.
Our design of the GNN layers allows flexibility in experimenting with other choices of graph aggregation functions. Exploring residual connections that can help stabilize training and circumvent the vanishing gradient problem is left for future work.
3 EXPERIMENTAL SETUP
Data Sets
We train and evaluate our algorithm on the GSDC (Fu et al., 2020) data sets. The data sets contain raw GNSS measurements from multiple smartphones on trajectories that were collected in different cities. The data sets also contain high-quality ground truth positions from a SPAN GNSS–inertial system. For benchmarking purposes, we replicate the experimental setup of Kanhere et al. (2022) and Mohanty & Gao (2022) by using the same number of training data sets and testing in the same cities. The test data sets comprise multiple data sets collected in the cities of San Jose and Sunnyvale, California, as shown in Table 1. To test the generalization capabilities of our approach, we also perform additional validation on data sets shown in Table 2 that were collected in Los Angeles, California.
We implement our proposed architecture using the Python-based machine learning library PyTorch (Paszke et al., 2019). We also utilize PyTorch Geometric (Fey & Lenssen, 2019), the geometric deep learning extension library for PyTorch, which is commonly used for graph-based learning. Our architecture’s preprocessing and training are performed by using high-performance graphics processing units and tensor processing units in Google Colab. At runtime, in each iteration, our model takes in preprocessed features and edge information to predict a corrected position. Our GNN preprocessing supports dynamic edge creation, whereby the edges are dynamically created based on feature similarity at runtime, using available GNSS satellites. After the BKF outputs the predicted position, it is then compared with the true position to calculate an MSE loss. The model parameters are updated based on this loss to improve future predictions.
We list key experimental parameters for the learnable BKF and the GNN module in Tables 3 and 4. We also list the different layers of the GNN architecture in Table 5. The architecture begins with a linear layer that transforms the feature space from 6 to 128 dimensions to match the hidden dimension of the GNN. This is followed by a series of SAGEConv layers that capture complex spatial relationships among the nodes. Additionally, a BatchNorm layer normalizes the features, improving the training stability and performance of the model. Finally, an output linear layer condenses the 128-dimensional data back down to 3 dimensions.
Baselines
We compare our proposed algorithm against three baselines:
Kalman filter with no learned components: This baseline uses a Kalman filter to remove noisy measurements and outliers in the GNSS measurements and also applies forward/backward smoothing to further refine the positioning estimate. The filter parameters are hand-tuned but optimized via Bayesian hyperparameter optimization to achieve maximum positioning accuracy.
GNN loosely coupled with a Kalman filter (Mohanty & Gao, 2022): This baseline uses the initial position from a Kalman filter to condition the node features to a graph convolution network that is built upon the GINConvolution layer (Xu et al., 2019). Specifically, the position estimate from the Kalman filter calculates the features for every node (satellite), namely, the range residuals and LOS vectors.
Attention-based neural network correction (Kanhere et al., 2022): This approach leverages a deep neural network (DNN) to correct initial position estimates. The DNN takes pseudorange residuals and satellite LOS vectors as inputs and learns the required corrections for the initial positions. The model employs a data augmentation strategy that involves randomizing the initial position guesses to further improve model performance.
Metrics
We compare the trajectory predicted by our approach with respect to the ground truth trajectory for qualitative results. We use quantitative metrics such as the mean, median, maximum, and minimum horizontal positioning error in the north and east directions to evaluate our approach against the baselines.
4 EXPERIMENTAL RESULTS
4.1 Qualitative Positioning Results
We show qualitative positioning results in terms of how closely the predicted position obtained from our algorithm follows the true position. For ease in visualization, we plot only the loosely coupled GNN/Kalman filter baseline in the same plot, as this baseline outperforms the other tested baselines. As shown in Figure 4, the trajectory tracking results from our algorithm demonstrate significant improvements over the loosely coupled GNN/Kalman filter baseline for a data set collected in San Jose. Our algorithm provides a mean error of 2.2 m in the north and 0.6 m in the east direction. The aerial view of the entire trajectory and magnified sections indicate that our approach closely follows the real path, thus outperforming the baseline in capturing nuanced changes in direction and speed. Similarly, for a data set collected in Sunnyvale, as shown in Figure 5, our algorithm outperforms the baseline with a mean error of 0.9 m in the north and 1.1 m in the east direction. In the magnified plots, it is evident that our approach closely approximates the ground truth.
4.2 Positioning Errors on Test Data Sets
As shown in Table 6, our approach has the lowest mean error of 1.1 m in the east direction, indicating that, on average, the estimated positions are closer to the actual positions compared with the baselines. Similar to the mean horizontal positioning error, our approach has the lowest median error of 1.1 m and the lowest minimum value of 0.6 m, outperforming the other methods. Our approach also provides the lowest maximum error of 1.8 m, suggesting that, even in the worst-case scenario, it provides more accurate positioning results than the other methods. This superior performance is due to the fact that our approach jointly optimizes both the GNN and the Kalman filter in an integrated manner. While the GNN utilizes satellite positions and received measurements to predict corrected measurements and their uncertainties, the Kalman filter utilizes a feedback loop to enhance future positioning predictions using the current predictions and backpropagation. Furthermore, corrected measurements and their uncertainty predictions from the GNN are fed into the Kalman filter, which refines the position estimates based on state estimates from the filter and intermediate observations from the GNN. Because the entire process is continuously iterated, we can estimate the receiver position with high accuracy.
As shown in Table 7, our approach also outperforms all of the baselines in the north direction with respect to mean, median and minimum metric and provides a maximum error comparable to that of the loosely coupled GNN/Kalman filter baseline. The presence of a higher maximum error indicates that our approach might be highly sensitive to extremely challenging scenarios. However, our approach still provides nearly meter-level positioning accuracy on the test data sets and sub-meter-level accuracy on most testing traces.
We also show the horizontal positioning errors compared with the loosely coupled GNN/Kalman filter baseline in Table 8. Compared with the baseline, we observe a consistent improvement in localization accuracy across all devices and data sets. The largest improvement is seen for the Google Pixel 5 in the 2021-08-24-US-SVL-1 dataset, where the error was reduced from 4.8 m to 1.4 m. This suggests that our approach may be particularly effective with the hardware or data characteristics of the Google Pixel 5 in this data set. The smallest improvement is 0.7 m, which occurs in row 8 for the GooglePixel5 device on 2021-08-04-US-SJC-1. Even the smallest improvement can reduce the positioning error by nearly a meter.
4.3 Generalization Results: Positioning Errors on Los Angeles Data Sets
For generalization results, as shown in Tables 9 and 10, our approach also provides improved positioning compared with the loosely coupled GNN/Kalman filter baseline, although both algorithms were trained and tested on the same data sets. Among the chosen metrics, the lower median error is particularly important, as it indicates that the algorithm performance is robust even when errors might be more extreme. This result shows that our approach has likely learned generalizable patterns from its training data, allowing it to adapt well to new data sets. The joint optimization of the GNN and Kalman filter, along with the feedback loop, further improves the capacity of the algorithm to adjust its predictions based on the characteristics of the new data.
4.4 Sensitivity Analysis with Respect to the Similarity Value for Edge Creation
In Tables 11 and 12, we present the impact of different threshold values for edge creation on positioning errors in the east and north directions. In the east direction, the error metrics are more sensitive to threshold changes. Both the mean and median errors decrease when the threshold is increased from 0.3 m to 0.5 m, but we observe little to no improvement at 0.9 m. The minimum and maximum errors also decrease as the threshold rises. In contrast, the north direction shows higher but more stable errors across varying thresholds, with the maximum error being notably higher at 5.8 m compared with 2.2 m in the east direction.
These findings suggest that an optimal threshold level exists for minimizing errors, particularly in the east direction. However, the error metrics for the north direction remain relatively constant across different thresholds, implying that threshold changes have less influence on the model performance in this direction.
5 CONCLUSIONS
We have developed a framework that deeply couples a GNN with a learned BKF to provide GNSS positioning correction for smartphones. Our Kalman filter is learned end-to-end, adjusting its parameters during training in tandem with the GNN. This makes our approach robust to errors in challenging urban environments. We further enhanced the modeling capabilities of the GNN by proposing an enhanced feature set, designing a novel technique for connecting edges based on similarity metrics, and applying inductive graph learning such as GraphSAGE. Through evaluations of multiple real-world data sets, we have shown that our approach outperforms existing learning- and model-based techniques in terms of positioning accuracy. Additionally, our approach shows promising generalization results in terms of providing high positioning accuracy in unseen urban environments.
HOW TO CITE THIS ARTICLE
Mohanty, A., & Gao, G. (2024). Tightly coupled graph neural network and Kalman filter for smartphone positioning. NAVIGATION, 71(4). https://doi.org/10.33012/navi.670
ACKNOWLEDGMENTS
We thank Google for making the smartphone data sets for 2020 and 2021 public and easily available. We also acknowledge Derek Knowles and Asta Wu for reviewing this paper. Lastly, we thank members of the Stanford NAVLab for their insightful discussions and feedback.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.