Abstract
Theoretical results conclude that Direct Position Estimation (DPE)-based Global Navigation Satellite System (GNSS) receivers can achieve more robust localization than their conventional two-step counterparts. However, compared to conventional approaches, there is a much smaller body of work for DPE, and DPE receiver implementations are highly experimental. This work surveys DPE techniques from the literature and presents a software-defined parallelized DPE-based receiver prototype built on popular DPE techniques. The parallelized receiver software, along with a companion sequential implementation, is made available to the community as an open source. The GPU usage of the parallelized DPE-based receiver is profiled and compared to the companion sequential implementation and another sequential implementation presented in the literature. Through the literature survey, discussion of the open-source receiver software, and the performance evaluation of the receiver, resultant insights for design decisions are presented.
1 INTRODUCTION
Since the inception of the Global Positioning System (GPS), the conventional two-step approach to computing navigation solutions has been the popular choice for Global Navigation Satellite System (GNSS) receivers due to its speed and reliability in uncomplicated signal environments (Misra & Enge, 2011). Such two-step methods compute ranges to satellites as intermediate measurements before estimating a navigation solution. Typically, these intermediate measurements are found independently of each other using an approach referred to as scalar tracking. This independence of measurements leaves the receiver susceptible to errors when complicated effects are present in GNSS signals (Dyke et al., 2004; Montloin et al., 2013). Signal reflections or blockages, such as those from buildings, and interference on the electromagnetic spectrum, such as jamming or spoofing effects, can cause a conventional receiver to fail to acquire or maintain lock on satellites (Chu, Chauhan, et al., 2018; Coffed, 2016; Gao et al., 2016; Grant et al., 2009; Humphreys, 2012).
In contrast to the conventional approach, localization algorithms have been proposed which compute the navigation solution directly in the navigation domain, bypassing intermediate measurements. Such a one-step approach – referred to as Direct Position Estimation (DPE) for reasons discussed in Section 2 – pivots the GNSS-based localization problem, providing ways to address the challenges of conventional two-step receivers at the expense of added computation load. The potential of the DPE approach has begun to be explored with conceptual motivations (Axelrad, Bradley, et al., 2011; Brown, 2012; Closas & Gusi-Amigó, 2017), analytical advantages (Closas, Fernández-Prades, & Fernández-Rubio, 2007a, 2009a; Gusi-Amigó et al., 2018), and demonstrated benefits of DPE-based receivers (Chu, 2018; Dampf et al., 2017; Ng &Gao, 2016b) being presented in the literature. This work aims to support the growth of the DPE body-of-knowledge through a survey of DPE processing techniques, the release of open-source parallelized DPE-based receiver software, and a discussion of considerations for practical DPE-based receiver design.
This paper is organized into five sections. This section introduces the objectives and contributions of the work. Section 2 summarizes the derivation of the DPE algorithm from the GPS received signal model and surveys techniques presented in the literature for the numerical implementation of DPE. Section 3 presents the open-source parallelized DPE software and proposes a strategy to utilize a well-known coupling between navigation-domain states. Section 4 profiles the parallelized DPE implementation to analyze the computational efficiency and contrasts localization results to demonstrate how two different navigation-domain search strategies can impact performance. Section 5 concludes the work.
2 SOLVING THE DPE OBJECTIVE FUNCTION
The conventional two-step approach to GNSS-based navigation performs localization by computing range measurements from the satellite to the receiver based on the Time of Arrival (ToA) and solving a system of equations using those ranges to estimate position through multilateration (Misra & Enge, 2011). Thus, the two steps arise from first processing the antenna voltage samples to compute the range measurements and then computing the position-time estimate from the range measurements second.
In contrast to the two-step approach, one-step localization arises from the observation that the composite signal received at any given position-velocity-time (PVT) state is unique and specified in advance. Thus, the transmission from a GNSS constellation can be completely reconstructed at a given PVT state and compared to the signal which was actually received. This is the premise of DPE – that a receiver’s PVT state can be estimated by choosing the state whose reconstructed transmissions most closely match the received signal.
This section will summarize the derivation of the DPE objective function and survey numerical approaches to optimizing this objective function, which have been presented in the literature.
2.1 One-step localization literature: DPE or CD?
First, it should be noted that one-step GNSS-based localization algorithms can be found in the literature under one of two names: DPE or Collective Detection (CD). The two names seem to have arisen from differing philosophies of how to apply one-step localization. DPE algorithms typically aim to create a continuously running receiver, leveraging the one-step approach for improved localization accuracy and a natural sensor fusion representation (Closas & Fernández-Prades, 2010). Thus, for most DPE works, the primary metric is localization accuracy. In contrast, CD algorithms typically aim to leverage the one-step approach for acquisition in challenging environments (Axelrad et al., 2011). Thus, the primary metric for most CD works is the minimum received signal power required for successful acquisition.
Despite the different names, both DPE and CD are one-step GNSS-based localization approaches with the same underlying navigation concepts. The edge of the dichotomy is blurred in the literature, as works presenting DPE algorithms often comment on the sensitivity benefits of using a one-step receiver, and works presenting CD algorithms often note the accuracy of the navigation solution computed during acquisition.
To avoid distraction in juggling terms, the DPE acronym will exclusively be used in the remainder of this work. As this work focuses on the design and operation of a localization accuracy-oriented one-step GNSS receiver, DPE appears to the authors as the more fitting name to use. However, as with most one-step localization literature, this work builds on results from both DPE and CD perspectives. The reader is advised that many references and fundamental works understandably prefer the CD name in their contributions.
2.2 Foundation of DPE and notation convention
For simplicity of discussion, the numerical solutions presented in this work are exclusive to a single-constellation, single-frequency GPS C/A code receiver implementation. However, the concepts presented derive from the general problem of satellite-based localization and are expected to be applicable across GNSSs and in multi-frequency operation.
The complex baseband representation of the received GPS L1 C/A signal from satellite 𝑖 can be modeled as follows:
1
where 𝑡𝑘 = 𝑡0 +𝑇𝑠𝑘 is the GPS time of sample 𝑘 in a 1×𝐾 vector 𝐭 spaced at . The code delay 𝜏(𝑖), expressed in GPS L1 C/A code chips, of the signal from satellite 𝑖 to the receiver is a function of the receiver’s true state 𝜸. This state is a PVT state of the form with 𝐩 = [𝑥 𝑦 𝑧]⊤ defined to be a Cartesian representation of 3D space. The code delay is converted to time by dividing by the signal’s effective frequency , as the L1 C/A code frequency FC∕A experiences some Doppler shift from the relative motion of the receiver and satellite 𝑖.
The five constituent terms of Equation (1) are:
𝑎(𝑖): the amplitude of the received signal, assumed to be constant over the sampling window;
𝐺(𝑖)(⋅): the Pseudo-random Number (PRN) code, returning either +1 or ‒1;
𝐷(𝑖)(⋅): the transmitted navigation data, returning either +1 or ‒1;
exp{⋅}(𝑖): the baseband-shifted carrier wave experiencing the Doppler shift ;
𝑁(𝑖)(⋅): the noise on the transmission received from satellite 𝑖. This is assumed to be additive white Gaussian noise (AWGN) to allow analytical simplification in the DPE objective function derivation presented here.
The received signal 𝑆𝑟𝑥(⋅) is the superposition of the transmissions of all GPS satellites in view 𝑆(𝑖)(⋅). The receiver samples 𝑆𝑟𝑥(⋅) to acquire the sample set 𝐬𝐭, a 1×𝐾 vector. Following the derivation presented in Chu (2018), according to the invariance property of maximum likelihood estimation (MLE) (Closas et al., 2007a), finding the replica signal ŝ𝐭 of maximum likelihood to the received signal 𝐬𝐭 is equivalent to finding and â, the maximum likelihood estimates of the receiver state and the 1×𝑀 vector of amplitudes of each satellites’ transmission. Since the noise terms 𝑁(𝑖)(⋅) are AWGN, this yields the least-squares minimization problem:
2
where 𝐂 is a 𝑀×𝐾 matrix of the samples of the reconstructed signal in the sampling window for the 𝑀 visible satellites. The explicit estimation of â can be eliminated by solving for â analytically in terms of 𝐬𝐭 and approximating the cross-correlations between GPS PRN codes as zero (Closas et al., 2007a). Following the derivation of Chu (2018), this leads to the formulation:
3
where is the summation of each cross-correlation between the reconstructed transmission for satellite 𝑖 with the received signal 𝐬𝐭. For the remainder of this work, the operation (⋅) will be referred to as the replica-received cross-correlation for channel 𝑖. The value of the vector cross-correlation (𝜸) for a given 𝜸 will be referred to as scoring a given state. The process of evaluating Equation (3) (or an approximation thereof) for a sample set and computing a DPE navigation solution will be referred to as one timestep.
This work will treat DPE as a snapshot algorithm, maximizing the objective function essentially the same way from timestep to timestep. This can be contrasted with continuous DPE, which will adapt the objective function maximization algorithm based on the previous timestep’s signal conditions (Closas & Fernández-Prades, 2010; Dampf et al., 2017). The snapshot approach to DPE is more consistent with the authors’ vision for this work, as the open-source DPE-based receiver presented is designed to provide fundamental functionality to support further research.
2.3 Survey of DPE techniques
Even without the explicit estimation of 𝐚, this formulation of the DPE objective function is still computationally expensive. The objective function of Equation (3) has no closed-form solution and must be maximized numerically (Closas, Fernández-Prades, & Fernández-Rubio, 2007b). Additionally, Equation (3) prescribes optimization over eight dimensions, and non-linearities over the PVT domain are expected in real-world conditions. Thus, DPE research has proposed approximations and developed efficient processing techniques to reduce the computational burden.
This section will survey five categories of study in GNSS-based one-step localization literature, beginning with common approximations which make the numerical method more computationally tractable, then examining various approaches to algorithmically use the DPE objective function, and then progressing to results from specific DPE implementations across a range of signal environments.
2.3.1 Approximations for computational efficiency
Three approximations, which reduce the dimensionality of the DPE objective function, are particularly influential in this work: optimizing around “nearby” states so that the cross-correlation is a sufficient statistic, decoupling the position-time and velocity-drift states, and approximating cross-correlation scores from those of nearby states.
Cross-correlation as a sufficient statistic
Since there is no closed-form maximization of the objective function, Equation (3) prescribes that the objective function must be numerically evaluated at specific PVT states 𝜸𝑔𝑢𝑒𝑠𝑠 in search for the maximum likelihood state. This would suggest that the reconstructed signal matrix 𝐂 must be generated for each state 𝜸𝑔𝑢𝑒𝑠𝑠 evaluated. Thus, a common assumption is that 𝐂 is generated at a 𝜸𝑔𝑢𝑒𝑠𝑠, which is close to the true 𝜸, turning (⋅) into a lower-dimensionality sufficient statistic for 𝜸 (Pany, 2010). Such a “nearby” 𝜸𝑔𝑢𝑒𝑠𝑠 can be provided by Assisted GPS (Andrianarison, Sahmoudi, & Landry, 2017; Cheong, Wu, Dempster, & Rizos, 2011; Li et al., 2014) or initialization from a conventional two-step GPS localization method (Closas & Gusi-Amigó, 2017; Bhamidipati & Gao, 2018; Dampf et al., 2017; Ng, 2016), as some examples.
Decoupling position time and velocity drift
Weill (2010) shows that the position-time and velocity-drift domains can be estimated separately from each other in vector correlations with a minimal loss of accuracy. This is due to the scale of GPS with respect to the search space of an initialized receiver. For a fixed position state, the shape of the cross-correlation function with respect to velocity states will not be significantly affected by the exact position state chosen over a range of position states on the order of 100 meters (Weill, 2010). Similarly, for a fixed velocity state, the shape of the cross-correlation function with respect to position states will not be significantly affected by the exact velocity state chosen over a range on the order of ±500 Hz of Doppler shift (Weill, 2010).
Thus, finding the maximum likelihood PVT state 𝜸𝐭|𝐭 for timestep 𝐭 may be approximated by estimating the maximum likelihood position-time state from a position-time manifold and the maximum likelihood velocity-drift state from a velocity-drift manifold. This approximation will hold as long as the receiver’s current best estimate of the PVT state 𝜸𝐭|𝐭‒𝟏 is close to the actual state of the current sample set, as each manifold will estimate four states and hold four states fixed:
4
This reduces the estimation problem from an eight-dimensional optimization problem to two four-dimensional optimization problems – a total reduction in the number of points searched by for a hypercubic grid of size 𝑛.
Batch correlation
When both the sufficient statistic and decoupled estimation properties can be assumed, the batch correlation approximation arises. For the position-time domain, since is a sufficient statistic for states nearby the state used to construct 𝐂, performing circular cross-correlation between the received signal and the expected signal will perform a parallel code phase search (Borre et al., 2007). This can efficiently compute replica-received scores at code delay intervals based on the sampling frequency, and the position-time domain score for a given position-time state can be looked up from the batch scores based on the code delay difference between the position-time state used to construct 𝐂 relative to the given state. Circular cross-correlation can be employed because of the repetition and autocorrelation properties of PRN codes, provided that 𝑇𝑠𝐾 is a multiple of the PRN code length and the navigation bits are wiped off of 𝐬𝐭. By the circular cross-correlation property of the discrete Fourier transform (DFT) (Oppenheim, Schafer, & Buck, 1999), and using the fast Fourier transform (FFT) function ℱ to perform the circular cross-correlation, the position-time correlation scores for channel 𝑖 may be approximated by
5
where 𝐜(𝑖) is the 𝑖th row of 𝐂. If the carrier phase of each satellite channel is explicitly tracked, the score for state 𝐱𝐭 can be found by coherently summing the complex scores from each channel (DiEsposti, 2007). If the carrier phase is not tracked, to avoid destructive additions from angle errors in the complex scores, the magnitudes of each channel’s score can be added in a non-coherent sum instead (Axelrad, Donna, & Mitchell, 2009).
For the velocity-drift domain, since the relative velocity between the receiver and the satellite produces a Doppler shift in the satellite’s transmission, the carrier wave can be inspected to provide a parallel frequency space search (Borre et al., 2007). The FFT of the baseband-shifted carrier wave produces a peak in the frequency domain corresponding to the the relative Doppler shift in the received signal. Then, a given velocity-drift state may back-calculate its expected Doppler shift and use the power at that frequency as its likelihood score (Borre et al., 2007). Thus, if the carrier wave exp(𝑖){⋅} is retrieved from the received signal, the velocity-drift correlation scores may be approximated by
6
As with the position-time correlation scores, the velocity-drift scores can be coherently combined when the carrier phase is tracked, otherwise they should be non-coherently combined if the carrier phase is not tracked.
The computation cost can be reduced further using another FFT-based approach, the Bi-dimensional Parallel Search algorithm, as presented by Andrianarison in Andrianarison, Sahmoudi, and Landry (2016). This approach does sacrifice some position-domain accuracy, but compensates with a more accurate Doppler frequency estimation through the use of Spectral Peak Location.
2.3.2 Finding the maximum
Approaches to numerically solve the DPE objective function range from purely iterative methods to purely grid-based methods. Iterative algorithms for DPE evaluate the replica-received cross-correlation at one state, quantify the error at that state, then move the state and try again until the distance between successive state estimates is below some threshold. In contrast, the grid-based approach to DPE constructs a grid of candidate receiver states, evaluating the replica-received cross-correlation as the likelihood for each state and choosing the maximum likelihood state as the navigation solution. The generalized iterative and grid-based approaches are visualized in Figure 1.
Iterative methods
Closas proposes the use of the Space-Alternating Generalized Expectation Maximization (SAGE) algorithm in Closas et al. (2007b), which reduces the search space for clock-bias estimates to improve computation cost. Closas later proposes using the Accelerated Random Search (ARS) algorithm (Appel et al., 2004; Closas, Fernández-Prades, & Fernández-Rubio, 2009b; Closas et al., 2015), a Monte-Carlo optimization technique that refines its estimate by reducing the search space whenever a new estimate is found to score better than the previous one.
Grid-based methods
DiEsposti first uses a grid to visualize the navigation-domain likelihood manifold in DiEsposti (2007). Axelrad then proposes the use of three grids – Rough, Medium, and Fine – successively in one timestep for progressive refinements from a grid with kilometer-level spacing down to a grid with decameters of spacing. Axelrad also proposes in Axelrad et al. (2009) that the Rough grid take the average of correlation values around each grid point to reduce the chance that the peak is missed in the large grid spacing.
Narula refines the idea of Rough, Medium, and Fine grids in Narula et al. (2015), proposing specific step and overall sizes for the grids based on properties of the GPS L1 C/A signal. Li proposes slightly different values for the Rough, Medium, and Fine grids in Li et al. (2014), sized instead to GSM cellular network localization accuracy to facilitate Assisted-GPS for DPE initialization. Esteves proposes not making the grid rectangular, which has been the prevalent shape in the literature, but rather recommends in Esteves et al. (2014) that the horizontal domain of the grid be expressed in polar coordinates. Esteves also parameterizes the polar grid size and spacing in terms of uncertainty in the initial receiver estimate and sky visibility.
More advanced grid forms are proposed in Dampf, Frankl, and Pany (2018) and in Jia (2016). Dampf treats the navigation-domain states as particles in a particle filter and proposes an algorithm to specifically perturb the particles and adjust the width of the batch scoring to evaluate various range uncertainties such as multipath, orbit, clock, and atmospheric errors. For example, if the receiver was initialized with an incorrect TOW, the navigation-domain correlograms would not overlap, likely leading to errors in the navigation solution. Thus, the receiver specifically evaluates the likelihood of a variety of such range uncertainties to limit errors from these sources. Jia proposes a Pigeon-Inspired Optimization (PIO) algorithm to choose which states are evaluated. States in this particle-based approach move between timesteps based on their score at the previous timestep, their location in relation to the other states, and an inertial factor. Intelligent particle-based approaches like these may be able to provide better rejection of false peaks, as filtering is built into the selection of states.
2.3.3 Navigation-domain filtering
For the grid-based approach, navigation-domain filters can be used to provide sub-grid spacing resolution in the navigation estimate. Axelrad presents a threshold-based filter in Axelrad et al. (2011), where states with a sufficiently high cross-correlation score are averaged to generate the navigation solution. Axelrad experiments with both weighted and unweighted averages in Axelrad et al. (2011). Ng simplifies this to a weighted average of all points on the grid in Ng and Gao (2016b). Cheong and Li both use a Dichotomous Search in their works, Cheong, Wu, Dempster, and Rizos (2012) and Li et al. (2014), respectively, to refine estimates in the Fine grid. Cheong uses an Iterative Morphological Filter (Cheong, 2011) to remove outlier peaks on the manifold that arise in noisy environments by looking for groups of grid points which are above a threshold.
The DPE manifold also provides a means to estimate uncertainty in the navigation solutions. Given sufficient grid resolution, a DPE-based receiver should be more confident in an estimate from the distinct peak on the manifold than one from a broad and smooth bump. Ridges in the manifold that do not intersect with the peak indicate signal reflections, and multiple peaks may be a sign of malicious attacks. As one way to characterize some of the manifold information, Narula proposes a confidence metric based on the difference between the code delay at the maximum of a channel’s correlogram and the corresponding code delay at the manifold’s peak (Narula et al., 2015). This provides what Narula calls “inertia” for choosing the same navigation solution again if noise is added.
2.3.4 Initialization considerations
Conventional two-step receivers can search over a limited frequency domain and a circular code delay domain for acquisition. A DPE-based receiver, however, must search an extremely large space compared to a PRN code chip in four independent dimensions. Thus, DPE-based receivers typically require some form of aiding to initialize.
Cellular network towers are a popular choice in the literature for DPE initialization. As discussed above, sizing grids to the localization accuracy of a GSM network and refining with smaller grids has been shown to be viable (Li et al., 2014). If the receiver has no access to initialization assistance, two-step methods may be able to provide the aiding (Bhamidipati & Gao, 2018). If only a few satellites have high enough signal strength to be tracked or if a cooperative receiver can communicate an estimate of reasonable confidence, that information may sufficiently reduce the search space enough for an impaired DPE-based receiver to initialize and use the architecture’s improved sensitivity to find satellite transmissions buried in noise or other signals. Andrianarison proposes an algorithm to switch between one-step and two-step approaches based on the number of strong satellites (Andrianarison & Landry, 2018).
Even with such aiding, though, the time domain uncertainty still remains large due to the precision needed to resolve clock-bias estimates, as GSM network timing uncertainty is on the order of two seconds (Bradley et al., 2010; Cheong et al., 2011). If the uncertainty in the receiver clock bias at initialization is sufficiently large that the TOW or HOW is unknown for the current transmission, coarse-time navigation proposes to search for the TOW/HOW as an additional dimension (Cheong et al., 2011; van Diggelen, 2009). Cheong explains the underlying concept for this additional dimension as searching for the satellite state in the coarse-time dimension and searching for the range to the satellite in the clock-bias dimension (Cheong et al., 2012). Narula proposes a way to accelerate such a time domain search: the receiver computes the difference in clock bias between the code delay at the peak of the strongest satellite’s correlogram and the expected code delay of the state used for initialization, then searching a smaller clock bias and coarse-time space around the predicted code delay (Narula et al., 2015).
2.3.5 DPE demonstrations
The benefits of DPE begin to shine when the receiver’s view of the sky is limited. This section will summarize a selection of the demonstrated results from the literature in notably challenging signal environments.
As the improved sensitivity of DPE is a driving factor for choosing the one-step approach, many works have empirically tested the general lower limit of received signal-to-noise power required for acquisition by a DPE-based receiver. DiEsposti visualizes a manifold grid with simulated data in DiEsposti (2007), showing the decay of the peak into a rough plateau receding into the noise floor as the signal-to-noise ratio (SNR) decreases. Axelrad presents five classes of experiments in Axelrad et al. (2011): comparing acquisition accuracy statistics when varying SNR, when varying dilution of precision (DOP), when estimating timing, when using different manifold filters, and when subject to real-world terrain effects. Cheong evaluates four methods of solving the DPE objective function (one iterative and three manifold filters for grid-based approaches) in Cheong (2011), comparing the sensitivity and accuracy of the variants. In Li et al. (2014), Li demonstrates the Rough, Medium, and Fine grid progression in live-sky data with varying levels of injected noise.
Experiments have also been conducted with live-sky data subject to environmental masking. Andrianarison compares acquisition probability at varying noise levels and varying timestep sizes for two methods of cross-correlation score generation (Andrianarison et al., 2016). Esteves demonstrates DPE acquisition in an indoor environment where a conventional two-step receiver can only track two satellites (Esteves et al., 2014). In Andrianarison and Landry (2018), an urban network of DPE-based receivers select the one with the best satellite reception to aid the others in localization. Andrianarison also provides a trade-off matrix between sensitivity and complexity for DPE-based receiver design in Andrianarison and Landry (2018). In Chu et al. (2018), Chu provides a fusion scheme for DPE-based receivers and demonstrates this DPE fusion maintaining track of a high-dynamics fixed-wing aircraft navigating through a California mountain range where a textbook two-step approach fails.
The multipath resilience DPE claims to offer has also been specifically tested. Closas simulates two multipath models in Closas et al. (2015) and shows DPE maintaining track longer than a two-step approach, with DPE only failing when the tracking loops used to generate the cross-correlation scores sufficiently diverged. In a live-sky experiment, Ng uses the multipath to the advantage of a DPE-based receiver by reflecting the state of the affected satellites off the multipath source, modeling multipath transmissions as line-of-sight transmissions to “pseudolites” (Ng & Gao, 2016b). Bhamidipati demonstrates superior accuracy of a DPE receiver compared to a textbook two-step receiver mounted on a mobile vehicle in an urban environment experiencing multipath (Bhamidipati & Gao, 2018). And, using DPE, Dampf achieves track of a mobile vehicle subjected to signal outages in a tunnel where scalar and vector two-step approaches fail (Dampf et al., 2017).
3 NUMERICAL DPE RECEIVER PROTOTYPE
This section introduces the open-source software-defined DPE receiver implementation developed for a portable graphics processing unit (GPU). The parallelized DPE algorithm implemented in this receiver will be summarized, and a new discussion will present how the coupling between vertical and clock-bias errors in GNSSs can be used in grid-based DPE.
3.1 Numerical DPE approach
The GPU offers much to a DPE-based receiver in the realm of computational efficiency. Signal replicas can be constructed in a parallelized manner, the scores of each candidate point can be computed independently, and the manifold can be assessed with parallelized reduction techniques. Each thread available to the GPU can execute the same instructions while operating on different memory locations, which is well-suited to vector and matrix operations. The idea of a GPU-based implementation has been suggested (Inside GNSS, 2015; Pany et al., 2015), but a GPU-specific implementation has yet to be demonstrated and evaluated in the literature.
Two software-defined receiver implementations are being made available to the research community as open source:
CUDARecv: A parallelized DPE receiver prototype developed in CUDA C/C++ 9.0.252. This software was previously introduced but not made public in Peretic (2019);
PyGNSS: A sequential DPE receiver prototype developed in Python 2.7. This software was previously introduced but not made public in Wycoff et al. (2015).
Both CUDARecv and PyGNSS are available online at the https://github.com/Stanford-NavLab/NavLab-DPE-SDR repository.
The remainder of this work will focus on CUDARecv due to the speedup it achieves through parallelization. PyGNSS acts as a benchmark, as it is a sequential implementation with an identical mathematical formulation to CUDARecv. CUDARecv and PyGNSS were kept in lockstep during development so that PyGNSS could offer rapid testing and debugging to complement the faster computation of CUDARecv.
To provide the researcher with the benefits of abstraction when developing a receiver implementation, the software segment of CUDARecv has been decomposed into seven modules, as depicted in Figure 2. The seven modules are executed sequentially for a set of samples; the processing of that set constitutes one timestep. Then, a new set of samples is acquired, and the next timestep runs.
The DPE algorithm itself is implemented in the Batch-CorrScores and BatchCorrManifold modules – a pair of tandem modules, which provide the researcher with flexibility through abstraction to make changes to the way in which the scores are batch-calculated and the grids are assessed.
This work follows the approximation that the position-time and velocity-drift states may be estimated separately and uses the batch correlation approximations presented in Section 2. To avoid obscuring conclusions about the numerical approximations and sampling schemes in DPE, the DPE receiver in this work will be initialized at a state such that the true state is within the grid on the first timestep in the vein of A-GPS.
3.1.1 BatchCorrScores
In the first of the two tandem modules implementing the DPE algorithm, the BatchCorrScores module generates a replica signal, wipes off components of the received signal, and performs the batch scoring. Components of the replica signals used for cross-correlation and wipe-off can be generated independently of the others, and each sample of each component can be generated independently of the others. Additionally, the FFTs, which compute the scores, can be performed independently of each other. Thus, the batch correlation step of the DPE algorithm was parallelized according to Figure 3. Each block in Figure 3 is a sub-task of the batch correlation, and each block is executed on one of three CUDA streams, which are synchronized by CUDA events. This parallelization scheme ensures no computational work in the algorithm is redone.
In this implementation, replicas of the carrier wave exp(i){⋅}, PRN code 𝐺(𝑖)(⋅), and navigation data 𝐷(𝑖)(⋅) are computed in blocks 2, 4, and 7, respectively, using the current estimates of the receiver’s state 𝜸𝐭|𝐭‒𝟏 and satellite track parameters. For the position-time scores, the carrier wave is wiped off of the received samples in block 5, and the result is Fourier transformed in block 8. This result is multiplied with the complex-conjugate of the FFT of the replica in blocks 7 and 10, and the inverse FFT is performed on the result in block 12. For the velocity-time scores, the DC offset of the received samples is removed in block 9, and the navigation data replica and the C/A code replica are used to wipe off their effects from the received samples in block 11. The FFT of the result is then taken in block 13.
When constructing the navigation data replica 𝐷(𝑖)(⋅), the next navigation bit may not be known at the time of replica generation. Thus, replicas are constructed for both navigation bit possibilities (Ng, 2016). Correlation scores for both replicas are computed, and the navigation bit value that gives the higher correlation score for the current best code phase parameter is selected. This is the cause of the dependence of block 12 on block 11. Additionally, this limits the sample set length 𝑇𝑠𝐾 to the length of one navigation bit from a computational efficiency standpoint – if multiple navigation bits are included in the sample set, the number of possible values to check will scale geometrically with the number of navigation bits.
3.1.2 Assess manifold
The second of the two tandem modules implementing the DPE algorithm, the BatchCorrManifold module, determines which PVT states will comprise the manifold, computes the cross-correlation score for each point, then generates a measurement from the manifolds.
The position-time and velocity-drift manifolds are computed in parallel on separate CUDA streams. The states comprising the two manifolds are chosen according to a grid initialization function and are oriented along the local East-North-Up (ENU) coordinates of the receiver’s current best estimate of its PVT state 𝜸𝐭|𝐭‒𝟏.
Both the position-time and velocity-drift manifolds are generated by looking up the batch correlation score for each state on the grid, thus the same approach can be followed at the thread-level for both manifolds, as diagrammed in Figure 4. The specific intermediate values computed to find the corresponding batch correlation score for a given state will be discussed in the following paragraphs.
Position-time manifold
For a state on the position-time manifold 𝐱 (with fixed velocity-drift estimates , scores are found by computing the difference between the code delay for the point being evaluated and the code delay of the receiver’s current best estimate of its state. Since the receiver’s current best estimate 𝐱𝐭|𝐭‒𝟏 was used to generate the batch correlation scores, the batch correlation scores are indexed by the code delay difference with respect to the code delay experienced at the state 𝐱𝐭|𝐭‒𝟏. Thus, the code delay difference used to find the score for state 𝐱 can be computed according to Equation (7):
7
In order to support optional code phase or navigation-domain tracking loops, the code delay difference is found by elapsed code phase cycles since an alignment time 𝑡𝐴, such as a TOW message or a handoff instant from A-GPS. This motivates the use of the 𝜙 symbols in an equation used to solve a code delay difference . The parameter is defined as the elapsed code phase cycles since 𝑡𝐴 for the receiver’s current best estimate 𝐱𝐭|𝐭‒𝟏 and requires the use of the GPS time of the first sample in the current sample set 𝑡0. Once the code delay difference is found, the score for satellite channel 𝑖 is computed by linearly interpolating between the two nearest indices in the batch correlation scores. The score for each visible satellite channel are non-coherently summed, giving the score for state 𝐱.
Velocity-drift manifold
For a state on the velocity-drift manifold (with fixed position-time estimates 𝐱𝐭|𝐭‒𝟏), the batch correlated scores are approximated as the spectral density of the received signal after wiping off the expected navigation bits, PRN code, and carrier wave at the receiver’s current best estimate . Thus, the difference in Doppler shift between the receiver’s current best estimate and state is used to find the score for that state, which can be computed by
8
where FL1 is the GPS L1 carrier frequency and is the expected (or tracked) Doppler frequency shift at state . Using the Doppler shift difference, the score for satellite channel 𝑖 is linearly interpolated between the nearest batch correlation scores, then summed non-coherently with all other visible satellite channels, giving the score for state .
Measurement generation
Once each position-time state 𝐱 ∈ 𝑋𝑝𝑜𝑠,𝐭 or velocity-drift state is scored with the noncoherent sum of the scores for each satellite channel , respectively, the maximum likelihood state may be found. In the unfiltered implementation, the state with the highest score is chosen from the position-time grid 𝑋𝑝𝑜𝑠,𝐭 and velocity-drift grids 𝑋𝑣𝑒𝑙,𝐭, as given in Equation (9).
9
An example weighted average navigation-domain filter was also implemented, as given in Equation (10):
10
The unfiltered implementation stores all grid scores in memory for debugging and visualization purposes. The search for the highest scoring state can be performed as a parallel reduce – the score for a given state is compared to the maximum seen within the thread, then the threads exchange their results until the last thread is left with the highest scoring state 𝐳𝐭. The weighted average filter skips the intermediate store step, directly performing a weighted sum by parallel reduction operations within the same kernel and dividing out the weighted total on the last thread to find 𝐳𝐭.
3.2 Manifold grids
Due to the number of states evaluated, the selection of the navigation-domain states is important to the receiver’s efficiency and accuracy. As Section 4 will show, grids will differ in their localization accuracy from even their initialization state alone. This subsection will introduce one strategy and the underlying concept for choosing between two different grid structures, as well as the specific grid implementations used in this work’s experiments.
3.2.1 Vertical and clock-bias error coupling in DPE
DOP is a well-known effect in conventional two-step receivers, which stems from the geometry of the GPS satellites relative to a terrestrial receiver. Analytically, DOP derives from the linearized least-squares optimization of pseudoranges in the two-step receiver algorithm. By formulating the solution of the receiver state correction 𝛿𝐱 to minimize the squared difference of the pseudorange residuals 𝛿𝝆, the covariance matrix of this formulation quantifies the relation between the error in one dimension of the navigation solution with respect to another in the Best Linear Unbiased Estimator (Misra & Enge, 2011).
It is well-known that DOP shows that a two-step receiver’s vertical state estimates will typically have a greater covariance than its horizontal state estimates because signals from satellites below the horizon are blocked (Misra & Enge, 2011). Additionally, it is well-known that the covariance between a two-step receiver’s vertical and clock-bias state estimates is often around one, leading to a coupling in the error in these states (Misra & Enge, 2011). This coupling is because a slight change in the vertical state is almost indistinguishable mathematically from an opposite change in the clock-bias state, as a change in the vertical state of the receiver will impact all pseudoranges the same way (either shortening all or lengthening all).
In a one-step receiver, such as a DPE-based receiver, the same DOP matrix can be derived when the objective function of the one-step receiver is formulated as a linearized least-squares optimization problem. Treating the cross-correlation between the received signal and the expected signal for state 𝐱 as a sufficient statistic, the normalized score for satellite 𝑖 can be approximated as given in Equation (11):
11
with error and where Δ𝜏𝐭(𝐱) is expressed in PRN code chips and linearized around 𝐱 and defined in Equation (12):
12
where 𝐮(𝑖) is the line-of-sight unit vector from the state 𝐩 to satellite 𝑖. Then, using least squares to find the state with the best score in the model linearized around 𝐱, an equation can be formulated to solve for 𝛿𝐱𝐭 using each of the score residuals . Solving for the score residuals in Equation (13),
13
and building the system of equations using the geometry matrix Γ in Equation (14),
14
a can be found that minimizes the sum of squared residuals:
15
The formulation of the DPE objective function given in Equation (15) – the formulation for iterative DPE-based receivers – uses a satellite geometry matrix in the same manner as the iterative approach in conventional two-step receivers (Misra & Enge, 2011). Thus, the covariance matrix, which follows from Equation (15), will have the same form as the DOP matrix of a two-step receiver, and the implications of DOP in the two-step approach will also follow for DPE.
3.2.2 DPE receiver design insights
Even though the grid-based approach to maximizing the DPE objective function as given in Equation (3) is not the Best Linear Unbiased Estimator, the DOP matrix describes effects of satellite geometry, which are present on the receiver, and these effects can inform the DPE-based receiver designer with a navigation-domain sampling strategy.
If the DPE algorithm begins in a low-confidence state, such as the one acquired through coarse acquisition or other sensors, a grid with large, independent variations in the vertical and clock-bias states should be used. The variation in such a grid would increase the likelihood of finding a state with offsetting errors in the vertical and clock-bias dimensions. If the DPE algorithm begins in a state of reasonable confidence, such as the one acquired through scalar tracking or previous DPE timesteps, a grid with vertical and clock-bias states that vary together should be used. This would allow a grid of the same size to refine the estimate by searching a smaller space at a higher resolution.
3.2.3 Experimental manifold grids
To demonstrate this experimentally in Section 4, two position-time grids were used by BatchCorrManifold when evaluating the DPE receiver algorithm: Spread Grid 7m and RNGrid 7m. In the unfiltered implementation, the state on the grid with the highest score is selected as the navigation solution.
Spread grid 7m
The Spread Grid 7m specifies a configuration of candidate points with higher density towards the estimate at which the receiver is initialized. This provides more detail in the region where the peak should be located while still capturing information over a wide range. A 1D slice of the position-time grid is visualized in Figure 5 and specified numerically in Table 1.
The position values, specified along the East, North, and Up axes relative to the receiver’s current best estimate 𝜸𝐭|𝐭‒𝟏, as well as the clock-bias range, are chosen based on previous work with a terrestrial mobile receiver (Chu, 2018). For the velocity-drift grid, the spacing and range in the East, North, and Up directions is based on the same previous work and terrestrial mobile receiver (Chu, 2018). The clock drift for the GPS signal simulator used (introduced in Section 4) is on the order of , as experienced in prior work (Peretic, 2019), though rated to be better than (National Instruments, 2012). The grid spacing for the clock drift was empirically chosen to be greater than the nominal drift, but closer to the nominal drift than the maximum rating.
RNGrid 7m
In comparison with the structured Spread Grid 7m, the RNGrid 7m was generated to cover the same position-time domain space with the same number of points as the Spread Grid 7m, but using largely randomly selected candidate points.
Of the 254 states, all but 17 were chosen as 4D uniform random variables, i.i.d. with respect to each dimension and with respect to each other. The other 17 points in the position-time domain grid were chosen explicitly. Sixteen of these were the vertices of a 4D hypercube spanning ±154 m in each dimension to ensure that RNGrid 7m covers the same position-time space as Spread Grid 7m. The one other explicitly chosen state was placed at the center of the grid so that the algorithm could remain at the same estimate over multiple timesteps. A 3D slice of the RNGrid 7m used in this work is shown in Figure 6.
As Section 4 uses this grid to study position-time domain effects, the velocity-drift grid of the RNGrid 7m configuration was kept identical to the Spread Grid 7m. The specifications of the grid are provided in Table 2.
4 EXPERIMENTAL RESULTS
This section will present results on the computational efficiency and grid-dependent localization accuracy of the open-source CUDARecv implementation. First, computational performance of the DPE receiver implementation on an NVIDIA Jetson TX2 portable GPU (Nvidia Corporation, 2020) will be analyzed. Second, results from a simulated dataset will demonstrate the impact on localization accuracy when utilizing the coupling between vertical and clock-bias states in DPE grid design.
For CUDARecv and its comparison to PyGNSS, the GPS signals originate from the National Instruments GPS Simulation Toolkit (National Instruments, n.d.). Generated samples undergo a simulated down conversion from the GPS L1-band carrier frequency of F𝐿1 = 1575.42 MHz (Misra & Enge, 2011)to 0 Hz so that the frequency of the carrier wave in the received samples for satellite 𝑖 will only consist of the Doppler frequency component , plus the modeled atmospheric effects. The sampling frequency is 𝑓𝑠 = 2.5 MHz with a sample set length of 𝑇𝑠𝐾 = 20 ms. This generated dataset is then processed by CUDARecv and PyGNSS.
PyGNSS provides the sequential CPU implementation to compare to CUDARecv. PyGNSS runs the same DPE-based algorithm as CUDARecv, but with vector operations instead of CUDA kernels and by sequentially performing the signal processing and estimation in the DPE-based algorithm. Though algorithmically comparable to CUDARecv, PyGNSS does not follow the same modular architecture, so the comparison timing provided in Section 4 focuses on the open-loop DPE pipeline, which is implemented in CUDARecv in the BatchCorrScores and BatchCorrManifold modules.
The host platforms for CUDARecv and PyGNSS are compared in Table 3.
4.1 Computational performance
Table 4 provides the information used in the computational performance analysis. First, GPU occupancy for the parallelized implementation is studied. Second, the speedup of the parallelized implementation is evaluated through comparison to a sequential implementation. Lastly, the parallelized implementation is compared to another DPE-based receiver in the literature.
4.1.1 GPU occupancy
The BatchCorrScores module requires 113 ms to batch compute position-time and velocity-drift scores. Approximately half of this time is spent by the Fourier transforms – the position-time score generation taking 12 ms and the high-resolution FFT velocity-drift score generation taking 47 ms. The velocity-drift domain uses an array approximately 10 times longer than that of the position-time domain – the same sampleset, but with zero-padding to sinc-interpolate the frequency scores. This amount of zero-padding was chosen empirically with reasoning explained in Section 3.3.4 of Peretic (2019). The other half of the 113 ms is spent constructing the 50 × 103-sample signal replicas and pre-FFT signal processing, such as navigation bit wipeoff.
The BatchCorrManifold requires on the order of 800 ms to evaluate the grid states and compute the navigation solution. Two kernels run in parallel, one for the position-time and one for the velocity-drift manifold. To score a state on the grid, for every visible satellite, a thread must remove the effect of the Earth’s rotation during the transmission’s time of flight, compute the difference of satellite tracking parameters between that of the state and the reference state used for batch correlation, and interpolate between the nearest batch correlated scores. The substantial number of registers needed for these operations limits the GPU occupancy to 25%.
The unfiltered and filtered implementations differ in how the navigation solution is found. The unfiltered implementation was developed with further analysis in mind, storing the scores for all grid points in memory before performing a reduction comparison to find the grid state with the highest score. The weighted average-based filtered implementation runs much faster because of the reduction sum integrated into the manifold scoring step, computing the navigation solution with a minimal time cost.
4.1.2 CUDARecv (GPU) vs PyGNSS (CPU)
For comparison’s sake, CUDARecv was programmed to predominantly use 64-bit double-type variables to match Python 2.7’s default floating point variable size. As the Jetson TX2 runs using Compute Capability 6.2 (Nvidia Corporation, 2020), the throughput when performing 64-bit floating point operations will be 32 times less than when performing 32-bit floating point operations (Nvidia, 2018). Thus, the theoretical FLOPs limit of the Jetson TX2 is around an order of magnitude lower than that of the comparable PyGNSS CPU implementation. Nonetheless, the parallelized implementation demonstrates a speedup on the order of three times compared to the sequential implementation.
For the BatchCorrScores module and its equivalent operations in PyGNSS, both the sequential and parallelized implementation have similar time costs to construct the replica signal. However, the parallelized implementation performs the FFTs nearly six times faster than the sequential implementation, leading to an overall speedup for the scoring operations of over four times.
For the BatchCorrManifold and its equivalent operations, even with only 25% occupancy, the parallelized implementation computes the manifold scores almost three times faster than the sequential implementation. As the manifold grids consist of 254 points, this result shows that parallelization is crucial for larger DPE grid sizes to be practically implemented. In the unfiltered case, Python’s data management allows the largest score to be quickly found, while a more significant computation cost is required to perform the weighted average step in the filtered case. As seen by the filtered case of the parallelized implementation, an integrated reduction step can generate the measurement with time cost similar to that of Python’s largest-value look-up.
4.1.3 Comparing CUDARecv to contemporary literature
Another comparison to CUDARecv can be found in the Bayesian DPE receiver presented by Dampf et al. in Dampf, Frankl, and Pany (2019). Though not named specifically in their text, the receiver prototype presented in Dampf et al. (2019) will be referred to as BDPERecv for ease of comparison here. BDPERecv differs from CUDARecv by not decoupling the position-time and velocity-drift state estimation, by integrating a particle filter into the selection of navigation-domain states to evaluate, and by using a multi-correlator approach to batch-generate scores for the navigation-domain states. BDPERecv is also designed to operate on a CPU with integrated parallel computing capabilities using Intel’s IPP Image Processing toolset. Contrasting the strengths and weaknesses of CUDARecv and BDPERecv further reveals strategies to optimize a DPE-based receiver.
The authors of Dampf et al. (2019) report computation time results for BDPERecv on a 2015 notebook computer with a 3.5 GHz Intel i7-6700HQ quad-core CPU. These results do not include the multi-correlator time cost, which would add to the batch score generation time. Nonetheless, the BDPERecv’s reported upsampling and particle filtering steps are comparable to CUDARecv’s BatchCorrScores and BatchCorrManifold modules, respectively, as detailed in Table 5. All values for BDPERecv in Table 5 and the following discussion are taken from Dampf et al. (2019).
A quantitative comparison of the batch score generation step is hindered because of the twenty times longer sample set length of CUDARecv compared to BDPERecv. As it is well-known that Fourier transforms scale worse than linearly relative to the length of the signal being transformed, the time per score cannot be simply divided by the sample length to make a direct comparison. Additionally, the timing reported for BDPERecv is measured for a single channel but predicted to be the same for eight channels, since the channels can be processed in parallel across the eight threads of the quad-core CPU. If the Fourier transforms are assumed to scale linearly (neglecting the added computation done CUDARecv because of its longer sample set) and the multi-correlators are assumed to have negligible time cost to BDPERecv, CUDARecv would compute scores at a faster rate than BDPERecv. This may indicate that the batch score generation step of a DPE-based receiver benefits from the greater parallelization available on a GPU. However, an all-parameters-equal comparison would still be needed to quantitatively confirm this.
It should also be noted that the multi-correlators used in BDPERecv provide the receiver designer with more control over which scores are generated. This control can be useful to mitigate estimator biasing errors, such as those that the developers of CUDARecv discuss in Peretic (2019) and in situations with significantly degraded signal conditions, such as those that the developers of BDPERecv discuss in Dampf et al. (2019).
The State Scoring and Navigation Solution step can be more quantitatively compared. The time per state for BDPERecv is approximately five times faster than CUDARecv. This is likely due to the computational complexity back-calculating and . On the Jetson TX2, these steps for CUDARecv consume all available registers, capping GPU occupancy at 25%. Should CUDARecv be able to approach the theoretical full occupancy through either a more efficient back-calculation algorithm or more available registers, CUDARecv would theoretically be around 1.25 times slower than BDPERecv. This would still suggest that state scoring may be computed more efficiently with the faster clock and greater per-thread resources on a CPU.
Converting CUDARecv from primarily using double-types to float-types both reduce the register demand and increase the throughput. Certain navigation-domain filters may also have improved efficiency on a GPU through parallel reduce operations. Under considerations such as these, CUDARecv may have a total time cost closer to BDPERecv.
A final point of comparison is the hardware profiles used. BDPERecv operates on a notebook computer with a focus on maintaining track of the receiver when subjected to occluded-sky and indoor environments. The resolution of the upsampling and the spacing of the multi-correlators in BDPERecv can be adjusted to compensate for such interferences, particularly if operating on a more powerful desktop, which was suggested in Dampf et al. (2019) in the context of real-time operation. Thus, for sensitivity, a CPU DPE-based receiver implementation may be preferred. In contrast, CUDARecv operates on an NVIDIA Jetson TX2, which is designed to be small, lightweight, and low power for applications such as mobile robotics. Ng and Gao demonstrate in Ng and Gao (2016a) a DPE-based receiver implementation that successfully tracks a moving terrestrial vehicle with a 2% duty cycle. With this loading factor, CUDARecv theoretically reaches real-time operation, as it processes 20 ms of data in less than 1 s. Thus, when designing for portability, a GPU DPE-based receiver implementation may be preferred.
4.2 Manifold grid localization performance
This section demonstrates the impact of the coupling between vertical and clock-bias errors on DPE grid design. The dataset was chosen to be simulated to ensure the following properties in the data:
Only Klobuchar-modeled atmospheric deflections are present in the received signal.
Received satellite power is a function of distance only.
No movement of the receiver.
Satellites transmit exactly from their ephemeris-specified state.
By guaranteeing the true state of the receiver, the true state of the satellites, and control over the signal transmission, the effects of the vertical and clock-bias error coupling can be isolated. The satellites simulated have the geometric configuration shown in Figure 7. Satellite PRN 22 was not tracked due to its low elevation.
The DOP for this geometry is as follows:
Additionally, the covariance between the clock-bias and the vertical dimensions is 1.516, while all other covariance terms are negligible.
In the experiments to follow, the DPE receiver processed the first two seconds of the simulated dataset in timestep increments of 20 ms. No navigation-domain filtering was used.
Two experiments were run, differing only in the grids used:
Experiment 1: Spread Grid 7m
Experiment 2: RNGrid 7m
In both experiments, to identify the impact of the coupling between vertical and clock-bias error, two runs were configured with only the following difference:
Configuration 1: “Unperturbed” (plotted in orange) – The clock bias used for initialization was unperturbed from the true value.
Configuration 2: “Perturbed” (plotted in blue) – Values in the range [‒80, ‒50] ∪ [50, 80] m were drawn and added as an offset to perturb the true clock bias used for initialization.
For simplicity in reference, the results from these experiments will be identified by their experiment number, followed by a hyphen, followed by their configuration number. For example, Experiment 2-1 refers to RNGrid 7m with unperturbed clock bias.
In all experiments, the receiver was initialized at 100 different states surrounding the true position of the receiver. The initialization states were chosen by randomly drawing i.i.d. from a uniform distribution:
Magnitude values in the range [50, 80] m and angle values in the range [0, 2𝜋) for the horizontal plane. These values were converted into rectangular offsets in the East and North directions and added to the initial state.
Values in the range [‒80, ‒50] ∪ [50, 80] m as an offset in the Up direction and added to the initial state.
In all experiments, the receiver reached a “steady-state” navigation solution within one second where the navigation solution no longer moved between timesteps. By plotting the steady-state error in one dimension with respect to another, the expected coupling between only the vertical and clock-bias dimensions may be observed, as shown for Spread Grid 7m in Figure 8 and RNGrid 7m in Figure 9. The steady-state navigation solution error across each experiment’s 100 runs is statistically characterized in Table 6.
As expected from the results of Section 3 and the covariance between the vertical and clock-bias states, Figure 8 and Figure 9 both show a linear correlation between vertical error and clock-bias error for both grids. Low correlation is observed in the other states for both grids. This coupling also leads to relatively low vertical RMS errors in the final states, particularly when the initial clock-bias state is perturbed.
Along with the improvement in the vertical and clock-bias accuracy from perturbing the initial clock bias comes a slight decrease in horizontal accuracy when using Spread Grid 7m. However, as Table 6 has a lower geometric error, and DPE optimizes with respect to low geometric error, this increase in horizontal error is reasonable.
For Spread Grid 7m, with the method of random perturbations used, initializations in Experiment 1-2 would select a state with less than 30 m difference in their vertical and clock-bias errors half of the time. In such cases, due to the linear structure of the candidate states in the Spread Grid 7m, the manifold would immediately be in a region with many states having comparable vertical and clock-bias errors, beneficially utilizing the coupling. Conversely, runs that did not have clock-bias perturbations (Experiment 1-1) would have states with around 65 m difference in their vertical and clock-bias error. In such cases, states with comparable vertical and clock-bias error would be in the lower-resolution region of the Spread Grid 7m grid, and they may have significant error in the horizontal dimension, giving a low score and preventing the grid from moving that direction.
However, for RNGrid 7m, there is no structure in the choice of candidate points. The reduced error from Experiment 2-1 to Experiment 2-2 stems from the greater initial error when the clock bias is perturbed. Initial state estimates with more error typically result in the receiver moving to a greater number of intermediate state estimates. For the 100 runs without the clock-bias perturbation, the DPE receiver implementation moved once at most and, on average, 0.94 times before settling on the final state. For the 100 runs with the clock-bias perturbation, the DPE receiver implementation moved five times at most and, on average, 1.85 times before settling on the final state.
Thus, perturbing the clock bias forced more states – typically states with less error – to be sampled, increasing the receiver implementation’s likelihood of finding a state near the true peak. These better estimates are reflected by the improvement in the standard deviation for the vertical and clock-bias states. This low number of moves is due to the open-loop configuration of the receiver, as there is no navigation-domain or multi-timestep filter to perform sub-grid spacing peak detection.
Comparing Spread Grid 7m and RNGrid 7m in Table 6, a difference in the RMS error is evident. When the clock bias was not perturbed, RNGrid 7m reached better solutions, with lower vertical, clock bias, and geometric error. When the clock bias was perturbed, Spread Grid 7m reached better solutions, with lower vertical, clock bias, and geometric error.
Consistent with the discussion in Section 3, these experiments show that a structured grid can produce more accurate navigation solutions when the DPE algorithm begins at a state with proportionally-related errors in the vertical and clock-bias states. Such a state could be provided by conventional coarse acquisition, a previous timestep from DPE, or an A-GPS handoff. However, when the receiver is unlikely to have proportionally related errors in the vertical and clock-bias states, it is better to have no structure in the choice of candidate states being evaluated.
5 CONCLUSION
To date, the DPE algorithm has been recognized for having the potential to mitigate errors that characteristically cause faults in classical two-step receivers. With the objective of supporting broader DPE usage, this work provided a practically oriented survey of DPE, released an open-source parallelized DPE-based receiver prototype called CUDARecv, and demonstrated the software-defined receiver’s efficiency and open-loop localization capabilities.
Design considerations for a practical DPE-based receiver were also highlighted. The literature survey contrasted differing approaches and referenced techniques to the aspect which they addressed in the basic numerical DPE algorithm. The parallelized batch correlation algorithm and strategies for choosing grid shapes were introduced to improve the computational and localization performance of the CUDARecv prototype. By comparing CUDARecv to a sequential implementation of the same algorithm in PyGNSS and an image processing-equipped sequential implementation of Bayesian DPE in BDPERecv, strengths and limitations of using a GPU for DPE-based localization became more apparent.
HOW TO CITE THIS ARTICLE
Peretic M, Gao GX. Design of a parallelized direct position estimation-based GNSS receiver. NAVIGATION. 2021;68:21–39. https://doi.org/10.1002/navi.402
ACKNOWLEDGMENTS
The authors would like to thank Mr. Shubhendra Chauhan for generating the simulated dataset. The authors also wish to thank Ms. Sriramya Bhamidipati, Mr. Shubhendra Chauhan, Mr. Arthur Hsi-Ping Chu, Mr. Enyu Luo, and Ms. Yuting Ng for their work developing PyGNSS and contributing to CUDARecv.
This research is funded by Air Force Research Laboratory Sensors Directorate (AFRL/RY), Wright-Patterson AFB, OH, under contract FA9453-15-1-0075.
Footnotes
Funding information
Air Force Research Laboratory, Grant/Award Number: FA9453-15-1-0075
- Received December 10, 2019.
- Revision received September 28, 2020.
- Accepted October 1, 2020.
- © 2020 Institute of Navigation
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.