## Abstract

This paper analyzes the effect of error correlation on the SISRE bounding for GPS and Galileo satellites. For a given period of data collection, it computes the effective number of independent samples contained in a dataset applying an estimation variance analyses. Results show that the time between effective independent samples is highly dependent on the constellation and onboard clock type. On one hand, GPS satellites equipped with Rubidium clocks exhibit significantly longer error correlation than those with onboard Cesium clocks. On the other hand, Galileo satellites show substantially shorter correlation time among samples with less variability on a monthly basis. This paper also introduces a methodology to compute SISRE bounding accounting for the limited number of independent samples. Using a Bayesian approach, it computes the so-called uncertainty factor by which the Gaussian distribution needs to be inflated in order to account for the observation data independence.

## 1 INTRODUCTION

Range error overbounding plays a pivotal role in GNSS Safety-of-Life (SoL) applications like aircraft precision approach. The high integrity requirement demanded by Ground Based Augmentation Systems (GBAS), Satellite Based Augmentation Systems (SBAS), and Advanced Receiver Autonomous Integrity Monitoring (ARAIM) necessitates a thorough analysis of the GNSS range errors that guarantees a safe overbound. In the GNSS integrity literature, two extensively used bounding methods can be found: DeCleene’s Gaussian CDF bounding^{1} and Rife’s Gaussian pair overbounding.^{2} Both methodologies replace the unknown true error distribution by a Gaussian with standard deviation *σ*_{ob} which preserves its bounding properties after convolution in the position domain. In order to account for arbitrary (nonsymmetric, nonzero mean) error distributions, the pair overbounding introduced the so-called nominal bias *b*_{nom}. This term is also meant to overbound other errors that are not always observable in the sample data (ie, nominal signal deformation biases) and whose distributions might be unknown. Among others, *σ*_{ob} and *b*_{nom} are encapsulated within the Integrity Support Message (ISM) and transmitted to the ARAIM users. Based on these inputs, users evaluate whether or not the integrity requirement is fulfilled.^{3} The pair overbounding theorem has been recently revisited in Blanch et al.^{4} (Blanch’s two step Gaussian bounding) where a relaxation of the bounding premises is proposed leading to a less conservative bound.

The three previous overbounding methodologies have one common denominator; they assume that the observed distribution is the actual one, and they do not need to be concerned with correlation and independence. Previous works have shown that a satellite Signal-in-Space Range Error (SISRE) exhibits a time variant component which affects both mean and standard deviation.^{5,6} These studies also showed that error distributions tend to be zero mean in the long term. In addition, the work in Perea et al.^{6} indicated that GPS clock errors presented a correlation time of 10-12 hours whereas orbit errors show longer correlation time with 12-h periodic components. Those findings were later corroborated by Stanford University in Walter et al.^{7} Given the high correlation of the data, it becomes clear that SISRE characterization based on service history gets more cumbersome for new constellations where less data is available. The work presented in this paper takes a step forward and analyzes the effect of error correlation in SISRE bounding for GPS and Galileo. Using an approach based on Bayesian inference, it then introduces a methodology to inflate error bounds based on an estimate of the amount of independent data.

The first part of this work focuses on the autocorrelation study of orbit and clock error for both GPS and Galileo satellites. Results will show how SISRE correlation exhibits significant differences between constellations. An amplitude spectral analysis of the range error will evidence a 12-h harmonic components due to half sidereal days orbit periodicity (14 hours in case of Galileo). Although useful to understand the physical behavior of the range error, autocovariance and spectral density analyses do not provide an estimation of time between effectively independent samples. This paper proposes a method based on estimation variance analysis to compute the number of independent samples for a given data. Based on the variance of the distribution standard deviation estimator Var[*σ*_{x}], we estimate the time between effectively independent samples for both GPS and Galileo satellites finding significant differences among them.

Once the data autocorrelation has been understood, the second part of this work derives an analytical expression of the range error Cumulative Distribution Function (CDF) based on the number of independent samples as defined in Section 3. Given the high correlation of the data, the characterization of the SISRE based on service history becomes more cumbersome for new constellations where less data is available. Pervan introduced in Pervan and Sayim^{8} the use of Bayesian inference as a means to account for the statistical uncertainties in the knowledge of error standard deviation and the correlation across the multiple reference receivers to be used in the GBAS. The work presented here takes a step forward and analyzes the effect of error correlation in the SISRE bounding for GPS and Galileo. Using Bayes’ theorem with a noninformative prior distribution of the standard deviation, we compute the factor by which the Gaussian distribution needs to be inflated to account for the sample independence. Analytical results will show that the conditioned distribution matches the Gaussian CDF when the number of independent samples reaches approximately 350. This methodology is implemented using GPS and Galileo SISRE distributions from our previous work in Perea et al^{5} and Perea et al.^{6} These results will show that the fact that Galileo SISRE presents a significantly shorter decorrelation time than GPS will speed up the SISRE characterization based on service history to support ARAIM.

## 2 GPS AND GALILEO SISRE ANALYSIS: PREVIOUS WORK

Constellation performance monitoring and SISRE characterization has been a profuse topic in the GNSS literature during the past years. In particular, in the frame of GNSS SoL applications, previous work done by Walter in previous studies^{7,9,10} has addressed the analysis of GPS satellites nominal performance and faults during the last decade. In parallel, we have also been active in the examination of GPS and Galileo nominal range error in previous studies.^{5,6} In the offline ARAIM architecture, the characterization of satellite service history is used by the Air Navigation Service Provider (ANSP) to verify operational commitments. This information will be encapsulated in the ISM and transmitted to ARAIM users which ultimately will evaluate the integrity, accuracy, and continuity requirements.

Our prior work in Perea et al.^{6} exposed the variability on a monthly basis of GPS range errors in both mean and standard deviation. Figure 1 includes the distribution means for satellite orbit and clock errors projected over a grid of 642 users on a monthly basis. As can be seen, distributions exhibit nonzero biases that can vary between 20 and 40 cm in monthly observation periods. As the monitoring period increases, mean errors reduce to values on the order of reference products accuracy. As pointed out in both Perea et al.^{6} and Walter et al.,^{7} during the first weeks of operations, some GPS satellites show abnormally large errors that are not representative of the satellite’s performance after that period. To also illustrate the variability on SISRE dispersion, Figure 2 includes the SISRE worst user location root mean square (RMS) values on a monthly basis. As observed, changes in SISRE RMS can reach up to 50 cm for monthly datasets. As the monitoring period increases, this time variability reduces significantly as depicted by the lower plot in Figure 2. Consequently, it can be stated that core distribution parameters reach fixed values for monitoring periods of 6-8 months. As pointed out in Perea et al.^{6} and Walter et al.,^{7} note that the largest RMS values correspond to the two Cs-equipped block IIF GPS satellites (SVN65 and SVN72) which show particularly less temporal variability with respect to the rest of the Rb-equipped satellites. As can be inferred, the onboard clock type has a pivotal role in the error correlation and sample independence for GPS satellites.

Respectively, Figures 3 and 4 include SISRE distribution mean and RMS values for Galileo satellites. Comparing Figures 1 and 3, it can be observed that Galileo range error exhibits significantly less variations on a monthly basis than GPS. This is an indication that the American GNSS is more affected by temporal correlation than the European constellation. It is also important to point out that the inherent maturation of the constellation entails events of nonstationary performance as depicted in Figure 4. As shown in Perea et al.^{5} and Galluzzo et al.,^{11} SISRE RMS for Galileo satellites presented values of 1 m in March 2015 having a drastic performance improvement to today’s figures (SISRE 1-*σ* values around 25-30 cm). Not only has the nominal performance greatly improved but also the number of outliers has been reduced. Clock events described by ESA in a previous investigation^{12} were also corrected after the initial service declaration. Such enhancement will soon reach a saturation point once the ground and space segments are completely deployed and the constellation reaches full operational capability. Note the fault events identified and reported in previous studies^{9,13} have been excluded, and only the nominal performance is analyzed.

In the ARAIM architecture, ANSPs are in charge of monitoring the constellation performance to ensure that it is compatible with the performance commitments. Data independence plays an essential role in determining those bounds since they impact the confidence that can be placed in the estimation. In response to this need, this paper performs a detailed analysis on error correlation and sample independence and attempts to measure the amount of information provided by the data by defining an equivalent number of independent samples. The definition of the orbit and clock monitoring algorithms has been covered in previous work.^{14,15} A further description of the products validation algorithm along with the conversion between satellite Antenna Phase Center (APC) and Center of Mass (CoM) can be found in previous studies.^{6,16} Table 1 encapsulates the set of input data employed in the orbit and clock error for this analysis.

## 3 DATA CORRELATION AND SAMPLES INDEPENDENCE

The effect of the number of independent samples is paramount for error overbounding. Inferring properties of the underlying distribution from a given dataset is always challenging. As pointed out in the previous section, satellite range error is eminently correlated over time creating high variability of the distribution core parameters estimates (mean and standard deviation ) on a monthly basis. In order to perform safe and efficient SISRE overbounds (not extremely loose so we might face availability risk), the error autocorrelation needs to be addressed within ISM generation.

### 3.1 Error autocorrelation and autocovariance

For a given random process x(*t*), let us define the autocorrelation *R*_{xx} and autocovariance *C*_{xx} functions as

1

2

where *μ*_{x} = *E* [x(*t*)] is the mean value of the random variable x(*t*).

Let us assume that over the interval of data collection 0 *< t < T*, our error data x(*t*) comes from a stationary, ergodic random process with mean *μ*_{x}, variance *σ*^{2}_{x}, autocorrelation function *R*_{xx}, and autocovariance *C*_{xx}. Let us also assume that the independent samples contained in x(*t*) come from an independent and identically distributed process. Because the process is ergodic, those parameters can be estimated using the following expressions (following Chapter 5 in Bendat and Piersol^{21}):

3

4

5

6

Since the error data is collected in discrete time x(*k*), let us write the equations above in discrete form as

7

8

9

10

The autocovariance provides a temporal representation of the correlation between the error and a delayed copy of itself. Typically, the autocovariance function is normalized by the sample variance so that lag zero is equal to one. Doing this is useful because the correlation is a scale-free measure of the statistical independence of the error. Let us define the normalized estimated autocovariance matrix as .

As exposed in Perea et al.^{6} and Montenbruck et al.,^{22} SISRE distribution is mainly driven by the onboard clock type. Consequently, in this chapter, error data are clustered in four different groups: GPS-Rb, GPS-Cs, Galileo-RAFS, and Galileo-PHM. For each of the four groups, a representative satellite is taken to generate the autocovariance and spectral density plots in Figures 5-9: SVN67 / PRN06 (Rb), SVN65 / PRN24 (Cs), GSAT0101 / E11 (RAFS), and GSAT0205 / E24 (PHM).

Figure 5 illustrates the large differences in the clock autocovariance for Rubidium clock-equipped GPS satellites versus Cesium. Orbit error components (radial, along, and cross-track error) show no significant disparity between satellite types having a 12-h harmonic component which remains over several days. This sinusoidal behavior is attributed to the inherent dynamics of the satellite throughout its orbital motion. In particular, it is due to the limitation of the 15 quasi-Kleperian parameters used to model the satellite motion.^{23} Apart from this half-sidereal day harmonic component, the along-track error also shows a 2-h sinusoidal behavior which coincides with the nominal update rate of the GPS navigation message.

Unlike orbit components, clock error (purple lines in Figure 5) shows considerable contrast between satellite types not only in the harmonic element but also in the convergence time. This suggests that within the Orbit Determination and Time Synchronization (ODTS) process, the estimation of the satellite clock prediction gets “contaminated” by the residuals of orbital model fitting. In order to better understand these effects, let us analyze the error correlation in the frequency domain. For a given signal x(*t*), the Wiener-Khinchin relation defines the link between its autocorrelation function *R*_{xx}(*τ*), in time domain, and its power spectral density *S*_{xx}(*ƒ*), in frequency domain, as follows

11

Figure 6 includes the Single-Sided Amplitude Spectrum (SSAS) for orbit and clock errors for both GPS satellite types. The dominant harmonic component for orbit error corresponds to the 12-h frequency (2.31 × 10^{−5} *Hz*) showing also several *n* × 12-h components. As pointed out above, there is no significant contrast in the orbit error between the two satellite types. However, the difference lays in the relative power density between the orbit elements and clock noise level. One can be misled by results shown in Figure 5 thinking that Rubidium clocks induce larger range errors than Cesium clocks—this is actually not true. The normalized autocovariance does not indicate anything about the distribution variance but about its correlation. In fact, as detailed in Figure 2, Cs-equipped SISRE RMS is typically twice as big as Rb-equipped satellite values (100-120 cm versus 50-60 cm) due to better short term stability and prediction accuracy.^{24} In Figure 6, it can be observed that the error noise level in Cs clocks is significantly higher than in Rb clocks. On the left side, it is illustrated that the power level of clock error noise (yellow line) is above the power level of the harmonic components of the orbit error. As a consequence, the orbit correlation has a larger impact on range error for Rb-equipped satellites than for Cs-equipped.

A similar analysis is carried out for Galileo satellites in Figure 7. Unlike in the GPS case, the onboard clock does not create substantial differences in the normalized autocovariance between clock types. Figure 8 includes the orbit and clock component SSAS for RAFS and PHM Galileo satellites. A 14-h harmonic component is observed for orbit error in both cases being compliant with the Galileo orbital period (1.98×10^{−5} *Hz*). However no significant contrast can be detected among clock types. In particular, both yellow lines in Figure 8 indicate similar noise level for PHM and RAFS clocks. The next section will prove that in fact there is a slight difference between the two satellite types impacting the time between effectively independent samples.

A comparison between Figures 5 and 7 exposes the differences in error correlation between GPS and Galileo satellites. The Galileo orbit error sinusoidal component decays significantly faster than GPS. This is corroborated by the range error spectral density depicted in Figure 9. Unlike GPS, Galileo only shows one harmonic component corresponding to the 14-h orbital period. However the power associated with this component is lower relative to lag zero ultimately leading to a less dominant orbit correlation for Galileo range error than for GPS.

### 3.2 Covariance analysis for effectively independent samples determination

The error temporal correlation shown in the prior subsection only provides a qualitative notion of sample independence. Figures 5, 7, and 9 exposed the differences between constellations and satellite type showing that the range errors for GPS Rb clock satellites are significantly more correlated than GPS Cs clock and Galileo errors. However, it does not bring a specific procedure to define a number of effectively independent samples given a certain dataset that can be used to measure the amount of information. Prior work done in previous studies^{6,7} based the correlation analysis on autocovariance plots inspection. This section takes a step further and proposes a method based on estimation variances for sample mean and standard deviation.

Let us assume again that error data x(*t*) comes from a stationary, ergodic random process with mean *μ*_{x}, variance , autocorrelation function *R*_{xx}, and autocovariance *C*_{xx} (defined in Equations (3)–(6)). Given a monitoring period for which we intend to characterize the satellite range error, confidence in the estimation of and will be given by their variances and . While the first variance term is easy to obtain, the variance of the estimated sample variance does not have a simple form. In this regard, we define the mean square value of x(*t*) as which can be estimated by

12

Using Equation (12) in (4), the estimated sample variance can be written as

13

Chapter 8 in Bendat and Piersol^{21} provides a closed-form equations for both and ,

14

15

Again, since our error data is collected in discrete time x(*k*), Equations (12), (14), and (15) can be written as

16

17

18

Note that Equations (17) and (18) require the true values of the error autocovariance , which is unknown. They have been substituted by its estimated value obtained using Equation (10). This is an important step that needs further motivation. Technically, the true values shall be substituted with a range of values around the sampled ones given by confidence interval. According to the Central Limit Theorem (CLT), the observed *C*_{xx} will be close to the true one if the number of independent samples is big enough (typically larger than 30). Since the estimation of the autocovariance function is carried out based on several years of SISRE data, the use of (10) is legitimized. The scope of this derivation is to determine how many of those *N* samples are effectively independent *N*^{*}. It is important to clarify that we will use as many samples as available in our dataset X to compute and , but their variances will be driven by the number of independent samples.

Consider the special case where x(*k*) is a white noise process with mean *μ*_{x} and variance . By definition, the autocovariance is (where *δ*(*k*) is the Kronecker delta function) implying that all samples contained in the dataset are independent, *N* = *N*^{*}. Then, variances in (17) and (18) reduce to

19

20

The underlying idea is to compare the white noise results with the general case (colored noise) and determine the number of samples *N* that will result in parameter estimate error variances equal to those in (19) and (20). The ratio *N*^{*}/*N* will then be the fraction of the *N*^{∗} samples that are *effectively independent* for the estimation of *μ*_{x} and out of the total *N* (colored). Setting Equation (17) equal to (19) and Equation (18) equal to (20), the ratio *N*^{∗}∕*N* can be obtained from the variance in the estimation of *μ*_{x} and as

21

22

Note that the fractions identified in the previous expressions will not be identical since they are conditioned to the variance of the estimation of two different parameters. In general, for a given dataset, and will not be equal, and hence, (21) and (22) will report different values depending on how much confidence we can place on the estimation of each parameter. Since in order to obtain , we need both and , the limiting factor will be the smallest ratio between and . The difference in these two ratios can be analytically proven by analyzing the dependency of expressions (19) and (20) on the number of independent samples *N*^{∗}. For a given zero mean Gaussian distribution with unit standard deviation, it can be easily seen that . In other words, we need twice as many independent samples to estimate than to estimate with the same variance.

Let us assume that over the period of data collection 0 < *t* < *T*, we have a sampling interval Δ*T* (time which elapses between two samples) so that the total number of collected samples is *N* = *T*∕Δ*T*. The selection of the sampling interval only affects the number of colored samples but not the number of effectively independent samples contained in the dataset X. One might be tempted to assume that the ratio *N*^{∗}∕*N* shall be independent of the sampling interval; however, let us look at the following example. Let us assume that two processing facilities monitor the same satellite range error during 10 days. Monitor A has a sampling interval of 5 min while monitor B has a sampling interval of 15 min. After the full period, monitor A has collected a total *N*_{A} = 2880 correlated samples while monitor B gathered *N*_{B} = 960 correlated samples. Since A and B have different sampling interval, *N*_{A} and *N*_{B} are different; however, they must contain the same number of effectively independent samples (same monitoring period) so . That yields to

23

For performance characterization and error overbound, our goal is to determine the time between effective independent samples Δ*T*_{ind}. It is formally defined as the time elapsed between two consecutive effectively independent samples. The determination of Δ*T*_{ind} is relevant since it allows us to discern how many independent samples we can collect in a given monitoring time. It provides an approximate measure on the amount of information that the recorded data contains about the true distribution. Let us assume that in our example the true Δ*T*_{ind} is 15 minutes. In that case, and according to (23), the ratio of effectively independent samples for the dataset collected by monitor A is . In the general case, we do not know the actual Δ*T*_{ind}, and the only way to estimate it is through the computed values of *N*^{∗}∕*N*. Let us call *N*_{J} the total samples of the data set X_{J} collected by a generic monitor J with a sampling interval Δ*T*_{J}. We want to compare our monitor J with the monitor whose sampling interval is the actual Δ*T*_{ind} so that *N*^{∗}∕*N* = 1. Using Equation (23), the time between effectively independent samples can be estimated as:

24

Figure 10 presents the values of and for GPS satellite range error for both Cs and Rb clock types as a function of the sampling interval. As pointed out before, both (21) and (22) are built under the assumption that *C*_{xx} can be substituted by its estimated value *C*_{xx} if a large number of independent samples are included in the dataset. In particular, SISRE data from January 2015 to December 2017 is taken for the GPS and Galileo correlation analysis. Although we have not yet defined the technique to determine the number of effective independent samples in a given set, results from GPS and Galileo stationary analysis in Perea et al.^{6} indicate that several years of SISRE data contain enough independent points to support the CLT.

As presented in Figure 10, both and are monotonous functions (ideally linear) of the sampling interval. According to Equation (24), the Δ*T*_{ind} can be estimated as the inverse of the slope of these functions. Note that choosing a different sampling interval does not imply a modification of the total monitoring period (3 years with sufficiently large number of independent samples) but different time elapsed between the collected measurements. As mentioned above, the confidence in the estimation of will be limited by the smallest *N*^{∗}∕*N* ratio. In other words, the time between effective independent samples Δ*T*_{ind} will be determined by the longest or .

As expected, for a given sampling interval, the number of effectively independent samples for Rb clock satellites are approximately 10 times smaller than for Cs clock satellites. For example, if we choose a sampling interval of 2 hours, the ratio of effectively independent samples for Cs clock satellites is 0.5 while it is just 0.05 for Rb clock satellites. Table 2 provides an average range of values for and applying Equation (24) for each point of Figure 10. Note that the shaded cells indicate the limiting parameters. The time between effectively independent samples for Rb clock GPS satellites ranges between 50-60 hours whereas it is 10 times shorter, 5-6 hours, for Cs clock satellites.

Figure 11 includes the results of a similar analysis for Galileo satellites (RAFS and PHM onboard clocks). Although not as pronounced as in the GPS case, there is a difference in the number of independent samples between PHM and RAFS-equipped spacecraft. For example, for a sampling interval of 1 hour, the ratio of effectively independent samples for PHM clock satellites is 0.5 while it is 0.4 for RAFS clock satellites. Table 2 indicates that the time between effectively independent samples for PHM clock satellites ranges between 2.5-3.5 hours while it is a few hours longer for RAFS clock satellites, around 4-5 hours. Note that as of June 2018, there is only one Galileo satellite in active service which operates with a RAFS onboard clock (GSAT0101).

It is worth pointing out the differences between the limiting factors among satellites. Cs clock-equipped GPS is the only case in which the estimation of is the limiting factor. In the rest of the satellites under study (both Rubidium and passive Hydrogen masers), is the limiting factor instead. A plausible explanation to this behavior is the short term versus long term clock stability. As illustrated in the Allan deviation plots in Beard,^{24} Cs atomic clocks present a better long term stability whereas Rb and passive Hydrogen maser clocks show an extremely stable short term behavior.

The initial correlation analysis that was performed in Perea et al.^{6} was based on autocovariance plots. Those figures provided a notion of the error temporal behavior, but it is actually quite complex to extract anything more than qualitative conclusions. Looking at the range error autocovariance in Figure 12, one could, for example, fix a 0.5 threshold value from which we can expect the data to be uncorrelated. However this criteria is not sufficiently motivated. Looking at the Rb GPS satellites autocovariance plot (blue line in Figure 12), it is quite hard to extract any conclusion about Δ*T*_{ind} from that sinusoidal behavior. Even more challenging is to infer the time between effectively independent samples in the Galileo case. The green and brown lines in Figure 12 show very similar trends but in fact, as shown in Table 2, they do not have the same Δ*T*_{ind}. The methodology here presented overcomes this problem by defining the *N*^{∗}∕*N* ratio.

## 4 EMPIRICAL ASSERTIONS ABOUT RANGE ERROR TIME DEPENDENCE

After having determined the time between effectively independent samples, the conclusions empirically stated in Perea et al.^{6} regarding error variability over months for GPS and Galileo can be mathematically seconded. Figures 1 and 2 exposed the broad variability in range error mean and standard deviation over months for Rb-equipped GPS satellites whereas in the case of Cs clocks, the errors were quite stable over time. With the results obtained in the previous section, we can now state that in the case of Rb clocks, only 14-16 samples collected in a month are independent. Trying to characterize a population with only 16 independent data points is certainly adventurous leading to discordant results on a monthly basis. As the monitoring time increases to biannually and yearly datasets, the number of independent samples grows providing more confident estimations of the true distributions. In the case of Cs-equipped satellites, 1 month of data contains around 120 independent samples, making the monthly estimations less changeable as displayed in Figure 2. Similar grounds can be given to explain the smaller variability observed for Galileo error distributions (after initial service declaration) shown in Figure 3.

A discussion regarding whether or not satellite range error distributions are biased is herein addressed. The final section in the Working Group C (WGC) Milestone 3 Report^{10} collects a series of assertions related to the ARAIM system architecture. Among others, they state that the ANSP will implement a ground-based offline monitoring of satellite measurements to compute a safe overbound using the distributions and . Both parameters *b*_{nom} and *σ*_{ob} account for: a) Repeatable or persistent biases in receiver observed SIS errors, for example signal deformation or interfrequency biases; b) Statistical uncertainty due to limited sample sizes available to the offline monitor function. This work asserts that orbit and clock errors do not create permanent bias in SISRE, so they should not be accounted in *b*_{nom}. This statement has also been supported by Walter et al. in his previous study.^{7} In case permanent biases are observed in the range error due to clock and ephemeris error, it can be attributed to a misalignment in the reference APC between Broadcast Ephemeris and reference data (Table 1). Our allegations are based on the findings in our previous work^{6} and the prior section:

GPS: SISRE distributions for Rb-equipped satellites do not exhibit a significant bias after 6-8 months of data collection. Since only 14-16 independent samples can be collected in a month, it would take at least 10-12 months to have a reliable estimation of the mean. Once enough significant data have been collected, the distribution mean is on the order of reference products accuracy (2-3 cm).

Galileo: Due to the short time between independent samples, Galileo SISRE has around 180 independent samples a month. This explains why no significant bias (on the order of the products accuracy 4-5 cm) was observed in the monthly error distributions for Galileo. In case of the European constellation, one must acknowledge that ground segment ODTS is subject to updates. As displayed in the timeline of the SISRE RMS in Figure 4, certain ground segment modifications can violate the stationarity of the error (March 2018). Since they do not belong to the nominality of the distribution, these events do not invalidate the results from this paper which apply to unfaulted error distributions.

## 5 BAYESIAN INFERENCE FOR ERROR CORRELATION ANALYSIS

The first part of this paper has provided a statistical method to define the number of independent samples for a given dataset. Ultimately, the statistical independence of the data needs to be accounted for in the satellite ranging error overbound and consequently in the ISM generation. Previously proposed overbounding methods for GNSS SoL applications (see previous studies^{1,2,4}) do not consider the effect of error correlation and how to account for it. Using a Bayesian inference approach, this section obtains an analytical expression of the range error CDF as a function of sample standard deviation and the number of independent samples. Then, it analyzes the inflation factor that needs to be applied to a Gaussian bound in order to safely account for the error variability.

### 5.1 Ranging error CDF based on sample independence

Given a range error dataset, our scope is to obtain a CDF of the ranging error *F*_{ε} as a function of the sample standard deviation *s* and the number of independent samples *n*.The major hypothesis of this section is accepting that the sampled error derives from a zero mean Gaussian population. Empirical error CDF plots in Perea et al.^{6} evidence that this assumption is not far from reality and that a Gaussian function is a good approximation for the major part of the true error distribution.

The probability functions in this section depend on two variables; range error, *ε*, and parent distribution standard deviation, *σ*. Our goal is to derive an analytical expression of the range error CDF as a function of the sample standard deviation *s* and the number of independent samples *n*. For that purpose, let us define the following probability functions in order to express

*ƒ*_{ε}: Marginal PDF of the ranging error*ƒ*_{σ}:*A priori*Marginal PDF of the distribution standard deviation*ƒ*_{ε|σ}: Conditional PDF of the ranging error*ε*to parent distribution standard deviation*σ**F*_{ε}: Marginal CDF of the ranging error.

The marginal probability of the ranging error *ε* is written as

25

Applying Bayes’ theorem, Equation (25) can be expressed as

26

Integrating the range error pdf to obtain the cdf and rearranging terms, we can write

27

Note that the integral over *dε* can be expressed in terms of the error function (erf) as follows:

28

A suitable posterior distribution *ƒ*_{σ} derived from a noninformative prior coming from a Gaussian population is given in Section 2.3 of Box and Tiao^{25}

29

where *s* is the sample standard distribution and *n* the number of effective independent samples. The equation above represents the PDF of the error standard deviation conditioned to *n* and *s* based on the assumption that the actual error derives from a zero-mean Gaussian distribution (discussed in Section 4). For simplicity, let us rename the constants

Introducing (28) and (29) in (27), we can write the range error CDF as

30

For the sake of clarity, let us rewrite Equation (30) as . Applying the variable change *u* = 1∕*σ*, substituting *dσ* = −*σ*^{2}*du*, and correspondingly changing the integration limits, the two terms of Equation (30) can be expressed as

31

32

Using the table of integrals provided in a previous report,^{26} both terms *I*_{1} and *I*_{2} have analytical solutions

33

34

where Γ(*n*) is the Gamma function and 2*F*_{1} (*a*_{1}, *a*_{2}, *a*_{3}; *a*_{4}) is the Gaussian hypergeometric function. Introducing (33) and (34) in (30) the range error CDF can be expressed as an explicit function of the number of effective independent samples *n*, the sample standard deviation *s* and the error magnitude x as

35

### 5.2 Inflation factor to account for sample independence

According to the CDF overbound theorem,^{1} a given random variable *A*(x) with CDF *F*_{A}(x), is bounded by a second distribution *O*(x) with CDF *F*_{O}(x) if

36

Given the measurement-dependent *F*_{ε} in (35), our goal is to find an *F*_{O} that fulfills the bounding conditions in (36). Figure 13 depicts the normalized Folded CDF and Quantile-Quantile (QQ) plot of *F*_{ε} for different values of *n* and compares them to the normal Gaussian distribution (no sample correlation). As can be seen, for a given error dataset with *n* independent samples and sample standard deviation *s*, the distribution derived from *Ν* (0*s*) will not bound the ranging error. In order to find a safe overbounding *σ*_{ob} that accounts for the uncertainty due to the finite number of independent samples, the estimated sample standard deviation shall be inflated by factor *K*_{uncer} ≥ 1 so that *σ*_{ob} = *K*_{uncer} *s*.

For the sake of generality, the error term x in (35) can be normalized by the sample standard deviation as x^{∗} = x∕*s* leading to a measurement-dependent *F*_{ε} as a function of number of samples

37

Figure 13 plots the above CDF expression for different numbers of independent samples. As can be inferred, *K*_{uncer} must be an inversely proportional function of the number of independent samples contained in the error dataset that, in the limit case, will reach *K*_{uncer} ≈ 1. Of course we do not want to unnecessarily inflate *σ*_{ob} so that it leads to availability risk. However we need to confidently overbound at least down to the *P*_{sat} committed by the CSP.^{3} For a given error dataset X with *n* independent samples and estimated standard deviation *s, K*_{uncer} is the factor that fulfills:

38

where *φ*^{−}1(*P,μ,σ*) is the inverse CDF of a Gaussian distribution *Ν* (*μσ*). Equation (38) provides an implicit function *K*_{uncer} = *ƒ* (*n, P*_{sat}) that needs to be solved iteratively. The computed CDF *F*_{O} derived from will guarantee a CDF bounding as

39

Figure 14 shows the folded CDF and QQ plots of the inflated Gaussian distribution that fulfills bounding conditions in (39) for a given *P*_{sat} = 10^{−5}. Figure 15 summarizes the values of the inflation factors *K*_{uncer} for different values of *P*_{sat} and *n*. The represented *K*_{uncer} is the inflation factor that provides the tightest safe overbound (assuming Gaussian parent distribution) and, as can be seen in Figure 14, decreases as the number of independent samples grows. As shown in Figure 15, the inflation factor is close to 1 (data uncertainty does not play a role) once the dataset contains around 150-200 independent samples. According to the results from the correlation study in the previous section, this means around 1-1.5 months of Galileo SISRE data and 10-12 months of GPS (Rb) data.

Note that the results showed above are compatible with the empirical assumptions made in Equations (17) and (18). There, the true distribution was substituted by the *C*_{xx} based on the argument that 3 years of data provide enough independent data to apply the CLT. The inflation factor *K*_{uncer} has been defined to characterize the uncertainty associated with the estimation of the error distribution when not enough independent data is available (just a few weeks for Galileo or a few months for GPS). As seen in Figure 15, once sufficient independent data points have been collected, the inflation factor asymptotically reduces to one, implicitly indicating that the estimation of the core of the distribution is a close representation of the real one.

## 6 DISCUSSION ON THE INFLATION FACTOR FOR GAUSSIAN OVERBOUND

The Bayesian analysis used in the derivation of *K*_{uncer} assumed that range error distribution comes from a zero mean Gaussian population. This assumption does not fall far from the reality. Figure 16 presents a typical SISRE folded CDF for GPS satellites (in this case for SVN67/PRN06) during 2 years of data collection projected over a grid of 642 users. As seen in the QQ plot, the Gaussianity is maintained for approximately 95% of the data (2*σ*) showing larger tails that deviate from the Gaussian fit defined by the sample standard deviation of 34 cm (red line). It was mentioned at the beginning of this paper that the core of the distribution controls the nominal performance of the error. In particular, Section 3 argued that the correlation properties (time between effectively independent samples) are dominated by the core, leaving almost no influence to the tails. The stationarity analysis for both GPS and Galileo in Perea et al.^{6} showed that once enough independent samples are collected, the core of the distribution settles, and only differences in the tails are observed.

Although the core of the distribution dominates the sample mean and standard deviation, the integrity bounds are driven by the tails of the distribution. If one was to overbound the empirical CDF in Figure 16 using a *σ*_{ob} computed as the product of the sample standard deviation (*s* = 34 cm) and the inflation factor corresponding to 384 independent samples (*K*_{uncer} = 1.03, 2 years of GPS-Rb data as per Table 2), the resulting Gaussian bound of *σ*_{ob} = 35 cm will not overbound the full distribution but only the core. This is due to the fact that real SISRE distributions have two distinct parts: a quasi-Gaussian core containing most of the data and a flat tail distribution with just a few data points.

In order to provide a safe integrity bound (not necessarily the optimal), the *σ*_{ob} has to be determined by the tail distribution. As suggested by Figure 16, we can consider a tail distribution which contains 5% of the total data, leading to a total of 20 independent samples. As reflected by Figure 15, a *K*_{uncer} = 1.33 has to be used to inflate the tail sample standard deviation. The resulting *σ*_{ob} of 74 cm does certainly provide a safe integrity bound to the observed empirical CDF accounting for data uncertainty due to correlation. Note that the inflation factor is still relatively high for 20 independent tail distribution samples. Following this core-tail partition strategy, if an inflation factor *K*_{uncer} ≈ 1 wants to be achieved based only on independent sample data belonging to the tail distribution, two more orders of magnitude of total data needs to be collected (departing from the fact that only 1-5% of the data belongs to the tail distribution).

However, it can be argued that, although safe, this overbound might not be optimal. Indeed, we are sacrificing the narrow core of the SISRE distribution by the tails. It is left for a future version of this work to introduce a Multi Gaussian bounding methodology which exploits the trade-off between core and tail distributions ultimately leading to more efficient integrity bounds.

## 7 CONCLUSIONS

This work has presented a technique to account for the effect of sample correlation in SISRE overbound. It has been introduced in two steps. First, we defined and determined the number of effectively independent samples contained in a given satellite range error dataset. The outcome of our study has demonstrated that the error correlation is highly dependent on the constellation and satellite onboard clock. Through an estimation variance analysis, Section 3 established the major differences among GPS and Galileo range error correlation. Although SISRE RMS values are typically double for Cs-equipped GPS satellites than for Rb-equipped satellites, the former show a time between effectively independent samples 10 times shorter (5-6 hours) than Rb clock-equipped satellites (50-60 hours). A signal spectrum analysis indicated that this is due to the periodic residuals caused by the limitations of the orbit model. In the case of GPS Rb clock satellites, this “contamination” is more pronounced due to the better clock performance and short term stability.

An analogous analysis was carried out for Galileo range error. Unlike GPS, the contrast between PHM and RAFS clock-equipped spacecrafts was not so noticeable. Satellites equipped with passive Hydrogen masers exhibit 2.5-3.5 h between independent samples whereas RAFS-equipped satellites display 4-5 h. It is important to remark that as of December 2018, only two operational GPS satellites are equipped with Cs clocks, and just one Galileo satellite runs with a RAFS clock. Consequently, it is a fair statement to say that GPS error shows a characteristic correlation time around 10 times longer than Galileo.

Second, using a Bayesian approach and assuming a Gaussian distribution, this work introduced a methodology to perform an overbound that accounts for the uncertainty in the data due to the limited number of independent samples. We derived a closed expression to compute the so-called *uncertainty inflation factor* as a function of the sample standard deviation, the number of independent samples, and the committed *P*_{sat}. Results show that in order to have an inflation factor of around 1.02-1.03, at least 150-200 independent samples need to be collected. This translates to a period of data collection of 1.5-2 months for Galileo and 10-12 months for GPS.

These results are paramount for elaboration of the offline ISM. As stated in the WG-C ARAIM Milestone 2 Report,^{27} the CSP shall need a period of several years of observation to gain sufficient confidence in the operational performance of each constellation. This confidence is highly driven (although not only) by the amount of effective independent samples that we can collect during the observation process. As shown here, the fact that Galileo SISRE presents a significantly shorter decorrelation time than GPS will eventually ease the SISRE characterization based on service history to support ARAIM in the sense that shorter monitoring periods are needed to collect independent data.

## HOW TO CITE THIS ARTICLE

Perea S, Meurer M, Pervan B. Impact of sample correlation on SISRE overbound for ARAIM. *NAVIGATION*. 2020;67: 197–212. https://doi.org/10.1002/navi.346

## CONFLICT OF INTEREST

The work carried out in this paper was performed while the first author was employed at the Institute of Communication and Navigation at the German Aerospace Center (DLR). The technical information contained in this document does not represent any official Airbus Defence & Space GmbH position.

- Received December 13, 2018.
- Accepted September 26, 2019.

- © 2020 Institute of Navigation

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.