## Abstract

Watermarking signal authentication is a technique in which a global navigation satellite system (GNSS) provider cryptographically perturbs the spreading code to allow for limited cryptographic authentication of a signal. Several proposals and studies have been presented or are underway to augment GNSS signals with this capability. This work reintroduces a generalized combinatorial watermarking function that affords a flexible pathway to cryptographically prove the authentication security of a signal with receiver observables under certain assumptions. The security levels are comparable to those of standard cryptographic security (e.g., 128-bit security) and require little or no additional use of the navigation data bandwidth. We show how our methods can be applied to signals of different designs and signal-to-noise ratios. With our receiver processing strategy, one can design a watermarking signal authentication scheme and the accompanying receiver to have high confidence in a signal’s authenticity.

## 1 INTRODUCTION

Receivers can use watermarking signal authentication to establish trust in satellite navigation ranging signals (Scott, 2003). Several proposals and studies have been presented to augment global navigation satellite system (GNSS) signals with this technique, including for the Global Positioning System (GPS) (Anderson et al., 2017; Hinks et al., 2021) and the Wide Area Augmentation System (O’Hanlon et al., 2022). These techniques work together with navigation message authentication to establish trust in transmitted navigation data.

In watermarking signal authentication, the spreading code is watermarked by cryptographically, pseudorandomly inverting or replacing sections of the signal spreading code. First, the GNSS provider (hereafter, the “provider”) makes a cryptographic commitment based on cryptographic information that the provider promises to hold secret for a specific time. Next, the provider perturbs the spreading code based on the commitment and broadcasts the perturbed spreading code within the ranging signal. The perturbation is small enough to avoid materially interfering with users ignoring authentication or attempting to acquire the signal. After receipt of the watermarked signal, the receiver stores raw intermediate-frequency radio data (hereafter, “raw radio data”) for the length of the applicable watermarked spreading code. After a specific time, the provider reveals the commitment used to generate the watermark. Then, the receiver re-examines the raw radio data. If the receiver observes a correlation increase when using the watermarked replica (known only to the provider at broadcast), then the receiver has evidence of the authenticity of the signal. In this work, we investigate the relationship between the extent of this correlation and the level of authentication security using mathematical proof arguments under adversarial and noise modeling assumptions.

Let us suppose that the constellation’s primary authentication framework is timed efficient stream loss-tolerant authentication (TESLA). In this case, the constellation can issue a watermarked signal with little or no additional bandwidth on the navigation data channel (Anderson et al., 2022a). The watermarking signal authentication must be derived from a commitment via a secure, one-way key-derivation function (e.g., hash-based message authentication code [HMAC] (Dang, 2008)). This work reintroduces a function class to perform watermarking perturbations on the spreading code (Anderson et al., 2022a; O’Hanlon et al., 2022). We call this function class “combinatorial watermarking functions.” Combinatorial watermarking functions use a cryptographic commitment to derive a uniform combination of integers from a set. These integers determine which portions of a spreading code to invert. We are explicit and careful in our construction to ensure the necessary cryptographic one-way properties to prove authentication security.

Combinatorial watermarking and inverting chips present several design advantages. First, combinatorial watermarking functions are generalizable and flexible, allowing design transferability among navigation signals without materially affecting security and operation analysis. Second, combinatorial watermarking functions can be constructed to ensure uniform spreading code inversion and vast, exponentially growing watermark spaces. This feature results from the judicious use of randomized construction and the exponential growth of the number-of-combinations function. The uniformity aids in simplifying mathematical analysis, and exponential growth eases the application of standard authentication security (e.g., 128-bit security). Additionally, the expected degradation is deterministic, further aiding mathematical analysis when receiver observables are considered.

This work considers the observables available to a receiver to determine whether the watermark is present and the signal is authentic. We present methods to compute a convenient statistic to determine authenticity, including missed detection and false alert probabilities. From our receiver processing strategy, one can design an authentication scheme and a receiver to determine the authenticity of a watermarked signal within the scope of authenticity provided by watermarking signal authentication (e.g., limited by meaconing and other complex attacks).

In Section 2, we use cryptographic, combinatoric, and sampling arguments to ensure that (1) the selection of a watermark and inverted spreading code is uniform, (2) the selection of a watermark or inverted spreading code admits no efficient algorithms to predict future inverted spreading code selections, and (3) the space of available watermarks is sufficiently vast to deter spoofing. These properties lend to tractable mathematical analysis in later sections. In Section 3, we show how the adversarial success probability follows a hypergeometric distribution and discuss how to relate and compute the security level of receiver observables without signal power, noise, or sampling considerations. In Section 4, we propose a filter and prove security bounds on the receiver statistics under standard noise assumptions, allowing one to design a receiver to meet specific security levels. Lastly, in Section 5, we show experimental results from real signals to validate our models. Altogether, this work shows how to design a watermarking signal authentication scheme and receiver so that an adversary must exhaustively search or guess watermarks from a search space too vast for achievable computational resources to spoof the authentication receiver observables.

### 1.1 Signal Authentication Statistics

When determining the authenticity of a signal, a receiver must map relevant information from the raw intermediate-frequency radio data to a statistic against a threshold. For the statistic to be useful, the probabilities of missed detection and false alarm, given the selected threshold, should be small. Statistics utilized in the literature are generally related to correlations between the data and watermarked and unwatermarked replicas and make security arguments based on modeled likelihoods.

For instance, Humphreys (2013) presented a detection statistic for hypotheses regarding the authenticity of a single chip for various security code estimation and replay (SCER)-capable adversaries. After modeling security chip estimation strategies, their methods estimate the probability that an adversary can correctly guess a modified chip within a watermark. Because our mathematical formulations provide a pathway to compute the probability density function of spoofing a watermark, our methods can also perform the same computation part-way through a watermark, given the successful estimation of chips at the beginning of the watermark. This capability lends a pathway for combining the methods of this work with SCER-adversary analysis. However, we reserve this for future work and exclude SCER-capable adversaries here (described further in Section 1.2).

In this work, we consider the information provided over an entire watermark instead of an individual chip. A commonly suggested statistic is derived from a matched filter with the correct security code sequence, for instance, as reported by Caparra and Curran (2018). In their work, Caparra and Curran (2018) proposed a hypothesis test on post-correlation data that presumes that the output is normally distributed in both hypotheses and used experimental data to identify those normal distributions. Then, they used the Kullback–Leiber divergence and Chernoff–Stein lemma bound to determine the efficacy of their tests. In this work, we will use a different (correlation-like) statistic that allows one to compute the output without assuming a normal distribution output or requiring experimental data to determine the center and spread of the output. The output distribution will appear to be normal because of central-limit-theorem effects. However, our statistic allows us to compute the output distributions directly, and the distributions are favorable to provide low probabilities of missed detection and false alarms. O’Driscoll et al. (2022) presented a combination of statistics that use the outputs of multiple matched filters that provide robustness when an adversary correctly observes chips and compensates by changing the spoofing signal power. In this work, we will use a matched filter based on the subtraction of watermarked and unwatermarked signals, whose output is related to the hypergeometric distribution.

### 1.2 Scheme Scope and Notation

In this work, we will repeatedly use *n*, *r*, and *s* to refer to the navigation scheme parameters. Our methods can easily apply to any navigation signal. Here, we divide the navigation spreading code into *n* sections. The provider selects *r* sections uniformly and pseudorandomly for a watermark to be inverted. An adversary may elect to spoof a signal with *s* chip sections inverted by a random guess or exhaustive search among the vast watermark space.

We discuss example theoretical signals based on the GPS coarse acquisition (C/A) and L1C signal designs. As suggested by O’Hanlon et al. (2022), we use *n* = 1023 and *r* = 52 for both GPS C/A and L1C and *n* = 10230 and *r* = 520 for GPS L1C. Our selection of the inverted chip-section duty cycle comes from O’Hanlon et al. (2022) to provide a representative comparison to Chimera’s selected duty cycle (Air Force Research Laboratory, 2019). We do not split the inversion duty cycle into fast and slow watermarks for simplicity, noting that it is possible to either (1) design a fast and slow authentication scheme to use the same watermark, as reported by Anderson et al. (2022a), or (2) construct the watermark to retain the mathematical modeling used in future sections. We will also show how to use data bandwidth to distribute watermarking commitments efficiently; thus, each spreading code sequence has its own watermark. For the C/A signal, each 1-ms spreading code has its own watermark, where *r* = 52 of the chips are flipped among the total *n* = 1023. For the L1C signal, the spreading code is divided into *n* = 1023 10-chip sections, of which *r* = 52 are flipped. The same methods can apply to an L1C signal in which *r* = 520 of the L1C chips are flipped among the *n* = 10230 or anything similar. Moreover, one could reserve a number of chips as never flipped (noting the harm to our security arguments) or apply this approach to other spreading code designs.

Certain attack strategies are outside the security scope of watermarked signal authentication. By using proper TESLA synchronization and an onboard clock, one can prevent longer replay attacks (Anderson et al., 2022b). SCER attacks are outside of the scope of this approach (Arizabaleta et al., 2019). Therefore, claims about provable authenticity must be qualified with these considerations.

Within the modeling equations, we make some simplifying assumptions of spoofed signals that also qualify the claims of this work. We constrain spoofed signals to those with constant power (i.e., the adversary is not electing to spoof with varying powers among the chip sequence). We note that this degree of freedom presents some advantage in the context of SCER-capable adversaries (O’Driscoll et al., 2022). Moreover, we assume that the receiver can securely place a lower bound on the signal-to-noise ratio in the presence of jamming and that jamming will behave as additive white Gaussian noise (AWGN). In this work, we note that the receiver will observe signals over a period of time in which non-AWGN jamming comes under central-limit-theorem effects. The authors are not aware of how varying the chip power or noise distribution would provide a more efficient algorithm for breaking the security and assumptions claimed in this work; however, we must defer to future work to make such extensions to the proofs herein.

## 2 SECURE WATERMARK SELECTION

We propose that the provider pseudorandomly selects a combination of integers to determine which spreading code sections to invert to create the watermarked spreading code. The selection of inverted spreading code must be uniform to ensure that no efficient algorithm can guess a watermark. Section 2.1 provides an unbiased combination selection function.

In the context of TESLA, the watermark is derived from the TESLA secret commitments. We must ensure that observation of the watermark lends no efficient algorithm for guessing TESLA commitments presumed secret and held by the provider for the delay disclosure time. Otherwise, we break the authentication afforded by TESLA. Moreover, we must ensure that observation of specific chip-section inversions lends no efficient algorithm for guessing other chip-section inversions. We achieve these properties by using a cryptographic one-way function to derive the pseudorandom integer function from Section 2.1. Section 2.2 provides a one-way random integer function based on HMAC.

In Sections 2.1 and 2.2, we construct our secure combinatorial watermarking function. A combinatorial watermarking function uses an input cryptographic commitment to compute a combination of integers that determines the spreading code inversions. A *secure* combinatorial watermarking function (1) yields uniform watermark and chip-section selections and (2) is cryptographically one-way. Figure 1 depicts an overview of how the properties of a secure combinatorial watermarking function work within the TESLA context to provide authentication.

### 2.1 Combination Selection

Let us suppose that we have *n* chip sections and desire to randomly select *r* of them to invert for composing a watermark. The provider could draw a random combination in several ways, assuming that there is access to a one-way random integer function (Section 2.2). A preliminary approach would be to repeatedly draw one-way random integers from 1 to *n* until *r* unique elements are drawn. While this approach would work, it can be computationally wasteful, especially in the context of a scheme in which many integers must be drawn. Floyd provides a better approach with a deterministic number of draws (Bentley, 1987). Floyd’s algorithm provides a way to select *r* unique elements with exactly *r* calls to a random integer function. For this, we provide pseudocode with Algorithm 1 and a simple proof in Appendix A.

### 2.2 One-Way Pseudorandom Integers

Floyd’s algorithm discussed in Section 2.1 requires a random integer function. In Algorithm 1, we use an HMAC-based pseudorandom function with Algorithm 2. Other pseudorandom functions could provide the random integers required for Algorithm 1; however, Algorithm 2 has several desired properties.

One such property is that Algorithm 2 is a one-way pseudorandom function. A one-way function admits no efficient algorithm to determine the function pre-image to a function output. HMAC provides cryptographic security based on its one-way property. Therefore, provided that the integers are derived from HMAC, the integers also have the one-way property. Algorithm 2 admits no efficient algorithm to determine the input seed from the output integers.

An adversary can observe the watermark with a radio. Upon observing an inverted chip section in the spreading code, the adversary can deduce the integer derived from our watermarking function. By ensuring that the integer is derived from a one-way function, we limit how an adversary can predict future inverted chip sections in the remainder of the watermark and derive or guess the watermarking seed. This approach also allows us to derive the integers in Figure 1 from a one-way arrow.

Another property we desire is for the function to efficiently utilize computational resources. One naive way to derive the integers required is to produce an HMAC for each derived integer. An HMAC allows the pseudorandomness to be derived in parallel (as opposed to via repeated hash application). Unfortunately, there is no way to sample an arbitrary random integer with a deterministic runtime unless the range of integers is a power of 2. For an arbitrary random integer range, we must sample with rejection to ensure that there is no bias in the pseudorandom integer function output. For instance, the output of HMAC cast to be an integer can be rejected if it is outside the desired integer range in favor of new HMACs until the output is within the desired range. If the HMAC output is 256 bits but the range might be considerably smaller (e.g., 1–1023), the function would reject most outputs. To derive an integer *i*, we should use the smallest number of bits required to delineate the range of positive integers up to the smallest power of 2 larger than *i*. Therefore, it is desirable to sparingly use the output bits from HMAC so that multiple random integers can be derived from a single call to HMAC.

Lastly, we desire to deter pre-computation attacks against the function. We can achieve this deterrence by introducing salt into the computation of random integers. This salt can be the same salt used for the rest of TESLA; thus, no additional data bandwidth is required for its distribution.

Algorithm 2 has all of the above properties. Because the random bits used to generate the random integers are obtained from an HMAC derived from the inputs with Lines 3 and 10, we satisfy the one-way property. We cache the HMAC output to ensure a minimum number of HMAC calls. Line 6 ensures that the minimum required number of bits is used for generating each integer. We use a cache containing a counter and start bit to ensure non-periodicity with Lines 3 and 10. Salt is incorporated into the HMAC call to deter an adversary from generating a rainbow table of seeds to output integers. The recursive structure achieves sample rejection with a concise program.

## 3 HYPERGEOMETRIC SUBTRACTED MATCH

The requirements of Sections 2.1 and 2.2 and the construction of Algorithms 1 and 2 ensure a *secure* combinatorial watermarking function. Because a secure combinatorial watermarking function admits no efficient algorithm to predict a watermark, we can model adversarial attacks on the watermark as uniform guesses, simplifying our mathematical analysis. Moreover, the combinatorial nature of the watermark allows easy exponential expansion of the watermark space, enabling standard cryptographic security.

To determine the authenticity of a pseudorange measurement, the receiver records the signal raw radio data and notes the correlation between the watermarked signal and the original spreading code. Then, after revealing a commitment according to the TESLA schedule, the receiver recomputes the correlation with the watermarked spreading replica. An adversary may randomly guess a watermark to spoof a signal, which would be reflected in the relative correlation measurement. In Section 3.1, we show that the distribution of an adversary selecting correct chip-section inversions is hypergeometric. In Section 3.2, we relate the hypergeometric distribution to the receiver correlation observation without power, noise, or sampling considerations.

### 3.1 Hypergeometric Distribution of Guessing Chip Sections

To begin, we note that the adversary’s ability to guess a watermark is limited to an exhaustive search or random guessing from previous sections. Let us suppose that we have a watermark with *n* chip sections, of which the provider will flip *r*. A chip section is a section of chips inverted together. For instance, a provider could take a 1023-chip sequence and flip 52 chips to make a watermark or a provider could take a 10230-chip sequence and flip 52 10-chip sections to make a watermark. The adversary may take the original spreading code replica, infuse its own guessed watermark, and broadcast an unauthentic watermarked signal. The adversary may elect to invert as many chip sections as they want, increasing the complexity of the scenario; here, we take *s* as the number of chip sections inverted by an adversary in an attack. There are watermarks for the provider to choose and watermarks for the adversary to choose, and their product is the total number of possible scenarios.

Now, let us suppose that the adversary guesses *h* chip sections correctly. Among the *r* authentic chip-section selections, there are ways for the adversary to guess *h* correctly and then ways for the adversary to guess the remaining incorrectly in the same attempt. Therefore, we arrive at Equation (1) for the distribution of correct adversarial chip-section guesses, the hypergeometric distribution. Hereafter, we denote the hypergeometric distribution with the scheme’s parameters and adversarial strategy as and a particular instance of as *h*. Moreover, we denote the probability density function of as :

1

### 3.2 Authenticity with Basic Correlation Measurements

A receiver will use a correlator to determine authenticity. In this section, we start relating the hypergeometric distribution to the output of a correlator. As a starting point, this section derives probability bounds on the security of the output of a correlator *without* considering (1) signal power, (2) noise, (3) the receiver sampling rate, or (4) the carrier wave. We revisit these effects in Section 4. In this section, we assume single-chip sections, with one sample per chip, and we assume that the signal is simply the spreading code. These assumptions allow our derivations to be simpler for the moment so that this section can focus on combinatoric arguments.

To deduce a pseudorange measurement, a receiver correlates a replica of the spreading code with its baseband samples by using a matched filter (Haykin, 1988). Let *S _{j}* ∈ {−1, 1}

^{n}be the receiver-measured signal with no noise, where

*n*is the number of

*j*-indexed samples over an interval matching the

*n*chips for a single watermark. For instance, a theoretical GPS C/A watermarked signal could flip

*r*= 52 chips among the

*n*= 1023 so that the loss of replica correlation is approximately 10%.

*S*can be either authentic or spoofed, which we denote as

*S*

^{auth}and

*S*

^{spoof}, respectively. Let

*R*, be the replica and watermarked replica, respectively, with the same

_{j}*n*- and

*j*-index convention. Let the match function be defined as follows, noting that the match function is equivalent to a convolution via a filter whose kernel is the reversed replica, denoted as

*R*

_{−}. We utilize the

*valid*convolution without padding within a match. The valid convolution of a signal and the reversed replica of the same sample length is a scalar: the unnormalized correlation:

The receiver will use two replicas (*R* and *R ^{w}*), and the signal may be authentic or spoofed; therefore, there are four convolutions useful to this discussion. When

*S*is authentic,

*S*

^{auth}=

*R*because only the provider knew

^{w}*R*at the time of broadcast. Each term of the convolution in Equation (2) aligns, yielding the total number of elements. For the terms of the convolution in Equation (3),

^{w}*r*terms are misaligned. Because those

*r*sum terms change from 1 to –1 for each misalignment, the difference is twice

*r*:

2

3

In the spoofing case, the adversary may elect to flip *s* chips; thus, the convolution with the original replica is always *n* minus twice *s*, as in Equation (4). For the convolution with *R ^{w}*, in the case where the adversary guesses every chip incorrectly, the difference results from the

*r*watermarked chips

*and*each of the

*s*incorrect chips. Thus, in the worst case, the convolution is

*n*– 2

*r*– 2

*s*. For each chip guessed correctly, the misalignment from one of the

*r*provider chips and one of the

*s*adversary chips simultaneously aligns, with the convolution increasing by 4. Therefore, if the adversary guesses

*h*chips correctly among

*s*, the convolution becomes Equation (5).

*h*is the number of chips that the provider and adversary both select, which follows the hypergeometric distribution from Section 3.1 because the chip selection is uniform, as described in Sections 2.1 and 2.2:

4

5

For our simplified case, we suggest the subtracted correlation statistic *X* of Equation (6), where the normalized match using *R* is subtracted from the normalized match using *R ^{w}*, to determine the authenticity of

*S*:

6

7

Substituting Equations (4) and (5) into Equation (6), we derive how our statistic *X* relates to the hypergeometric distribution under spoofing conditions via *f* of Equation (8). In our notation, *X* above provides the mathematical formula function for the receiver (not the distribution). Hereafter, *x* denotes the particular instance of the statistic, and denotes the distribution:

8

9

10

Equation (8) provides an increasing and linear map from the hypergeometric domain to the corresponding output in this simplified filter scenario. In Equation (9), is a function of the random variable , and Equation (10) provides the PDF formula.

Let us suppose that we desire to enforce *l*-bit security on the watermark authenticity discrimination statistic *X*. In this case, based on the cryptographic properties of the functions used in the construction of the watermark, our decision based on *X* admits no efficient adversarial algorithm advantage beyond an exhaustive search over at least a 2^{l}-sized search space. Therefore, we desire to establish an upper bound of 2^{−l} for the probability of guessing a specific watermark that will spoof a statistic *X* on a receiver.

To demonstrate how a security level can be applied on *X*, we introduce a security parameter *b _{h}* within the support of , and its associated

*b*=

_{x}*f*(

*b*,

_{h}*n*,

*r*,

*s*) within the support of to enforce

*l*-bit security. According to the TESLA schedule, the receiver will make an authentication determination based on a match increase

*X*from using

*R*over

^{w}*R*after a watermark seed is released. Therefore, we desire

*l*-bit security on an adversary guessing a watermarked spreading code that will sufficiently spoof a statistic for every attack strategy

*s*. We must consider every

*s*strategy, as an adversary can elect to flip as many chip sections as they wish:

11

The upper tail of the hypergeometric distribution of the form of Equation (11) allows us to compute the probability of an adversary sufficiently guessing a watermark to spoof a statistic based on the subtracted normalized match *X*. The problem for designing a watermarking signal authentication is now the selection of signal parameters *n*, *r*, and *b* to provide a desired security level *l*.

The hypergeometric cumulative probability density function does not have a convenient form, frustrating further simplification of Equation (11). However, it is possible to continue the analysis of a watermarking design computationally. We provide an example of a theoretical GPS C/A signal in Section 3.2.1.

#### 3.2.1 Example with GPS C/A

In this section, we provide an example analysis of a theoretical GPS C/A signal. In Section 5, we also provide an additional example for GPS L1C similar to that proposed by Chimera (Air Force Research Laboratory, 2019) that considers additional effects. Let us consider the GPS C/A signal with 1023 chips per millisecond. To achieve approximately 10% degradation of the signal, we flip 52 of the chips. Therefore, in the nomenclature herein, *n* = 1023 and *r* = 52. Because each flip inverts the autocorrelation power, the expected correlation of this watermark signal will be with the exclusionary noise and power assumptions from Section 3.2. The total number of watermarks is very large , and the selection of a combinatorial watermark admits no efficient algorithm beyond an exhaustive search. Now, we proceed to compute the necessary security bound to enforce security at a desired level, as shown in Figure 2.

Figure 2 shows the upper-tail probability that an adversary could spoof the subtracted match statistic *X* before the distribution of the delayed TESLA seed via Equation (11). The *x*-axis presents the subtracted normalized match *X*. The *y*-axis shows the upper-tail probability , which is the sum from Equation (11) for a representative selection of *s* values. Figure 2 shows one tenth of the possible *s* values from 0 to 511, with some results colored to show trends. When an adversary elects to flip half of the chips, the signal degrades to a pure pseudorandom sequence, and the hypergeometric distribution centers at *x* = 0, providing a worst case. We note that we expect the receiver to lose track of the signal long before *s* approaches 511; however, this situation forms our worst case suitable for conservative analysis. If the adversary elects to flip all of the chips, then the signal might appear to have an inverted data stream or a phase change, forming a mirrored situation that is corrected in the tracking loop. One could compute probability bounds after first limiting to a more reasonable *s* value (i.e., before a receiver would lose tracking), but interestingly, we find that we can bound 32-bit security even in the *s* = 511 case. Moreover, this 32-bit security is achieved over a 1-ms watermark. We defer our analysis on longer integration times to later sections.

Many GNSS systems incorporate navigation message authentication with shorter HMACs (Anderson et al., 2023). The 32-bit security is satisfactory because of the short time for which the HMACs must secure the message between broadcast and the delayed disclosure of the TESLA secret. If 32-bit security is acceptable for navigation message authentication in these systems, then 32-bit security may also be acceptable for watermarks in these systems. Based on Figure 2, *X* ≥ 0.086 would meet the acceptable security level for *every* attack strategy *s* and assert authenticity to 32-bit security on . However, depending on the assumptions of the signal processing capabilities, it may be possible to have less stringent bounds on *X* to assert authenticity. Figure 2 shows that the highest likelihood for spoofing *X* corresponds to the case in which the adversary submits a random sequence to the receiver.

## 4 RECEIVER OBSERVABLE SECURITY

In Section 3.2, our derivations did not include signal power, noise, or sampling considerations to allow the reader to understand the combinatoric derivation separately from effects induced by actual receivers. By utilizing the properties of the matched filter and convolution, we now include signal power, noise, and sampling considerations.

Our received signal, *S _{j}*, is now the signal immediately preceding the spreadingcode correlators that form the tracking loop (after other filtering and amplifications before correlation).

*S*is composed of the replica broadcast

_{j}*R*(including the carrier and spreading code) with power

_{j}*P*and noise

*N*, as denoted in Equation (12). In Section 5, we provide a method for determining and

_{j}*σ*

^{2}. We model the noise as typical AWGN, specifying the noise power immediately preceding correlation as , which is also assumed constant over an authenticity determination. In the case of C/A and L1C signals,

*σ*

^{2}only includes in-phase channel noise:

12

The receiver feeds *S _{j}* into a set of correlators to form a tracking loop. We assume that the receiver has already acquired the signal, and our security statistics will be obtained from a functioning tracking loop. To incorporate the cryptographic security described above, we invoke the properties of the matched filter. In our derivation, we utilize the

*valid*convolution with no padding. The valid convolution of a signal and the reversed replica of the same sample length results in a scalar. The tracking loop already maximizes this scalar.

A naive approach might be to use the prompt correlator value from the tracking loop and another prompt correlator that is executed after receipt of the watermark seed. However, we suggest a different approach in which the receiver has a separate filter for authentication. The separate filter will better account for noise considerations and efficient memory use, as discussed in Section 4.2. We suggest that the receiver utilize a filter whose kernel is . We note that the matched filter convolves the reversed replica to maximize the peak indicating correlation. Each *R* now contains the carrier wave (whether in-phase or quadrature) and spreading code information with the receiver sampling rate and a coherent integration time over the single spreading code (or more than one single spreading code in later sections). We discuss how to aggregate over multiple watermarks in Section 4.1. By specifying the filter this way, we are agnostic to the filter sampling rate and whether the samples evenly divide the chip sections. We include a gain *K* for mathematical conciseness later, arriving at statistic *Y*, shown in Figure 3.

We feed the signal *S* through our modified matched filter to arrive at Equation (13). Because our filter is linear-time-invariant, we can consider each term separately:

13

First, we derive a suitable value for *K*. In Section 3.2, the case was derived without power, noise, or sampling considerations to show that . We suggest that the *K* of Equation (14) be used to map the case *with* power, noise, and sampling considerations to the case *without* these consideration to aid in mathematical conciseness for deriving safety bounds. This approach will ensure that the spoofing hypothesis distribution is a simple sum of (1) the hypergeometric distribution and (2) the normal noise distribution:

14

Equation (14) enforces that the output of *Y* is normalized to the fraction of inverted spreading code.

As for all of our hypotheses, *K* is a function of *P*. We provide suggestions on how to regress *P* and *σ*^{2} in Section 5. A different value of *K* can be used with the appropriate adjustments to the derivations below; however, our choice removes many constants from our derivations, allows for easier comparison among different scheme designs, and allows us to nakedly cite the derivations from Section 3.2 for mathematical conciseness.

Second, we propagate the noise in Equation (13). We compute the propagated noise variance via a Fourier transform , noting that the expectation of *N* is 0 and the variance is invariant under based on Parsaval’s theorem. We derive Equation (15) as the variance of the zero-mean filtered noise:

15

Last, we consider two hypotheses: (1) the signal is authentic or (2) the signal is spoofed. In the authentic case, , although the receiver does not know *R ^{w}* until later. This scenario translates to

*R*=

*R*in Equation (13). We arrive at our authentic hypothesis model in Equation (16): a simple normal distribution. Following Section 3.2,

^{w}*Y*is the formula for the receiver statistic on the receiver, is the distribution (either spoofed or authentic), and

*y*is a particular measurement instance of the receiver:

16

In the spoofing case, . In Section 3.2, we showed how under the assumptions of one sample per single-chip section, no noise, and unitary power. Now, we recall our choice of *K* from Equation (14) and derive the transformation *g* to obtain Equation (18): a sum of the hypergeometric distribution and a simple normal distribution. The transformation *g* relies on the uniformity of chip-section inversions from our combinatorial watermark:

17

18

Equations (16) and (18) account for any signal power, noise, and sampling rate; however, to make this more derivation concrete, let us suppose that a receiver has a fixed sampling rate *F* for the signal *S* over a coherent integration time *T* that also evenly divides the number of chip sections *n*, of which *r* are inverted. In this case, we have the following:

19

20

From Equations (19) and (20), we observe the trade-off that a receiver must make between authentication time, authentication uncertainty, and choice of hardware. Given our models, we now seek to find a decision boundary *b _{y}* with an acceptable probability of false alarms and missed detection, corresponding to Equations (21) and (22), respectively:

21

22

Equation (21) lends to a trivial false alert probability bound via the Gaussian cumulative probability density function. Equation (22) is more complicated and is discussed in Section 4.1.

### 4.1 Missed Detection and False Alert Probability Bounds

Now that we have identified the distributions of the two relevant hypotheses, we now discuss how to compute missed detection and false alert probabilities, given a decision boundary on *Y*. We have a simple normal distribution for the authentic hypothesis, allowing us to trivially use the normal cumulative probability density function to compute the false alert probability. We suggest using convolution to compute the probability density function of .

Starting with Equation (20), we note that the structure is a simple sum. The probability density function of a sum of two probability distributions is the convolution of the two probability distribution functions, provided that their supports are the same, as in Equation (23):

23

We can extend this approach when observing multiple watermarks. Here, we suppose that *Y* now includes *H* individual watermarks. For instance, an L1C receiver with a spreading code length of 10 ms might perform 100 coherent integrations over 1 s with *H* = 100 watermarks. To account for this, we first note that, in our formulation, *K* already compensates by a factor of within the extension of *R* so that is still centered at . Individual *Y* statistics are independent because individual watermarks are cryptographically independent and the noise is assumed to be independent. The distribution of with *H* watermarks is simply convolved with itself *H* times. In our notation, we indicate *H* repeated convolutions as (·)^{*H}:

24

In our derivation of Equation (24), we exploit the properties of convolution and deliberately separate and to reduce computational costs. The repeated convolution of the normal probability density function can be performed analytically by taking the sum of identically distributed normal random variables, which allows us to discretize the domain in the last step. The repeated convolution of the hypergeometric probability density function can be conducted with the limited 0 to *r* domain first and then up-sampled to the desired granularity for the final convolution with the repeatedly convolved normal distribution. The repeated convolution can also be performed through exponentiation by squaring and via the fast Fourier transform. Whereas the computation of , *H* can be conducted once the system is offline, as *H* becomes larger to accommodate non-ideal receivers, ignoring these computational considerations and techniques becomes costly. By implementing these techniques, we can compute , *H* = 10000 in less than 20 s using laptop hardware.

In Figure 4, we use Equations (19) and (24) to compute the probability of missed detection and false alarms given a boundary *b _{y}* on

*Y*. The upper tail of is the probability of missed detection, and the lower tail of is the probability of a false alarm. We consider an L1C signal in which 10230 chips are divided into 1023 10-chip sections, of which 52 are inverted per watermark (i.e., the

*n*,

*r*case), and the case in which 520 of the 10230 chips are inverted (i.e., the 10

*n*, 10

*r*case). Our default signal characteristics are based on the equipment and experimental results given in Section 5. These characteristics include

*T*= 10 ms for the coherent integration time per watermark from the L1C spreading code,

*F*= 25 MHz from the equipment, and the predicted pre-correlation signal-to-noise-ratio from Section 5.

In Figure 4 (left), we observe the following trends. The blue lines show that increasing *F* decreases the probability of missed detection. The green lines indicate that increasing the integration time to include multiple watermarks dramatically decreases the probability of missed detection. This results from the independence among individual watermarks. From the purple lines, we see that increasing the scheme parameters decreases the probability of missed detection. Among these three trends, it appears that integrating over multiple watermarks is the most effective way to reduce the probabilities of missed detection and false alerts. From Figure 4 (right), we observe that we can use a longer integration time, on the order of seconds, to compensate for weaker signals.

### 4.2 Receiver Memory

Our suggestion of using a filter separate from the tracking loop allows for a more flexible receiver design and more efficient memory use. Based on Figure 4, we expect that lower-cost receivers may require longer integration times. However, the required memory can be reduced if the TESLA commitments are released more frequently. For instance, let us suppose the TESLA commitments were released every 6 s, but that the receiver’s sampling rate, operating conditions, and protection levels required observing *Y* statistics over a few minutes. After the distribution of the TESLA commitment at 6 s, the memory required for the raw radio data for those 6 s can be dumped in favor of the smaller number of integrated samples of *Y* for which *R ^{w}* –

*R*≡ 0. Therefore, the memory required (e.g., a ring buffer) for a receiver corresponds to

*F*·

*T*

_{TESLA}, where

*T*

_{TESLA}is the maximum delay before a watermark’s secret commitment is released according to the TESLA schedule. Moreover, one can achieve higher protection levels without substantially increasing receiver memory in low-cost receivers.

## 5 EXPERIMENTAL VALIDATION

Although rudimentary simulations of signal watermarks confirm our method’s ability to predict the distributions of *Y*, we found an opportunity to demonstrate efficacy with actual data by modifying an existing software design radio (Bernabeu et al., 2022). We collected data with a Universal Software Radio Peripheral providing raw 16-bit in-phase and quadrature samples at 25 MHz. Figures 5 and 6 present a validation of our methods for GPS C/A data and L1C data, respectively.

Because no watermarked signals were broadcast at the time of writing, we tested our methods on a mirrored situation. In a real watermarking signal authentication scheme, the receiver correlator tracks a watermarked signal using the original spreading code. Instead, to simulate the authentic situation, we had the receiver correlate with a watermarked replica to track the real original signal. The net result is the negation of statistic *Y*. To simulate the spoofed situation, we designed the *Y* filter to use two watermarked replicas. The first replica was a random watermarking function application of the original spreading code. The second replica was an additional random watermarking function application of the first watermarked replica. Provided that both watermarks have the same number of flips (i.e., *r* = *s*), we have the inverted *Y* for the spoofing situation. In our results for both scenarios, we compensate for the mirrored situation by negating *K* within the receiver.

Our methods in Section 4.1 presume a knowledge of *P* and *σ*^{2}, the precorrelation signal power and noise, respectively. Receivers will not be sufficiently sensitive to make an authentication determination based on a 1-ms or 10-ms integration; thus, the decision should incorporate a longer integration time. This consideration naturally leads to a simple way to regress an accurate *P* from individual integrations. We suggest dividing the mean prompt correlation by the expected gain from the correlator, as in Equation (25). In Equation (25), we take a watermarked prompt correlation and divide it by *R*_{−} * *R ^{w}* because the signal would be watermarked in a real scheme:

25

We found that regressing *σ*^{2} is more challenging, but ultimately, we backtracked the receiver-estimated through the correlator using derivations from Falletti et al. (2011) with Equation (26):

26

In Equation (26), we take back through the correlation by dividing by ||*R*||_{1}, correct for the equivalent bandwidth *B*_{eq}, and multiply by , as our pre-correlation *S* has been stripped of the quadrature components, leaving a signal-to-noise ratio that is multiplied by our regressed *P*. Alternative ways could be more effective, including those that directly regress or bootstrap the ratio. We defer to future work to ensure that the regression is resistant (or provably immune) to manipulation by an adversary. However, we suspect that adversarial attacks will always increase *σ*^{2}, which will be appropriately accounted for in the receiver’s confidence in authenticity.

In Figure 5, the receiver replica is watermarked by inverting 52 chips from 1023 chips per millisecond. Each 1-ms spreading code received its own watermark. The statistic *Y* was computed for ten consecutive watermarks and individual coherent integrations in the converged tracking loop. We chose these parameters to match with L1C and Figure 6 to demonstrate the efficacy of using convolution to predict the distribution. In Figure 6, the watermark is generated by dividing the L1C spreading code into 1023 10-chip sections. Among the 1023 10-chip sections, 52 of the 10-chip sections were flipped every 10 ms.

In Figures 5 and 6, we validate our methods with real data. We observe that the spoof distribution closely resembles a normal distribution, an effect from both the concentration of the hypergeometric distribution and the central limit theorem. Although one could approximate both hypotheses with normal distributions in postprocessing, our methods predict the center and spread of the spoofed condition distribution. Given an estimation of the signal , one can predict the distribution of in the worst-case adversarial attack (i.e., ) to compute security bounds on *Y* for real-time processing (e.g., Figure 4). Furthermore, we validate our method’s ability to predict radio requirements.

## 6 CONCLUSIONS

In this work, we reintroduced combinatorial watermarking functions. Our methods present several advantages over the current state of the art. These advantages include a flexible and transferable security scheme that presents a mathematical pathway for proving security bounds. We showed that, given model assumptions on receiver noise, the receiver response to these watermarks can be mathematically modeled, and we can derive security bounds on typical security levels with this receiver processing strategy. Through our derivation, we identified the tools required to design a scheme and corresponding radio capable of determining the authenticity of a signal. We validated our mathematical models with radio data collected via a software-defined radio.

## HOW TO CITE THIS ARTICLE

Anderson, J., Lo, S., & Walter, T. (2024). Authentication security of combinatorial watermarking for GNSS signal authentication. *NAVIGATION, 71*(3). https://doi.org/10.33012/navi.655

## APPENDIX A PROOF OF FLOYD’S ALGORITHM

For the reader’s convenience, herein we provide a proof that Algorithm 1 selects a combination in which the probability of the selection of any element is the same, inspired by Bentley (1987) and Blatter (n.d.). To begin, let us suppose that we have *n* total elements from which we wish to draw a uniform combination of *r* elements. Within Algorithm 1, the for-loop iterates *j* from *n* – *r* +1 to *n* (inclusive). Let *P _{j}*(¬

*i*) be the probability that element

*i*is not selected in the

*j*-th for-loop iteration, and let Prob(¬

*i*) be the probability that element

*i*was never selected after the last iteration of the for-loop. Here, we seek to show that Prob(¬

*i*) is the same for all

*i*. To do this, we must split Prob(¬

*i*) into two cases that correspond to whether the if-conditional within Algorithm 1 could activate (A) 1 ≤

*i*≤

*n*–

*r*+1 or (B)

*n*–

*r*+1 <

*i*≤

*n*.

For Case A, we start with the very first draw, corresponding to when *j* = *u* – *r* +1, . Thereafter, . Therefore, we have the following:

For Case B, *i* is not in the range of the random integer drawn until *j* = *i*. Therefore, *P _{j}*(¬

*i*) = 1 ∀

*j*<

*i*. On the

*j*=

*i*draw, following through the for-loop iteration, either the

*i*-th element is selected via the random integer function or the

*i*-th element is selected because the random integer function provided an element previously drawn. Among the

*i*possible elements to draw,

*i*– (

*n*–

*r*+1) +1 outcomes lead to drawing

*i*, and

*n*–

*r*outcomes lead to not drawing

*i*. Therefore, . Thereafter, as for Case A, . Therefore, we have the following:

Because both Case A and Case B yield the same Prob(¬*i*), each element *i* is selected uniformly among the combinations of *r* elements.

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.