Abstract
Jamming signals can jeopardize and ultimately prevent the effective operation of global navigation satellite system (GNSS) receivers. Given the ubiquity of these signals, jamming mitigation and localization techniques are of crucial importance, and these techniques can be enhanced with accurate jammer classification methods. Although data-driven models have proven useful for detecting jamming signals, training these models using crowdsourced data requires sharing private data and may therefore compromise user privacy. This article explores the use of federated learning to locally train jamming signal classifiers on each device, with model updates aggregated and averaged at a central server. This approach ensures user privacy during model training by removing the need for centralized data storage or access to clients’ local data. The personalized federated learning strategies employed in this study are also tested on non-independent and identically distributed data sets composed of spectrogram images from interfered GNSS signals. In addition, this article discusses the effect of model quantization, which is used to effectively reduce communication costs, as well as a fusion strategy for personalized federated learning schemes in which multiple classifiers are available.
1 INTRODUCTION
Global navigation satellite system (GNSS) jamming signals are L-band spectrum interferences that can overpower a GNSS receiver and prevent its effective operation (Amin et al., 2016; Morton et al., 2021). A wide variety of jammers can be found in the online market at cheap prices, which makes intentional, human-made jamming signals a threat such that to national security and safe navigation (Borio et al., 2016; Morales-Ferre et al., 2020). In addition, signals do not need to be malicious to have a jamming effect: even legitimate waveforms, including the continuous wave interferences produced by damaged electronics and the signals emitted by Distance Measurement Equipment technology conceived for aircraft navigation, can interfere with GNSS receivers (Li et al., 2019). Jamming sources are located on Earth or, in the case of drone jammers, near Earth’s surface. Because of path-loss attenuation in GNSS signals due to the large distance between Earth and GNSS satellites, jamming interferences are often received with remarkably higher power than the useful GNSS signal, which can lead to localized performance disruption over a radius of several kilometers (Mitch et al., 2011). Jamming has been suggested to be the main cause of GNSS-based service outages (Morales Ferre et al., 2019), making protection against this kind of attack a desirable feature in GNSS receivers (Dovis, 2015; Thombre et al., 2018).
One such form of protection involves jammer classification, which can enhance classical interference cancellation techniques. In general, interference cancellation techniques are formulated as an estimation problem where the jamming signal is detected and estimated, often with a parametric model (Borio & Closas, 2017). Because the aim of these techniques is to first reconstruct the interference, knowledge of the type or class of interference can accelerate the algorithm. For example, if the algorithm knows that a continuous wave interference is threatening a receiver, it would only need to estimate the interference’s central frequency in order to reconstruct its waveform and implement a cancellation measure. Furthermore, jamming classification techniques inherently involve detecting the interference. Most previous GNSS studies regarding protection against jamming interferences have focused on detecting (Arjoune et al., 2020), mitigating (Borio et al., 2018), and localizing the interference (Strizic et al., 2018), with little effort dedicated to the classification of jamming signals (Morales Ferre et al., 2019) until more recent publications (Chen et al., 2024; Mehr & Dovis, 2022). The notable exception is some previous work in the context of radar systems, such as the machine learning jamming prediction algorithm proposed by Lee et al. (2020).
Recent studies have focused more explicitly on potential approaches to classifying jamming signals. For example, Morales Ferre et al. (2019) proposed Support Vector Machine- and Convolutional Neural Network-based classifiers for the purpose of jammer classification, which they treated as an image classification problem. Their classifiers achieved nearly 99% accuracy at detecting a jamming incident and over 90% classification accuracy when differentiating among a variety of jamming types (namely those considered in this article) after being trained on a set of 600 images per class. According to Voigt (2021), the use of multivariate time-series approaches can also increase the accuracy of jammer classification techniques. More recent studies (Closas et al., 2024; Mehr & Dovis, 2024) in GNSS interference classification have prominently utilized machine learning techniques. For example, Mehr & Dovis (2022) demonstrated the effectiveness of Convolutional Neural Networks for jammer classification, while Chen et al. (2024) explored a compound neural network model. Residual Neural Networks have also been employed (Brieger et al., 2022; Zengyuan et al., 2023). Mehr et al. (2023) and van der Merwe et al. (2024) contributed classifiers based on other machine learning techniques, highlighting the growing popularity and success of these methods in GNSS interference classification.
Most studies of GNSS integrity rely on synthetic data because data collection in the presence of jamming signals is difficult. This difficulty is further compounded in studies that wish to use different interference types and received power values. However, some interference effects, such as the ones introduced by multipath reflections, can be difficult to recreate in synthetic data sets. Thus, despite the challenges of obtaining real GNSS interference data, the use of real data has the potential to significantly improve the training and assessment of data-driven classifiers. One option for collecting real GNSS data involves traditional crowdsourcing approaches, where clients record data and share it with a central unit that trains the classifier. Nevertheless, crowdsourcing raises concerns about user privacy because it requires that users send their data directly to a centralized server.
Aimed at addressing these concerns, Federated Learning (FL) has recently attracted great interest because it protects user privacy and efficiently uses resources by harnessing the processing power of edge devices (Niknam et al., 2020). FL is a promising solution that enables many clients to jointly train machine learning models while maintaining local data decentralization. Such collaboration between users in distributed scenarios has proven useful in GNSS interference management tasks (Jiang et al., 2024; Nicola et al., 2020). Instead of exchanging data and conducting centralized training, each party in a FL system sends its model to the server, which then updates a joint model and sends the global model back to the parties. Because the original user data is not exposed, FL effectively addresses privacy issues (McMahan et al., 2017).
FL has proven highly beneficial for jamming signal classification in GNSS applications, particularly for protecting the privacy of collaborating users and reducing the amount of data being exchanged. In the traditional crowdsourcing-based alternative, in which users send snapshot data to a server in charge of centrally training the classifier, intercepted data snapshots could be used to compute the position of the user. These snapshots therefore reveal confidential user information, which is generally undesirable for most users but especially problematic in contexts like military or other sensitive applications. In contrast, FL allows for model training without the need to share raw data, thereby preserving privacy. Additionally, the bandwidth required for such transmission in a crowdsourcing framework can be very large (as we will see from the typical data sets that are employed in training such models), making this approach impractical from a communication perspective. FL reduces this communication overhead by transmitting only model updates rather than large amounts of raw data to a central server. Finally, GNSS receivers are typically spread across various geographical locations, each experiencing different jamming events. Decentralized, FL-based methods leverage this geographic distribution to create more robust and generalized models. In other words, by learning from diverse environments, the models enhance their performance and robustness.
Despite the advantages of FL, one common problem is the challenge of non-independent and identically distributed (IID) data, which diminishes learning effectiveness. Non-IID data refers to the scenario where the data distribution across edge devices differs significantly, posing unique challenges for FL algorithms. Various approaches have been developed to address this issue. One such approach, called personalized FL, has garnered significant attention because it tailors models to each client’s local data distribution. There are several ways to achieve this customization, including local fine-tuning (Ben-David et al., 2010; Wang et al., 2019), meta-learning (Fallah et al., 2020; Jiang et al., 2019), transfer learning (Li & Wang, 2019), model mixture methods (Deng et al., 2020), and pair-wise collaboration methods (Huang et al., 2021).
Quantization is another crucial aspect of FL (Krishnamoorthi, 2018). Specifically, quantization enables efficient communication and reduces computational costs by representing model parameters with lower precision. By quantizing the weights or gradients, we significantly reduce the amount of data that needs to be transmitted during the aggregation process. This downsizing is especially beneficial in FL scenarios where communication resources are limited and bandwidth is a constraint (Lang & Shlezinger, 2022; Reisizadeh et al., 2020). Many quantization methods have been proposed, such as Uniform Quantization (Widrow et al., 1996), Non-uniform Quantization (Baskin et al., 2021), Stochastic Quantization (Damgaard & Hüffel, 1987), Vector Quantization (Gray, 1984) and Quantization-Aware Training (Jacob et al., 2018). In general, integrating quantization into FL algorithms can enhance scalability and privacy while maintaining reasonable model accuracy.
Here, we continue our preliminary (Wu et al., 2023) and more recent work (Deng et al., 2024) towards training jamming signal classifiers using privacy-preserving strategies that can cope with crowdsourcing-based data collection. In this extended study, we specifically investigate (i) the challenge of non-IID datasets, with the goal of developing solutions based on personalized FL strategies, and (ii) the impact of model quantization in the communication process. Our overall objective is to obtain a Neural Network-based global model capable of classifying different jamming signals, as shown in Figure 1. To preserve client privacy while leveraging crowdsourcing data collection strategies, we exploit FL approaches in which model parameters are shared with clients, thereby allowing local classification of jamming signals while avoiding data sharing (Figure 2).
System diagram of the jamming signal classifier considered herein. First, a receiver downloads a pre-trained model from the server, which can be either i) trained on locally available data and sent back to the server for fusion with other models; or ii) used to perform jamming classification on local data. Monochrome spectrogram images of the six jammer types available in the test data set from Morales Ferre et al. (2019) are shown: (b) Amplitude Modulated (AM), (c) chirp, (d) Frequency Modulated (FM), (e) Pulsed or Distance Measurement Equipment (DME), and (f) narrow band (NB) jammers. Class (a) shows a clean signal (no interference).
Federated learning framework for training jamming signal classifiers. First, M collaborative clients receive the parameters of the classifier from a server. These clients retrain the model based on their local data and then upload their updated classifier to the server in charge of fusing the results. This process does not require the exchange of actual user data or positions, thus preserving their privacy.
In our proposed framework, we assume the possible existence of C different jamming types and perform our FL approach over a network with M collaborative users. We then study the FL-based jamming classifier under two different data distribution scenarios. In the first scenario, clients’ data is IID; that is, all clients observe a similar amount of interference instances from all C classes. In the second, non-IID scenario, clients observe data that is unbalanced towards different classes. Although working with non-IID data poses several challenges, non-IID data is common in real-world scenarios given that not all clients have access to all available types of data. In the context of this work, non-IID data arises when not all participating users observe the same classes of jamming signals. We therefore investigate different techniques for addressing the challenges of non-IID data in the context of jammer classification. Rather than employing a single global model distributed among clients, we use a framework where each client maintains a personalized model. This approach enables clients to more accurately represent their unique data distributions while simultaneously benefiting from the collective knowledge derived from other clients. Finally, we investigate the effects of quantization techniques on the transmission of parameters between clients and the server and demonstrate the influence of quantization bit depth on the performance of various FL algorithms.
The remainder of this paper is organized as follows. In Section 2, we describe the satellite signal model and targeted jammer types. Our FL technique is then derived in Section 3, and the experimental setup and results are described in Section 4. Finally, Section 5 concludes the paper.
2 SYSTEM MODEL
For the purpose of this article, we model the analog baseband equivalent of the received GNSS signal as
where s(t) contains the useful GNSS satellite signals and w(t) represents sources of randomness, such as thermal noise, which are typically modeled as an additive white Gaussian noise process. The term j(t) represents the signal waveform generated by a jamming source as measured at the receiver. Several waveforms are possible for j(t) depending on the type of jammer (Morales-Ferre et al., 2020). Accurate knowledge of j(t) allows for prompt reaction to a jamming threat, either for its localization (Nardin et al., 2023) or mitigation. For mitigation, interference cancellation techniques aim to estimate the waveform of j(t) so that it can be reconstructed and directly subtracted from r(t).
Jammers can be classified according to their characteristic features, including the type of device by which they are broadcast, their frequency spectrum, and their number of antennae (Borio et al., 2016). In this paper, we target the same jammer types as in Morales Ferre et al. (2019), given that we use their data set of jammer signals and benchmark our results against theirs. Overall, the aim of our research is to use the FL technique explained in Section 3 for the classification of the following jammer types:
Amplitude Modulated (AM);
Chirp;
Frequency Modulated (FM);
Pulsed or Distance Measurement Equipment (DME);
Narrow Band (NB) jammers; and
No interference.
As in Morales Ferre et al. (2019), we do not consider wideband jammers given that their presence is difficult to detect when analyzing spectrogram images. All five jammers considered here have narrow spectra that overpower the signal of interest, which becomes buried in noise. Note that our classification strategy, proposed in Section 3, simultaneously performs the task of detecting interference because the absence of interference can be properly identified. The waveform expressions j(t) for each of the five jammer types listed above can be found in Morales Ferre et al. (2019) but are not explicitly used for training or testing the FL solution proposed here.
Our classification strategy mostly relies on the behavior of these five jammer types in the frequency domain. While AM and FM jammers target pre-fixed frequencies, others, such as chirp jammers, sweep over different frequency bands. Feature extraction approaches based on spectral analysis of the signals, such as their spectrograms, are therefore suitable for distinguishing different jammer types. Specifically, the short-time Fourier transform allows for the time-frequency localization of the interference signal. (Morales Ferre et al., 2019) successfully approached jammer classification as an image classification problem, where spectrograms of the received signal r(t) were treated as images. In their approach, the spectrograms are computed on the discrete-time version of r(t) in Equation (1), which, at an appropriate sampling rate fs = 1/Ts, would be modeled as r[n] = s[n] + j[n] + w[n], where t = nTs for
3 FEDERATED LEARNING METHODOLOGIES
Many different FL algorithms have been discussed for different applications (Li, Sahu, Talwalkar et al., 2020; Park et al., 2022; Wu et al., 2021) but especially in the field of image classification. One de facto approach for FL is Federated Averaging (FedAvg) (McMahan et al., 2017), which fuses the model parameters by a weighted sum. According to previous studies (Hsu et al., 2019; X. Li et al., 2020), the learning effectiveness of standard FL methods is compromised when using non-IID data. In this section, we explore two FL strategies. The first leverages FL to develop a unique global model capable of making accurate predictions on data from various clients, whereas the second focuses on learning personalized models for individual clients with the goal of achieving higher accuracy on their local data sets.
Global FL
In the first strategy, we consider the setup depicted in Fig. 2, where M collaborative clients train a global classification model (e.g., a neural network) such that:
where
The training process for this model can be formulated as the minimization of a loss function:
where ωg is the initial weight, or, if the model is trained recursively, the aggregated weight from the global model of the previous round.
Personalized FL
For personalized FL, each client has its own personalized model which is shared among clients to cross-pollinate the local data without sharing it. The objective here is to learn multiple models, for which we can mathematically define the loss function:
where ω = [ω1,…, ωi,…, ωM]. When
Because our goal is to employ FL approaches for jamming classification, not to compare different FL methods, we choose several popular FL methods that have already shown promising results across diverse applications. In this section, we 1) discuss the practical problem of quantizing the shared parameters before transmitting them, which affects the performance of the models through the different FL strategies; and 2) introduce a fusion strategy to make a classification decision based on a set of personalized models, which results from a probabilistic interpretation of the local classifiers.
3.1 Quantization in Federated Learning
In FL schemes, the shared information needs to be quantized before being transmitted in order to reduce the communication bandwidth. Here, we briefly discuss quantization, which we then experimentally investigate in Section 4
The quantization operation and dequantization formulas in an FL scheme are as follows:
where
Once the quantized information is received, it can be re-quantized for processing. For instance, in the FL scheme in Fig. 3, the server re-quantizes model parameters before fusing them to some N′ > N number of bits. This re-quantization can be achieved mathematically by:
where
In FL schemes, quantization of shared data happens during transmission from clients to the server and from the server back to clients.
For our work, the model parameters are quantized only for communication purposes, namely when they are downloaded from the server to clients and uploaded from clients to the server. During the local training and server aggregation, the weights are re-quantized to float values. The details of this re-quantization are shown in Algorithm 1, where
General FL Algorithm
A generic FL algorithm operates as follows: given M clients and an initial model ω0, each client receives the current model from the server at each of the T iterations where the process is repeated. Clients then update their model parameters based on their local data and send the quantized model parameters back to the server. The server aggregates these local parameters into either a single global model (in the case of general or global learning) or into M personalized models (i.e., one for each client in the case of personalized learning). The aggregated parameters are then sent back to the respective clients for the next iteration of training. Detailed steps for this generic FL approach are outlined in Algorithm 1.
3.2 Fusion Strategy used in Federated Learning
Some personalized models, like FedAMP, can only learn from data that has a similar distribution as their local training data. This requirement creates a challenge when there is no prior knowledge of which model to use for new data points. In these cases, a fusion scheme is needed to combine the M available models and generate a reasonable estimate (Wu et al., 2024).
In a scheme similar to FedAMP, the i ∈ {1, …, M} classifier (i.e., each of the personalized models) provides a categorical posterior distribution for c, the jammer class, which can take any of the j ∈ {1, …, L} labels based on the current data y and the corresponding model
where pj|i denotes the probability of the j-th label given the i-th classifier, and [c = j] is an indicator function that returns 1 if c = j and 0 otherwise. The a priori class probability p(c) is categorical and defined by the probabilities pj|0, which, in the equiprobable case, result in pj|0 = 1/L, ∀j. The optimal fusion rule is provided by the joint posterior distribution, which Pastor et al. (2021) showed is proportional to:
Here, the M different models are conditionally independent given the c. The resulting joint distribution is categorical, from which the maximum a posteriori probability can be readily obtained to predict the class c.
4 EXPERIMENTS
This section presents a series of experiments in which we show the applicability of FL to train, in a distributed manner, a jammer classifier. Our classifier performs comparably to a classifier trained on a centralized node with access to all local data sets. In the following sections, we describe the data set used for our experiments, how this data set is employed in a distributed learning scheme, how the model was configured, and the results obtained from our experiments.
4.1 Data Pre-processing
We used the data set provided by Morales Ferre et al. (2019), which is available open-access at https://zenodo.org/record/3370934. This data set contains 61800 . bmp monochrome spectrogram images with 512 × 512 pixel resolution, binary scale, and 600 DPI. The spectrograms were computed from simulated GNSS signals affected by interference from the aforementioned jammer types (see Section 2). Morales Ferre et al. (2019) used 6000 images for training (1000 for each jammer class), 1800 images for validation, and the remaining 54000 images for testing.
To optimize computational resources and expedite the training process, we pre-processed the data following an approach typical in machine learning contexts. First, we used both the training and validation data sets, as the validation step is often omitted from the experimentation process unless performing hyperparameter tuning. The combined dataset, which includes both training and validation data, was divided into 75% for training and 25% for testing. To further enhance the training process, image resolution was reduced from 512 × 512 to 256 × 256 pixels using of bilinear interpolation techniques. Finally, once all the data were preprocessed, the pixel values were normalized to the range [–1, 1] using mean 0.5 and standard deviation 0.5 to facilitate the training.
4.2 Federated Data Setting
We investigated two different data settings. First, in the IID setting, all clients received similar data distributions (i.e., a similar number of samples from each jammer class). For these experiments, we uniformly split the data into groups of 20, 30, and 40 clients to examine how client numbers may influence the results. This split resulted in approximately 65, 43, and 32 samples per client for the 20-, 30-, and 40-client scenarios, respectively.
Second, we considered a non-IID setting in which the distribution of training samples from different jammer classes was unbalanced across clients. To generate non-IID splits of the data set, we followed the approach by T. Li, Sahu, Talwalkar, & Smith (2020), in which client data is sampled using a Dirichlet distribution. In brief, we defined the number of clients and classes, and then, for a given client i, we defined the probability of sampling data from each label j ∈ {1, …, C} as the vector (pi,1, …, pi,C) ~ Dir(β), where Dir(⋅) denotes the Dirichlet distribution and β = (βi,1, …, βi,C)⊤ is the concentration vector parameter. The concentration parameter β is used to generate proportions for distributing each class’s data points among the clients. These proportions then determine how many data points of that class each client will receive. By adjusting β, we control the level of skewness in the data distribution, with smaller β values leading to more uneven, non-IID distributions.
The advantage of this approach is that the imbalance level can be flexibly changed by adjusting βi,j. For our analysis, we set the concentration parameter βi,j to a relatively small value of 0.1, thereby creating a more unbalanced partitioning. This imbalance is evident in the distribution of data points among clients, as many client data sets only contain a subset of the six labels. For example, Figure 4, shows the number of samples per class for each client when M = 20 clients, and some clients contain a disproportionately large or small percentage of certain class labels.
Number of data points per class for each of the M = 20 clients.
4.3 Model Setting
Morales Ferre et al. (2019) employed a convolutional neural network (CNN) to train a classifier based on their full data set D. Their solution serves as the benchmark for our results, which rely on the same CNN architecture to train a classifier using the FL framework described earlier. This CNN consisted of one convolutional layer, one pooling layer, and one fully connected layer with a ReLU activation function. The convolution layer used 16 filters of size 12×12×1, a learning rate of 0.01, and a stochastic gradient descent optimizer (Ruder, 2016). The last layer was the softmax layer to produce classification results. Cross-entropy was used for the cost function.
4.4 IID and Non-IID Experiments
Figure 5(a) shows the accuracy of federated averaging algorithms over 400 communication rounds in an IID data setting. The final accuracy of the centrally trained model (approximately 93.4%) was used as a benchmark for subsequent tests. This figure also compares the accuracy achieved with different numbers of clients M. As expected, better results were achieved when a small number of clients were used. With a fixed amount of data, fewer clients means that each client has access to a larger share of the data, thus enabling better training of local their models. Nevertheless, high accuracy was achieved for all tested numbers of clients.
Example of FedAvg in 400 rounds under IID data setting. (a) Accuracy. (b) Confusion matrix of FedAvg for M = 20 clients.
The corresponding confusion matrix in Figure 5(b) reveals that each jammer class is identified with relatively high accuracy, with the DME jammer type and the clean signal (“NoJam”) achieving the highest accuracies (over 99%). The classifier is therefore able to accurately detect the absence of interference, as the interference-free spectrogram in Figure 1(a) differs notably from the others. This unique spectrogram arises because the spectrum of a clean signal contains the signal of interest buried in Gaussian noise, which pollutes the whole spectrogram. On the other hand, because jamming signals are received with dramatically higher power than the satellite signal of interest, the noise w(t) cannot be observed in spectrograms (b)-(f) from Figure 1. In contrast, the SingleFM and NB jammer types were classified with less than 90% accuracy. The results from Figure 1 suggest that the classifier struggled to distinguish between SingleAM and SingleFM interference, which both span only one or two narrow bands of the signal spectrum. Indeed, the SingleFM spectrogram is equivalent to the SingleAM spectrogram with an additional band. The classifier also struggled to distinguish between the NB and SingleChirp interferences, both of which have a lower magnitude in their spectra due to being more spread. This spread makes their respective spectrogram images look blurry relative to spectrograms from SingleAM and SingleFM interference.
Figure 6(a) illustrates the corresponding accuracy of the FedAvg algorithm for different numbers of clients in a non-IID data setting, again compared to the accuracy of the centrally trained model as a benchmark. The results show that the accuracies for different client numbers are lower than the corresponding accuracies in the homogeneous IID data setting, indicating the increased difficulty of learning with heterogeneous data. Consistent with the IID data setting, increasing the number of clients reduced the algorithm’s overall accuracy. Moreover, with 40 clients, the algorithm took more communication rounds to converge than when smaller numbers of clients were considered.
Example results from FedAvg across 400 rounds with a non-IID data set (a) Accuracy of different number of clients. (b) Confusion matrix for M = 20 clients. (c) Confusion matrix for M = 40 clients.
Figures 6(b) and 6(c) show the corresponding confusion matrices for 20 and 40 clients under the non-IID, Dirichlet-distributed data setting. As in the IID data setting, the DME jammer type and clean signal were the easiest to classify, with accuracies of 100% and 97.48% for M = 20 and 95.79% and 98.45% for M = 40 clients, respectively. For M = 40, classification accuracies were low for the NB and SingleAM jammer types, and for M = 20, the worst accuracy was achieved with the SingleFM jammer type. As in Figure 5(b), inspecting the non-diagonal elements reveals that the classifier specifically struggled to distinguish between SingleAM and SingleFM interference and between NB and SingleChirp interference. For M = 40, where performance was already worse due to the higher number of clients (implying less local data), the classifier also struggled to distinguish between NB and DME signals. Nevertheless, even with a high number of clients (i.e., M = 40), the classifier achieved accuracies above 80% with the DME, clean signal, SingleChirp, and SingleFM jammer types. For a lower number of clients (i.e., M = 20), all jammer types could be classified with an accuracy above 80%.
As a final remark, the results presented in this section are comparable to those obtained with the benchmark training process: the centralized classification algorithm proposed by Morales Ferre et al. (2019). In their results, classification accuracy was also highest for the DME (or pulsed) interference and the clean signal. Their confusion matrices likewise showed that their classifier struggled to distinguish SingleAM from SingleFM interferences and NB from SingleChirp interferences. Moreover, our obtained accuracies for M = 20 when classifying the DME and NB types exceed the accuracies achieved by the benchmark neural network. Our proposed FL framework therefore allows us to obtain results comparable to those from state-of-the-art centralized classification algorithms while also preserving user data privacy and security.
4.5 Comparison of FL Algorithms
Despite achieving comparable results to the benchmark, the above FL algorithm nevertheless yields unfavorable outcomes with low accuracy when applied to non-IID datasets. In this section, we compare four different FL algorithms to assess their performance in non-IID data settings. This evaluation encompasses several metrics. First, we consider accuracy (Acc), which was discussed earlier, and we also employ the macro F-1 (F1) score to account for variations in sample and class distribution across different clients. Furthermore, we evaluate the classification accuracy for new data points that lack prior information from any specific clients. The corresponding metrics for these data points are the server accuracy (S-Acc) and the server F1 score. Of the four FL algorithms we consider, FedAvg, FedProx, and Ditto learn a global model automatically. On the other hand, FedAMP only uses personalized models, which require a fusion strategy. Moreover, we assess how the number of quantization bits affects the performance of these algorithms. We evaluate performance using both 8-bit and 4-bit settings, as the difference from the original 32-bit configuration is marginal for 16-bit encoding, while the 2-bit scheme yields unsatisfactory results.
The results for the various FL algorithms are presented in Table 1. Based on the results in the IID data setting (bottom rows), personalized models like FedAMP do not demonstrate a significant advantage over centralized methods such as FedAvg and FedProx, which consistently achieved superior outcomes. Even so, the performance differences among all algorithms were relatively minor. However, in the non-IID data setting, personalized FL algorithms clearly outperform centralized algorithms with respect to accuracy. Notably, FedAMP consistently delivers the highest accuracy across all data configurations, maintaining superior performance even with 4-bit quantization, where its accuracy exceeds 70%. In contrast, other approaches generally yield accuracies at or below 50% at 4-bit quantization.
Results of different FL algorithms. “Acc” denotes Accuracy, “F1” refers to the macro F1 score, “S-Acc” represents Server Accuracy, and “S-F1” signifies the Server F1 score. The numbers following these abbreviations indicate the quantization bits, such as “Acc-8” for Accuracy with 8 bits.
Ditto also demonstrates commendable performance, with the unquantized and 8-bit quantization cases achieving similar performance as FedAMP, but for the 4-bit quantization case, Ditto loses the ability to match the high level of accuracy achieved by FedAMP. Even so, Ditto usually achieves higher server accuracy and server F1 score than FedAMP. This discrepancy arises because Ditto learns a global model alongside the personalized models, which provides it with greater robustness. In contrast, FedAMP relies solely on personalized models, and server accuracy is derived from the fusion of the personalized models.
With respect to server accuracy and the F1 score, centralized FL (i.e., FedAvg) is the strongest approach. As shown in Table 1., FedAvg consistently achieves top-tier server accuracy and F1 scores across various quantization scenarios. Conversely, FedProx does not perform as well under our experimental settings, suggesting a potential requirement for further hyperparameter optimization. Such optimization falls outside the scope of this study.
Finally, our analysis reveals that quantization significantly influences the training process: lower bit precision resulted in poorer outcomes. Nevertheless, the results at 8-bits remain commendably robust, especially within personalized FL frameworks. This finding offers valuable insights for the design of future FL systems.
5 CONCLUSION
This paper demonstrates the efficacy of selected FL algorithms in the context of GNSS jamming classification. These algorithms would allow the successful implementation of a crowdsourcing scheme in which real data is gathered without compromising user privacy. We provide results of spectrogram image classification for simulated GNSS signals affected by six different jammer types. Under certain FL configurations, classification accuracies are high for all the studied jammer types, though DME and clean signals were consistently classified with the highest accuracies (above 99%). Conversely, the classifier could struggle to distinguish between AM and FM and between NB and Chirp jammer types. Nevertheless, the FL framework proposed herein performed favorably relative to the benchmark centralized classification algorithm in Morales Ferre et al. (2019), showing that it is possible to work in a collaborative scenario that protects user privacy without causing performance to drop. Our experimental results specifically showed that i) it is more difficult to learn non-IID data than IID data; ii) assuming the total number of data points is the same, having more clients each with fewer data points decreases classifier performance; iii) personalized FL algorithms are more effective at handling non-IID data; and iv) choosing a different quantization bit number can reduce communication costs while still maintaining good performance. Future research in this area will include collecting real-world data to investigate various practical non-IID scenarios. These scenarios include cases where clients are situated at varying distances from jammer sources and where clients are situated across diverse environments where the signal characteristics differ despite being of the same type. Finally, we aim to explore and develop various FL algorithms to enhance classification efficiency while maintaining performance and privacy (Wu, 2024). These algorithms could involve the use of different quantization methods (Almanifi et al., 2023) and differential privacy techniques (Yin et al., 2021).
HOW TO CITE THIS ARTICLE
Wu, P., Calatrava, H., Imbiriba, T., & Closas, P. (2025). Federated learning of jamming classifiers: From global to personalized models. NAVIGATION, 72(1). https://doi.org/10.33012/navi.688
Footnotes
Funding Information
This research was supported by the National Science Foundation under Awards ECCS-1845833 and CCF-2326559.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.