Abstract
This tutorial presents the factor graph, a recently introduced estimation framework that is a generalization of the Kalman filter. An approach for constructing a factor graph, with its associated optimization problem and efficient sparse linear algebra formulation, is described. A comparison with Kalman filters is presented, together with examples of the generality of factor graphs. A brief survey of previous applications of factor graphs to navigation problems is also presented. Source code for the extended Kalman filter comparison and for generating the graphs in this paper is available at https://github.com/cntaylor/factorGraph2DsatelliteExample.
1 INTRODUCTION
In navigation, the capability to integrate data from multiple sensors is an essential element. If the dynamics of a system are known (i.e., can be stochastically modeled), then measurements over time can be integrated together to estimate the state of the system. For decades, the Kalman filter family—including the linear, extended, unscented, and many other variants—has been the workhorse of sensor fusion for navigation. Of particular note is the linear Kalman filter, which is the optimal maximum likelihood estimator when (a) measurements and dynamics are linear and (b) all noise sources are known, Gaussian, and white (Maybeck, 1990). Unfortunately, most practical systems do not meet these requirements, thereby explaining the large number of Kalman filter variants in use in the community.
With the introduction of significantly more complex (e.g., nonlinear) sensor and dynamic models, a new method for fusing sensor data has recently been introduced: the factor graph. Since its introduction into the field of robotics localization (Dellaert & Kaess, 2006; Thrun & Montemerlo, 2006), the factor graph has been applied to many different navigation systems, leading to improvements in sensor fusion performance across a broad range of navigation problems. In the robotics community, much of the work with factor graphs has focused on solving the simultaneous localization and mapping (SLAM) problem, which is particularly nonlinear. In addition to effectively solving highly nonlinear problems, factor graphs have also been shown to work well for rejecting outliers, optimizing on nonlinear manifolds such as SO3 (rotation matrices), and finding maximum likelihood estimates with non-Gaussian probability distributions. Of particular note to the navigation community, the first two Smartphone Decimeter Challenges sponsored by Google were both won by systems utilizing a factor graph formulation, demonstrating the ability of a factor-graph-based technique to handle nonlinearities and highly corrupted measurements with a reasonable amount of computational overhead (Suzuki, 2021, 2022). Furthermore, there has been a significant increase in the number of papers in this journal utilizing factor graphs, with at least 13 papers in the 2020–2023 time frame.
Given the increased interest in and applicability of factor graphs, we believe it is beneficial to the community to introduce the basic theory behind factor-graph-based optimization, compare it with extended Kalman filter (EKF)-based systems, and review some applications of factor graphs to navigation problems. The objective of this paper is to introduce the reader to how factor graphs can be utilized to solve navigation-based estimation problems. While references are included to many seminal papers in the field, we make no claim to comprehensively review the factor graph literature (i.e., this is a tutorial, not a comprehensive review).
While previous tutorials have been written about factor graphs, many of these tutorials have focused on the general factor graph representation (Loeliger, 2004). Note that at its heart, the factor graph is a general tool in the broader field of probabilistic graphical models with applications in biological systems, wireless networks, video semantics/indexing, etc. (Koller & Friedman, 2009). However, while this general framework is theoretically important, the primary impact to the navigation community arises from factor graphs that leverage the significant computational advantages available when combined with sparse matrix linear algebra. Furthermore, the problems that can be efficiently solved with sparse matrix computation are also addressed by Kalman filter variants. Therefore, these types of factor graphs are the focus of this tutorial.
There have also been previous tutorials on factor graphs that can be efficiently solved using sparse matrix computation, but these tutorials have been more focused on (a) explaining the mechanics of efficient factor graph computation (Dellaert & Kaess, 2017), (b) a specific factor graph implementation library (Dellaert, 2012), (c) a specific aspect (robustness) of factor graphs for navigation (Das et al., 2021), or (d) providing a literature review as opposed to a tutorial (Xiwei et al., 2022). We believe that this is the first tutorial focused on readers with prior experience in performing estimation using Kalman filter variants, filling a need of the navigation community.1
The remainder of this paper is organized as follows. Section 2 (a) introduces the notation that will be used throughout this paper, (b) describes the basic system model that is used in most Kalman filter estimation problems and which we will utilize extensively in this tutorial, and (c) introduces the graphical notation for a factor graph. In Section 3, we introduce the basic factor graph optimization approach used for Kalman-filter-type systems. Section 4 examines how factor graph optimization can be performed efficiently. Section 5 compares the factor graph optimization approach with a Kalman filter, ending with a discussion of how the factor graph can handle very general systems. Section 6 briefly reviews some of the important prior work that has applied factor graphs to navigation problems. Section 7 concludes the paper.
2 NOTATION AND DEFINITIONS
2.1 Mathematical Notation
In this paper, we will use the following mathematical notation:
2.2 System Definition
The primary focus of Kalman filtering and a significant portion of this paper will be to estimate the state of a discrete dynamic system that can be expressed as follows:
1
where xk is the state of the dynamic system at time step k and f is the discrete function that maps the system state at time k to time k + 1 using the inputs at time k(uk). The function h(x) takes a system state and generates a measurement, whereas F and H are the partial derivatives of the f and h functions with respect to (w.r.t.) xk. There is also a prior for time step 0, expressed as γ. The vectors v, η, and are white noise terms defined as follows:
2
2.3 Factor Graph
A factor graph is a bipartite graph consisting of two types of nodes: hidden variables and factors—edges in the graph only exist between hidden variable and factor nodes. Hidden variable nodes represent the variables being estimated and can only connect with factor nodes. Similarly, factor nodes, which represent functions of the hidden variables, can only connect with hidden variable nodes (i.e., factors cannot be functions of other factors). In navigation, factors typically represent measurements or dynamics. Graphically, hidden variables will be represented by open circles and factors by solid (filled-in) rectangles. The set of all factors in the factor graph is denoted as , the set of all hidden nodes is indicated as , and the set of neighbors to a node is denoted , where “.” represents the node (Frey et al., 1997).
Each factor in a factor graph has three items associated with it:
A function of the hidden variables. Note that the edges connecting a factor node to hidden variables show the hidden variables comprising the inputs to the function.
A measured value for that factor.
A probability distribution function (PDF) returning the likelihood of the measured value given current values of the hidden variable.
Therefore, a factor graph is a representation of the probabilistic relationship between measurements that have been received and state values (i.e., hidden variables) that are being estimated given the received measurements.
As an example of the three items associated with each factor in a factor graph, consider an angle-of-arrival sensor for a known landmark. Given the current location of the sensor (a hidden variable), there is a function describing what the angle of arrival should be. When the sensor receives a measurement, a factor in the graph is created with the measured value. This node will be linked to the state or hidden variable associated with that measurement (the location of the sensor). Finally, the uncertainty of the sensor is used to create a PDF relating the measured value and the predicted value from the function of the hidden variables. The factor node includes the measurement-prediction function, the measured value, and the PDF.
An example factor graph for the dynamic system described in Equation (1) is shown in Figure 1. Note that the hidden variables correspond to the state estimate for different time steps, whereas the factors represent the measurements, dynamics, and priors.2 Moreover, although the factor graph in Figure 1 represents a dynamic system similar to what estimators in the Kalman filter family are designed to address, the factor graph formulation is generic and can be used to represent many different types of estimation scenarios.
3 BASIC FACTOR GRAPH OPTIMIZATION
To understand how a graphical structure like the example in Figure 1 can be used to perform data fusion, it is important to understand that there is an equivalence between three different representations of the factor graph: the graphical representation, the (sparse) matrix representation, and the optimization representation. These equivalences and their importance are illustrated in Figure 2 and described in the following two subsections.
3.1 Equivalence Between Graph and Optimization Representations
Each factor in a factor graph implies some information about the system. The goal of a maximum likelihood optimization procedure is to find the values of the hidden variables that maximize the probability of those values. The edges in a factor graph always connect a factor (information about the system) to the hidden variables that affect that factor, leading to the following expression for the maximum likelihood optimization problem:
3
This equation indicates that we desire the set of all estimated hidden node values that maximizes the product of the individual probabilities of the factor node values, given all of the values of their neighboring nodes (i.e., hidden variables). By taking the negative logarithm of Equation (3), this maximization routine turns into a minimization of a summation of terms. Furthermore, if all factor terms are Gaussian (as in the system described in Equation (1)), Equation (3) transforms into a weighted least-squares optimization procedure. More specifically, for the example graph in Figure 1, we obtain the following optimization problem:
4
where the matrix subscripts define a weighting for those terms.
While converting the factor graph to its equivalent optimization explains at a high level how to find the maximum likelihood variables as a weighted least-squares algorithm, the true computational savings are realized by analyzing the sparse matrix representation of this problem.
3.2 Sparse Matrix Representation
Given a factor graph, an equivalent adjacency matrix (L) for a bipartite graph can be developed as follows: Each hidden variable in the graph corresponds to a (block) column in the matrix, each (block) row corresponds to a factor in the graph, and an entry in the matrix is zero unless there is an edge between the corresponding factor and hidden variable. For example, in Figure 2, the top N −1 rows correspond to the N −1 factors that represent dynamics between hidden variables, the next N rows represent the measurements (zk) corresponding to each time step, and the last row represents the prior factor. Note that the ordering of the rows and columns is arbitrary.3 For the first dynamics row (d0), the only nonzero entries correspond to the hidden variables x0 and x1, representing the dynamic model from x0 to x1. Similarly, each row corresponding to zk has a nonzero entry only in the column for xk, as there is only one edge from each factor. The row for the prior has a nonzero entry only in the column corresponding to x0.
In addition to being equivalent to the factor graph representation, the sparse matrix L is also key to efficiently solving a weighted least-squares optimization problem. To understand this relationship, we first replace each term in the optimization problem with a first-order Taylor series expansion. As an example, is replaced by the following:
5
where and is the current estimate for the hidden variable xk.
To find the minimum of the optimization problem, the derivative of the optimization equation is computed and set to zero. Because this expression is a summation, the overall derivative is a summation of the individual derivatives. For example, the derivative of the term in Equation (5) w.r.t. Δxk is as follows:
6
This term goes to zero when the following holds:
7
where and is referred to as the residual.
This equation can be further simplified by finding a “square root” matrix of the weighting matrix R−1 as follows:4
8
This leads to the traditional normal equation problem:
9
To solve the complete optimization problem, a normal equation of the following form is created:
10
where y is a stacked vector of all of the individual weighted residual vectors. To create the L matrix, the weighted residual matrices are placed in the corresponding (block) row for the weighted residual vector and the correct (block) column for the hidden variable(s) for which the derivative was performed. The matrix multiplication between L⊤ and y will sum the individual derivatives to form the derivative for the complete optimization problem. Put another way, the L matrix used to solve the optimization problem can also be viewed as an adjacency matrix in which the weights for each edge correspond to the weighted derivative of that function (factor) w.r.t. the hidden variable.
Note that because L is a linearization based on the current estimate of the hidden variables, as the linearization point changes, L is updated, requiring that the normal equations be repeatedly solved. When L⊤y = 0, then a minimum (derivative is zero) has been found. In this fashion, the nonlinear weighted least-squares problem described by the factor graph and its equivalent optimization problem are solved.
Perhaps the most important fact about this matrix L is that it is a sparse matrix. Using widely available sparse matrix libraries (Agarwal et al., 2023; “Eigen: A C++ template library for linear algebra”, 2023; Kümmerle et al., 2011; “SuiteSparse, A Suite of Sparse Matrix Sackages”, 2023; Virtanen et al., 2020) enables efficient solving of the optimization procedure. This is part of the reason why the factor graph formulation has proven to be so popular. For the example system shown in Figure 1, the computational requirements are a linear function of the number of hidden variables, enabling very large optimization procedures to be solved with reasonable computational time requirements (see Section 4).
3.3 Notes on Factor Graph Optimization
For the three equivalent representations described above, there are several miscellaneous facts of which one should be aware. Here, we briefly mention four of these facts.
First, note that the factor graph optimization problem described above is a large least-squares optimization problem. Solving large least-squares optimization procedures has been and still is an active field of research, including the introduction of several computational libraries (Agarwal et al., 2023; Grisetti et al., 2020; Virtanen et al., 2020) specifically designed to efficiently handle large sparse matrix-based normal problems. Therefore, any technique developed for solving a large least-squares solution can also be used to solve many factor graphs.
Furthermore, note that the procedure described above (repeatedly solving the normal equations) is a Gauss–Newton optimization procedure. For highly nonlinear systems, an unmodified Gauss–Newton approach is typically not used; rather, a guarded Gauss–Newton (Psiaki, 2005), Levenburg–Marquardt (Moré, 1977), dog-leg (Lourakis & Argyros, 2005; Rosen et al., 2012), or other “trust region” (Nocedal & Wright, 2006) approach is employed. Any technique that can be applied or used to improve the optimization of large least-squares problems can also be used to improve the ability to solve a large factor graph. For example, conjugate gradient (Dellaert et al., 2010), multiprocessor (Choudhary et al., 2015; Hao et al., 2016), and constrained (Qadri et al., 2022; Yang et al., 2021) optimization techniques have all been applied to optimize large factor graphs. Also note that because this is a gradient-based optimization technique, the starting location for the optimization will have a considerable effect on whether the solution converges to a global optimum and how many iterations are required for convergence.
Second, there are several different techniques that can be used to solve the normal equations. To compute the pseudo-inverse for a full-rank matrix L, the Moore– Penrose pseudo-inverse (represented by the † symbol) is often used:
11
The difficulty with this technique is that the inversion of M = L⊤L can be very computationally intensive. However, because M is a symmetric positive definite matrix, it can be decomposed by a Cholesky decomposition into lower and upper triangular matrices that are transposes of each other:
12
These triangular matrices can then solve the normal equations using back-substitution, thereby avoiding the matrix inverse.
Similarly, if a QR decomposition is computed of L where L = QR, Q is an orthonormal matrix, and R is an upper triangular matrix, then the R matrix here is equivalent to the S matrix derived from the Cholesky decomposition of M. While there are computational advantages and disadvantages to each approach, in this paper, we will assume that the upper triangular matrix R can be computed, as it has some interesting theoretical properties that will be used throughout the remainder of this paper.5
Third, after solving the optimization problem described above, the Gaussian distribution (covariance) of the hidden variables can also be computed. Note that the matrix M = L⊤L is the Fisher information matrix for the system solution. Therefore, the matrix Σ = M−1 is the predicted covariance matrix for the hidden variables. Unfortunately, while M is a sparse matrix, sparsity is not preserved during the inverse operation; thus, computing Σ can be expensive in terms of computation time and memory consumption. In contrast, if a QR decomposition is used to solve the normal equations, then the (marginal) variance corresponding to the last column (last hidden variable in the chosen ordering) can be computed with very low cost.6 Furthermore, if the marginal covariance for some other hidden variable is desired, the sparsity pattern of the upper triangular matrix R can often be exploited to avoid inversion of the entire matrix M (Kaess & Dellaert, 2009). Also note that the marginal covariance (diagonal entries of Σ) for any given hidden variable roughly corresponds to the P matrix that would be computed if a Kalman filter were used. Furthermore, the full M matrix includes information not only on the marginal covariance at each step, but the correlation between time steps introduced by the optimization procedure.7
Fourth, note that factor graphs can explicitly handle optimization on manifolds, including rotations (the SO3 manifold), with a simple modification to how the normal equations are used (Hertzberg et al., 2013; Kümmerle et al., 2011). By creating a “tangent” space that represents local movements on the manifold, the Jacobian matrices needed to generate L are a minimal representation of the error space. When the normal equations are solved, the resulting contains localized (minimal representation) movements. To apply this localized movement to the hidden variable, a operation must be performed. In most cases, this operation is the same as a Cartesian sum, but for manifolds, it can be significantly different.
For example, consider the manifold of 3 × 3 special orthogonal matrices (rotation or direction cosine matrices) that are core to three-dimensional (3D) navigation problems. These matrices form a 3D manifold embedded in the nine-dimensional space of all 3 × 3 matrices. Because it is a 3D manifold, movement on the tangent space of the manifold can be expressed as a 3D vector. For SO3 specifically, the tangent space is defined by skew-symmetric matrices. The Jacobians that will be used to create the L matrix and thereby solve for will solve for the optimal skew-symmetric matrix modification given the current rotation matrix estimate.8 When applying the three computed delta changes to a rotation matrix, the operator consists of forming a skew-symmetric matrix from those three values, exponentiating that matrix (forming a valid rotation matrix), and multiplying it by the current state rotation matrix.
Note that Kalman-filter-based filters handle rotation matrices in a similar fashion through the use of “multiplicative error states” (see, e.g., the work by Markley (2003)), but the theory behind manifolds and their use in a factor graph/optimization framework makes the theoretical foundations more generalizable. For example, optimizations over SE3 and homography matrices can also be performed via the same manifold operations (Begelfor & Werman, 2005; Blanco-Claraco, 2021).
4 COMPUTATIONAL IMPROVEMENTS
As mentioned previously, because the L matrix is sparse, the normal equations can be solved very efficiently, as shown by the example in Figure 3. This graph represents the computation time required to solve a factor graph in Python, where each hidden variable is a four-element state. The x-axis represents the number of hidden variables included in the factor graph, whereas the y-axis shows the computation time required to solve the normal equation.9 The key item to note in this chart is that the timing increases approximately linearly with the size of the factor graph.
While solving a graph with 1800 hidden variables (corresponding to an L of size 9000 × 7200 for a four-element state vector) still takes less than 0.33 s, the computational requirements grow without bound as the graph size increases. Note also that while the type of graph we have been using leads to linear growth, more complex graphs may have different computational growth characteristics, with the worst case being O(n3) because of the matrix inversion that is implied. Therefore, several techniques have been introduced for reducing computation time beyond efficient sparse implementations. In this section, we briefly introduce four main techniques that can be used to solve factor graph estimation problems more efficiently. These include the following:
Marginalization and sliding windows
Bayes tree/fluid relinearization
Pre-integration/smart factors
Split real-time and optimization estimation
4.1 Marginalization and Sliding Windows
The idea behind marginalization is that some hidden variables can be fixed and removed from the graph, thereby reducing the number of nodes in the graph. In the simple system we have been working with thus far, this step is typically applied with a sliding window, where a fixed number of time steps is kept within the graph and anything older is removed. This scenario is shown graphically in Figure 4.
In a more complex graph, when a hidden variable is removed, a new factor connecting all hidden variables to which it was connected is created in the graph. An example of this is shown in Figure 5. The node m in the left-hand graph is connected through factors to three other hidden variables. Therefore, when m is marginalized, a new factor (in blue) is added to the graph connecting all of the hidden variables that were connected by factors to m. Note that this property, i.e., that all hidden variables connected through factors to the hidden variable being eliminated must be connected together after marginalization, has an analog in the linear algebra matrix. When an orthonormal matrix is used to eliminate all elements in a column except for the top-most element, it leads to “fill-in” in other columns. This fill-in is equivalent to the additional factor added in Figure 5. For more details on how marginalization is performed, please see Appendix A.
By performing marginalization, the number of hidden variables can be kept low, but this does not always lead to a reduced computation time. If the “fill-in” is large, then the computation time to invert the L matrix can grow, canceling out some of the computational savings obtained by reducing the number of hidden variables. Therefore, some prior works have sought to create sparse graph representations that more closely approximate the factor graph that one is aiming to solve. These methods are beyond the scope of this tutorial, but some discussions of these techniques have been presented by Carlevaris-Bianco et al. (2014), Fourie et al. (2021), and Vallvé et al. (2019).
4.2 Bayes Tree/Fluid Relinearization
While marginalization allows one to reduce the number of hidden variables in a graph, Bayes tree/fluid relinearization techniques enable the graph to be solved more efficiently by only processing nodes that need to be changed given new measurements. Much of the computation time required to perform a single iteration of the optimization procedure results from the creation of the L matrix to be inverted. Note, however, that each (block) column of L is a function of the hidden variables corresponding to that column. If the current estimate for a hidden variable has not changed since the previous iteration, then there is no reason to change the partial derivatives associated with that hidden variable. This insight is the key to performing fluid relinearization. Rather than recomputing the entire L matrix, only those portions whose hidden variables have significantly changed are modified, reducing the computation time.
The Bayes tree approach (Kaess et al., 2008, 2011, 2012) further extends this idea. By turning the factor graph into a Bayes tree,10 the relationship between hidden variables can be used to recompute or invert only certain portions of the L matrix. In many cases, one can achieve the same result as solving the complete optimization while performing only the computation required to invert or solve a small portion of the matrix.
4.3 Pre-integration/Smart Factors
For navigation applications specifically, it is common for inertial sensors to run at a high rate, while the other sensors run at a much lower rate. In a factor graph, this could lead to multiple hidden variable nodes with dynamics (inertial sensor) factor nodes between them, but no other measurements. This scenario can quickly lead to a very large number of nodes.
To address this problem, several approaches seek to replace multiple nodes chained together with only inertial measurements with a single factor node that represents multiple updates (Eckenhoff et al., 2019; Forster et al., 2016; Lupton & Sukkarieh, 2011). This technique, called “pre-integration,” for handling inertial sensors can lead to a significant reduction in the number of hidden variables to be solved for. The original pre-integration techniques, however, focused on low-quality inertial sensors. Therefore, significant work is ongoing for performing pre-integration with higher-quality inertial sensors (Eckenhoff et al., 2019; Lyu et al., 2023; Zhang et al., 2023).
The smart factors approach (Carlone et al., 2014) attempts to divide hidden variables into two classes: target and support variables. The target variables are the true variables of interest to the user, whereas the support variables are estimated only in order to enable the solution of the target variables. For example, the support variables may be the system state when a measurement is received, whereas the target variables correspond to intermediate inertial states. If several support variables are connected to each other but not the target variables, these support variables can be collapsed into a single node. These variables are not optimized for unless the target variables adjoining them change significantly, enabling a computational reduction.
4.4 Split Real-Time and Batch Optimization
Although the techniques introduced in the previous sections can greatly reduce the computational requirements, when performing SLAM, one of the most difficult scenarios is a loop closure, where new information corresponding to many time steps ago is suddenly available to the system. On the one hand, this scenario is a great motivation for using factor graphs, as the new information can cause significant estimate improvements for the whole graph. On the other hand, this scenario precludes both the marginalization and Bayes tree approaches from solving the problem. In other words, during a loop closure, a complete optimization of all data received over time may need to be performed. This computation may not be able to run in real time regardless of how efficiently one has designed the factor graph optimizer.
To address these circumstances, parallel filtering and smoothing approaches have been proposed (Klein & Murray, 2007; Williams et al., 2014). In these approaches, the optimization or smoothing computation (the full factor graph) is always being performed in the background. While this optimization is computing, however, a Kalman filter is run with current data. Whenever an optimization is completed, the state for the Kalman filter is updated to use the full optimization information. In this way, a real-time solution is always available, but the full benefit of optimizing over the full data history can also be achieved. Note that this method requires parallel threads of computation; however, this capability is standard in most modern processors.
5 COMPARISON WITH KALMAN FILTER SENSOR FUSION
When comparing the Kalman filter and factor graphs, it is important to clarify that Kalman filters can be formulated to operate in fundamentally the same manner as a factor graph formulation. For example, Psiaki (2005) introduced the backward smoothing EKF, which iterates over and relinearizes nonlinearities over both current and past updates. Similarly, Bierman (1977) presented a QR-based method for solving large batches of data and its relationship to the Kalman filter.
The true power of a factor graph is not that it optimizes data in a fundamentally different way than Kalman-filter-based methods, but that it is a more general framework that quickly enables estimation to occur on several different problems that may be more difficult (or require more extensive changes) in a Kalman filter framework. Therefore, this section is organized into two main divisions. In the first division (Section 5.1), we introduce a simple example and compare the estimation performance of a traditional EKF against the factor graph. In the second division (Sections 5.2 and 5.3), we describe extensions to the factor graph framework that move beyond problems traditionally solved in a Kalman filter framework. Note that depending on the reader, some may feel that these are “simply” extensions to a Kalman filter framework, but we believe that (a) the fact that a single framework (the factor graph) handles all of the below problems with minimal modification is one of its advantages and (b) some of the modifications discussed below would require significant modification to the Kalman filter framework, stretching the definition of a Kalman filter. Therefore, we compare the Kalman filter and factor graph in terms of both estimation performance and flexibility in this section.
5.1 Comparison of Estimation Performance
In this subsection, we introduce a simple system and compare the performance of a factor-graph-based technique against an EKF implementation. The system is a “two-dimensional” (2D) satellite orbiting around a “point” earth, with angular measurements being taken of the satellite. The goal is to accurately estimate the satellite position.11 Mathematically, this system can be expressed as follows:
13
where x and y are the position coordinates of the satellite in space, vx and vy are the velocity components of the satellite, GE is the gravitational constant, and v and η are noise terms with covariance Q and R, respectively:
The initial estimate of the position for this system is x = 0 km, y = 20000 km with a large uncertainty.
The results of an example run are shown in Figure 6, with Figure 6(a) plotting EKF performance and Figure 6(b) showing the factor graph performance. Note that the EKF has significantly worse performance (root mean square error [RMSE] of 736 km. vs. 17.5 km) than the factor graph estimator, especially at the beginning of the estimation time period. This result is to be expected, as the initial uncertainty is very large and a single heading measurement cannot give the distance away from the origin. This information is only obtained by combining measurements over time with the dynamics model. Therefore, examining all information received will give better results than simply using the information received up to that time period.
The advantages of factor graphs extend beyond simply performing batch estimation. Even when using only data up to the current time step, the factor graph can outperform the EKF. This result occurs because the EKF marginalizes out all previous state estimates, thereby fixing their Jacobians. In contrast, with a factor graph, changes in past state estimates can lead to a recomputation of the Jacobian matrices (past F and H matrices), resulting in improved estimation performance.12 Furthermore, this recomputation of past matrices leads to significantly more accurate covariance estimates when using factor graph techniques vs. EKF-based or unscented Kalman filter-based approaches (Taylor, 2012).
The impact of past marginalization can be seen in Table 1. In this table, the RMSE was compared, for the same scenario as in Figure 6, for several different sliding window sizes (different rows) and for a batch vs. real-time approach (delayed vs. real time). For the real-time results, at each time step, a factor graph was solved using only data available up to that time step or, for the sliding window, using a subset of the data available up to that time step. At each time step, the result for the last time in the optimization is recorded as the result for the optimization procedure. Therefore, this filter can run in real time and does not have to be delayed until more data are available. In contrast, the delayed results record the estimate for the oldest time for each optimization run, requiring at least the window size of data to be available before the first estimate of the trajectory can be generated.
There are several items to note from Table 1. First, note that the advantages of a batch (delayed) technique decrease as the batch size decreases. This can be seen in the delayed results column, where the overall performance decreases as the sliding window size is decreased. Second, note that the real-time performance, while not nearly as good as the delayed or batch performance, is still significantly better than the performance achieved by the Kalman filter alone, demonstrating the importance of modifying past Jacobians. This trend is shown by the results in the RMSE (real-time) column. Finally, as shown in Figure 6(a), the majority of errors in the EKF occur near the beginning of the simulation run. To remove the effects of the large errors at the beginning and compare performance in a “steady-state” scenario, we also present results for the second half of the data in the last column.13 The results in the last column represent a “real-time” factor graph technique that uses only data available up to the current time for any estimation result. For comparison, the EKF RMSE for the second half alone is 210.2 km. In all cases, whether delayed or real-time, for all data or only the second half, the factor graph produces smaller estimation errors than the EKF.
5.2 Differing Graph Structures
While most of the discussion to this point has assumed the basic system described in Equation (1) and illustrated by the graph in Figure 1, there are several different systems that can be represented by graphs of a significantly different style. In this section, we consider five different examples, illustrated in Figure 7.
In Figure 7(a), we show a simple graph representing the SLAM problem.14 The “center line” of hidden nodes represents the robot or sensor location over time, with green dynamics nodes representing inertial or other odometry measurements from one time step to the next. In contrast, the landmarks (the hidden variable nodes marked by an l) are fixed over time and will only be seen for particular robot poses. In general, Kalman-filter-type systems assume that there is one set of dynamics, with everything moving according to that time scale. In contrast, the factor graph can easily handle different components of the state propagating at different rates because time steps are explicitly converted to different hidden variables. In this graph, the differing dynamics rate is illustrated by the landmarks never propagating forward in time, while the robot moves at a different rate. Note that this type of graph is commonly used to solve SLAM problems; see, for example, the work by Dellaert (2021) and Grisetti et al. (2010).
In Figure 7(b), a graph patterned after a visual odometry problem is shown. Note that all measurements are between two time steps, but the “spacing” between time steps is different for different measurements. Inertial sensors might give measurements with close time steps, whereas camera or lidar measurements could give relative pose measurements between states at different time scales. All of this information is handled naturally in the factor graph formulation by converting the illustrated graph to an optimization that can be solved by sparse matrix libraries. Pose graphs that use only relative measurements between nodes have been extensively studied in the robotics literature; for example, see the work by Konolige and Agrawal (2008) and Konolige et al. (2010).
Figure 7(c) illustrates a system in which there are two different dynamics models for the same system. For example, Crocoll et al. (2013) presented a method for combining inertial sensors and a vehicle model for a Kalman-filter-based framework. This required a formulation of a unified dynamics model that was specific to the models given in that paper. In a factor graph, however, both inertial sensors (green factors) and vehicle dynamics models (blue squares) can be simply added to the graph. This enables both dynamics models to be considered simultaneously without requiring any special “merging” of the dynamics. Note also that Figure 7(c) has a somewhat random “measurement” pattern; each time step is associated with a different set of measurements.
Figure 7(d) extends the multiple dynamics example of Figure 7(c) to include a weighting variable (another hidden variable) that determines which dynamics model is more trusted at the current state. A prior for the weighting hidden variable is also provided at each step. Once again, no special machinery beyond that already introduced in this paper is required to solve this system. This concept of having an additional hidden variable to switch between two models of a measurement (switchable constraints) has been used to reject outliers in previous works (Sünderhauf & Protzel, 2012a).
In Figure 7(e), we show another interesting use case of factor graphs. In this graph, the hidden variables, rather than representing the system state at a particular time, represent knots of a spline used to approximate the continuous-time system state. This particular representation has been used when the sensor is an event (neuromorphic) camera or a camera with a rolling shutter (Mueggler et al., 2018) or when multiple sensors are used, all with different sampling frequencies and non-overlapping sampling times (Lv et al., 2023). In these cases, a spline or Gaussian process regression (Barfoot, 2024) approximation of the continuous-time state is reconstructed instead. In Figure 7(e), a third-order spline is used; thus, any measurement will be a function of four adjacent nodes or knots of the spline to reconstruct the system state at a particular time, after which the measurement function can be applied. Because measurements are received when events occur, the number of “measurements” connecting any four adjacent nodes may differ.
Each of these examples demonstrates the flexibility and utility of the factor graph representation for estimation problems. Although it is possible to formulate Kalman filters to handle these scenarios, in general, if a factor graph can be constructed to represent a problem, then the optimization procedure described above can be used to solve it, making the factor graph approach a flexible and powerful extension to Kalman-filter-based estimators.
5.3 Different Types of PDFs
Up to this point, one of the underlying assumptions behind solving factor-graph-based optimization problems is that the PDF represented by each factor is Gaussian. Note, however, that the factor graph itself can express any type of PDF. In this subsection, we describe extensions to the optimization structure described above to handle more general PDFs. We have ordered the types of PDFs that can be represented from those with the least computational overhead to the most overhead.
5.3.1 Unimodal Distributions
To elucidate how a factor graph can be extended to work with any unimodal likelihood function15 in which the mode is at the measured value associated with the factor, let us first examine how the Gaussian case is handled. In the Gaussian case, the entries in the L matrix and y vector are found by applying the following steps:
Set yk = zk – hk(xi).
Set the Lk submatrix equal to the derivative of hk(xi) – Hk.
Weight both the subvector and submatrix by the square root (Cholesky decomposition) of the inverse covariance matrix.
Multiplying the weighted L⊤ and y together leads to a total derivative term of , where Σ−1 is split between the L and y terms. Note that this is the same as the derivative term expressed in Equation (6). Furthermore, note that this equation can be viewed as having two portions: (1) Σ−1(zk – hk (xi)), which is the derivative of the log likelihood function w.r.t. changes in hk(xi), and (2) Hk, which completes the chain rule so that the overall derivative can be performed w.r.t. xi.
For non-Gaussian unimodal likelihoods, we use the same first two steps. For the third step, instead of weighting by the inverse covariance matrix, we must find a symmetric, positive definite matrix (so that the Cholesky decomposition can be applied) that maps the vector zk – hk(xi) to the gradient of the negative log likelihood function at the current point. Once this matrix is found, then the Cholesky decomposition of that matrix is used to weight both Lk and yk. Note that using this approach, the underlying computational architecture for solving the factor graph optimization has not changed. More iterations may be needed because of the non-Gaussian distributions, but the basic computational approach is the same.
Let us consider two examples of different likelihood functions. One of the primary purposes of using different PDFs in factor graphs is to optimize the use of robust cost functions (Barron, 2019; Rosen et al., 2013; Watson & Gross, 2017). These functions are usually similar to a Gaussian distribution close to the mode, but increase at a much slower rate further away from the mode. Two commonly used cost functions are the Huber and Geman–McClure cost functions, shown in Figures 8(a) and (b), respectively. In Table 2, we show both the cost function definition and the weighting factor used to perform optimization (i.e., replacing Σ−1). Note that for the Huber cost function, the weighting is the same as the Gaussian weight when less than one sigma away from the mode, but the weighting becomes much smaller as we move further away from the mode.
In Figures 8(a) and (b), the effect of this weighting function is illustrated. In each figure, the blue plot shows the robust cost function, which is parabolic, similar to a Gaussian cost function, close to the origin, but has a much slower rate of increase farther away from the origin. One way to view the weighting functions shown in Table 2 is that a new Gaussian is created that will have the same slope as the robust cost function at the current estimate. In both figures, the Gaussians that correspond to that weighting function for specific estimates are shown. To illustrate that the derivatives are the same at the estimate point, the derivative lines for the “temporary Gaussian” at that point are also shown. For the Huber cost function, everything beyond one sigma has a slope of unity, whereas the Geman–McClure cost function has decreasing derivatives farther away from the origin. Note that in both cases, the local optimization using the gradient of the robust cost function employs the same efficient computational architecture as the baseline factor graph optimization introduced above.
5.3.2 Discrete and Gaussian Mixture Model Distributions
Doherty et al. (2022) showed that another style of PDF that can be efficiently handled in a factor graph formulation is a discrete random variable. When discrete random variables are present in the factor graph, the hidden variables are divided into continuous and discrete hidden variables. If all discrete variables are fixed at their currently most likely value, then the continuous variables can be solved for in the traditional manner (i.e., as described earlier in this paper). With the continuous variables fixed, the values for the discrete variables can also be determined. By repeating this process, an expectation maximization procedure is obtained, allowing one to optimize for the most likely discrete and continuous values. As long as the discrete variables are conditionally independent, the discrete optimization step can be very efficient, allowing efficient discrete–continuous estimation problems to be performed.
Olson and Agarwal (2013) introduced an approach for approximately solving the optimization if a Gaussian mixture model is used for some of the PDFs. A Gaussian mixture model PDF is defined as follows:
14
where n is the number of Gaussians being “mixed” together, wi is a weighting term for the Gaussians such that , and μi and Σi are the mean and covariance, respectively, for the i-th Gaussian distribution. The difficulty with a Gaussian mixture model PDF is that the logarithm of a sum of terms is not easily obtained. However, if, instead of adding the Gaussian components together, one uses the component with the highest probability at the current estimate for the probability at each point, then the “max-mixture” logarithm for the current hidden variable values (and its derivative) can be easily computed and solved for using the sparse matrix computational approach. Pfeifer et al. (2021) extended this approach to provide a better approximation of the sum of the Gaussians’ negative log likelihood while maintaining a computationally efficient approach.
5.3.3 General (Sampled) Distributions
By applying sampling theory, one can solve for any PDF in a factor graph using nonparametric approximations of the PDF (Fourie et al., 2016; Salmerón et al., 2018). Note, however, that this moves the factor graph away from the computationally efficient sparse matrix optimization procedures. Thus, while a factor graph can be used to solve any type of PDF, its computational efficiency is sacrificed when overly complex PDFs are used. Note that if only a small part of the factor graph requires a nonparametric representation of the state, the computational burden may not be too large; however, as more variables require sampling to represent their PDF, the computational cost will increase.
6 FACTOR GRAPHS IN NAVIGATION
Given its benefits and flexibility, the factor graph framework has become the de facto strategy for many positioning and navigation applications. With these developments, comprehensive reviews on factor graphs as applied to navigation and positioning applications have been compiled in recent years (Das et al., 2021; Dellaert & Kaess, 2017; Xiwei et al., 2022). While not the main focus of this tutorial, in this section, we offer a high-level review of the main developments in the use of factor graph optimization in three popular areas of navigation systems research, namely, SLAM, global navigation satellite systems (GNSSs), and cooperative/collaborative navigation and mapping.
The SLAM problem is perhaps the most prominent navigation application for which the factor graph optimization framework has proven to be the most effective tool. This fact is detailed in an in-depth review of the history, state of the art, and ongoing work in the field of SLAM (Cadena et al., 2016). More formally, within the SLAM problem, factor graph optimization has been useful as a back-end optimizer that solves the maximum a posteriori estimation problem and reconciles incremental constraints that represent changes in pose over time (i.e., from odometry measurements: lidar, visual, internal, wheel) with periodic loop closures. Furthermore, given the low-frequency nature of the drifting errors related to odometry sensors, a variation on the full-fledged factor graph optimization, known as pose graph optimization, is often applied to SLAM (Cadena et al., 2016). In pose-graph optimization, odometry constraints over an interval of time are collapsed into a single constraint representing the incremental change between poses. In this way, poses are sampled at a rate that balances application needs, odometry sensor error models, and computational complexity. Many advances on the use of factor graphs in SLAM with respect to offering robustness (Sünderhauf & Protzel, 2012b; Tian et al., 2022), providing performance guarantees (Holmes & Barfoot, 2023; Rosen et al., 2020), improving computational efficiency (Kaess et al., 2012), and incorporating semantic information (Doherty et al., 2019) have been developed.
Researchers have also applied the factor graph framework to various estimation problems involving GNSSs. A simple example of a GNSS factor graph for pseudorange positioning is shown in Figure 9. Early papers on this topic applied robust estimation techniques, namely, switch factors, to Global Positioning System (GPS) pseudorange positioning in multi-path environments (Sünderhauf et al., 2012). These papers were extended to compare several robust estimation methods applied to GNSSs (Watson & Gross, 2017) and to modify the factor graph to automatically learn the most appropriate robust model for both batch (Watson et al., 2019) and incremental estimation (Watson et al., 2020). Techniques for finding unbiased estimates of the input covariances have also been presented by Vanli and Taylor (2022). The factor graph framework was applied to GPS carrier-phase processing for kinematic precise point positioning (Watson & Gross, 2018) and was found by Wen et al. (2021) to outperform EKF-based implementations for GNSS/inertial navigation system integration, because of the nature of multiple iteration optimization and the availability of batch estimation to more robustly handle non-Gaussian errors. In regard to Figure 9, this approach would amount to including additional factors for the carrier-phase measurements, including carrier-phase biases in the estimated state vector, and incorporating inertial data in the dynamics factors (shown in green). Perhaps most visibly, a factor graph optimization approach has won first place in all Google Smartphone Decimeter Challenges held thus far (Suzuki, 2021, 2022).
Another active research topic is the development of factor graph formulations for cooperative localization, mapping, and target applications and frameworks for decentralizing computation across agents. In cooperative estimation applications, double-counting information that is shared across a group of agents that all share relative updates with one another can lead to inconsistent estimates. When the information sharing can be explicitly tracked, the double-counted information can be removed from the estimator. Cunningham et al. (2013) developed an anti-factor for use within factor graphs to explicitly remove double-counted information for cases in which information-sharing topology is tracked. Choudhary et al. (2016) demonstrated the use of the distributed Gauss–Seidel algorithm for multi-robot cooperative trajectory estimation with relative pose measurements and showed that this approach scales well to large groups and has communication needs that scale linearly with the number of robots. Further, Dagan and Ahmed (2021) presented several algorithms for using factor graphs in peer-to-peer decentralized sensor fusion applications that are heterogeneous in nature such that agents have their own local states and states of common information shared with other agents in a neighborhood. In their approach, if a tree-structured communication topology is assumed, a channel filter can be used to track common information such that information is not double-counted, allowing each robot to estimate the global posterior of local variables and the variables of common interest with its neighbors. Several other successful frameworks of multi-agent cooperative SLAM with factor graphs have also been developed (Matsuka & Chung, 2023; Tian et al., 2022).
Several open-source factor graph software implementations for various navigation problems are available, making the framework more accessible for the navigation research community to leverage. For those seeking to begin working with factor graphs, several simple examples delivered with the GTSAM library are packaged with an informative tutorial (Dellaert, 2012). For more complex visual–intertial–odometry and lidar SLAM solutions, several popular implementations with open-sourced code are available, e.g., Kimera (Rosinol et al., 2020) and lidar-odometry smoothing and mapping (Shan et al., 2020). For GNSS applications, software implementations of robust (Watson & Gross, 2017; Watson et al., 2019, 2020) and precise (Watson & Gross, 2018) carrier-phase positioning have been built upon GTSAM and made open-source. Likewise, the GraphGNSSLib associated with Wen and Hsu (2021) is available open-source. The research community can benefit from the factor graph framework by leveraging available open-sourced implementations as a baseline in order to more readily tackle new research questions.
7 CONCLUSION
Factor graphs and their associated computational machinery represent a generalization of Kalman-filter-based estimation systems. In this paper, we have briefly described factor graphs, presented the instantiations of a factor graph that can be efficiently computed, and provided several approaches for making a factor graph more computationally efficient to solve. We have also shown how a factor graph can more accurately solve nonlinear systems and discussed some of the navigation applications for which factor graphs have already made a significant impact in the field. Because of their ease of use, extensibility, and low computational overhead, we believe that factor graphs represent an appealing choice for many navigation-based estimation problems and hope that this paper helps readers to understand and start using factor graphs for their specific problems.
HOW TO CITE THIS ARTICLE
Taylor, C., & Gross, J. (2024). Factor graphs for navigation applications: A tutorial. NAVIGATION, 71(3). https://doi.org/10.33012/navi.653
APPENDIX A MARGINALIZATION DETAILS
To understand how marginalization is implemented in a factor graph, two linear algebra properties are used to modify Equation (10). First, note that any orthonormal matrix (i.e., any matrix E such that I = E⊤E) can be applied on the left side to the weighted sparse matrix L and residual vector y without changing the value of as shown below:
15
Second, let us assume that we have an L matrix that can be partitioned into four submatrices such that the bottom-left submatrix is all zeros. In this case, we can break down the normal equation as follows:
16
Note that A, B, and D represent arbitrary matrices and do not have any specific meaning. This formulation is beneficial, as solving for requires only y2 and D.
Using these two mathematical properties, an algorithm for marginalizing hidden variables out of the factor graph can be established:
Marginalization Algorithm
Reorganize L and such that the variable to be eliminated is in the first (block) column of the L matrix and the top of the vector.
Find an orthonormal matrix that rotates all entries in the first (block) column to zero except the first (block) row.
Apply this rotation to L and y. Any rows modified by the rotation now become “fixed,” i.e., the approximations of their values are now linear and the derivatives do not change with iterations.
Some additional explanation is needed for each of these steps. In the first step, note that the ordering of rows and columns in the matrix and state vector is arbitrary; thus, this step does not modify the graph or optimization problem in any meaningful way. To perform the second step, either Householder or Givens rotations (Bierman, 1977; Golub & Van Loan, 2013) can be used. Both of these operations are commonly used to find orthonormal matrices to zero out all elements of a column, e.g., when performing a QR decomposition of a matrix.
For the third step, there are two items of interest to note. First, when an orthonormal matrix is applied to a sparse matrix, “fill-in” will occur, creating nonzero entries for elements that were previously zero. The orthonormal rotations do not affect rows that already had a zero in the first column; thus, fill-in will occur only for rows in which there was a nonzero entry in the first column of the original matrix. This fill-in in the matrix representation is equivalent to adding a new factor in the factor graph that connects all hidden variable nodes that were connected to the node being eliminated, corresponding to the phenomenon illustrated in Figure 5.
Second, to complete the third step, we need to create new (linear) factors that connect all of the neighbors of the variable being eliminated. To create these factors, let us concentrate solely on the rows of the reorganized L matrix (i.e., after the first step) that have nonzero entries in the first (block) column. Let us partition this submatrix of L into four submatrices as follows:
17
With these steps, a new linear factor is defined with a measurement value of , where D′ defines the linear function. Adding this factor to the graph and removing the variables being marginalized (corresponding to the columns for A′) completes the marginalization process.
Footnotes
↵1 We assume most practitioners of navigation have some exposure to the Kalman filter.
↵2 For pedagogical purposes, different colors are used for different types of factors: green squares represent dynamics, black rectangles represent measurements, and the yellow box represents a prior.
↵3 The column ordering can have a significant impact on the computational time required when dealing with sparse matrices, but the accuracy of the solution does not change with different orderings.
↵4 This matrix is typically found by using a Cholesky decomposition, although other approaches are possible.
↵5 Note that this matrix R is different from the R that represents the covariance on measurements in Equation (1). This R is typically a temporary variable, and it should be clear which meaning is intended from the context.
↵6 The bottom-right corner of the inverse of an upper triangular matrix can be computed using only the bottom-right corner of the input matrix.
↵7 In some scenarios, this temporal cross-correlation information can be very important (Dolloff, 2012, 2013).
↵8 For a detailed explanation of computing these derivatives, see the work by Blanco-Claraco (2021).
↵9 Because this is implemented in Python, approximately 90% of the time is spent in computing L, with only approximately 10% being spent in actually solving the normal equation.
↵10 This procedure can be performed as an “elimination game” in the graph representation or as a QR decomposition in the linear algebra representation.
↵11 Note that all code used for this example and to generate all of the graphs (and some of the other figures in this paper as well) is available at https://github.com/cntaylor/factorGraph2DsatelliteExample.
↵12 Note that the EKF is essentially a factor graph with a sliding window size of unity and only a single Gauss–Newton iteration. Therefore, the EKF is a “subclass” of factor graph optimization, meaning that any technique that can be used to improve EKF performance should also be suitable for factor graphs.
↵13 The full simulation is run, but only the second half of results are considered for comparison.
↵14 In reality, this problem has been the main driver of most advancements in the factor graph literature.
↵15 Note that because optimization occurs on the negative log of the PDF, the PDF or the PDF multiplied by any scale factor will result in the same optimization problem. Therefore, the functions do not strictly need to be PDFs (integrate to 1).
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.