Abstract
Smartphone-based Lifelog (automatically annotating the users’ daily experience from multisensory streams on smartphones) is in great need. Accurate positioning under any situation is one of the most significant techniques for a desirable Lifelog. This paper proposes to detect location-related activities and use the activity information to improve positioning accuracy. In the proposed system, a human activity recognition module is developed to extract location-related activities from multisensory streams of smartphones. After that, the proposed system integrates activity information with PDR-based positioning results in a context-based map-matching framework. The developed system can be used for both outdoor and indoor scenarios. Moreover, the developed indoor positioning method is used to determine the positions of calibration points automatically in an auto-calibration Wi-Fi positioning system. The proposed methods achieve 3.1-m accuracy in outdoor and average 2.2-m accuracy in indoor situations.
1 INTRODUCTION
The smartphone has become one of the most frequently used personal computing platforms. With the increase of processing, communication, and sensing capabilities on smartphones, the development of the smartphone-based intelligent service is highly expected. Smartphone-based Lifelog automatically annotates the users’ daily experience by recording the location data, activities, and device operation history through multisensory streams in smartphones. Users can check the history of Lifelog to recall events, review the progress, and make self-improvement. In addition, user preferences contained in the Lifelog data can also be utilized for marketing and city development.
The “Move X”1 and “Lifelog”2 applications are two examples of off-the-shelf Lifelog applications. Move X provides the function for recording transportation modes and places users visit. Lifelog also has the function of transportation mode recognition. In addition, Lifelog can record the physical state of the human body with a smart band. However, our investigation indicates that the current Lifelog applications mainly have three issues that need improving. First, distinguishing the train mode and non-train mode can provide more detailed information on the usage of the public traffic facilities, but current applications cannot distinguish train mode from bus and car modes. The second issue is the positioning accuracy in a city. In location recording tasks performed in a deep urban environment, most of the Lifelog applications cannot work well because of the degraded Global Positioning System (GPS) positioning results. Third, all the investigated applications focus on outdoor cases, and it is necessary to develop a Lifelog application that can work for both outdoor and indoor scenarios. Based on the above analysis, we found that providing more detailed transportation mode and accurate location information are two important issues when we improve the performance of a Lifelog application. In addition, extending the outdoor Lifelog to an indoor scenario is also significant for increasing the generalization of Lifelog.
Providing accurate and detailed transportation mode information would be beneficial to both individual users and city planners. Most researchers of smartphone-based transportation mode detection follow several steps: collecting data with smartphones, extracting features from the data, training the classifier, and classifying transportation modes based on the trained classifier. The differences lie in the data type used (acceleration or GPS data), extracted feature type (temporal features or features in frequency space), window size, and so on. Zheng et al.3,4 proposed to detect five different transportation modes, including stationarity, walking, biking, driving, and traveling by bus, using features extracted from GPS data. In addition to the movement direction, velocity and acceleration estimated from raw GPS measurement, Geographic Information System information of street segment is also considered for determining the transportation mode.3,4 Reddy et al.5 proposed a traffic mode classification system. The proposed system is developed based on a mobile phone with built-in GPS receiver and accelerometers. Two types of information are used in the traffic mode classification: the speed information captured by the GPS receiver and locomotion information recorded by accelerometers. Since GPS is not available in some situations, such as taking a subway or walking in the underground, GPS information was excluded in the traffic mode classification. Yu et al.6 chose accelerometers, magnetometers, and gyroscopes to classify five traffic modes: still, walking, running, biking, and on a vehicle. Shafique et al.7 proposed to use acceleration data to predict walking, bicycle, car, and high-speed train. Gao et al.8 proposed a hierarchy classifier to output more detailed activity information, such as stationary, walking, running, ascending stairs, descending stairs, stationary vehicles with the engine on, moving underground trains, moving trains, and moving buses.
The quality of the location recording service in Lifelog highly depends on the accuracy of the positioning. The Global Navigation Satellite System (GNSS) is the most developed positioning system in the world, and it is widely applied in outdoor scenarios. The GNSS positioning method has quite a good positioning performance for the open sky area. However, its positioning accuracy drops substantially in urban areas due to reasons like multipath interference, non-line-of-sight (NLOS) reception problems, or signal blockage. Various technologies of GNSS have been developed to mitigate multipath effects.9,10 With the development of ranging technologies, the 3D map of the urban city became available on the market. The 3D building map has been used for estimating the multipath and NLOS effects in order to mitigate the GNSS positioning error.11-13 Results of 3-m positioning error on average have been reported in the 3D map-aided GNSS positioning method developed based on a commercial grade receiver (u-blox).14,15
Recently, most buildings like shopping malls or office buildings are deployed with lots of Wi-Fi access points (APs), which leads to a promising Wi-Fi-based indoor positioning solution. The trilateration-based method and fingerprint-based method are two common Wi-Fi-based positioning technologies. However, trilateration prefers a line-of-sight environment,16 and needs the accurate location information of each AP.17 Wi-Fi fingerprint creates a radio map with the different locations (reference/calibration points) and corresponding Wi-Fi signal properties in the offline phase. In the online phase (positioning phase), the similarity between the received Wi-Fi fingerprint and the fingerprint pre-stored in the radio map is evaluated for positioning.18-20 The more accurate positions of the calibration points can be a benefit to the better positioning results in Wi-Fi fingerprint.21 However, manually creating the accurate radio map is time consuming, thus, the automatic method is expected.
Unlike Wi-Fi- and GPS-based positioning methods, a Pedestrian Dead Reckoning (PDR) solution will not suffer from received signal problems. The PDR method collects data from magnetometers, gyroscopes, and accelerometers embedded in a smartphone and estimates pedestrian trajectory based on three components: step detection, step length, and heading direction estimation.22 Pedestrian steps are detected using peak detection from the vertical acceleration.22 The step length can be estimated based on pedestrian height23 or vertical acceleration.22,24 Magnetic field combined with acceleration is used to calculate the heading direction.25 In addition, some PDR methods also integrate the gyro measurements to determine the heading direction.26 The advantage of the PDR system is that it can output a smooth trajectory. In addition, the PDR method can be used in both indoor scenarios and outdoor scenarios. However, PDR needs a starting point to initialize its position update process and also suffers from the error accumulation problem. The additional positioning resulting from Wi-Fi and GPS systems can be considered for providing the start point and rectifying the accumulated error of the PDR.22,25 On the other hand, the inaccurate positioning source will degrade the performance of the PDR system. Thus, the candidate method for integration with PDR should have a satisfying performance.
The stand-alone PDR-based positioning method can also be improved when it is possible to establish a relationship between the current locomotion and the information on a map.27 This procedure is called map matching.28 Gusenbauer et al.27 projected an estimated position to potential vertical links on a map (staircases and elevators) based on a corresponding movement pattern. Kakiuchi et al.29 proposed a PDR reset method by “snapping” the position where turning activity is detected to the nearest corner to eliminate the accumulated error. The abovementioned methods use the idea of nearest-point map matching. Recently, researchers proposed to use a sequence of activities-related contexts to optimize the PDR trajectory. Compared with the point-to-point map matching, the sequence-to-sequence map matching has a higher possibility of correcting the positioning error. Lu et al.30 proposed a context-recognition-aided PDR localization model to calibrate PDR. The context is detected by employing particular human actions (turning left and right), and it is matched to the context pre-stored offline in the database to obtain the pedestrian’s location. Zhou et al.31 utilized a similar activity landmark-based map matching but with a more complex Hidden Markov Model (HMM), which takes the length and direction of each segment into consideration. In the map matching, turning and taking elevators were considered as activity landmarks. In the new publication by Zhou et al.32 taking the escalator was added to “location-related activity” to improve the accuracy of map matching.32 These efforts indicate that “location-related activity” -aided map matching can be used to improve the positioning accuracy of the PDR method.
By considering the abovementioned limitations and challenges, this paper proposes to improve the available algorithms for detecting location-related activities and further uses the location-related activity information in the context-based map matching for enhancing the positioning accuracy in outdoor and indoor environments. In addition, this paper proposes an auto-calibration Wi-Fi positioning method in which the developed indoor positioning method is used to determine the positions of calibration points automatically when the Wi-Fi fingerprint database is created.
Figure 1 shows an overview of the proposed Lifelog system. The proposed system estimates the pedestrian trajectory based on a PDR method and detects transportation mode and public facility-related activities from smartphone sensor data. The concept of the Lifelog system was introduced in our conference paper.33 In transportation mode detection, this research proposes to adopt the data of accelerometers and magnetometers to make a distinction between different modes, including still, walk, run, vehicle (bus or car), and train. In outdoor scenarios, turning, waiting at a traffic light, and passing exit/entrance activity are further detected based on transportation modes. Similar to the related works of activity detection in indoor scenarios,32 turning, taking elevators (up/down), and taking escalators (up/down) are considered as location-related activities in this paper. In addition, this paper newly proposes to detect the shopping behavior and consider the shopping behavior as the location-related activity in indoor positioning. In summary, turning, escalator down, escalator up, elevator down, elevator up, walking in corridor (out of shop), and shopping in store, ie, seven activities, are detected in the indoor scenario. The detected location-aware activities and map information are imported to the context-based map matching for optimizing the PDR trajectory. In addition, the proposed auto-calibration Wi-Fi positioning method uses the optimized PDR trajectory to determine the positions of calibration points when the Wi-Fi fingerprint database is created. As shown in Figure 1, the parts marked in green denote the activity detection and positioning in the outdoor scenario, the parts in blue correspond to the activity detection and positioning in the indoor scenario. These two scenarios will be explained in Section 2 and Section 3, respectively.
Overview of the proposed Lifelog system
The rest of the paper is organized as follows: Section 2 describes the traffic mode detection and map matching-based outdoor positioning method. Section 3 explains how to detect indoor activities and use the detected activities to improve the PDR trajectory in the indoor scenario. In addition, Section 3 also describes the proposed Wi-Fi auto-calibration method. Section 4 evaluates the proposed systems. Finally, Section 5 concludes this paper.
2 TRAFFIC MODE DETECTION AND CONTEXT-BASED MAP MATCHING FOR OUTDOOR SCENARIO
When we analyze our daily activities, it is easy to find that we usually stay at several indoor places, such as home, office building, and shopping mall. In addition, traffic-related activities happen moving among the indoor places. In this section, the traffic-related activities are categorized in the outdoor scenarios, and human activities in the office building or shopping mall are discussed in indoor scenarios by contrast. Thus, this paper also presents the Lifelog system based on outdoor and indoor scenarios.
2.1 Transportation mode detection
The most important traffic-related activity is the transportation mode used for moving between different places. First, the transportation mode is expected to be recorded by personal usage and city development. Second, the transportation mode includes location-related information that can be used for improving the positioning accuracy as well. We propose to detect the traffic mode using accelerometers and magnetometers of a smartphone. Figure 2 shows the flowchart of transportation mode detection. This paper is aiming to detect five kinds of traffic mode: walk, run, still, train, and vehicle (car or bus). In this paper, both car and bus are considered as vehicle mode. “Still” refers to the status of a static pedestrian. The proposed transportation mode detection has been published in our previous publication.34 This paper briefly explains the main idea of the transportation mode detection. Figure 3 visualizes the magnitude of linear acceleration and magnetic field in different transportation modes. “Linear acceleration” denotes the acceleration excluding gravity effect. Different colors correspond to different transportation modes. We can see that different transportation modes usually exhibit different characteristics in acceleration and magnetic field readings. Run mode and walk mode have much larger acceleration than the other four modes. The magnetic field in train mode changes drastically due to the power lines and the electricity generated by the train’s acceleration. Since the orientation of the smartphone is changing when it is placed in the pocket, it is not productive to consider the acceleration value and magnetic field value in three separate axes. This paper computes the magnitude of measurements from all three axes for the transportation mode detection.
Flowchart of transportation mode detection
Magnetic field and acceleration value under different transportation modes
In this research, a sliding window method has been applied prior to feature extraction. According to the suggestion of the related work,5 this paper sets the size of the time window as one second. Two consecutive windows have 0.9 s overlap in this research. Then, a set of 14 features including statistical features (eg, mean and variance) and frequency-domain metrics (DFT energy coefficients between 1 and 5 Hz) of the data of accelerometers and magnetometers are calculated for each sliding window. According to the analysis in related work,5 statistical features of the data of accelerometers can be used to infer if an individual is running, and the DFT coefficients help in differentiating between the foot-based transportation modes. Inspired by,5 this research also adopts a similar idea to extract the features. Finally, the detection system uses the 14 features to build a classifier based on Random Forest Model to recognize five types of transportation modes (walk, run, still, train, and vehicle). Section 4 will give the evaluation for the transportation mode detection.
2.2 Context-based map matching based on location-related activities
With the development of urban mapping systems, most road networks and public facilities have been included in the open map source for public and research usage. It is possible to apply map matching to have meter-level accuracy in the case of vehicle modes.35 The most challenging case is the non-vehicle case because pedestrian movement is very flexible compared to vehicle movement. Thus, we expect to extract more location-related activities when pedestrians move in non-vehicle mode (run, walk, and still). This paper proposes to use three types of location-related activities for improving the positioning accuracy.
First, waiting activity at a traffic light can be considered as still mode for a period of time. The still activity can be detected with the developed transportation mode classifier. Second, this paper uses the data from gyroscopes of a smartphone to recognize turning activity. We first employ the PDR system to detect pedestrian steps and then integrate the angular velocity about the vertical axis between two neighboring steps. If the value is larger than a threshold, a turning activity is detected. The threshold value should be determined carefully. For example, in an area with a right-angled corner, the threshold can be set to 1.5 radians. But when there exists some 45-degree corners, the threshold value should be set to a smaller value to assure the turning activity can be detected correctly. This threshold could be dynamically decided based on the information on the map. In order to obtain the vertical axis from the sensors when the orientation of the smartphone keeps changing, we need to transfer the raw measurements from local (defined by smartphone) to global coordinates. Android releases an open Application Programming Interface (API) called getRotationMatrix to provide the transformation. Based on our experience, the accuracy of the provided rotation information is more than 98%. The idea of the transformation is to detect the gravity and then to match the three axes’ acceleration with it. The details of the explanation can be found in our previous publication.22
Third, in the urban city, there are many indoor parking lots and subway stations. Based on the investigation of the authors, more than 93% of exit/entrances of stations on Tokyo Marunouchi Line are connected with stairs, escalators, or elevators. This paper proposes to use the barometer data to detect the height change caused by going up and down through the stairs, escalators, or elevators in order to detect the passing exit/entrance activity. Based on the international barometric formula,36 the relationship between air pressure and altitude can be simplified as
1
where h is altitude in meters and P is the measured air pressure. This equation is used to approximately convert the measured air pressure to the altitude in this research. This research empirically assigns 3.5 m as the threshold to recognize the altitude difference between the train platform and the exit/entrance.
In addition, when pedestrians pass the exit/entrance and go out of the station, the GPS signal will become available. Thus, passing exit/entrance activity can be detected by the combination of pressure variation and GPS availability. If the altitude changes more than 3.5 m after a pedestrian finishes a train transportation mode and the GPS positioning result becomes available, the passing exit activity is detected. The passing exit activity is determined at the moment of the GPS positioning result becoming available. On the contrary, when the GPS positioning result becomes unavailable and the altitude changes more than 3.5 m before a pedestrian performs a train transportation mode, the passing entrance activity is determined at the moment of the GPS positioning result becoming unavailable. In this algorithm, it is necessary to detect a train transportation mode to determine the passing entrance/exit activity. This means that there is a time delay in the recognition of the passing entrance activity because of the necessity of the detection of the train transportation mode. In summary, the three activities (waiting at a traffic light, turning, and passing exit/entrance activity) are used for the following context-based map matching.
Because of the degradation of GNSS performance, this research uses the PDR to track the pedestrian’s walking trajectory and then integrates the raw PDR results, location-related context information, and 2D road maps in an HMM for improving the positioning performance. HMM is a probabilistic sequence model: given a sequence of units (words, letters, morphemes, sentences, etc.), HMM computes a probability distribution over possible sequences of labels and chooses the best label sequence as the result. HMM has been used for map matching in related works.30,32 According to the statement in the related work,30 the output sequences of characteristic contexts satisfy the Markov property. This paper uses HMM to match the location-related activities with the 2D road map. In the outdoor environment, turning, passing exit/entrance, and waiting at the traffic light are considered as location-related activities, and the positions of a corner, exit/entrance, and traffic light exist on the 2D road map.
The left image of Figure 4 gives an example of a pedestrian-walking trajectory from the exit of a station. The left part of Figure 4 shows the structure of the HMM-based map matching model for this example case. In this example, the pedestrian first goes out from the exit of the subway station, then turns left two times without waiting, and arrives at the destination. This research assumes that the person keeps walking along a straight line from station to corner and between corners. The basic idea of the HMM-based map matching is to find a route from the map, and the detected activity sequence most likely to happen in the chosen route. Based on the regulations of HMM, the activity sequence is denoted as observation O. The location sequence or trajectory is defined as S. HMM automatically computes a probability distribution over possible sequences of the location nodes in the map, and chooses the best matched location sequence S as the result by
HMM model for map matching using location-related activities
2
where λ denotes the parameters in HMM. ai,j and bi,j,k are the transition probability and the emission probability, respectively. As shown in the right side of Figure 4, the HMM model includes three parts: Initial state, Hidden states, and Observation.
2.2.1 Hidden states
For an area on the map with N corners, M exit/entrance, and Q traffic light, the hidden states are denoted as
3
where subscript C, E, and T denote the corners, exit/entrance, traffic light, respectively.
2.2.2 Initial state
For the above mentioned area, the initial state is different based on the first activity:
We used the method described in the second paragraph of this subsection (Section 2.2) to detect the activity. After the first relevant activity is detected, the map matching can be performed. It is possible that there are some epochs remaining before the first activity is detected. In this case, the data between the first activity and final activity will be used for map matching. The epochs before the first activity will not be used for map matching.
2.2.3 Transition probability
Assume that A is the transition probabilities among any two hidden states in HMM, where A = {ai,j}, ai,j = p(sj,k|si, k ‒ 1), 1 ≤ i,j ≤ (N+M+Q), aij is the probability of the hidden state sj,k at the k-th epoch time given the hidden state si,k ‒ 1 at the previous (k-1)-th epoch time. Transition probability is defined according to a simple idea that the pedestrian can only move between adjacent contexts. Therefore, in the case that sj,k and si,k ‒ 1 are not connected in the map, the value of ai,j is zero. Otherwise, the value of transition probability follows the idea of equal distribution. For example, if state si,k ‒ 1 has three possible adjacent contexts, then each transition probability starting from state si,k ‒ 1 to the connected context is defined as 1/3.
2.2.4 Observations
The observation contains three parts: the distance a user walks between two subsequent activities, the walking direction between two subsequent activities, and the activity type. If this is the first detected context, only the activity type is taken into consideration.
4
2.2.5 Emission probability
Emission probability is defined as B = {bi,j,k}, bi,j,k = p(ok| si,k ‒ 1, sj,k), which means the probability that the hidden state si,k ‒ 1 and sj,k performs in the case of the observed state ok. sj,k is the chosen hidden state corresponding to observation state ok, and si,k ‒ 1 is the hidden state selected for the previous observation ok ‒ 1. When HMM is at the state sj,k, the last state si,k ‒ 1 is already selected. Unlike the usual HMM whose emission probability is only ruled by state sj,k, the proposed HMM makes a modification to let the emission probability be ruled by the current hidden state and the last hidden state. In this way, we can compare the distance difference di,j,k and angle difference θi,j,k with each observation, which also contains distance and direction information:
5
6
Suppose the ok ‒ 1 and ok are detected at time k ‒ 1 and k, pdrk ‒ 1 and pdrk are two points on the PDR trajectory at time k ‒ 1 and k. means a straight line connecting two points pdrk ‒ 1 and
is a straight line connecting two locations corresponding to si,k ‒ 1 and sj,k. This paper uses zero-mean Gaussian distribution to model emission probability of the distance difference and angle difference for the given hidden state:
7
8
where σd and σθ are standard deviation of measured distance and angle, respectively. We empirically decided the constant value for σd and σθ in this research. σd and σθ are set as 2 m and 20 degrees, respectively. In addition, the probability of the detected activity for a given hidden state can be obtained from the activity detection confusion matrix. Here, ack means the activity type at k. Finally, emission probability calculations are summarized as
9
Then, we can solve the HMM matching problem using the Viterbi algorithm.32
3 LOCATION-RELATED ACTIVITY DETECTION AND POSITIONING IN INDOOR SCENARIO
A large-scale building, such as a shopping mall, usually has multiple floors, thus indoor positioning needs to provide both 2D position on each floor and the correct floor level estimation. The proposed indoor positioning method utilizes the vertical displacement activities to estimate the floor level and then performs the positioning on each floor based on the context-based map matching. The initial idea of indoor positioning was published in our previous conference paper.37
3.1 Vertical displacement activities and floor level estimation
Usually, there are seven vertical displacement activities happening in the indoor scenario: staircase down, staircase up, escalator down, escalator up, elevator down, elevator up, and same floor. In large-scale buildings like a shopping mall, customers usually use escalators and elevators instead of staircases. This paper limits the vertical displacement activities into five types: escalator down, escalator up, elevator down, elevator up, and same floor activities. The barometer has shown more robust performance compared with accelerometers in the detection of floor change.38 However, absolute pressure value varies due to altitude, weather, humidity, and some other factors, and hence it is not reliable to estimate the floor level from the pressure value directly.39 This paper proposes to recognize the vertical displacement activities using the pressure data from a barometer then count the change of the floor level with the aid of accelerometers. In fact, this paper detects how many times the customer passes the landing area with the aid of accelerometers. In this way, the accelerometers are used to count the change of the floor level.
Figure 5. shows the pressure and acceleration change when a customer moves between the first floor and the fifth floor. In this example, T1 to T9 denote the different time periods. The customer first takes an escalator up to the second floor (T1) and looks around for a while (T2). Then he or she utilizes an escalator to ascend to the fourth floor (T3, T4, and T5). After walking for a long time on the fourth floor (T6), the customer takes an escalator to ascend again (T7). Finally, after looking around on the fifth floor (T8), the customer uses the elevator to descend to the first floor (T9).
Change of acceleration magnitude and pressure in different vertical displacement activities
The visualized data in Figure 5 indicates the following four points. First, the pressure data changes dramatically in elevator- and escalator-related activities compared with movement on the same floor. Second, the pressure changes faster in the elevator activity than in the escalator activity because the elevator moves faster than the escalator. Third, when customers use escalators to ascend for multiple floors, they have to walk at the landing area, which is the place between two escalator segments as shown in the picture of Figure 5. The acceleration changes dramatically in this time period (eg, T4 time period). Fourth, the relative pressure change is accurate. The customer has ascended four floors by the escalator and descended four floors using the elevator. The change in pressure is the same in the ascending and descending processes.
Based on the abovementioned points, this paper proposes to integrate the pressure data and acceleration to estimate the floor level. Figure 6 shows the flowchart of the proposed method. First, the proposed system detects the vertical displacement activity using the pressure data from the barometer. Recently, the Long Short-Term Memory (LSTM) network and its variant have shown good performance in tackling various sequence modeling tasks. The LSTM deep neural network has been widely used for human activity recognition.40-42 An LSTM layer is a recurrent neural network (RNN) layer, which supports time and data series in the network. The greatest advantage of the RNNs is their capability to take contextual information into consideration when mapping between input and output sequences through hidden layer-units.40 LSTM can automatically extract useful features and model the inexplicit criterion. This paper adopts the LSTM network to build a classifier to recognize the different vertical displacement activities. When we estimate the activity for time t, an 8-s length of pressure data before time t is used as the input of the developed displacement activity recognition network. The output of the displacement activity recognition network includes five statuses: escalator up, escalator down, elevator up, elevator down, and same floor.
Flowchart of proposed floor level estimation method
Second, in order to detect walking at the landing area, the proposed system will check the vertical acceleration data in the time period of the escalator activity. A simple thresholding method is implemented for finding walking activities. If the acceleration variance in 1 s is larger than a threshold, and it lasts for more than 4 s, the passing landing area is detected. The threshold is set as 1 in this research. When walking activity is detected, continuous escalator activity will be divided into several segments, as indicated in the T3 and T5 areas in Figure 5. Thus, the change of the floor level can be calculated by counting the number of segmented escalator activities.
Third, this paper focuses on the floor level estimation based on escalator activities because most customers use escalators in large shopping mall environments. However, it is feasible to develop the floor level estimation algorithm based on elevator activities, when the relationships between the pressure data and the floor level are provided.
3.2 Shopping activity detection and positioning on floor in indoor scenario
Recently, the indoor floor map in some large-scale buildings became available for public research use. In the building floor map, locations of facilities, such as corners, escalators, and elevators are good reference information for map matching. The escalator and elevator activities can be detected when vertical displacement happens, and turning activity can be detected from PDR. In addition to these facilities, the area of each shop has also been included in the building floor map. On the other side, customer behavior shows different patterns while shopping as compared with normal walking in corridors.
Figure 7 shows the difference of the acceleration in the two activities: moving in a shop and walking in corridors. When customers are walking in corridors, they are stepping regularly. On the other hand, customers often move irregularly and slowly when they stay in the shop area because of carefully looking for products or walking around without purpose. Following the idea of classifying different types of displacement activities, this paper proposes another LSTM-based classifier to distinguish shopping behavior from normal walking in corridors. The developed classifier uses 8-s data of acceleration and gyroscopes as the input and output walking in corridors or shopping label for every epoch. The sampling rate of the data of accelerometers and gyroscopes is 10 Hz.
Acceleration in the case of shopping and walking in corridors of shopping mall
Similar to the outdoor scenario, this paper also adopts location-related context-based map matching for the positioning on the floor in the indoor scenario. The flowchart of the context-based map matching in the indoor scenario is shown in Figure 8. In order to perform the map matching in the indoor scenario, a floor map is needed. With the development of the mapping technologies, the indoor floor map becomes available for public usage. For example, Google map has provided the floor layout for some buildings in urban cities. In the floor map, the area of each store and the positions of escalators and elevators are available. In this research, we build a link-node map by referring to the Google map and manually extract the position information from the Google map. The escalators, elevators, stores, corners, and corridors are defined as the nodes with the position and semantic information. The research uses PDR to track the customer’s walking trajectory on each floor, then integrates the raw PDR results, location-related activities information, and floor maps in an HMM for improving the positioning performance.
Flowchart of context-based map matching for indoor scenario
3.3 Auto-calibration for Wi-Fi fingerprint in indoor scenarios
Our experience on indoor positioning indicates that the performance of Wi-Fi-based positioning service provided by Google cannot yield a satisfying position estimation in indoor scenarios. One of the possible reasons is that Google adopts a crowdsourcing way to collect the Wi-Fi fingerprint. Android Location Services periodically checks on user’s location using GPS, Cell Identity (Cell-ID), and Wi-Fi. At the same time, the Android phone will send back publicly broadcast Wi-Fi access points’ Service Set Identifier and Media Access Control data. Google updates the fingerprint if the GPS accuracy is good. However, since GPS is not available or has bad positioning accuracy in buildings, it is in fact hard to use crowd sourcing techniques to collect the accurate indoor Wi-Fi fingerprint.43 Cell-ID is the basic positioning method utilizing the cellular network’s knowledge about the serving cell of the user equipment and the position of the user equipment is represented by the serving Cell-ID. Cell-ID accuracy depends on the inter-site distance of the network deployment; and the positioning accuracy is in the range of hundreds of meters.44 Obviously, the Cell-ID-based positioning results cannot be used to collect Wi-Fi fingerprint because of the low accuracy.
To enhance the performance of Wi-Fi fingerprint, this research proposes an auto-calibration method in the construction of the Wi-Fi fingerprint radio map. The proposed context-based map matching method estimates the accurate trajectory when customers walk in a building; then pairs of customer position coordinates and fingerprints are constructed automatically and added into the Wi-Fi fingerprint database. Each position on the customer-walking trajectory could be one calibration point in the Wi-Fi fingerprint radio map. This idea can be combined with the current existing Wi-Fi fingerprint positioning methods, which require manual labeling for the position of calibration points. In this paper, we use a Deep Neural Networks (DNN)-based Wi-Fi fingerprint method45 to illustrate the effectiveness of our proposed method. According to suggestions in related works,45,46 conventional Wi-Fi positioning solutions are time-consuming with parameter tuning. Machine learning approaches are an alternative solution due to less parameter tuning.45 The achieved results suggest that the DNN approach might achieve comparable results to the traditional approaches.45,46 Therefore, the DNN-based Wi-Fi method is chosen as an example to explain how to apply the auto-calibration idea into Wi-Fi fingerprinting. The auto-calibration idea can be combined with other Wi-Fi fingerprinting methods as well.
Figure 9 shows the structure of DNN applied in Wi-Fi positioning in this research. The HL stands for hidden layer. DNN includes three parts: Encoder, Decoder, and Classifier. The Encoder structure and Decoder structure are called stacked autoencoder. The Encoder and Decoder parts are trained together to obtain the Encoder. When the learning of weights of the stacked autoencoder is finished, the decoder part of the network is disconnected and a typical classifier part is connected to the output of the Encoder. The Encoder and classifier structure are trained again. The Stacked autoencoder, including encoder and decoder, is used to reduce the dimensionality of the input data and denoise by learning the reduced representation of the original data. Due to the fact that the dimensionality of the layer between encoder and decoder is smaller than the size of the input vector, the network has to learn the reduced representation of information provided at the input.45 The classifier used in this research is also a neural network, including a hidden layer and a softmax layer.
Frameworks of DNN-based Wi-Fi fingerprint. “HL” stands for hidden layer. The HL (256) means the hidden layer has 256 neurons
The implemented Wi-Fi fingerprint method includes two phases. For the offline phase, the system obtains the position of calibration points from the positioning results of the context-based map matching and stores the position and corresponding fingerprint into the database. In the offline phase, in order to obtain the autoencoder, the input and output of autoencoder are set to be equal. The encoder and decoder are trained using the signal strength (RSSI) of the Wi-Fi. Next, the already pre-trained encoder part is connected to a classifier to distinguish the different positions based on the received Wi-Fi scans. The classifier is trained using both RSSI and the position of calibration points. During the online phase, the system predicts the position using the trained encoder and classifier.
4 EXPERIMENTAL RESULTS
To evaluate the performance of the proposed methods, this research builds datasets and performs the test in different scenarios. First, this section shows the evaluations for the transportation mode detection algorithm (Section 4.1) and the context-based map matching in the outdoor scenario (Section 4.2). Second, this section will present the results of the vertical displacement activity detection and floor level estimation in the indoor scenario (Section 4.3). Third, the shopping activity detection and positioning on the floor (Section 4.4) are evaluated. Finally, the performance of the proposed auto-calibration Wi-Fi positioning (Section 4.5) is demonstrated.
4.1 Transportation mode detection results
In order to evaluate the proposed transportation mode detection algorithm, we have six people to participate in our experiments. Data used in this subsection is collected from accelerometers and magnetometers of the smartphone. The sampling rate of the data of accelerometers is 10 Hz, and the data of magnetometers also has 10 Hz. The participant put the smartphone into the pants pocket in the experiments. Our dataset contains 13.4-h data, including 2.89 h of train data, 4.34 h of bus data, 50 min of car data, 2.35 h of still data, 2.03 h of run data, and 53 min of walk data. A 10-fold cross-validation method is used to evaluate the proposed transportation mode detection method. The data collected for each category are imbalanced. In the training process, we first resample the dataset. We randomly add copies of instances from the under-represented classes to make the samples in different categories balance. Table 1 shows the confusion matrix of transportation mode detection. The first column of Table 1 gives the true labels of transportation modes. Other columns correspond to the predicted transportation mode. The detection accuracy illustrates that run mode and walk mode can be detected accurately; the reason for this is that both modes experience fast acceleration changes and acceleration changes more drastically in run mode. Train activity can also be detected with a high rate due to the drastic change in the magnetic field. We finally achieve a 93.1% recall rate on our dataset.
Confusion matrix of transportation mode detection
4.2 Evaluation for context-based map matching in outdoor scenario
In the experiment for the proposed context-based map matching method, one tester puts the smartphone in the pocket and takes a subway to Ginza station, then gets onto the floor from exit, and walks a 460-m trajectory. Figure 10 visualizes the data from different sensors from the time that the pedestrian is getting closer to the exit. After these location-related activities are detected, we apply the context-based map matching to correct the PDR trajectory to realize the positioning in the outdoor environment. The positioning results of GPS, PDR, and the proposed context-based map matching are plotted by dots with the different colors in Figure 11, and the ground truth is visualized by the yellow line. In the experiment, we record the start and end time of each activity. The start and end position (key points) of each activity are manually labeled by referring to a product level satellite image with 25 cm/pixel resolution. Each pixel of the satellite image has a global position. In the experiment, the participant walks along a straight line between two corners and keeps a constant speed in walking. We use interpolation to determine the rest of the positions of the ground truth based on the manually labeled key points.
Visualization of data from different sensors during outdoor experiment
Positioning results and ground truth in the experiment performed in Ginza area
The performance of different positioning methods is summarized in Table 2. The positioning error is defined as follows:
Positioning results with different positioning methods for outdoor environment
10
For the PDR and context-based map matching methods, pei means the estimated position at step i, pgi means the ground truth at step i, and N is the step number. For the GPS positioning method, pei means the estimated position at time i, pgi means the ground truth at time i, and N is the GPS point number. In this research, we suppose that the pedestrian walks with a constant velocity in every walking segment and determine the position at each second or step based on interpolation. The Ginza area is a typical urban area in Tokyo with many tall buildings. GPS positioning result has 33.76-m positioning error. PDR can provide a smoother positioning trajectory compared with GPS, but PDR still has a 27.7-m error because of the error accumulation problem. With the aid of context-based map matching, the positioning error is reduced to 3.05 m. It is easy to see that the proposed method can outperform the GPS and PDR positioning methods.
4.3 Vertical displacement activity detection results
In order to obtain the accurate classifier in the vertical displacement activity recognition, our research group builds a training database that contains escalator down, escalator up, elevator down, elevator up, and same-floor activities. Table 3 shows the time length of each category of data and the number of samples used in training. The data of the different activities are collected in our office building and a large multi-functional building, Marunouchi Building in Tokyo City, respectively. The data of barometer and accelerometers are recorded by a smartphone in the pocket of the tester. The recorded barometer data is the ambient air pressure, and the recorded data of accelerometers is the acceleration excluding gravity. The sampling rate of barometer data is 10 Hz. The ground truth is manually reordered in the experiments. In the training process, we first resample the dataset. We randomly add copies of instances from the under-represented classes to make the samples in different categories balanced. In addition, we use the same process to build a test database for testing the performance of our LSTM-based vertical displacement activity recognition model. The accuracy of the proposed vertical activity recognition model is shown in Table 4. From Table 4, we find that our proposed deep learning model can recognize different vertical displacement activities accurately.
Time length and the number of samples for each type of vertical activity in training dataset
Confusion matrix of vertical displacement activity recognition
In addition, the evaluation for the floor level estimation is also conducted. The tester moves among the different floors in a tall building and collects the test data for the evaluation. The error rate in the floor level estimation is defined as:
11
Table 5 shows the results on three tests. The proposed algorithm can work well in determining the floor level by combining acceleration with pressure data. On average, the proposed method has a 95% correction rate. The 5% error happens at the transition time period between floors. One thing we have to point out is that when elevator activity is detected, we refer to the pressure change between each floor during the escalator activity to roughly estimate floor change during an elevator activity.
Floor level estimation results in three tests
4.4 Evaluation for shopping activity detection and positioning on floor
In order to develop the classifier to distinguish shopping from normal walking, we also built a training dataset, which includes 309-s sensor data recorded in shopping and 313-s sensor data recorded in normal walking. Table 6 demonstrates the results for shopping activity recognition. The error rate in the shopping activity recognition is calculated based on
Shopping activity recognition results
12
On average, the proposed method has a 97.2% correction rate for the shopping activity recognition. There are two reasons for a 2.8% error. First, is difficult to classify the activity correctly at the moment when the customer is entering or exiting the shop. The other reason is that we use 8-s data to recognize the activity, thus, it is difficult to have 1- or 2-s resolution in the recognition result. In addition, there is time delay for the activity detection. This research focuses on the Lifelog application. In Lifelog, it is important to record which store the user has visited. Using the shopping activity in the context-based map matching can improve the positioning accuracy to correctly indicate the stores that are visited by the users. For other navigation applications, this time delay becomes a more serious problem but using shorter window length will decrease the accuracy of activity detection. In the future, we will improve this time delay problem.
In this evaluation for indoor positioning, the tester also conducts experiments in the Marunouchi building because the floor maps in this building are available. In the experiment, the tester walks on different floors. In order to demonstrate the effectiveness of the proposed context map matching method, Figure 12 shows both the positioning results of the PDR and context-based map matching methods. The ground truth of the walking trajectory is indicated by the red line in the first column of Figure 12. The positioning results of the raw PDR are shown by blue points in the second column. In addition, to demonstrate the significance of the shopping activity to improve positioning accuracy, this paper further compares two context-based map matching methods. The first method is the context map matching with five activities: turning, escalator up, escalator down, elevator up, and elevator down. In this case, the shopping in store and walking in corridor are not distinguished and we have to make a unified assumption. We assume that the customer moves along corridors and the PDR trajectory should be matched with the path of the corridor only. The second method is the context map matching with seven activities: turning, escalator up, escalator down, elevator up, elevator down, shopping in store, and walking in corridor.
Ground truth of walking trajectory in test and positioning results from different methods for indoor positioning
Two kinds of results are shown in the third and fourth column in Figure 12. The positioning errors of different methods are summarized in Table 7. The experimental results in Figure 12 and Table 7 demonstrate that the context-based map matching methods have a much better performance than the original PDR positioning method. In addition, the context-based map matching with shopping activity performs better than the HMM without considering shopping activity. The customer can be correctly located in the shop. This is because the shopping activity is used to constrain the customer trajectory in the shop area within the HMM. The typical shopping behavior performed by customers in stores shows irregular steps and walking directions. Thus, the PDR positioning results in stores present complicated trajectories, which include many turns as shown in Figure 12b. In fact, the activity detection mode recognizes the start and end time of the behavior of shopping in store and simultaneously divides the PDR trajectory into the part in store and the part in corridor. When the PDR trajectory is optimized by the context-based map matching with the aid of shopping behavior, the complicated part of the PDR trajectory will be matched with shop mode in the map. In this way, we could mitigate the interference of the complicated trajectories when the whole PDR trajectory is matched with the floor map. On average, the proposed context-based map matching has about 2.2-m of positioning error in the tests as shown in Table 7. In Lifelog, it is important to record which store the user has visited. As shown in the experimental result in Figure 12, the proposed method provides the correct result to the stores that have been visited by the customers. The information of store visits is also valuable for shop owners in improving marketing.
Positioning error with different techniques in indoor environment
4.5 Evaluation for Wi-Fi fingerprint auto-calibration
In the last part, we evaluate the performance of the auto-calibration Wi-Fi fingerprint. In this experiment, a tester takes the smartphone and walks along the same route four times on one floor of the Marunouchi building. The collected Wi-Fi signal in three walks is used to build the Wi-Fi fingerprint database and train the Wi-Fi fingerprint auto-encoder and classifier. The position of each Wi-Fi scan is estimated by the context-based map matching method. The other walk is used to test how accurate the proposed Wi-Fi positioning method can be. The positioning results from Google Wi-Fi and the proposed Wi-Fi fingerprint method are plotted in Figure 13 by blue and red points, respectively. The quantitative evaluation of the positioning results is shown in Table 8. The proposed Wi-Fi fingerprint method outperforms Google Wi-Fi and achieves 3.24 m of positioning error. It proves that the accurate position of the Calibration Points is beneficial for improving the performance of Wi-Fi fingerprint.
Ground truth of walking in test and Wi-Fi positioning results
Positioning result with different Wi-Fi positioning methods
The disadvantages of fingerprinting technology are the database generation and maintenance requirements. The conventional method to create the database is that people carry out the survey manually. When the environment changes significantly (such as after a building renovation or moving of furniture), the database has to be rebuilt.47 In addition, the disadvantages of the conventional method become more serious when Wi-Fi fingerprinting technology is deployed in a wide area. Our proposed method can determine the position of calibration points automatically when creating the database. It overcomes the disadvantages of the manual survey in the conventional method. In this way, our proposed method can improve the scalability of Wi-Fi fingerprinting technology.
5 CONCLUSION
This paper presents a context-based map matching method to improve the performance of positioning technologies for the Lifelog application. First, the developed system detects various location-related activity based on the multiple sensors of the smartphone. The experimental results indicate that more than 93% correction rate is achieved in various activity detection, such as transportation mode, vertical-related activity, and shopping activity detection. Second, the detected activities are integrated with PDR positioning trajectory and map information in a context-based map matching framework. Also, 3- and 2.2-m positioning accuracy are achieved in the evaluation of outdoor and indoor scenarios, respectively. Moreover, 3.2-m positioning accuracy is reported in the evaluation of the proposed auto-calibration Wi-Fi fingerprinting. In this research, the participant always put the smartphone into their pants pocket in the experiments. However, the user may sometimes hold a smartphone when walking in a real situation. In the future, we will aim to optimize the activity detection algorithm to make it work in different scenarios, such as a smartphone in a bag or held by hand. In addition, the positioning result of context-based map matching will be integrated with the positioning result of auto-calibration Wi-Fi fingerprinting to further improve the positioning accuracy and for use in iterative auto-calibration for Wi-Fi fingerprinting.
HOW TO CITE THIS ARTICLE
Gu Y, Li D, Kamiya Y, Kamijo S. Integration of positioning and activity context information for lifelog in urban city area. NAVIGATION. 2020;67:163–179. https://doi.org/10.1002/navi.343
- Received June 3, 2019.
- Revision received September 24, 2019.
- Accepted October 1, 2019.
- © 2020 Institute of Navigation
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.