combined gnss/slam-based high accuracy indoor dynamic

Research ArticleCombined GNSS/SLAM-Based High Accuracy Indoor DynamicTarget Localization Method

Xiaona Zhang ,1,2 Shufang Zhang,1 Qiaosong Li,3 Xin Wang,4 and Shujing Sun5

1Information Science and Technology College, Dalian Maritime University, Dalian 116026, China2School of Media, Tonghua Normal University, Tonghua, Jilin 134002, China3State Grid Jilin Power Supply Company, Jilin 132000, China4School of Business Administration, Wonkwang University, Iksan 570749, Republic of Korea5Communication Engineering Construction and Maintenance of Wired Communication Department, Air Force CommunicationNCO Academy, Dalian 116026, China

Correspondence should be addressed to Xiaona Zhang; [email protected]

Received 29 April 2021; Accepted 15 June 2021; Published 28 June 2021

Academic Editor: Wei Wang

Copyright © 2021 Xiaona Zhang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

As the indoor dynamic target localization does not detect and repair the circumferential jump value in time, which leads to largeposition and attitude errors and low-velocity stability, a combined Global Navigation Satellite System/Simultaneous Localizationand Mapping- (GNSS/SLAM-) based high-precision indoor dynamic target localization method is proposed. The method usesEmpirical Mode Decomposition (EMD) to decompose the noisy signal, obtains the noise energy as the dominant mode from thedecomposed components, extracts the useful signal energy as the main dividing point, removes the high-frequency signal,constructs the low-frequency component to realize low-pass filtering and denoising, selects a suitable threshold processingfunction to make the high-frequency signal component retain the detailed signal effectively to realize high-frequency componentdenoising, detects and fixes the circumferential jump of the observed data, and detects and fixes the circumferential jump ofeach frequency. The indoor dynamic target positioning method is established by combining GNSS/SLAM to achieve highaccuracy target positioning. The experimental results show that the position and attitude errors are small, and the velocity isstable, which indicates that the position information is closer to the dynamic target, i.e., the target positioning is more accurate.To address the problems of scale drift and frequent initialization due to environmental factors in monocular vision SLAM, anUltra Wideband (UWB)/vision fusion localization model that takes into account the scale factor is proposed, which can give fullplay to the complementary characteristics between UWB and vision; solve the problems of visual initialization, scale ambiguity,and absolute spatial reference; and improve UWB localization accuracy and localization frequency as well as reducing thenumber of base stations. The model can reliably achieve 0.2m accuracy in indoor environments with sparse texture or frequentlight changes.

1. Introduction

In recent years, with the growing market for public-facinglocation services, it has become extremely important toresearch an efficient, accurate quotient, and low consump-tion indoor mapping technology, so SLAM technology isgradually becoming a hot topic of research and developmentin the field of robotics. SLAM problem refers to placing arobot in an unknown environment, where the robot incre-mentally creates a continuous map of the unknown environ-

ment while determining its position in the map. It has beenstudied in the field of robotics for decades [1]. The cross-fertilization of mapping, navigation guidance and control,and remote sensing science has given rise to a combinationof different sensors for positioning and mapping instead ofa single navigation system. Multisensor combination naviga-tion systems complement each other’s strengths and advan-tages and can obtain higher navigation performance thanany single navigation system. In outdoor environments, themost common technique is to use a combination of Global

HindawiWireless Communications and Mobile ComputingVolume 2021, Article ID 8380869, 12 pageshttps://doi.org/10.1155/2021/8380869

https://orcid.org/0000-0002-8643-3792

https://creativecommons.org/licenses/by/4.0/

https://creativecommons.org/licenses/by/4.0/

https://doi.org/10.1155/2021/8380869

Navigation Satellite System (GNSS) and Inertial NavigationSystem (IGS) for positioning, which can providecentimeter-level positioning accuracy when GNSS signalconditions are good. In general, as an important researchdirection in the field of navigation, seamless indoor-outdoor positioning technology has received extensive atten-tion and research from governments and scholars [2]. At thisstage, GNSS-based outdoor high-precision positioning tech-nology is developing rapidly, and indoor positioning technol-ogies such as UWB and INS have also made great progress,which can meet the positioning needs in most scenarios [3].However, at this stage, it is still impossible to solve the prob-lem of seamless indoor and outdoor positioning by a singlesensor, and the method based on multisensor fusion hasbecome the mainstream technical route [4]. Especially inthe face of urban fires, earthquakes, and other emergency res-cue, there are fires, smoke, noisy sound, power interruption,house collapse, road obstruction, rescue under the rubble,and other complex situations, intensifying the difficulty andchallenge of rapid and high-precision positioning of indoorand outdoor rescue personnel and vehicles [5].

However, in indoor environments, GNSS signals arehighly attenuated to the extent that they cannot meet theneeds of indoor navigation and positioning. In indoor SLAM,the sensors used are generally divided into two categories:environment-aware sensors and motion-aware sensors.Environment-aware sensors generally include laser rangefinders, vision sensors, sonar, and optical flow sensors, whichare mainly used to achieve localization based on the charac-teristics of the surrounding environment [6]. Motion-awaresensors include INS and odometer, which can achieve auton-omous navigation without relying on the external environ-ment. LiDAR is widely used in indoor SLAM for itsadvantages such as high localization accuracy and high posi-tioning accuracy, unlike vision sensors which are affected byambient light and other factors and are more stable, butbecause LiDAR matching is highly dependent on the featuresof the surrounding environment, it performs poorly in envi-ronments with low features, such as promenades. The advan-tage of GPS positioning is convenience and low cost becauseit is done through cell phones. Theoretically, the location ofthe cell phone can be determined by calculating the signaldifference of three base stations, and the terminal held bythe user only needs a cell phone. Therefore, if the user’s cellphone has a signal, location positioning can be performedat any time without being affected by weather, tall buildings,location, etc. While inertial navigation can achieve autono-mous navigation without relying on external environmentalinformation, the sensor errors contained in it will graduallyaccumulate with the long working time, and the errors willfail after accumulating to a certain degree [7]. Therefore,the effective combination of the two sensors to complementeach other’s shortcomings and to create highly accurateindoor maps with high efficiency has become an importantresearch direction for scholars at home and abroad.

To solve the existing problems, the GNSS/SLAM com-bined high-precision indoor dynamic target positioningmethod is proposed. By preprocessing the indoor dynamicobservation data, detecting the weekly jump values of differ-

ent frequencies, repairing both large and small weekly jumps,and obtaining high-quality data and accurate positioning,GNSS/SLAM combined navigation has the advantages ofhigh positioning accuracy and stability. The combinedGNSS/SLAM navigation has the advantages of high position-ing accuracy and stability and has a broad application pros-pect for the development of indoor positioning and wirelesscommunication technology. The original postprocessingalgorithm is improved, and a near real-time improvementalgorithm is proposed, which greatly reduces the outputdelay of the navigation results without reducing the mappingaccuracy, and paves the way for transplanting the algorithmto the embedded platform, which can make real-time map-ping of the drawn map in the computer while collectinginformation in the surrounding environment while walkingon the mobile machine platform in real-time. As can be seen,sensor map-based positioning methods require preestab-lished maps, so in addition to the study of positioning tech-nology, map-building technology is also an importantresearch direction. The methods of map building can be gen-erally divided into two categories: mapping andmap buildingand simultaneous positioning and map building, in whichthe former process of map building requires other high-precision positioning systems to provide positioning infor-mation, while the latter estimates the location and the mapof the surrounding environment simultaneously throughthe information provided by the sensors themselves. Themapping approach is more costly and difficult to use in GNSSsignal occlusion areas, while the SLAM mapping approach ismore flexible but less accurate in comparison.

2. Current Status of Research

The visual SLAM process consists of two parts: the front-endand the back-end. In the front-end part, the feature pointmethod or the direct method is used to obtain data associa-tions of visual information from between consecutive orinterrupted frames [8]. In contrast, the direct method doesnot extract feature descriptors and directly uses pixel gray-scale values to obtain image associations. Compared withthe feature point method, the direct method can estimatedenser maps and is more robust to scenes with insignificanttextures, but the direct method is more sensitive to illumina-tion changes and has difficulty handling closed-loop detec-tion [9]. Among them, the feature point method is themore commonly used method, while the direct method hasonly been proposed in recent years [10]. In the back-end part,SLAM methods usually transform the whole problem into amaximizing a posteriori probability estimation (MAP) prob-lem, using the data association provided by the front-end toestimate both the positional and surrounding environmentmaps [11]. The back-end estimation methods are generallydivided into two types: filtering and optimization. The filter-ing approach is mainly based on the extended Kalman filter(EKF), while the optimization approach iteratively solvesthe problem by constructing a nonlinear least-squares prob-lem using the Gauss-Newton method or the Levenberg-Marquardt method. The filtering method is more time-effi-cient, but its accuracy is poor, and it is difficult to deal with

2 Wireless Communications and Mobile Computing

closed-loop problems, while the optimization method ismore accurate and can deal with closed-loop detection andoptimization problems, with the disadvantage that it is lesstime-efficient and in rare cases may fall into the local mini-mum problem and affect the robustness of the estimation[12]. Due to the improvement of hardware computing powerand research on matrix sparsity in recent years, optimization-based back-end methods can now run in real-time on com-mon computing platforms, so optimization methods are nowmore frequently used in visual SLAM research.

Soatti et al. uses SURF as a feature descriptor to solve theclosed-loop detection problem by training a bag-of-wordsmodel (BOM) of the feature descriptor, and this method iscalled FAB-MAP [13]. Where closed-loop detection specifi-cally refers to determining whether the location of a frameappears in an established map, thus establishing a linkbetween this frame and the map, and is used to optimizethe map that improves the drift problem of the positionalestimation. Later, Wen et al. used binary descriptors suchas Brief or ORB to build bag-of-words models so thatclosed-loop detection can be done in a shorter time to meetthe needs of real-time applications, and this method is calledDBoW [14]. Hoang et al. proposed a visual SLAM methodbased on the direct method and the LSD-SLAM, whichreplaces the minimized reprojection error commonly usedin the feature point method with minimized photometricerror, proves that the direct method can also obtain the sameperformance as the feature point method and can builddenser maps, and is a representative method of direct methodvisual SLAM [15]. Singh et al. introduce a tightly coupledEKF-based VIO method, the MSCKF, which uses a filteringapproach to fuse visual and inertial guidance information,extending the back-end estimation to multiple frames ofinformation in a sliding window instead of the then commonoptimization of two adjacent frames [16]. Based on this,MSCKF proposes a linearization point in the fixed Jacobimatrix computation to avoid the observable direction anom-aly problem and introduces an online estimation method forthe external parameters of the camera-IMU system [17].

Aiming at the problem of missing absolute localizationreference for visual SLAM, visual odometry, and other visualindoor localization technologies, we carry out the research ofa single-image spatial-visual localization model based oncoded graphics assistance. Firstly, the algorithm model andtechnical process of the coded image, automatic detectionand recognition of coded image, and automatic extractionof coded image and object coordinates are studied; secondly,the coded image assisted single image space rear rendezvouspositioning model is constructed, and Tukey antidifferenceweighting factor is introduced to weaken or suppress theinfluence of the residuals of coded image coordinate observa-tion on the positioning results and achieve high accuracypositioning.

3. GNSS/SLAM Combined High-PrecisionIndoor Dynamic Target Positioning Analysis

3.1. GNSS/SLAM Combination Design. GNSS dynamic posi-tioning in urban environments is limited by the observation

environment, with high signal interference noise, seriousmultipath effects, and frequent signal interruptions by occlu-sion, which seriously affects positioning accuracy and conti-nuity [18]. Currently, in GNSS positioning, the GNSScarrier-phase double-difference positioning technique canbe used to remove highly correlated error sources in spaceand time, such as ephemeris errors, to improve the position-ing accuracy. Usually, the carrier-phase double-differenceobservation model can be described as:

∇Δp = ∇Δp0−∇ΔT−∇ΔI−∇ΔM+∇Δεp,λ∇Δϕ = ∇Δϕ0−∇ΔT−∇ΔI − 1 + λð Þ∇ΔN+∇Δm+∇Δεϕ,

(

ð1Þ

where the ∇Δ symbol refers to double differentiation. p0and ϕ0 denote pseudo-range and carrier phase observations,the geometric distance between the receiver and the satelliteis p0, T denotes tropospheric delay, I denotes ionosphericdelay, λ denotes carrier phase wavelength, M denotespseudo-range multipath error, m denotes carrier phase mul-tipath error, and εp and εϕ denote pseudo-range noise andcarrier phase noise. For GNSS RTK with GPS and BDS data,the equation for estimating the unknown parameters can bewritten as:

HG AG

HC AC

" #εp

∇ΔNC

" #=

HGεp + AG∇ΔNC

HCεp + AC∇ΔNC

24

35 =

∇ΔLG

∇ΔLC

" #,

ð2Þ

where the superscripts G and C denote GPS and BDS,respectively; H is a geometric matrix containing satellitegeometry information; εp denotes the baseline incrementvector; A is the ambiguity coefficient matrix, and ∇ΔL is theobservation minus the computation vector. Inertial devicesmainly include three-axis accelerometers and three-axisgyroscopes. Limited by technical methods and process levels,inertial devices inevitably contain a variety of systematic andrandom errors, and at the same time, the influence of envi-ronmental factors makes these errors more complicated.Usually, inertial devices have been factory calibrated in aseries of laboratory procedures; most of the system errorsare compensated; the residual part needs to be estimatedonline through modeling; in addition, inertial devices alsocontain a variety of power-law spectrum of random noise,whether system errors or random errors will eventually bereflected by the accelerometer and gyroscope observations;this subsection focuses on the vehicle tactical grade. This sub-section focuses on the analysis of SPAN-FSAS, an inertialguide, and discusses the effects of the selection of the system-atic error term and the setting of the random error parameteron the combined GNSS/SINS results.

Before introducing the method of visual map building, itis necessary to introduce the form of the composition of thevisual map used in this paper. Whether the map is built withhigh precision GNSS or SLAM, the form of the map is thesame, which ensures the consistency of the positioning

3Wireless Communications and Mobile Computing

map.When positioning, there is no need to consider what thebuilding method is, and the same information for positioningis obtained [19]. The map data in this paper is organized inkeyframes, which are some of the frames selected with certainrules among all the image frames in the construction map.The main rule is that there is a certain distance intervalbetween adjacent keyframes, to avoid redundancy of data.And each keyframe contains the information extracted fromthe sensor information, which is enough for positioning, andthe complete map is composed of this keyframe, as shown inFigure 1.

It should be noted that the vision sensor in this paper is amonocular camera, and as analyzed in the previous paper,the visual feature map for localization needs the 3D positioninformation of the features, so the focus of the high-precisionGNSS-based visual map building method is the accurate esti-mation of the 3D position of the features. The simplest 3Dposition estimation method is to use the matching informationof the features and the localization information for triangula-tion. Suppose that for the same feature point P, the normalizedcoordinates on the two frames obtained by feature detectionand matching are x1 and x2, and the two frames obtained bythe positioning information are ½R1, p1� and ½R2, p2�. Thedepths of the high-precision pose GNSS are s1 and s2, whichare also the depths of the feature point P in the two frames.There is the following constraint relation.

s1x1 =s2Rx2 + ts2Rx2 − t

: ð3Þ

In determining whether the current frame is a keyframeand added to the map, the most important judgment criterionis the average parallax between the feature matching pairs of

the current frame and the previous keyframe, i.e., the distancebetween the positions of the matching point pairs under theimage coordinates. This is because when the parallax is rela-tively small, it can lead to a large error in the depth calculatedby triangulation due to the camera’s measurement unit of onepixel, which has a considerable impact on the matching locali-zation. In addition, the matching accuracy of different featurepoint pairs may be different, and some of them are even mis-matched, and the depths obtained by direct triangulation willbecome unreliable.

For these reasons, it is not reasonable to simply performtriangulation, so in this paper, after triangulation of the fea-ture points, a sliding window optimization is added to esti-mate the position of each feature point when building amap based on high accuracy GNSS. In this optimizationproblem, the only state quantity to be estimated is the depthof all feature points within the window, as in equation (4).

X = λ0, λ1,⋯, λm½ �: ð4Þ

And the optimization problem takes the form of

max liml,k⟶∞

〠l,kð Þ∈C

erA z

ckl ,Xð Þk k2

pckl

( ): ð5Þ

Since the window includes multiple keyframes with a cer-tain distance interval, the features within the window gener-ally have enough parallax, and the estimation of featuredepth is more reliable after optimization. After the optimiza-tion of the feature depth, the reprojection error of each fea-ture is also recalculated, and if the error is larger than apresent threshold, the feature can be considered as a wrongmatch, and the depth is invalid. When the keyframe with

Indoor dynamic target

Input frame

Global optimizationmodule

Keypoint detection

Loop detection

Pose-graph optimization

Data logging module

Data processing module

Geo-referencingsignals

Estimatea camera pose of every

frame

Offline operation

3D-map reconstruction

Mobile mappingplatform

Sensors

Matching with localmap

Feature matching

Local map

Tracking module Triangulation of 3Dpoints

Create 3D points and optimize a map near the currentKF

Simultaneous localization and mapping Global navigation satellite system

Position data

Figure 1: Combined GNSS/SLAM framework.


the earliest timestamp in the window is removed, that is,when the feature depth information on the image is no longerchanged by the optimization, it is ready to be joined to themap. The keyframe information described in the previoussubsection needs to be computed before joining the map; thatis, the feature descriptor is computed based on the position ofthe feature point on the original image, and then, the bag-of-words vector for this keyframe is computed based on the fea-ture descriptor. After completing the computation of thisinformation, this keyframe is added to the mapincrementally.

Closed-loop detection is a common concept in SLAM,which is essentially a method of information retrieval. Themain purpose is to determine whether the current locationhas already appeared on the map, and when the informationof the current location is found to be like some of the infor-mation in the map, it is considered that a closed-loop isfound, and the map within the closed-loop can be optimizedby establishing constraints on the matching relationshipbetween the closed-loop frames. And in the localization pro-cess, the first step of global localization can be considered asthe closed-loop detection process, which is to find similarlocations in the map. Specifically, for the visual feature mapin this paper, it is to find keyframes that are like the sceneof the current image frame in the map composed of key-frames, and these keyframes will be used as candidate framesfor feature matching with the current frame to get the globallocalization information. This is equivalent to the coarselocalization process before the fine localization, which cangreatly reduce the time complexity of global localization[20]. This is because the current frame does not need to fea-ture match with every frame in the map but only needs tomatch with the map frames that are determined to be similar.The main solution for closed-loop detection in visual SLAMis to use the bag-of-words model. Bag-of-words (BOW)model is originally a document representation method inthe field of information retrieval. In determining the similarityof two documents, the order of occurrence of words in a docu-ment is not considered, but only the frequency of occurrence ofdifferent words, i.e., two documents are considered similar when

the frequency of occurrence of each word is similar. And theanalogy of this concept to image features is to divide image fea-tures into a certain number of categories and count the fre-quency of occurrence of different categories in each image.

As shown in Figure 2, for an image, the histogram statis-tics of categories can be performed, and the images with sim-ilar histogram distribution have higher similarity, which isthe information of bag-of-words vectors in keyframes men-tioned before, and the distance between the bag-of-wordsvectors of different images can be considered as the similarityof images.

Our algorithm uses multiple fusion algorithms, whichresults in more accurate results and more efficient localiza-tion. And the classes of these features are called dictionaries,which are used to query which class the features belong tobase on the descriptors of the features. The dictionary of fea-ture descriptors in visual SLAM is obtained by clustering thefeatures on many images, which has the advantage that it cancope with a wide variety of scenes and can perform closed-loop detection in unknown scenes. In the localization sce-nario of this paper, on the other hand, the map is well estab-lished, so the scene is known, and a dictionary specifically forthis map can be generated by clustering all the feature pointsappearing on the map. Feature matching refers to determin-ing the correspondence between feature points of differentimages and determining which points are the same in the3D space. The similarity between feature points is generallymeasured by feature descriptors, which are usually expressedin terms of Euclidean distance for floating-point descriptorslike SIFT and Hamming distance for binary descriptors likeBRIEF. The Hamming distance refers to the number ofvalues that differ between two binary numbers with the samenumber of bits, and the larger the Hamming distance, thesmaller the similarity.

d x, yð Þ = Sup 〠n

i=1x i½ � ⊗ y i½ �

( ): ð6Þ

With the similarity measure, what needs to be done is to

Indoor exercise goals Features

Feature histogram representation

00

90

80

70

60

50

40

30

20

10 10

FPS fx fy Resolution Brightness

92.74

55.4

80

48

0

Figure 2: Schematic diagram of the bag-of-words model.


find which feature points are like each other. One of the sim-plest ideas is Brute-Force Matcher, which is the idea of calcu-lating the distance between each feature point and all itspossible matches and choosing the closest one as the match-ing point. This method is very simple, but it works quite wellwhen the number of features is small. The number of featuresin a single keyframe in the map is generally small between100 and 200, so using violent matching is a more efficientapproach. Based on the sliding window state described above,the form of the optimization problem during fusion localiza-tion is somewhat altered from that of local positional estima-tion, specifically by adding reprojection error term for thefeatures observed in the map frames.

max liml,k⟶∞

e

〠l,kð Þ∈C

rA zckl ,Xð Þk k2

pckl

+ liml,β⟶∞

e

〠l,βð Þ∈C

rA zcβl ,X

� �� 2

pckl

8><>:

9>=>;,

X = x0, x1,⋯, xm, λ0, λ1,⋯, λm½ �,xk = pwbk , v

wbk, qwbk , bk, bg

h i,

xbc = pcb, qcb½ �:ð7Þ

The objective of this optimization problem is to simulta-neously optimize the poses of the image frames in the win-dow and the relative poses derived from PnP in theprevious subsection. The final optimized relative poses willbe based on the poses of the map frames to obtain the posesof the current frame in the map coordinate system, i.e., theglobal positioning result. When no closed-loop detection

occurs, i.e., when no map frame is observed, the whole prob-lem turns back to the relative pose estimation problem andcontinues to be derived based on the previous global posi-tioning result. This is the fusion localization method oftightly coupled optimization, which simultaneously performsthe estimation of relative positional and global localization,and finally fuses the two, improving the disadvantages ofpoor robustness and low frequency of global localizationinformation based on feature point matching, giving globallocalization results with higher accuracy and frequency.

3.2. Indoor Dynamic Target Localization Analysis. SLAM is asynchronous positioning and mapping technology, andGNSS is a superior navigation technology, which is com-monly used in various fields [21–23]. In some scenarios,GNSS signals cannot be received due to the complex propa-gation environment and interference in the propagationchannel, and its navigation function fails, so GNSS/SLAMcombination can obtain high-precision environmental fea-tures and provide accurate positioning information. In theGNSS SLAM combination algorithm, for the attitude angleof the LIDAR coordinate system in the reference coordinatesystem, the position vector of the sensor and all the observedlandmark points are the system state vector at k moments[24–26]. Location recognition is a key element of mobilerobotics. In practical usage scenarios, we usually need to esti-mate the current robot’s location information without any apriori information. This operation is also referred to as globallocalization. In this paper, the data in a 3D spatial coordinatesystem is first divided into two spheres: an outer sphere andan inner sphere. In this case, the radius of the outer sphereand the radius of the inner sphere can be measured. Based

1.982.27

0.0 0.5 1.0 1.5 2.0 2.5 3.00

2

4

6

8

10

12

14

16

13.58

10.72

2.78

11.95

6.74

14.66

13.84

2.23

3.05

14.9514.45

4.05

2.12

13.79

2.66

13.9714.46

9.48

12.26

4.66

9.1

1.86

4.94

7.56

13.1

5.515.14

9.83

8.24

9.81

Num

bers

GPS time (hour)

PDOPNSAT

4.824.21

3.64

2.65

4.814.25

3.242.63

4.54

3.77

1.62

0.81

3.58

1.57

4.63

1.251.9

3.4

1.59

3.45

1.59 1.32

3.47

4.79

2.51

4.73

1.45

4.17

1.962.49

Figure 3: PDOP values and number of available satellites.


on the principle of quadrant distribution of the 3D spatial coor-dinate system, it can be divided into 8 subspaces. Eventually, 16subspaces can be obtained. The most essential differencebetween the tight and loose combination is whether SINS pro-vides position a priori information for the GNSS solution. Theposition information is the link between SINS and GNSS, andthe two sensors share the position to achieve the fusion of obser-vation information. Based on this argument, we design a com-bination method with a semicompact structure.

SINS as the main filter distributes the position a prioriinformation for all sensors, and each subfilter decides by itselfwhether to use the position a priori information through thejudgment of the pretesters. This distributed architecture isessentially a semitight combination, where all sensor infor-mation is handshake confirmed and fused in the positiondomain, while each sensor’s internal solution remains inde-pendent, leaving only the receiving and sending interfacefor position information. The advantage of the semicompactcombination is that it can robustly fuse data from multiplesensors to obtain a better result; here, only GNSS and SINSsensor fusion is used as an example. The SPAN-FSAS devicefrom Novatel was used to collect a single GPS vehicle data ina suburban area with a duration of about 2.8 hours. To ana-lyze the effect of satellite signal interruption on the semitightcombination, we simulated three 60 s segments of partial sat-ellite information loss lock starting from 0.75 hours, and onlythe first 4 satellites were retained in the first segment accord-ing to the satellite serial number size, and only the first 3 sat-ellites were retained in the second segment. According to thesize of satellite serial number, only the first 4 satellites areretained in the first segment, only the first 3 satellites areretained in the second segment, and only the first 2 satellitesare retained in the third segment, as shown in Figure 3 in theSTC results labeled with the three segments of loss of lock.

From the error analysis of the positioning experiment, wecan see that the lateral and longitudinal errors in most areasare within 20 cm, and the root mean square error of headingangle estimation is 0.420°, which can meet the needs of intel-ligent vehicle applications in most cases. The maximum lat-eral error of 84 cm was found in the open intersection area,where the road features on both sides are less constrained,so the lateral error was larger, while the maximum longitudi-nal error was found in the straight-line area due to the highrepetition of the feature points on the straight-line road,

Initialization Visual mapping based on highprecision GNSS

Point tracks

Spatio-temporal graph

Satellite signal occlusion

Motion saliency

Video bundles

Opticalflow

Local map alignment and optimization

Ending signal occlusion

Edge definition

Foreground point tracks

Normalize cut

SLAM-based visual mapping

Yes

Yes

Yes

No

No

No

Figure 4: Flow chart of visual map creation.

0

5

10

15

20

25

0

30

6090

120

150

180

210

240270

300

330

0

5

10

15

20

25

Base station 1Base station 2

Base station 3Base station 4

Base

stat

ion

1 (m

)

Figure 5: Distance measurement data.


which may cause some error in the estimation of the longitu-dinal positioning. Overall, the localization results are reason-able considering that most of the maps used were created bythe VI SLAM approach. When using the maps created by thehybrid approach for localization, it not only shows the per-formance of the positioning system but also is affected bythe accuracy of the maps. Therefore, to further compareand verify the effectiveness of the localization method andthe map building method, additional experiments were con-ducted to localize the maps entirely using high precisionGNSS building, as shown in Figure 4.

Local map and map alignment and optimization and thedetailed methods for these two parts are described below. Itshould be noted that due to the lack of robust and reliable cri-teria for judging whether the satellite signal is obscured, thejudgment here is performed by hand, which is easy to imple-ment and reasonable since the map building process is an off-line process. After having a more accurate estimate of thepositional pose and feature depth, all that must be done isthe creation of the local map [27, 28]. It is easy to find thatthis part is the same as the building steps in the method basedon high-precision GNSS mapping; that is, the keyframeinformation that can be used for localization is calculatedfrom the positional pose and feature position, including thecalculation of feature descriptors and the calculation of bag-of-words vectors, which will not be described repeatedlyhere.

In terms of system implementation, the map buildingmethod in this paper is an extension of local bit pose estima-tion, so the underlying programming platform is the same asthe open-source tool, and the time synchronization of GNSSinformation with other sensor information is achievedthrough software synchronization, and the software time-stamp of GNSS information is accurate enough in this systemdue to the small amount of data, low transmission delay, andhigh frequency, and the additional visual image closed-loopused in the experiment detection is implemented using theopen-source library DBOW, and the map data is saved usingthe serialization function of the Boost library. Although

the map building process is an offline process after dataacquisition, the methods used in this paper are real-time,so all experiments are run in real-time on an ordinary lap-top CPU.

4. Analysis of Results

4.1. System Performance Analysis. Figure 5 shows that theUWB ranging data in the rectangular line is partiallyobscured by the columns and appears discontinuous. Its cor-responding Δri is shown in Figure 5. The bars of differentcolors indicate the ranging residual values of the correspond-ing base stations, respectively. As can be seen from Figure 5,the range values are relatively unstable at several peaks,which is because the range error of the UWB increases rela-tively as the distance increases. This is because the rangeerror of UWB increases as the distance increases. The corre-sponding range residuals also increase in Figure 5. Theweight of the range residuals is analyzed to dynamicallydetermine the weight of the covariance of the observationsat different base stations. From Figure 5, the average rangeresiduals of all four base stations are less than 0.1m, whichindicates that the overall quality of the range values of path1 is high. The smallest residual value of 0 indicates that theranging value is obscured at a certain moment, and the resid-ual calculation is not done at this time. Continuous wide cov-erage scenario is the most basic coverage method for mobilecommunication, aiming at ensuring users’ mobility and ser-vice continuity and providing users with seamless high-speed service experience.

Another advantage of the mapping method in this paperis that it does not require route closure in the actual sense tooptimize the map. Visual closure is done only once in thewhole route, so the accuracy of mapping using visual closureis experimentally lower than that of GNSS closed-loop map-ping. Theoretically, both visual and GNSS closed-loop pro-vide the same information, and if the number of closed-loops is the same, the result should be similar. The purposeof this paper is to deal with the visual mapping of the

10

92.74

55.4

80

48

FPS fx fy Resolution Brightness0

10

20

30

40

50

60

70

80

90

100

Performance indicators

Indi

cato

r par

amet

ers

Figure 6: Performance parameters.


occluded and unoccluded areas, and the use of visual closed-loop is quite limited. The comparison experiments here illus-trate the flexibility of this paper and the effectiveness of theframework.

The experimental process around the figure of the con-ference table walking in a circle for this camera in the direc-tion of different will automatically adjust the brightness,resulting in the video stream of light and dark changes; seri-ous areas will be too black and too white, basically cannot seethe textural information in the figure. Figure 6 is a sparse tex-ture of the experimental environment; you can see that thecorridor walls for the brochure, the ground, and the top plateare relatively single tone. In addition, during the experiment,the camera is facing the direction of the corridor, and the walltilt intersection increasing the texture is not clear. The exper-imental equipment used is a calibrated camera, as well as aUWB base station and tagging equipment, as shown inFigure 6. The UWB equipment details are consistent withSection 4 of this paper. The camera is a CCD (charge-coupleddevice) imaging device, supporting up to 1080P resolutionvideo stream acquisition, which can be directly inputted tothe computer through USB, and the development of the cor-responding interface software to achieve real-time extractionof video streams and keyframes.

During the experiment, to see a larger range of objects,the camera depth of precision is set to infinity. In addition,because the camera is an industrial camera, the camera willautomatically adjust the brightness after sensing light. Sincea fixed light-sensitive parameter cannot be set, it results in aprocess of obvious light, and dark changes in the acquiredvideo stream when the light changes. As a streaming technol-ogy, the vision sensor not only can obtain rich texture infor-mation but also can transmit it in real-time. If it is applied tolocalization methods, it is ideal for indoor localization as itcan not only obtain location information but also understandthe surrounding real-world scenery. However, the currentvision-based indoor localization methods fail in localizationdue to sparse textures, too bright or too dark light, and scaleambiguity in monocular ORB-SLAM. To address these prob-lems, this paper proposes a combined UWB/vision-basedEKF indoor localization model, which achieves indoor local-ization accuracy of 0.2m and solves the problems of visualscale ambiguity and localization failure caused by texturesparsity and light brightness variation.

4.2. Dynamic Target Localization Results. The actual roadconditions are highly variable, and only five types of featurematching under typical conditions are given above. It can

E

D

CType Numbers

B

A 1

2

3

4

5

100

200

300

400

500

600

Total number of feature pointsNumber of incorrect matchesNumber of correct matches

PnP positioning accuracyOptimized positioning accuracy

Valu

esFigure 7: Matching cases and positioning accuracy in five cases.


be seen that the SIFT operator extracts a lot of feature points,which are distributed in the corner points on the figure andthe areas with obvious changes, and the actual number of fea-ture points that can be matched is relatively much less, butthey can also be distributed in different areas of the figureso that the geometric configuration strength of the rear ren-dezvous can be ensured, and the geometric configurationstrength through the RANSAC algorithm and antidifferenceoptimization estimation excludes still more error matchingpoints, mainly some road sign points with high feature simi-larity. The target detection and semantic segmentation arenot used when processing the image in real-time here, mainlyconsidering that the difference between vehicle and pedes-trian feature descriptors and environmental feature descrip-tors is relatively large and does not easily cause mutualinterference, while the vehicle and pedestrian have beenremoved when building the map, so the matched featurepoints in the scene do not fall on the vehicle and pedestrian,but basically on the road, house, and tree. It is better to matchthe correct point distribution to get a good bit pose solvingresult.

Figure 7 gives the matching situation and localizationaccuracy under five road conditions, and we can see thatthe feature matching rate is only about 10.37%, and most ofthe extracted feature points cannot find the correspondingroad marker points in the visual point cloud map, althoughthe percentage is relatively small, the number of its matchingfeature points is sufficient, and the ratio of final matchingerror to correct matching is about 1 : 4.5. Using localizationaccuracy can also be around cm, and C and D are still slightlybetter than the optimized localization accuracy, but the opti-mized results are more stable and have higher accuracy, asshown in Figure 8. It should be noted that the vertical coor-dinates of the positional results and the positional results ofantidifference optimization in the figure are differentbetween the two. In addition, the left and front components

of the position are the planes directly under the system, andthe lower component is the elevation direction, where thepositional positioned under the camera coordinate system,so the heading angle is the roll angle under the ENU system,and the roll angle is the heading angle, and the pitch angleremains the same.

The above conclusion is only for data, and the same set ofdata is used for map building and positioning. In practice,map building and localization are often separated; for exam-ple, if the map built last month is used for localization at thismoment, the point cloud map may not be able to accuratelyrepresent the actual road information at this moment dueto the change of environmental factors. Whether the featuredescriptors can still be matched needs to be studied in onestep. With the promotion of crowdsourcing technology inthe future, we can potentially achieve incremental real-timeupdates of the visual point cloud map and add more robustmultidimensional feature information and semantic infor-mation, which can improve the success rate of localizationin real scenarios. However, the average time required is 158ms and 146ms, and the maximum time required is 693msand 541ms, accounting for the major part of the time con-sumed. On this basis, the position loosening combinationwith SINS can achieve continuous high sampling rate locali-zation. Our positioning results are about 20% more efficientand a bit more accurate compared to others.

5. Conclusion

Motion target detection and localization in the dynamicbackground, the current method cannot be repaired withhigh precision for different frequency regions in circumfer-ential jump detection and repair, and it is difficult to obtainhigh-quality data, resulting in large position and attitudeerrors and unstable speed. That is, the accuracy of indoordynamic target positioning is low. In this problem, a

0 50 100 150 200 250 300 350 400 450 500

–6

–4

–2

0

2

4

6

Time (s)

Erro

r (m

)

Left

DownForward

Figure 8: Position accuracy comparison.


GNSS/SLAM combination-based high-precision indoordynamic target positioning method is proposed to preprocessthe indoor dynamic observation data, detect the circumfer-ential jump occurring on each frequency, repair it with highprecision, obtain high-quality data, and accurately positionthe indoor dynamic target, which effectively solves the prob-lems existing in the current method. This method effectivelysolves the problems in current methods and provides thebasis for the realization of high precision indoor dynamic tar-get positioning. In the position error test, the error of method1 is closer to 0, which proves that the positioning of thismethod is closer to the dynamic target, and the position ismore accurate. The position error of method 2 and method3 is larger due to the large fluctuation of latitude error, longi-tude error, and altitude error, which means that the position-ing accuracy is lower. Because method 1 detects and fixes theobserved data for circumferential hops, regardless of whetherit is for large or small circumferential hops, it can detect thecircumferential hops occurring at each frequency and canfix them with certain accuracy to obtain high-quality data,which makes the position error closer to 0, i.e., it accuratelyachieves indoor target positioning.

Data Availability

The data used to support the findings of this study are avail-able from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competingfinancial interests or personal relationships that could haveappeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Foundation No. 61231006, Founda-tion No. 62071080, and Foundation No. 61501078).

References

[1] Z. Gong, R. Ying, F. Wen, J. Qian, and P. Liu, “Tightly coupledintegration of GNSS and vision SLAM using 10-DoF optimiza-tion on manifold,” IEEE Sensors Journal, vol. 19, no. 24,pp. 12105–12117, 2019.

[2] K. W. Chiang, G. J. Tsai, H. J. Chu, and N. el-Sheimy, “Perfor-mance enhancement of INS/GNSS/refreshed-SLAM integra-tion for acceptable lane-level navigation accuracy,” IEEETransactions on Vehicular Technology, vol. 69, no. 3,pp. 2463–2476, 2020.

[3] T. Shamseldin, A. Manerikar, M. Elbahnasawy, and A. Habib,“SLAM-based pseudo-GNSS/INS localization system forindoor LiDAR mobile mapping systems,” in 2018 IEEE/IONPosition, Location and Navigation Symposium (PLANS),pp. 197–208, Monterey, CA, 2018.

[4] R. Jurevičius, V. Marcinkevičius, and J. Šeibokas, “RobustGNSS-denied localization for UAV using particle filter andvisual odometry,” Machine Vision and Applications, vol. 30,no. 7-8, pp. 1181–1190, 2019.

[5] W. Sun, J. Wang, F. Jin, and Y. Yang, “A quality improvementmethod for 3D laser slam point clouds based on geometricprimitives of the scan scene,” International Journal of RemoteSensing, vol. 42, no. 1, pp. 378–388, 2021.

[6] G. Zhang and L. T. Hsu, “A new path planning algorithm usinga GNSS localization error map for UAVs in an urban area,”Journal of Intelligent & Robotic Systems, vol. 94, no. 1,pp. 219–235, 2019.

[7] G. Sammartano and A. Spanò, “Point clouds by SLAM-basedmobile mapping systems: accuracy and geometric content val-idation in multisensor survey and stand-alone acquisition,”Applied Geomatics, vol. 10, no. 4, pp. 317–339, 2018.

[8] D. Perea-Strom, A. Morell, J. Toledo, and L. Acosta, “GNSSintegration in the localization system of an autonomous vehi-cle based on particle weighting,” IEEE Sensors Journal, vol. 20,no. 6, pp. 3314–3323, 2019.

[9] W. Wen, G. Zhang, and L. T. Hsu, “Object-detection-aidedGNSS and its integration with Lidar in highly urbanizedareas,” IEEE Intelligent Transportation Systems Magazine,vol. 12, no. 3, pp. 53–69, 2020.

[10] T. Suzuki, “Time-relative RTK-GNSS: GNSS loop closure inpose graph optimization,” IEEE Robotics and Automation Let-ters, vol. 5, no. 3, pp. 4735–4742, 2020.

[11] T. M. Quack, M. Reiter, and D. Abel, “Digital map generationand localization for vehicles in urban intersections usingLiDAR and GNSS data,” IFAC-Papers OnLine, vol. 50, no. 1,pp. 251–257, 2017.

[12] Y. Li, Z. He, Z. Gao, Y. Zhuang, C. Shi, and N. el-Sheimy,“Toward robust crowdsourcing-based localization: a finger-printing accuracy indicator enhanced wireless/magnetic/iner-tial integration approach,” IEEE Internet of Things Journal,vol. 6, no. 2, pp. 3585–3600, 2019.

[13] G. Soatti, M. Nicoli, N. Garcia, B. Denis, R. Raulefs, andH. Wymeersch, “Implicit cooperative positioning in vehicularnetworks,” IEEE Transactions on Intelligent TransportationSystems, vol. 19, no. 12, pp. 3964–3980, 2018.

[14] C. Wen, Y. Dai, Y. Xia et al., “Toward efficient 3-D coloredmapping in GPS-/GNSS-denied environments,” IEEE Geosci-ence and Remote Sensing Letters, vol. 17, no. 1, pp. 147–151,2020.

[15] G. M. Hoang, B. Denis, J. Härri, and D. Slock, “Fusion bayesi-enne de donnees GNSS, ITS-G5 et IR-UWB pour des applica-tions robustes de localisation vehiculaire cooperative,”Comptes Rendus Physique, vol. 20, no. 3, pp. 218–227, 2019.

[16] S. K. Singh, S. Raval, and B. Banerjee, “A robust approach toidentify roof bolts in 3D point cloud data captured from amobile laser scanner,” International Journal of Mining Scienceand Technology, vol. 31, no. 2, pp. 303–312, 2021.

[17] M. Karimi, M. Oelsch, O. Stengel, E. Babaians, andE. Steinbach, “LoLa-SLAM: low-latency LiDAR SLAM usingcontinuous scan slicing,” IEEE Robotics and Automation Let-ters, vol. 6, no. 2, pp. 2248–2255, 2021.

[18] K. Qiu, T. Liu, and S. Shen, “Model-based global localizationfor aerial robots using edge alignment,” IEEE Robotics andAutomation Letters, vol. 2, no. 3, pp. 1256–1263, 2017.

[19] Q. Shi, S. Zhao, X. Cui, M. Lu, and M. Jia, “Anchor self-localization algorithm based on UWB ranging and inertialmeasurements,” Tsinghua Science and Technology, vol. 24,no. 6, pp. 728–737, 2019.

[20] R. Karlsson and F. Gustafsson, “The future of automotivelocalization algorithms: available, reliable, and scalable


localization: anywhere and anytime,” IEEE Signal ProcessingMagazine, vol. 34, no. 2, pp. 60–69, 2017.

[21] M. Chen, S. Lu, and Q. Liu, “Uniqueness of weak solutions to aKeller-Segel-Navier-Stokes system,” Applied Mathematics Let-ters, vol. 121, p. 107417, 2021.

[22] Y. Li, Z. Hu, Z. Li, M. A. Sotelo, and Y. Ma, “Multiscale sitematching for vision-only self-localization of intelligent vehi-cles,” IEEE Intelligent Transportation Systems Magazine,vol. 10, no. 3, pp. 1–183, 2018.

[23] T. H. Nguyen, T. M. Nguyen, and L. Xie, “Range-focusedfusion of camera-IMU-UWB for accurate and drift-reducedlocalization,” IEEE Robotics and Automation Letters, vol. 6,no. 2, pp. 1678–1685, 2021.

[24] S. Qi, Y. Zheng, X. Chen, and W. Wei, “Ants can carry cheese:secure and private RFID-enabled third-party distribution,”IEEE Transactions on Dependable and Secure Computing, p.1, 2020.

[25] A. Zielonka, A. Sikora, M.Wozniak, W.Wei, Q. Ke, and Z. Bai,“Intelligent Internet of Things system for smart home optimalconvection,” IEEE Transactions on Industrial Informatics,vol. 17, no. 6, pp. 4308–4317, 2021.

[26] M. Chen, K. Liu, J. Ma et al., “MoLoc: unsupervised fingerprintroaming for device-free indoor localization in a mobile shipenvironment,” IEEE Internet of Things Journal, vol. 7, no. 12,pp. 11851–11862, 2020.

[27] G. X. Liu, L. F. Shi, S. Chen, and Z. G.Wu, “Focusing matchinglocalization method based on indoor magnetic map,” IEEESensors Journal, vol. 20, no. 17, pp. 10012–10020, 2020.

[28] Y. Li and L. Wang, “A novel noise reduction technique forunderwater acoustic signals based on complete ensembleempirical mode decomposition with adaptive noise, minimummean square variance criterion and least mean square adaptivefilter,” Defence Technology, vol. 16, no. 3, pp. 543–554, 2020.


combined gnss/slam-based high accuracy indoor dynamic

Documents