introduction to geoprocessing conflation ......when using multi-source spatial data together common...
TRANSCRIPT
INTRODUCTION TO GEOPROCESSING CONFLATION TOOLS AND WORKFLOWS
Dan Lee and Silvia [email protected]; [email protected]
Agenda
What is Conflation?Geoprocessing Conflation ToolsConflation Workflows
Conceptual workflow strategy Demo: real world scenario
Conclusions and Future Work
What is Conflation?
補正
���Fusione
合并 CombinaçãoCombinación
Zusammenführung
Assemblage
Translated by Esri localization
Birleştirme
Объединение
When using multi-source spatial data together
Common obstacles in analysis and mapping: Spatial and attribute inconsistency caused by
differences in data collection and modeling High cost to fix the problems
Overlapping datasets
Adjacent datasets
Conflation reconciles multi-source datasets and optimizes data quality and usability
Conflation involves the process of: Identifying corresponding features, also known as feature matching (FM) Making spatial adjustment and attribute transfer Combining matched and unmatched features into one unified dataset with
the optimal accuracy, completeness, consistency, and integrity Resolving conflicts and connection issues among adjacent datasets, also
known as edge matching, to ensure seamless data coverage
Long-term benefits: No longer living with various imperfect datasets More confidence in reliable analysis and high quality mapping
What’s the way to get there?
GeoprocessingConflation Tools
Our initial focuses
Develop highly automated tools in Geoprocessing framework Starting with linear features (roads,
parcel lines, etc.) Aiming at high feature matching
accuracy (not promising 100%) Providing information to facilitate
post-processing
Build workflows
Have you used these tools in ArcMap?
In ArcGIS 10.4.1 and Pro 1.3
Conflation: Edgematchingtools and workflows12:30 – 1:15pm, today
Demo Theater 2 – Spatial Analysis, Hall B
Feature matching (FM) for overlapping datasetsBased on proximity, topology, pattern, and similarity analysis, as well as attributes information
1:1 and 1:m matches
m:1 and m:n matches
FM-based tool #1 - Detect Feature Changes (DFC)
DFC
Finding feature differences
Output CHANGE_TYPE Spatial change (S) Attribute change (A) Spatial & attribute change (SA) Spatial and line direction
change (S_LD) Spatial, attribute, and line
direction change (SA_LD) No change (NC) New update feature (N) To-be-deleted base feature (D)
FM-based tool #2 – Transfer Attributes (TA)
From source features to target features Transfer fields (e.g.
ROAD_NAME) Target features are
modified
TA
FM-based tool #3 – Generate Rubbersheet Links (GRL)
Generate Rubbersheet Links (GRL) From source features
to target features
Followed by RubbersheetFeatures (RF) Adjusting input features
GRL RF
Rubbersheeting moves source locations towards target locations based on established links
Conflation Workflows
Conceptual workflow strategy
Unification of overlapping datasets
A popular scenario and requirements: To unify the two datasets
into one with combined spatial and attribute information
Unification of overlapping datasets
Conceptual workflow strategy
setAContainsupdates
setBSpatiallyaccurate
setCBest of
both
(4)Transfer attributes
(2)Generate
Rubbersheetlinks
(5)Append
(1)Identify matched
& unmatched
(3)Rubber-sheeting
Matched Make acopy
Unmatchedfrom setA
This model reflects the conceptual workflow strategy.With simple and highly similar
inputs, the process can produce 100% accurate result.
Simple and highly similar input streets
Base features with spatial accuracy and attributes
Update features with new streets and attributes
Together
Results Attributes transferred
Changes detected
Rubbersheetinglinks generated
New features adjusted and added to base
Three components in conflation workflows
In same projection Data validation Selection of
relevant features
Conflation tools Workflow tools
Queued review Interactive editing
PreprocessingConflation and
evaluationPostprocessing
Supplemental tools and guidelines for downloadhttp://www.arcgis.com/home/item.html?id=36961cde1b074f1f944758f6abec87ccYou can also search by “conflation” at arcgis.com to find the download.
Improved toolbar and new workflow tools are coming soon.
Demo: Real world scenario
Unification of complex overlapping datasets
Data overview
Two road datasets (northwest of San Diego, CA): SAN – 1128 features OSM – 1206 features
Both datasets: Have common and
uncommon features and attributes
Are well preprocessed
Breakdown of Conceptual workflow into sub-workflows
QA #1
QA #2
QA #4
QA #3
Step1a
Step4
Step2
Step3
Step5
Step1b
Same goal and same strategy as the conceptual workflow
Let’s get started …
DFC result and potential match issues
QA #1 Matched flags
(CFM_GRP) Unmatched flags
(N and D in DFC output)
Demo of QA #1 …
Review potential match issuesTotal 13 CFM_GRP were flagged
6 were true match issues due to data complexity and dissimilarity7 were false alarm (ignorable)
Match issue due to data complexity
Match issueignorable
Review DFC result: CHANGE_TYPE N and D((CHANGE_TYPE = 'N') OR ( CHANGE_TYPE = 'D' )) AND (NEAR_DIST > 0)
Inspect records with high potential for errors: 107 reviewed 3 wrong Ns or Ds flagged
Wrong N
Wrong D
Matched groups: Total: 980 groups Correct: 974 groups Incorrect: 6 groupsAccuracy = 974/ 980 = 99.34%
Unmatched: Total: 317 (121 Ns + 196 Ds) Correct: 314 (120 Ns + 194 Ds) Incorrect: 3 (1 N + 2 Ds)Accuracy = 314 / 317= 99.05%
(biased by the total count)
Ready to join with inputs to tag Ns and Ds …
Feature matching accuracy estimates
Overall feature matching accuracy(average of matched and unmatched)
99.19%
Extract matched features for GRL and TA processes
SAM: 1008 non-N out of 1128OSM: 1012 non-D out of 1206
Ready for GRL process …
GRL resultGenerated total 19268 regular links and 4 identity links
QA #2 Intersecting links Long links Links of different
node types Flagged match
issue areas No link locations
Demo of QA #2 …
Intersecting links10 locations of intersecting links, total 16 links: 8 correct, 4 modified, 4 wrong
Long linksShape_Length >15, total 123 links: 96 correct, 7 modified, 20 wrong
Links at different vertex typesqaNotes = 'src_tgt_VxType_diff‘, total = 192 links, 155 correct, 23 recheck, 10 modified, 4 wrong
Locations of missing links33 areas – generated for 605 no link locations on matched source features
(TGT_FID > -1 ) AND ((FREQUENCY >1) OR ((FREQUENCY =1) AND (srcVxType = 0) AND (NEAR_DIST>2)))
59 links were added; other locations are not critical
QA regular links - summary
Total 326 (1.69%) of 19268 links were reviewed: 19 were modified 27 were to be removed 280 were correct, including 23
to be rechecked
33 no link areas were reviewed: 59 links were added Links at other locations were
not criticalReady for rubbersheeting …
Review selection criteria matters
19300 of 19327 regular links were selected by (REV_FLAG <> 'Wrong' OR REV_FLAG IS NULL) to use
Rubbersheeting result
QA #3 Links after RF DFC result after RF Flagged Links Flagged match
issue areas Flagged no link
areas
GRL and DFC result after rubbersheetingMany regular links became identify links; RF result is less different from target
How good is the rubbersheeting result?Three indicators showing spatial improvement
Less spatial differences Before RF After RFRegular links 19268 651Identity links 4 19125
Improved location alignment
Link-length distributions before/after RF(Not on the same scale due to the big difference in values)
QA #3 – Check rubbersheeting result
Ready to do TA …
Matched features and N features Ramps may need editing
Transfer attributesAttaching all flagged and reviewed information to target
Attribute transfer result
QA #4 Flagged unmatched
features (Ns and Ds) Flagged match
issue areas Multi-source m:n
transfers (srcM_inMN > 1)
1:2 match
3:2 match
Demo of QA #4 …
Check attribute transfer resultCHECK_TA = 'Recheck' OR REV_FLAG = 'wrongD' OR srcNearFID > -1 OR srcM_inMN >1
69 records were reviewed: 15 Unique_ID values were corrected
Retransfer via corrected Unique_ID
Almost there …
Select adjusted N features; append them to target( CHANGE_TYPE = 'N' AND REV_FLAG IS NULL ) OR( REV_FLAG = 'isN' )
N features in final result
Unification of overlapping datasets completed!
Processing TimeStep 1 (a, b) 1 min 45 sec
Step 2 1 min 47 sec
Step 3 1 min
Steps 4, 5 36 sec
Total 5 min 8 sec
Automated processing
Interactive processing
(not counting final review)
QA #1(CFM_GRP
and DN)
QA #2(links) QA #3
QA #4(attribute transfer)
TotalTime
(3-4 review counts per
minute)
Review Count(locations or
groups)120 359 x 75 554
~ 2-3 hrs.Edit Count
(shape or value) 78 x 15 93
Conclusionsand
Future Work
Thanks to:• Department of Public Works (DPW),
Los Angeles County, USA.
• Institut Cartogràfic i Geològic de Catalunya (ICGC), Barcelona, Spain.
• Kevin Hunt, New York State Department of Transportation, USA.
• Richard Fairhurst, Riverside County Transportation and Land Management) CA, USA RCTLMA,
• National Institute for Water and Atmospheric Research (NIWA) and Land Information New Zealand (LINZ) -Crown Copyright Reserved.
• Resource Management Service, LLC, Birmingham, AL, USA.
• All others who supported us along the way.
Conflation can be done more efficiently now
It takes a workflow:
Use the best practice in preprocessing.
Run automated tools to obtain highly accurate results and evaluation information.
Interactively review and edit the results. The time is worth-spending.
Three user stories …
User story 1: Enhancing county roads by spatially more accurate city roadsCounty centerline attributes and direction must be retained.
Data/information source: RCTLMA (Riverside County Transportation and Land Management) CA, USAAcknowledgement: Thanks to Richard Fairhurst, for providing the information and screenshots.
Updated_county_centerlinesOriginal_county_centerlines
Use DFC to find matching features and line direction differences For 1:1 matches, flip city centerlines of opposite direction (Flip Line) For m:n matches, merge/split city or county centerlines to get 1:1 matching segments, recalculate address
ranges for county roads as needed, and flip city centerlines of opposite direction (tools + scripts) Transfer city centerline geometry to county centerlines (script)
Updated_county_centerlinesTemecula_city_centerlines
Original_county_centerlinesTemecula_city_centerlines
~ 98%+ accuracy
User story 2: Combining electoral roads and topographic roadsThere is no “most accurate” dataset.
Information source: Land Information New Zealand (LINZ)Acknowledgement: Thanks to Douglas Kwan, LINZ, for providing the information.
Electoral roadsTopographic roads
~ 99% accuracy
~ 90% accuracy
Data/information source: NYSDOT, USAAcknowledgement: Thanks to Kevin Hunt, for giving us the opportunity to work with him and share his data.
User story 3: Transferring attributes from State routes to Street segmentsSegmentation for the datasets was different
State routes
Street Segments
State Routes and Street segments were split by end points to provide a more similar segmentation between the two datasets.
99.5% match rates
Consider conflation a higher priority
Study the tools and workflows; understand the results Start with small test areas
Customize the workflows for your organizations Improve data quality and usability Bring new live and value to your data
Work with broader communities Data sharing and collaboration Seamless analysis and mapping
Please send us your feedbacks
Our future workNew tools and enhancements
Align Features tool Better feature matching Better rubbersheet links
Integrated review and editing Plug-in to the Universal Error Inspector
(ArcGIS Pro future release) Interactive tools, including reference imagery
Formalization of workflows Common scenarios oriented Other feature types Contextual conflation (spatially related
features)Please send us your use
cases and requirements
Align Features
Recent papers
Baella B, Lee D, Lleopart A, Pla M (2014) ICGC MRDB for topographic data: first steps in the implementation, The 17th ICA Generalization Workshop, 2014, Vienna, Austria. http://generalisation.icaci.org/images/files/workshop/workshop2014/genemr2014_submission_8.pdf
Lee D (2015) Using Conflation for Keeping Data Harmonized and Up-to-date, to be presented at the ICA-ISPRS Workshop on Generalisation and Multiple Representation, 2015, Rio de Janeiro, Brazil
Lee D, Yang W, Ahmed N (2014) Conflation in Geoprocessing Framework - Case Studies, GEOProcessing, 2014, Barcelona, Spain. http://goo.gl/iOoSGV
Lee D, Yang W, Ahmed N (2015) Improving Cross-border Data Reliability Through Edgematching, to be presented at The 27th International Cartographic Conference, 2015, Rio de Janeiro, Brazil
Yang W, Lee D, and Ahmed N, “Pattern Based Feature Matching for Geospatial Data Conflation”, GEOProcessing, 2014, Barcelona, Spain. http://goo.gl/JKGJbo
Please take our SurveyYour feedback allows us to help maintain high standards and to help presenters
Find the specific session
Find ESRI UC in the Esri Events App
Scroll down to the bottom of the session
Answer survey questions and submit
Thank you for attending! Questions, comments …?