integrating unstructured data analysis into defense and ... · print your certificate of...

34
Integrating Unstructured Data Analysis into Defense and Intelligence Workflows James Jones Jeff Wilson Tim Murphy

Upload: others

Post on 26-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Integrating Unstructured Data Analysis into Defense and Intelligence WorkflowsJames JonesJeff WilsonTim Murphy

Page 2: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Every 2 days we create as much information as we did up to 2003

Eric Schmidt, 2010

Page 3: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

What does that look like?Every minute…

Twitter sees new 350,000 tweets

Facebook has 510,000 comments posted, 293,000 statuses updated

600 Wikipedia pages are edited

3.6 million Google searches are conducted

15.2 million Text Messages are sent

954,000 new Microsoft Office documents are created

144 million e-mails are sent

Page 4: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

What is Unstructured Data

• Does not have a recognizable structure or isloosely structured

• Can be in a variety of formats and storagemechanisms

- Word Documents

- Email

- Social Media Posts

- PowerPoint

- PDF

- Share drive

Page 5: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Problems in Integrating Unstructured Data

• Tone can vary wildly

• Not in traditional spatial format

• May or may not contain explicit locational information

• Locational information may take many forms

- Coordinates

- Place-names

- Address

Page 6: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

How to Integrate Unstructured Data into ArcGIS

CoordinatesCustom Locations

User defined keywords

LocationsPeople/Organizations

EventsDates

Relationships

ArcGIS Pro 2.3

Native EsriCapability

Third PartyIntegration

Natural Language Processing

What are you looking for?

What is the best tool?

How is it best used?• Data is at least somewhat understood• Data benefits from identifiable and

repeating patterns• Little to no programming experience

available/needed

• Data is not well understood• Data does not contain identifiable

and/or repeating patterns• Integration needed

Page 7: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Extracting Locations with ArcGIS

• LocateXT Extension for ArcGIS Desktop and Enterprise

• Available in ArcGIS Pro 2.3

• Also available for ArcMap

• Uses pattern matching (regular expressions, REGEX) to search for coordinates in a variety of formats

• Uses custom location list to match/extract other patterns (place names, codes, other terms)

• Also extracts from GPS-tagged photos (EXIF)

• Multiple ways to initiate location extraction

Page 8: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Extracting Locations in ArcGIS Pro

• New option added to the “Add Data” button

• Allows for a user to drag and dropdocuments or copied text into a window

• Can create a new feature class or append itto an existing one

Page 9: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Extracting Locations in ArcGIS Pro

• Two Geoprocessing Tools added

• Located in the Conversion Tools – To Geodatabase toolset

- Extract Locations from Document

- Extract Locations from Text

Page 10: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

James Jones

Extracting Locations from Text in ArcGIS Pro

Page 11: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Extracting Custom Attributes

• Ability to create custom attributes based on content within document or near a location

- Triggered by location extraction

• Based on keywords- Tag locations based on keywords

- Scrape/harvest portions of document based on keywords

• Ability to extract based off of:- Number of characters/words

- Number of lines/blank line

- Stop string

• Built in separate LocateXT desktop application (until Pro 2.4)

Page 12: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Tag extracted locations based on keyword found in source document

Extracting Custom Attributes

Page 13: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Extracting Custom Attributes

Tag extracted locations based on keyword found in proximity to location

Page 14: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Custom capture text based on keywords found in proximity to location Location trigger

Location trigger

Page 15: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Custom capture text based on keywords found in proximity to location

Page 16: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Building Custom Attributes and ETL data

Page 17: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

What is Natural Language Processing?

• Field of computer science and Artificial Intelligence since the 1950s• Machine learning algorithms for NLP introduced in the 1980s• Early focus was primarily on machine translation• Focused on four key areas:

• Syntax

• Semantics

• Discourse

• Speech

Page 18: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Main Fields of NLP

• Part of Speech Tagging*• Parsing• Word Segmentation• Terminology Extraction

• Automatic Summarization• Coreference resolution• Discourse Analysis

Syntax Discourse

Semantics• Machine Translation*• Named Entity Recognition*• Optical Character Recognition*• Relationship Extraction*• Sentiment Analysis*• Topic Segmentation• Text Similiarity

Speech• Speech Recognition• Speech Segmentation• Text-to-speech

Page 19: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

NLP Integration

• Numerous 3rd Party tools exist- Open Source

- Proprietary / As A Service

• Identify and extract named entities

• Link entities and create semantic relationships

• Organizes data into an ontology

• Classify sentiment, topic identification, noun-phrase/verb extraction

APIs

Apps

Desktop

ArcGIS

NLTK

NLP Tools

Page 20: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Entities and RelationshipsEntities (spatial)

Saudi Arabia285 Fulton St, New York,NY 1000734 10 9.51N 73 14 32.78EHadhramaut, Yemenapproximately 5 miles northwest of Baqubah

Entities (non-spatial)Osama bin LadenTerroristUS EmbassyUS Special ForcesAugust 20, 199866 cruise missiles

LinksOsama Bin laden -- Saudi Arabia (birthplace)US Embassy -- Kenya

EventsOsama bin Laden attacked World Trade CenterAbu Musab al-Zarqawi was killed June 7, 2006

Page 21: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Possible Use Cases of Unstructured Data

• Deriving locations from text

• Analyzing and enhancing existing spatial data containing attributes with free-text narrative

Page 22: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

NLP Integration with ArcGIS

Page 23: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

How to Integrate Unstructured Data into ArcGIS Enterprise

Script outputs JSON file to a network-accessible

folder

Custom Python script leveraging LocateXTprocesses message

GeoEvent Monitors folder

GeoEvent updates features in ArcGIS

Enterprise

New message comes in to folder

Page 24: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30
Page 25: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Print Your Certificate of AttendancePrint Stations Located at L Street Bridge

Tuesday Wednesday12:30 pm – 6:30 pm GIS Solutions Expo Hall D

5:15 pm – 6:30 pm GIS Solutions Expo SocialHall D

10:45 am – 5:15 pm GIS Solutions Expo Hall D

6:30 pm – 9:00 pm Networking ReceptionNational Museum ofNatural History

Page 26: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Please Take Our Survey on the AppDownload the Esri Events app and find your event

Select the session you attended

Scroll down to find the feedback section

Complete answersand select “Submit”

Page 27: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Presentation TitlePresenter Names

Page 28: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30
Page 29: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30
Page 30: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30
Page 31: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Sample Name Here

Click HereFor DEMO

Page 32: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30
Page 33: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Print Your Certificate of AttendancePrint Stations Located at L Street Bridge

Tuesday Wednesday12:30 pm – 6:30 pm GIS Solutions Expo Hall D

5:15 pm – 6:30 pm GIS Solutions Expo SocialHall D

10:45 am – 5:15 pm GIS Solutions Expo Hall D

6:30 pm – 9:00 pm Networking ReceptionNational Museum ofNatural History

Page 34: Integrating Unstructured Data Analysis into Defense and ... · Print Your Certificate of Attendance. Print Stations Located at L Street Bridge. Tuesday. Wednesday. 12:30 pm – 6:30

Please Take Our Survey on the AppDownload the Esri Events app and find your event

Select the session you attended

Scroll down to find the feedback section

Complete answersand select “Submit”