real-time natural language processing for crowdsourced road traffic alerts

19
Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts C.D. Athuraliya, M.K.H. Gunasekara, Srinath Perera, Sriskandarajah Suhothayan http://bit.ly/1NwBXTv

Upload: cdathuraliya

Post on 16-Apr-2017

348 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

C.D. Athuraliya, M.K.H. Gunasekara,Srinath Perera, Sriskandarajah Suhothayan

http://bit.ly/1NwBXTv

Page 2: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

● Introduction

● Background

● Solution & Methodology

● Results & Conclusion

Overview

2

Page 3: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Introduction

● Success of modern day enterprises and businesses is highly relied

on how they process massive amounts of data

● “Drowning in data yet starving for knowledge”

● With the emergence of social media, public has gained the

potential to generate massive amounts of data

● But we are still in a struggle to extract useful information out of this

data

3

Page 4: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Introduction

● Road traffic has become a major issue, mainly in developing

countries

● Directly affects country’s economy and development due to the

waste of resources – Fuel, time

● Using technology to find solutions – Proven to be success stories in

number of cases

● This study was focused on one such solution emerged with the use

of social media

● Twitter – Popular for dynamic content publishing

○ Users publish on different topics such as current affairs, news, politics

and personal interests via 140 character messages called tweets4

Page 5: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Background

● Road.lk – A website that provides localized traffic alerts from a

Twitter feed

● Experiencing road traffic or have information on road traffic? Tweet

about it!

● All users, follow @road_lk receive traffic alerts nearly in real-time

● Identified as a potential source to extract information on road traffic

in real-time

● Reliability maintained by higher number of publishers

5

Page 6: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Background – @road_lk Feed

6

Page 7: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Background

● Potential is significant to a country like Sri Lanka – Due to the

unavailability of high tech traffic monitoring systems

● Several limitations,

○ Connectivity requirement

○ Unavailability of proper alert mechanism except Twitter feed or

road.lk website

● Notable limitation – Users use natural language to post traffic

updates

● A format can make processing tweets more straightforward but it

can reduce the flexibility of sharing updates

7

Page 8: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology

● A prototype solution was implemented by combining NLP and CEP

tools

● Accommodates three use cases,

○ Real-time road traffic feed and geo location map

○ Traffic search within an area

○ Traffic alert subscription

● Developed an architecture for a these use cases

● Multiple tools were utilized to retrieve, process and present

information

8

Page 9: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – Architecture

9

Page 10: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – Feed

● Feed Retrieval – Access Twitter via its API

● Existing feed for model training dataset generation

○ REST API, Twitter4J

● Real-time feed stream for alert generation

○ Streaming API, WSO2 Enterprise Service Bus Twitter

connector

10

Page 11: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – NLP

● @road_lk Twitter feed

○ Reliable data source to generate real-time traffic alerts

○ Constrained by natural language representation

● Transform this data into a machine readable representation – Can

use the full potential of this source for a better solution

● Proposed a NLP model to address this problem

● Extracted two entities from a tweet – location and traffic level

● Before extracting these two entities,

○ A tweet needed to be classified – Traffic alert or not?

○ Cleaning, preprocessing

11

Page 12: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – NLP

● NLP tasks required to classify and extract,

○ Tweet categorization

○ Location extraction

○ Traffic level extraction

● First task – Document categorization task

● Latter two – Name entity recognition (NER) tasks

● Apache OpenNLP toolkit was used

● Custom tokenizer for street names and city names

● Traffic level NER task – Predefined set of words selected to tag

● Had to consider factors – Spelling mistakes, informal language,

abbreviations 12

Page 13: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – CEP

● Another important property of this data source – Required to

process the Twitter feed in real-time

● Our approach was complex event processing (CEP)

● CEP is a field, concerned in processing data from multiple sources

in real-time

● Used WSO2 Complex Event Processor as the CEP tool to analyse

and process Twitter feed input stream

● Siddhi Query Language (SiddhiQL) is at the core of WSO2 CEP

● Designed to process event streams and identify complex event

occurrences

13

Page 14: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – Siddhi Queries

from classifiedStream#transform.nlp:getEntities(convertedText,4,true,"/_system/governance/en-location.bin")

select * insert into templocationStream;

from classifiedStream#transform.nlp:getEntities(convertedText,1,false,"/_system/governance/en-trafficlevel.bin")

select * insert into temptrafficlevelStream;

from S1=classifiedStream, S2=temptrafficlevelStream, S3=templocationStream

select S1.createdAt as time, S2.nameElement1 as trafficLevel, S3.nameElement1 as location1, S3.nameElement2 as

location2, S3.nameElement3 as location3, S3.nameElement4 as location4

insert into locationsStream;

from uiFeedStream#window.time(120 min) as trafficFeed join SearchEventStream as request

on (trafficFeed.latitude < request.latitude + 0.018 and trafficFeed.latitude > request.latitude - 0.018 and

trafficFeed.longitude < request.longitude + 0.027 and trafficFeed.longitude > request.longitude - 0.027)

select trafficFeed.formattedAddress, trafficFeed.latitude, trafficFeed.longitude, trafficFeed.level, trafficFeed.time

insert into searchResult;

14

Page 15: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Solution & Methodology – CEP

● Siddhi queries define how to process and combine existing event

streams to create new event streams

● SiddhiQL was extended with extensions for,

○ Tweet categorization

○ Name entity recognition

○ Geocoding

● Geocoding extension converts the locations into geo coordinates

● Searching functionality used a time-based Siddhi window

○ To retrieve traffic in nearby geo area within a predefined time

period

15

Page 16: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Results & Conclusion

● Implemented a web based interface to demonstrate the

functionalities

● Users can interact with this interface and make use of the use

cases

● Accuracy measures of NLP through OpenNLP evaluation APIs

● A solution to extract useful information from a crowdsourced social

networking service

● By utilizing a NLP/CEP combined approach

16

Page 17: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Results & Conclusion – Web UI

17

Page 18: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Results & Conclusion

● Results demonstrate the potential of such model

● To tackle an application of real-time natural language processing

task

● This model can be extended to tackle any real-time unstructured

data stream

● Transforming human readable data into machine readable format

enables deep processing of data to generate useful information and

insights

○ Trend analysis

○ Pattern detection and prediction

18

Page 19: Real-time Natural Language Processing for Crowdsourced Road Traffic Alerts

Thank you.