mapping of geographical entity with meeting location from text for mobile 2011. 9. 30 kyoungryol kim

22
Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

Upload: esmond-bruce

Post on 12-Jan-2016

224 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

Mapping of Geographical Entity with Meeting Location from Text for Mobile

2011. 9. 30

Kyoungryol Kim

Page 2: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

2

Table of Contents

1. Introduction

2. Background and Related Work

3. The Proposed System

4. Experimentation

5. Conclusion

Page 3: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

1. Introduction1) Motivation2) Problem Definition3) Contribution

Page 4: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

4

Motivation : IE Techniques on Smartphone

AppleiPhone

GoogleAndroid

RIMBlackberry

MSWindows

Phone

Time(Text)Recognition

Phone No.Recognition

Location(Text)Recognition

Adding event by recognized time

May 21, 2011

AddressRecognition

(Captured from Apple iPhone)

People start to pay attention to ‘Location Extraction’ technique

Page 5: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

5

Motivation : Characteristics of Mobile Device

Memory Issue Android : 16MB heap size limit for each app. iPhone : No memory limit, but totally 512MB of RAM (iPhone4)

Speed Issue People who use mobile devices usually feel uncomfortable when it delays.

IE System Usually general Information Extraction system consists of many NLP modules

which consume more than 1GB memory, at least.

Client-Server model Client and server communicating model that every processing is done in server-

side. Need internet connection (3G or Wifi). If many clients request to the server at once, there will be overloading delays or

the server dies.IE Method Specialized on Mobile Device is Needed

Page 6: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

6

Goal of this Research

Mapping Meeting Location text to the Geographical Locationand update it to online calendar in mobile device

The team meeting for the evaluation of first half of Univcast will be held.Date : July 19 (Sat) PM 2Location : Myeong-dong Dande-lion TerritoryDirections to Dandelion TerritoryAt Myeong-dong station gate num-ber 8, take a walk following the downtown then there it is on the first floor of YMCA building.

MeetingLocation

Name Myeong-dong Dandelion Territory

Address 1-1, Myeong-dong 1-ga, Jung-gu, Seoul, Korea

Geocode (37.5647312, 126.9861426)

Meeting AnnouncementExtractMeetingLocation

UpdateCalendar

startTime 2011-07-19T14:00

ExtractTime

Page 7: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

7

Problem Definition

1. Extract meeting location from meeting announce-ment email

2. Disambiguate the extracted meeting location

회의는 오후 5 시 학생회관 101 호에서 열립니다 .

(Meeting will be held 5 PM at Room 101, Student

Union.)

Page 8: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

2. Background and Related Works1) Information Extraction2) Geocoding3) Linked Open Data4) Local-Grammar Graph

Page 9: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

10

Information Extraction

Information Extraction The objective is to construct structured database from free text or semi-

structured text (J. H. Kim 2004)

Related Work CMU Seminar Announcement Corpus 485 semi-structured seminar announcements Types : stime, etime, location, speaker Focus only on 4 types of information extraction, not on Geocoding.

Examples of seminar announcement

Page 10: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

11

Geocoding

Geocoding The process of finding associated geographic coordinates, often expressed as lati-

tude and longitude, from other geographic data such as street addresses or zip codes (Geocoding, Wikipedia)

Related Work Geocode from the address

(Manov 2003; Jones 2003; Peng 2006; Pouliquen 2006; Volz 2007; Overell 2007; Goldberg 2007; Kauppinen 2008)

The big issue of the research is disambiguation of address (Pouliquen et al. 2006)

1. Multi-referent ambiguity two different geographic locations share the same name, e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts?

2. Name variant ambiguity the same location has different names,

3. Geoname-Non Geoname ambiguity where a location name could also stand for some other word such as a person name or nouns, e.g. Metro as the city in Indonesia vs. Metro as the subway system

Focus only on Geocoding address, not all location entity e.g. "Room 101, Student Union, Hanyang University"

Page 11: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

12

Linked Open Data

Linked Open Data URL : http://linkeddata.org The project aims to identify data sets that are available under open licenses, re-pub-

lish these in RDF on the Web and interlink them with each other Geographic Datasets are growing rapidly For only few Korean Geographical data included in LOD, we regard set of open ge-

ographical data as Linked Data, in this research.

March 2009 September 2010 September 2011

Page 12: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

13

Local-Grammar Graph

Local-Grammar Graph The language description model which is to perform automatic analysis

and generation of natural language text, information extraction, using local language information in the form of Finite-State Automata. (J. Nam 2006)

Help to increase efficiency and accuracy by lexicalizing the knowledge forming grammar readability by consisting grammar as Directed Acyclic Graphs.

Various omission and permutation can be described which cannot be done by rules or specific features.

Example of LGG for 176 kinds of French wine

un vin rouge de Bordeauxun vin de Bordeaux rougeun rouge de Bordeauxun Bordeaux rougeun Bordeauxun rouge....du vin d'Alsace blancdu vin blanc d'Alsacedu blanc d'Alsacede l'Alsacede l'Alsace blancdu blanc

Page 13: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

3. The Proposed System1) Preliminaries2) Overall Architecture3) Extraction Module4) Disambiguation Module

Page 14: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

17

Overall Architecture

ExtractionModule

DisambiguationModule

Query DisambiguatedResult

MobileDevice

Server

Linked Data

Finite-StateTransducers

INPUT

OUTPUT

제목 : 팀장회의 공지 2008 년도의 마지막 팀장회의가 11 월 22 일 토요일 오후 2 시 종로 토즈에서 열립니다 . 재계약 그리고 명함 배부가 이뤄질 예정이니 팀장님 , 그리고 차기팀장님들 모두 와주시기 바랍니다 . 오시는 길 : 종로 종각역 4 번 출구에서 내려서 100m 정도 걸어오시면 오른쪽에 있습니다 .

팀장회의 공지

장소

명칭 종로 토즈

주소대한민국 서울특별시 종로구 종로 1.2.3.4

가동 84-8

GPS 좌표

(37.569914, 126.984924)

Template Generator

PersonalGeoData

Page 15: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

18

Extraction Module (1/2)

1. Construct Local-Grammar Graph (LGG) Find local patterns around meeting location, inductively. Scope of local patterns :

Previous/Next/Current sentence including meeting location. Describe local patterns with 110 information types under 7 categories.

Location, Time, Title, Actor, Label, Connecting words, Etc. e.g. ‘ 장소 : ‘ is ‘locLbl’ information type under ‘Label’ category.

2. Convert LGG to Finite-State Transducer (FST)

3. Extract Meeting Location by FST

2. 학술대회 일정 : 2003 년 5 월 17 일 ( 토요일 ) 10:30 ~ 16:303. 학술대회 장소 : 성공회대학교 피츠버그관4. 학술대회 순서

Page 16: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

19

Extraction Module (2/2)

Category of LGG for Meeting Location

1. 개최장소 1 개

1.1. 장소

1.1.1. 장소

1.1.2. 장소 1_1 | 장소 1_2

1.1.3. 장소 1_1 | 장소 1_2 | 장소 1_3

1.2. 장소 + 랜드마크

1.2.1. 장소 | 랜드마크

1.2.2. 장소 1_1 | 장소 1_2 | 랜드마크

1.2.3. 장소 | 랜드마크 1 | 랜드마크 2

1.3. 장소 + 주소

1.3.1. 장소 | 주소

1.3.2. 장소 1_1 | 장소 1_2 | 주소

1.3.3. 장소 1_1 | 장소 1_2 | 장소 1_3 | 주소

1.3.4. 장소 | 랜드마크 | 주소

2. 개최장소 N 개 (N>1)

2.1. 개최장소 2 개

2.2. 개최장소 3 개

2.3. 개최장소 4 개

1. 일시 및 장소 : 2010. 5. 12( 수 ) 14:00~16:00, 무역협회 중회의실

( 삼성동 트레이드 타워 51 층 )

3. 장 소 : 울산광역시 울주군 상북면 등억리 27 번지

먹고쉬었다가 (052-263-1206)

Page 17: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

20

Disambiguation Module (1/2)

Problem Multi-reference ambiguity (Pouliquen et al. 2006)

two different geographic locations share the same name e.g. "Cambridge" is it Cambridge, UK or Cambridge, Massachusetts?

Disambiguation by Linked Data Personal Geo Data

Personalized OpenStreetMap User can map and save geographical location to the ‘meeting location’ (should be applied, consulting by Claus at Leipzig Univ.)

Open Geo Data Naver Local Search API Yahoo! POI Search API Seoul Bus-stop DB

Disambiguation by applying Ranking algorithm (idea will be borrowed from meta-search researches) disambiguate with 1st ranked geographical location

Page 18: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

21

Disambiguation Module (2/2)

Personal Geo Data

Query : 동측식당Email : [email protected] Data

Personal Geo Data

[email protected]동측식당

<36.369051,127.363757>

NaverLocal API

Yahoo!POI API

SeoulBus-stop

Open Geo Data

Disambiguation

동측식당 <37.19051,123.363757>동측식당 <36.347001,127.396285>동측식당 <36.998166,126.894287>

동측식당 <37.55111,126.93219>.......

동측식당<36.369051,127.363757>

Page 19: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

4. Experimentation1) Experiment Data2) Extraction Module3) Disambiguation Module

Page 20: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

23

Experiment Data

Meeting announcement corpus1101 meeting announcementsCollected from the web, with keyword ‘notice’Annotation

10 types of term, 13 types of relation 3 human annotators with COAT annotation toolkit

Page 21: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

24

Extraction Module

Exp1. Extraction speed/memory comparison Baseline system : ML based system Dataset :

already gathered corpus (training/test set)

Exp2. Extraction performance comparison Baseline system : ML based system Evaluation : Precision/Recall/F-measure Dataset :

already gathered corpus (training/test set) newly gathering corpus

(Experimentation should be followed)

Page 22: Mapping of Geographical Entity with Meeting Location from Text for Mobile 2011. 9. 30 Kyoungryol Kim

25

Disambiguation Module

Exp1. Accuracy in distance 6 types of distance :

0≤x≤100m, 100m≤x<1km, 1km≤x<2km, 2km≤x<3km, 3km≤x<5km and 5km≤x

Exp2. Accuracy Improvement with Personal Geo Data Evaluation :

hard to show the performance show some scenarios how can it be applied so that it can improve accuracy.

Exp3. Performance of Ranking Algorithm comparison

Exp4. Disambiguation speed/memory comparison processing and communication speed/memory comparison on Server vs. on Mobile device

(Experimentation should be followed)