extraction and visualization of geographical names in text

28
Extraction and Visualization of Geographical Names in Text ZHANG Xueying [email protected] Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University Nov. 18, 2009

Upload: elgin

Post on 11-Jan-2016

57 views

Category:

Documents


8 download

DESCRIPTION

Extraction and Visualization of Geographical Names in Text. ZHANG Xueying [email protected]. Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University Nov. 18, 2009. Content. 1. 2. 3. Background. Extraction of geographical names. Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extraction and Visualization of Geographical Names in Text

Extraction and Visualization of Geographical Names in Text

ZHANG [email protected]

Key Laboratory of Virtual Geographical Environment, Ministry of Education Nanjing Normal University

Nov. 18, 2009

Page 2: Extraction and Visualization of Geographical Names in Text

Content

Background1

Extraction of geographical names2

Applications3

Page 3: Extraction and Visualization of Geographical Names in Text

Resolution of Geographical names

Generation of geographical names

GIS

Geography

spatial model of the earth

Information and Library Sciences

Computer Science

Natural Language Processing

Computational linguistics

Human Computer Interaction

Cognitive Psychology

Medicine

Political and social sciences

Geophysics

Biology(botany/zoology/ecology)

Archeology

……

1.1 Disciplines concerned with geographic space

Page 4: Extraction and Visualization of Geographical Names in Text

Location designator

1.2 What is a geographical names?

Geographical named entity: named entities with nouns or location expressions

Place name: the name by which a geographical place is known.

Location

Toponym: a named point of reference in both the physical and cultural landscape on the Earth's surface.

Geographical name: essentially labels which distinguish one part of the earth’s surface from another.

Page 5: Extraction and Visualization of Geographical Names in Text

Recognition: identify geospatial names from a text span and then classifies them to predefined geographical feature categories.

1.3 Main tasks

Resolution: look up candidate referents and uses algorithms to pick the correct referents assigned to the recognized geographical names.

Page 6: Extraction and Visualization of Geographical Names in Text

1.4 Basic processing architecture

Applications

Representation

Extraction

Formalization

Dataset

Natural language processing and Machine learning

Geo

spatial In

form

ation

Geographical Information System

Natural language text

Page 7: Extraction and Visualization of Geographical Names in Text

1.5 Statistical models-ME

Maximum Entropy 1996 Natural language processing

√ no assumption of a normal distribution

√ no limits of context characteristics

√ learning cost of its parameters

√Considering single situations

Page 8: Extraction and Visualization of Geographical Names in Text

1.5 Statistical Models-HMM

Hidden Markov Model

Markov property

Markov chain model: For observable state sequences (state is known from data).

Hidden Markov Model: For non-observable states

Page 9: Extraction and Visualization of Geographical Names in Text

Speech recognition

Speech recognition

Part-of-speech tagging

Part-of-speech tagging

HandwritingrecognitionHandwritingrecognition

Machine translation

HMM in Computational Linguistics

1.5 Statistical Models-HMM

Page 10: Extraction and Visualization of Geographical Names in Text

Conditional Random Field

1.6 Statistical Models-CRF

Much like a Markov random field

An HMM –a CRF with very specific feature functions

A CRF --generalization of an HMM

Page 11: Extraction and Visualization of Geographical Names in Text

Content

Background1

Extraction of geographical names2

Applications3

Page 12: Extraction and Visualization of Geographical Names in Text

2.1 Diagram of CRF based recognition

label granularity

Feature template

CRF training

CRF test

CCRF test

Dataset

CCRF training

Simple geographical names

linguistic characteristics

Combined geographical names

Page 13: Extraction and Visualization of Geographical Names in Text

2.2 Linguistic characteristics

language, history and culturespecial charactersCombined named unitsspatial relations

Page 14: Extraction and Visualization of Geographical Names in Text

2.3 Label granularity

Granularity:1-gram, 2-gram, …., word, phrase, sentence, paragraph, discourse

1-gram: sparse data

Word segmentation

Page 15: Extraction and Visualization of Geographical Names in Text

2.4 CCRF( cascaded CRF)

The upper recognition model

…… ……2CT iCT nCT1CT

The lower recognition model

…… ……

…… ……1W 2W iW nW

2ST iST nST1ST

Page 16: Extraction and Visualization of Geographical Names in Text

2.5 Feature template

Context: observable windows

( 1) 0 1( , ,..., ,..., , )n n n nw w w w w

n: training time and test performance

Page 17: Extraction and Visualization of Geographical Names in Text

Feature type Relative position

Front neighbor feature W-n….. W-(n-1)

Back neighbor feature W1….. Wn

Current feature W0

Front combined feature W-1 W0

Back combined feature W0 W1

Transition state Label of the first front neighbor feature

2.5 Feature template

Page 18: Extraction and Visualization of Geographical Names in Text

2.6 A example

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park in the Harbin city of Heilongjiang Province

prepared special new year gifts for children.

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park/SGN in the Harbin city/SGN of Heilongjiang

Province/SGN prepared special new year gifts for children.

位于黑龙江省哈尔滨市的哈尔滨市儿童公园为孩子们准备了特殊的贺岁礼物。Harbin Children Park/SGN in the Harbin city of Heilongjiang Province/CGN prepared special new year gifts for children.

Page 19: Extraction and Visualization of Geographical Names in Text

2.7 Experimental performance

Dataset

Precision Recall F-1

Number of recognized

geographical names

Train Test

PER ( 1-5)

PER( 1) 94.01 94.91 94.46 26185

PER ( 1-5)

PER( 6) 94.30 94.35 94.33 30126

PER ( 1-5) MSRA 73.40 73.10 73.25 2674

MSRA MSRA 93.23 87.78 90.43 3211

MSRAPER

( 1) 73.61 67.84 70.61 18718

MSRAPER

( 6) 71.90 69.68 70.77 22249

Page 20: Extraction and Visualization of Geographical Names in Text

2.8 Resolution approach

Matching

Gazetteer

Reference disambiguation

Candidate referents

Cognitive salience model

intended referents

Page 21: Extraction and Visualization of Geographical Names in Text

2.9 Cognitive salience model

High degree of spatial correlation in geographic references that are in textual proximity.

Page 22: Extraction and Visualization of Geographical Names in Text

2.10 Problems

Ancient geographical names

Spatio-temple Changs

Limits of statistical models

Limits of gazetteers

……

Page 23: Extraction and Visualization of Geographical Names in Text

Content

Background1

Extraction of geographical names2

Applications3

Page 24: Extraction and Visualization of Geographical Names in Text

GeoChunk: an annotation system

Page 25: Extraction and Visualization of Geographical Names in Text

TextMAP: a integrated system for text and map

Page 26: Extraction and Visualization of Geographical Names in Text

CGeoCoder: a address geocoding systems

Page 27: Extraction and Visualization of Geographical Names in Text

SRAnnotation

Page 28: Extraction and Visualization of Geographical Names in Text