geoinfo 2006 presentation by chris jones, cardiff university 1 geographical information retrieval...

42
GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See www.geo-spirit.org for information on SPIRIT project, the contributing partners, and downloads of articles and project deliverables.

Upload: alexandra-chandler

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

1

Geographical Information Retrieval

Christopher Jones

Cardiff University

See www.geo-spirit.org for information on SPIRIT project, the contributing partners, and downloads of articles and project deliverables.

Page 2: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

2

What is Geo-information?

• Geo-information associates things and events with places

Rich vocabulary:Place names, coordinates, geometric

objects, spatial relationships, spatial structures, patterns, paths, flows, interactions…

Page 3: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

3

Where is Geo-information?

Personal knowledge – of landscape, of where things, people and

services are located, where things happened

Documents (various media)– Lists of where facilities, resources, structures

are located– Textual descriptions of geographic

phenomena– Images and videos of geographic space

Maps

Page 4: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

4

GIS and the Web

A GIS typically :– Isolated– Supports individual

organisation – Small range of topics– Structured data /

geo-coded locations– Finds answers – Accessed privately– Complicated to use

World Wide Web is :– Global networked– Supports everyone

on Internet– Vast range of topics– Unstructured

free text / images– Finds documents – Accessed publicly– Easy to use

Page 5: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

5

Problems with WWW as a source of geo-

information• Geographic context embedded in natural language descriptions

• Place names ambiguous and confused with names of organisations, people, buildings and streets

• Web queries depend on exact match of text terms

• No intelligent interpretation of spatial relationships (“near”, “west” etc)

• No geo-relevance ranking

Page 6: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

6

Current motivation of GIR : Find geo-specific resources on the

Webfind web resources about

Something related_to Somewhere

related_to = in, near, within Xkm, north_of ..etc.

• Resolve ambiguity of names (many places have same name)

• Interpret the query spatial relationships query footprint

• Find documents geographically associated with region of query footprint

• Relevance rank geographically by place and subject

near north

Page 7: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

7

GIR, GIS and The Web

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Page 8: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

8

Geographical Search Engines

• Google etc have “local” versions.

-Based on business (yellow pages) directories.

Page 9: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

9

Geographical Search Engines

SPIRIT research prototype general geo-web search

Structured user interface:

Dropdown menu of spatial relationships

Page 10: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

10

Geographical search engines

SPIRITResultslisted as

URLs Plus

symbols on map

User Interface screen shots from Ross Purves et al University of Zurich

Page 11: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

11

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Page 12: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

12

Geo-Tagging = Geo-parsing + Geo-coding

Geo-parsing Recognising geographic

references (ignoring non-geographic uses of place terminology)

Geo-coding– Attaching a unique

quantitative locations (footprint) to geographic references

Page 13: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

13

Geo-parsing• The presence of place names can be recognised

with gazetteers (i.e. lists of names)

• Some types of genuine geographic reference– the name of the place : Sao Paulo

– an address School of Computer Science,

Cardiff University,

5 The Parade, Cardiff

– an address fragment “Ross lived in Dalmeny Street in Edinburgh”

– a postcode / zip code CF24 3XF

– a phone number most Cardiff phone numbers start with 02920

Page 14: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

14

Geo-Parsing : true & false references

Some types of false geographic reference

• Personal names Smedes York, Jack London

• Business name Dorchester Hotel, York Properties..

• Street names Oxford Street,

London Road…

• Common words that are also places bath, battle, derby, over, well, ……

Page 15: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

15

Geo-Parsing : distinguishing between false and true geo-references

Look for patterns and context

Personal names (Jack London, Mr York): <First_name> <Location>; <Title> <Location>

Business names (Paris Hotel) :

<Business_type> <Location> (or vice versa)

Street names (Oxford Street) :

<Location> <Road_type>

Detect spatial propositions in, near, south of, outside etc “he lived in Over”

Page 16: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

16

Geo-coding (grounding) the genuine geo-references

Many different places with the same name

(referent ambiguity) Newport, Cambridge,

Springfield………

Use context to decide (references to parent or nearby places )

Or – choose most important one (by population or place type hierarchy)

Page 17: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

17

Geo-Coding :

What is the geo-focus of a web page?• Frequency of

occurrence• Do multiple

places have common parent?

Page 18: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

18

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Page 19: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

19

Indexing Web ResourcesStandard text index is

inverted file

Query: Restaurants in Cardiff

Find documents that contain all terms

Works literally for “in” but won’t find contained places.

Doesn’t work in general for “near”, “Xkms from”, “north_of” etc

apple Doc79, Doc89, Doc822….

Cardiff Doc2, Doc19, Doc37, …

door Doc16, Doc49, Doc112…..

hotel Doc1, Doc2, Doc23, …

in Doc4, Doc7, Doc19…

London Doc20, Doc35, Doc150…..

pub Doc9, Doc11, Doc100, …

restaurant Doc19, Doc22, Doc37, ..

…………………….

…………………………………………..

Text Term List of resources containing term

Page 20: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

20

Why Spatial Indexing?Query : “Castles in Wales”

Need to find documents that refer to names of places in Wales (perhaps without mentioning “Wales”)

Query “Hotels outside and within 30Kms of Rio”Need to documents referring to hotels that are in

places other than Rio

• In both cases to use conventional text indexing requires a query to contain the names of all places in Wales and all places outside Rio within 30km

Page 21: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

21

Spatial indexing of resources• Use prime foci of documents to create

document footprints (point, polygon, bounding rectangle..)

• Use footprints to index documents• Convert query to a query footprint• Match query footprint to doc. footprints

Spatial Query Result

Page 22: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

22

Combining text and spatial indexing : spatio-textual

indexing

• Space-primary (ST) : textual index for each spatial cell

• Text-primary (TS) : spatial index for each term

• Separate S and T indexes (T)

Page 23: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

23

D

A B

CSpatial Query Results

term1 docs

term2 docs

term3 docs

Text Index B

term2

docs

term4

docs

term5

docs

Text Index D

Spatial-primary (ST) method

Each spatial cell has a text index

Retrieve document ids for query terms lying in cells intersected by query footprint

High storage overhead with multiple text indexes

Page 24: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

24

Text primary spatial indexing

• For each text query term, retrieve ids of documents lying in spatial cells intersecting the query footprint

• High storage overhead – multiple spatial indexes• Time performance better than ST

Results

Spatial Queryterm1

term2

term3

Text Index BD

A B

C

D

A B

C

D

A B

C

 

 

Index Entry: term2 : cellB(D1, D7); cellD(D3, D11, D13)…

For each term, store spatial index of documents containing the term

Page 25: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

25

Separate spatial and textual index• Access spatial index

with query footprint• Access text index

with concept terms• Merge results – find

intersection

• Relatively small storage overhead with spatial index

• Time performance superior (in latest experiments)

Term1 D1, D2, D23, …

Term2 D9, D11, D100, …

Term3 D27, D85, ..

D1

D2

D3

D4

D6

D7

D8

D5

D10

D11

D12

D13

D14

D15

D9 D16

R

R1

R3

R2

R4

Page 26: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

26

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Page 27: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

27

Geographical Relevance Ranking

• Determine “distance” between query footprint and document footprint

• Depends on query spatial operator (in, outside, XKms, north_of etc)

Spatial score

Example: airports near Leicester the further away, the lower the spatial score

D

Q

Figure from Marc van Kreveld, University of Utrecht

Page 28: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

28

Combining textual and spatial scores

• Textual scores: BM25• Spatial scores: by spatial footprint

analysis

0

1

1

normalizedBM25 score

spatial score

query / ideal footprint

footprints of documents

Figure from Marc van Kreveld University of Utrecht

Page 29: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

29

Anatomy of a Geographical Search Engine

Textual

Spatial

IndexesSpatialTextual

SearchEngine

RelevanceRanking

RankedResults

Search Request + Query footprint

UnrankedResults

Place Ontology

UserInterface

Broker

RankedResults

Query disambiguation

Geo-tagging

Textual

Spatial

WebResources

Document Footprints

Text Indexing

Query footprint

Page 30: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

30

Place OntologyEncodes knowledge of terminology and structure

of geographic space

• alternative names, languages• place types (political, topographic, social.. )• footprint (point, MBR, polygon) • spatial relationships and attributes : containment, adjacency, overlap • imprecise (vernacular) places

(“Midlands”, “south of France”, “Scottish borders”, “Pennines”, “Highlands”…..)

Derive from gazetteers, thesauri, maps & web

Page 31: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

31

Roles of Place Ontology

User Interface

Query Disambiguation

Geo-Tagging

Metadata Extraction

Web collection

document footprints

Relevance Ranking

Relevance Ranking

Spatial Index

documentfootprints

Search Component

Query Expansion(query footprint)

ontology

Page 32: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

32

Mining text on the web for vernacular place name knowledge

• Objective: estimate spatial extent of vague place

• Documents that refer to vague places may also refer to more precise places inside them.

• Places that occur frequently in association with a target named place may have higher chance of being inside

• Analyse frequency of occurrence of co-located places

Page 33: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

33

Places mentioned in documents retrieved by queries on the

“Cotswolds”

Figure from Ross Purves et al University of Zurich

Page 34: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

34

Summary of web mining procedure

• Submit web search engine queries referring to a target place

• Geo-Parse resulting highest ranking web pages for occurrence of place names

• Geocode place names with coordinates

• Create density surface model of co-occurring places and extract approximate boundary (contour).

Page 35: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

35

Formulating appropriate web queries

• Region only, e.g. “Rocky Mountains”– Retrieves all documents mentioning the name

• Region + Concept, e.g. “Hotels in Cotswolds”– Tends to retrieve directory pages listing places

associated with the target place

• Region and lexical pattern (trigger phrase), e.g. “Midwest towns such as”;

“in the South of France”– Reduces the number of relevant documents

retrieved but can work well for those documents– Problem of not enough “hits” for statistical analysis

• Region + Concept produces highest numbers of co-associated places in top ranking documents.

Page 36: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

36

Devon (county)

Distribution of associated places

Density surface at three threshold levels (1, 0.5, 0.25 points per cell)

Density surface

Note: some places wrongly

geocoded

Thresholded boundary compared with actual boundary

Figure from Ross Purves et al University of Zurich

Page 37: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

37

Vague place :MittellandEvidence for validity of method

Human interpretations of the extent

+ is the “core”

Density surface of web mining results

Figure from Ross Purves, University of Zurich

Page 38: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

38

GIR and GIS• GIR currently dominated by web search

– Unstructured results in multiple documents

• Sometimes single focused result wanted

• Hotels within 1 kilometre of the British Museum in London

• Where are pre-sixteenth century dwellings in USA?

• Which areas of East Anglia would be flooded if sea level rose by 1 metre?

Page 39: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

39

Bringing GIR and GIS together

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Geo-knowledge

GIS

The Web

GIRWorldKnowledge

Page 40: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

40

GeoInformation Services

Encode Geo-information in Web Services (Geo-services)

• Parse natural language queries• Interpret geo-terminology of queries• Identify the relevant geo-services to

match geo and non-geo concepts• Compose appropriate chain of services

Page 41: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

41

Where is GIR going?• Improve “conventional GIR” components:

– Geo-tagging, spatio-textual indexing and geo-relevance ranking

• Creation of rich place ontologies with world-wide coverage

• Improved understanding of spatial natural language terminology

• Open GeoInformation Web services

• Adapt GIR to personal needs (“What’s the quickest way out of here?”)

Page 42: GeoInfo 2006 Presentation by Chris Jones, Cardiff University 1 Geographical Information Retrieval Christopher Jones Cardiff University See

GeoInfo 2006 Presentation by Chris Jones, Cardiff University

42

More Information• SPIRIT project partners with local representatives:• Cardiff University (Chris Jones, Project

coordinator)• University of Sheffield (Mark Sanderson and Paul

Clough)• IGN, Paris (Anne Ruas)• Unversity of Utrecht (Marx van Kreveld)• University of Hannover (Monika Sester)• Universit of Zurich (Ross Purves and Rob Weibel)

• See www.geo-spirit.org for information on SPIRIT project and downloads of articles and project deliverables.

[N.B. Prototype search engine (with link from SPIRIT web site) is no longer functional]