beira: a geo-semantic clustering method for area summary

26
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. BEIRA: A geo-semantic clustering method for area summary Osamu Masutani, Hirotoshi Iwasaki Denso IT Laboratory, Inc.

Upload: osamu-masutani

Post on 12-Jun-2015

701 views

Category:

Technology


0 download

DESCRIPTION

The 8th International Conference on Web Information Systems Engineering (WISE2007)

TRANSCRIPT

Page 1: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

BEIRA: A geo-semantic clustering method for area summary

Osamu Masutani, Hirotoshi IwasakiDenso IT Laboratory, Inc.

Page 2: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26

Summary

BackgroundConceptSystem architectureEvaluationConclusions & Future works

Page 3: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16

Background – Map service

Target- Car navigation or PND (Personal

Navigation Devices) - GPS mobile phone- Web-based Map Service

Major functionalities of map service- View maps around current position- Search route to destination- Search favorite POI (Point of

Interests)

Page 4: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16

A scenario : A visitor to NancyNo previous knowledge about Nancy.- Japanese- A little interest about Art

He has a free time.- No plan.- He can’t speak French.- He has a GPS mobile phone.

The only available information is from mobile map service.- He’d like to search POIs using the service.- What is a problem ?

Page 5: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16

Use cases : Searching POIs on mobile

3 ways to searchLocation based search- Nearby area

Category based search- “Restaurant” / “Italian” / …- “Public” / “Library” / …

Keyword based search- “chocolate cake”, “soccer”,

“beautiful”, “calm” , …

Page 6: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16

Problem in location based search

Filtering by the specified areaSometimes results are numerous- In central urban area- Broad area is chosen

Selection is very hard- UI is limited. (especially on mobile)

Page 7: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16

Problem in category based search

Filtering by specific categorySometimes results are numerous- When the user doesn’t specify

detail category

Information awareness- Once the user chose “Museum”

category, he can’t find “Place Stanislas”.

museum park

Place Stanislas

Page 8: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16

Problem of keyword based search

Filtering by keyword matchInformation awareness- The users is required to know about

the keyword in advance- “Art Nouveau” is good keyword to

find Nancy’s features.- But if the user mistakes the keyword

for “Art Deco” the result will be poor

Art nouveau

Place Stanislas

Page 9: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

ProblemsInformation overload- Numerous candidates- Millions of POIs in mobile phone service

Information awareness- Both fixed category and free keyword

search have the similar problem.

Solution- Reduce the candidates- But keep information awareness- Clustering and summarization of

information

9 of 16

museum park

Page 10: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16

Clustering and summarization

Similar concept- Web search engine “Vivisimo”- Displays clustering result and

their topic of search results- Dynamic category

Easy to choose but comprehensive- There are reduced number of

candidates but has comprehensive view

Page 11: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Is Vivisimo enough ?

It provides only semantic (topic) view.- With map service- Switching between semantic and

geographic view will be complicated

Can these two views be combined?- Use only map view- Cluster = area

11 of 16

Page 12: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16

BEIRA :Bird’s Eye Information Retrieval Application

Topic based IR through geographic view.- Use AOI (Area of Interest) instead of POI- AOI consists of area(cluster) and its summary

(the word list)

Art Nouveau

Area

Summary=word list

Page 13: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

System architecturePOI database- Address of POI- Text of POI (guide text, reputation text etc.)

Preprocessing- Geo-coding and Topic vector generation.

Geo-semantic clustering and summarizationDisplay AOI

13 of 16

POI database

Geographic preprocessing

POI ID Address text Etc…

Semantic preprocessing

Geo-semanticclustering

Geo-semanticsummarization

AOI

AOI ID Area Polygon Summary

Topic Vector

Latitude Longitude

Page 14: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Implementation

Combinations of GIS and Text mining tools

14 of 16

Page 15: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Geo-semantic clusteringGeographic clustering doesn’t reflect area topics : Circular areaSemantic clustering doesn’t consider geographic view : Scattered areaGeo-semantic clustering solves these problems

15 of 16

Semantic Clustering G/S Clustering Geographic Clustering

Page 16: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16

Geo-semantic clustering

Co-clustering with geographic and semantic features- Geographic feature : latitude, longitude- Semantic feature : large dimension matrix (Latent

semantic indexing)

G/S ratio R: the combination ratio- R =Geographic bias / Semantic bias

Geographic Features Semantic FeaturesPOI ID Latitude longitude LSI1 LSI2 LSI3

・・・ ・・・ ・・・ ・・・ ・・・ ・・・

*R *1

Page 17: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16

Evaluation : geo-semantic clustering

Dataset : Cafes in Shibuya- Text contents : restaurants evaluation web site

“asku.com”- 272 cafes in the region (Shibuya ward).

Correct cluster data- Generated manually- 13 clusters in the region- F measure

Page 18: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Results of clustering

Geo-semantic clustering produces non-circular area according to its topic.

Semantic GeographicGeo-semantic

R=1.0E-02 R=1.0E+06R=1.0E-04

Page 19: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16

Evaluation of clustering

We confirmed geo-semantic clustering is better than each solo clustering- Intermediate ratio (0.01) is optimal.

0

0.1

0.2

0.3

0.4

0.5

0.6

1.0E-04 1.0E-02 1.0E+00 1.0E+02 1.0E+04 1.0E+06

MLSA

Tensor-Kmeans

Semantic Geographic

Page 20: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16

Area summarization

Document summarizationTerm weighting : ex. TF/IDF- The term that occurs many times in a

document is important (TF term frequency)

- The rare term in entire document set is important (IDF inverse document frequency)

Page 21: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

The simple IDF cannot extract regional characteristic word- According to IDF , “onion” and “wedding” have same weight- “wedding” should be regarded as more important because the

area where wedding is held should be biased.

z Normal term“onion”

Place name “Dogenzaka”

Area term “Wedding”

IDF

IDF 3.08 3.51 3.04K 4.41 54.0 9.93

21 of 16

Problem of IDF

Page 22: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

The geographic distribution of word- Term occurrence in the geographic space

More condensed is regarded as more important- Measurement : K-value (point distribution analysis method)

IDF * K

22 of 16

Location aware IDF

z Normal term“onion”

Place name “Dogenzaka”

Area term “Wedding”

IDF

IDF 3.08 3.51 3.04K 4.41 54.0 9.93

Page 23: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Evaluation measure : Extraction rate of location names- The area characteristic terms has similar

distribution with location name

z Normal term“onion”

Place name “Dogenzaka”

Area term “Wedding”

IDF

IDF 3.08 3.51 3.04K 4.41 54.0 9.93

23 of 16

Evaluation of location aware IDF

Page 24: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16

Evaluation data- All words in Shibuya area.- Top 1,000 weighted terms

Location aware IDF (IDF*K) efficiently extracts location name than conventional ones

Evaluation of location aware IDF

0

5

10

15

20

25

30

1 100 200 300 400 500 600 700 800 900

rank

densi

ty o

f lo

cation

nam

e[%

]

IDF

K

IDF*K

Page 25: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16

ConclusionsBEIRA attacks the issues on map service- Information overload- Information awareness

Geo-semantic combination of features and processing can be used to make area characteristics view.Future works- Automatic adaptation of G/S ratio- Evaluation on other contents Hokkai Takashima

(1850-1931)

Page 26: BEIRA: A geo-semantic clustering method for area summary

Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.

Thank you for your attention!

26 of 26