kelm überblick 2013

42
Pascal Kelm [email protected] Communication Systems Group Technische Universität Berlin Thursday, 24 January 2013 www.nue.tu-berlin.de

Upload: tu-berlin-fb-nue

Post on 02-Jul-2015

1.129 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kelm überblick 2013

Pascal Kelm

[email protected]

Communication Systems Group

Technische Universität Berlin

Thursday, 24 January 2013

www.nue.tu-berlin.de

Page 2: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Overview 2

Page 3: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Motivation – Where in the world is it? 3

Page 5: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

State of the Art 5

Textual informationTags: Paris, France, twilight, grand blue, Europe,

Hasselblad, film, …

Visual information

Gazetteers- like geonames.org

Textual similarity- Finding the similarity

to a group of typonyms

Low-level features- Propagate the location

by finding a visual similar

Image

-Features: texture, color,

shape…

Local features- interesting points on the

object can be extracted to

provide a "feature

description“ of the object

- Features: SIFT, SURF

etc.

How would you estimate the location of an unknown content?

• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]

• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]

Page 6: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Relevant Research 1

2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic

information from a single image. Proceedings of the IEEE Conf. On

Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)

Purely data-driven scene matching approach (over 6 million GPS-

tagged images, 5 low-level descriptors)

Visual ambiguity

Low precision, high computational cost

(cluster of 400 processors 3 days)

6

Page 7: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Relevant Research 2

2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing

Flickr Photos on a Map. In: 32nd International ACM SIGIR

Textual annotated language model (ranking)

Geographical / textual ambiguity

High precision

High computational cost

7

Images with “palma" tag falsely mapped near

Palma de Mallorca, Spain

Page 8: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Research Question

What is the limitation of an automatic algorithm?

Which feature (text, video) performs best?

Is a fusion possible to eliminate geographical ambiguity?

Do I need a CPU-cluster to estimate the location?

Low performance low precision?

Is it possible for a human to estimate the location of a

video using textual, visual and audio information?

8

Page 9: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Placing Task

Organizers:

Pascal Kelm, TU Berlin

Adam Rae, Yahoo! Research

9

The task requires participants to assign

geographical coordinates to each provided

test video. Participants can make use of

metadata and audio and visual features as

well as external resources.

[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-

0073) of the MediaEval 2012]

Page 10: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Image Distribution

Flickr Database:

3,6 million training images

10.000 trainings videos

5091 test videos

Descriptors:1. Color and Edge Directivity Descriptor

2. Gabor

3. Fuzzy Color and Texture Histogram

4. Color Histogram

5. Scalable Color

6. Auto Color Correlogram

7. Tamura

8. Edge Histogram

9. Color Layout

Metadata:

All Inforamtion about

uploader + video

Page 11: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Overview Framework 11

National borders extracted from the metadata

Textual and visual features are used in a hierarchical

framework to predict the most likely location

[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical

Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]

Page 12: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Collaborative Systems: Example 12

這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。

這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…

Page 13: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Geographical Ambiguity

這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…

Which language is it?

Chinese

This was my last trip to Paris. I visited the castle in Disneyland…

Which words gives us information? Tags?

Trip, Paris, Castle, Disneyland

Which of these nouns have got geographical information?

Paris, Disneyland

13

Page 14: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Geographical Ambiguity 14

Paris

France

Canada

Puerto Rico

Disneyland

China

USA

France

R(ci) = Rank sum

ci = Countries

N = Number of toponym

1

0

1

0

0

det

)(

...

)(

maxargN

j

mj

N

j

j

ected

cR

cR

c

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map

using Millions of Flickr Photographs” ACM Multimedia 2011]

Page 15: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Overview Framework 15

National borders extracted from the metadata

Textual and visual features are used in a hierarchical

framework to predict the most likely location

Page 17: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Textual Region Model

Segmenting the world map into regions according to the

meridians and parallels

Stemming: reducing inflected words to their root form

17

Bream Vortex

Swimming

Ocean

Beach

Springs Vortex

Scuba Diving

Scuba Underwater

TextBounds Crossing, Florida, USA

Bream Vortex

Swim

Ocean

Beach

Springs Vortex

Scuba Dive

Scuba Underwat

Porter Stemmer

Page 18: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Textual Region Model

Term-location-distribution:

Term frequency-inverse document frequency:

18

Vt

lt

lt

N

NltP

'

,'

,

1

1)|(

t

lttn

NNtfidf log

,

N

i

iltPdlP

0

)|(logmax)|(

Page 19: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Textual Region Model 19

Bernoulli model:

t = Tag

C= Class / Region

Bream Vortex

Swim

Ocean

Beach

Springs Vortex

Scuba Dive

Scuba Underwat

Vt

ct

ct

N

NctP

'

,'

,

1

1)|(

Page 20: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Visual Region Model 20

Returns the visually most similar areas, which are

represented by a mean feature vector of all training images

and videos of the respective area

Page 21: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

What is meant by Spatial Segmentation?

World map is iteratively divided into segments of

different sizes

Each segment is considered as classes for our probabil-

istic model

21

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”

Working Notes Proceedings of the MediaEval 2012]

Page 22: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Fusion: Example

Confidence scores of the visual approach (right)

restricted to be in the most likely spatial segment

determined by the textual approach (left)

22

Page 23: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Results 23

[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR

'12

[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and

visual analysis.

Page 24: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Conclusion

hierarchical approach for automatic estimation

of geo-tags in social media website

detailed analysis of textual and visual features

using different spatial granularities (national

borders detection)

fusion of textual and visual methods is

important to eliminate geographical ambiguities

reduces the computing time in the subsequent

classification step

correctly located within a radius of 10 km for

half of the test set

24

Page 25: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Web demonstrator 25

http://geotagging.de.im

Page 26: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Geo-Location Human Baseline Project 26

Page 27: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Geo-Location Human Baseline Project 27

• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-

Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]

•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video

Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.

1, January 2013]

http://geotagging.de.im/game.php

Page 28: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Object Detection 28

Frame 35

Frame 370

Page 29: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Augmented Object Detection

OpenCV for Android

FAST

ORB

BRISK

SURF

CPU: 192 ms

GPU: 87 ms

Android: 9990 ms

29

Geo-referenced

Database

business card

Page 30: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Object Detection 30

Depth Map Matching Map

Page 31: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Graph-based Object Detection

Matching

31

Page 32: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

DFG Proposal 32

Housebreaking

Cyber-Stalking

Cyber-Stealing

Cyber-Mobbing

Page 33: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

DFG Proposal: Geo-Privacy 33

Page 34: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Question 34

Thanks for your attention.

Dipl.- Ing. Pascal Kelm

Communication Systems Group

Technische Universität Berlin

Sekr. EN1, Einsteinufer 17

10587 Berlin, Germany

E-mail: [email protected]

Telefon: (+49) 30 / 314 28504

Page 35: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

DFG: Geo-Tagging 35

Page 36: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Spatial Segmentation 36

Page 37: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Twitter-based Placing Sub-Task (New York) 37

Page 38: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Spatial Segmentation 38

Page 39: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Extracted geo. items

00001: hawaii, kauai, usa

39

hawaii

usa

kauii

Page 40: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Textual Features + Naive Bayes 40

Page 41: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Visual Features

What will you do if you do not have any textual information?

41

Page 42: Kelm überblick 2013

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Pic1Pic3Pic2Region

2

Fusion 42

Region

1

Region

2

Region

3

Region

4

Region

5

Region

6

Region

7

Region

8

Region

N

Region

1

Region

2

Region

3

Region

4

Region

5

Region

6

Region

7

Region

8

Region

N…

Textual Region Model

Visual Region Model

Geographical Boundaries Extraction

Region

3

Region

4

Region

5

Region

6

Ranking