efficient summarization framework for multi-attribute uncertain data

23
Efficient summarization framework for multi- attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1

Upload: hewitt

Post on 11-Jan-2016

58 views

Category:

Documents


1 download

DESCRIPTION

Efficient summarization framework for multi-attribute uncertain data. Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra. The Summarization Problem. Extractive. Uncertain Data Set. face ( e.g. Jeff, Kate ). O 1. O n. O 8. …. O 11. O 2. O 1. O 25. l ocation ( e.g. LA ). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient summarization framework for multi-attribute uncertain data

1

Efficient summarization framework for multi-attribute

uncertain data

Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra

Page 2: Efficient summarization framework for multi-attribute uncertain data

Uncertain Data Set

The Summarization Problem

2

location (e.g. LA)

face (e.g. Jeff, Kate)

visual concepts (e.g. water, plant, sky)

Extractive

Abstractive

O1

O8

O11

O25

Kate Jeff wedding at LA

O1

O2

On

Page 3: Efficient summarization framework for multi-attribute uncertain data

3

Modeling Information

Summarization Process

What information does this image contain?

Extract best subset

dataset summary

Metrics?- Coverage Agrawal, WSDM’09; Li, WWW’09; Liu, SDM‘’09;

Sinha, WWW’11 - Diversity Vee, ICDE’08; Ziegler, WWW’05- Quality Sinha, WWW’11

information

object

Page 4: Efficient summarization framework for multi-attribute uncertain data

Existing Techniques

4

Kennedy et al. WWW’08

Simon et al. ICCV’07

Sinha et al. WWW’11

Hu et al. KDD’04

Ly et al. CoRR’11

Inouye et al. SocialCom ’11

Li et al. WWW’09

Liu et al. SDM’09

• Do not consider information in multiple attributes

• Do not deal with uncertain data

image customer review doc/micro-blog

Page 5: Efficient summarization framework for multi-attribute uncertain data

5

Challenges

Design a summarization framework for

Multi-attribute Data

Uncertain/Probabilistic Data.

visual concept

face tags

locationtimeevent

visual conceptsP(sky) = 0.7, P(people) = 0.9

data processing(e.g. vision analysis)

Page 6: Efficient summarization framework for multi-attribute uncertain data

6

Existing techniques typically model & summarize a single information dimension

Limitations of existing techniques - 1

Summarize only information about visual content (Kennedy et al. WWW’08,Simon et al. ICCV’07)

Summarize only information about review content (Hu et al. KDD’04,Ly et al. CoRR’11)

Page 7: Efficient summarization framework for multi-attribute uncertain data

What information is in the image?

7

{sky}, {plant}, …

{Kate}, {Jeff}

{wedding}

{12/01/2012}

{Los Angeles}

Elemental IU

Is that all?

{Kate, Jeff}{sky, plant}…

Intra-attribute IU

Even more information from attributes?

{Kate, LA}

Inter-attribute IU

{Kate, Jeff, wedding}

Page 8: Efficient summarization framework for multi-attribute uncertain data

8

Are all information units interesting?

Is {Sharad, Mike} an interesting intra-attribute IU?

Yes, they often have coffee together and appear frequently in other photos

Are all of the 2n combinations of people interesting? Shall we select a summary that covers all these information?

Well, probably not! I don’t care about person X and person Y who happen to be together in the photo of this large group.

Is {Liyan, Ling} interesting?

Yes from my perspective, because they are both my close friends

Page 9: Efficient summarization framework for multi-attribute uncertain data

9

Mine for interesting information units

O1face

{Jeff, Kate}

O2face

{Tom}

O3face

{Jeff, Kate, Tom}

O4face

{Kate, Tom}

O5face

{Jeff, Kate}

Onface

{Jeff, Kate}

T1

T2

T3

T4

T5

Tn

Modified Item-set mining algorithm

frequentcorrelated{Jeff, Kate}

Page 10: Efficient summarization framework for multi-attribute uncertain data

10

Mine for interesting information units

O1face

{Jeff, Kate}

O2face

{Jeff}

O3face

{Jeff, Kate, Tom}

O4face

{Kate, Tom}

O5face

{Jeff, Kate}

Onface

{Jeff, Kate}

Mine from social context

(e.g. Jeff is friend of Kate,

Tom is a close friend of the user)

{Jeff, Kate}

{Tom}

Page 11: Efficient summarization framework for multi-attribute uncertain data

11

Can not handle probabilistic attributes

Limitation of existing techniques – 2

dataset summary

P(Jeff) = 0.8

P(Jeff) = 0.6

Not sure whether an object covers an IU in another object

?

objects

IU

1

2

3

n

n

3

Page 12: Efficient summarization framework for multi-attribute uncertain data

12

Deterministic Coverage Model --- Example

Coverage = 8 / 14

dataset summary

information

object

Page 13: Efficient summarization framework for multi-attribute uncertain data

Probabilistic Coverage Model

13

Expected amount of information covered by S

Expected amount of total information

Simplify to compute efficiently

Can be computed in polynomial time

The function is sub-modular

Page 14: Efficient summarization framework for multi-attribute uncertain data

Optimization Problem for summarization

Parameters: dataset O = {o1, o2, · · · , on} positive number K

Finding summary with Maximum Expected Coverage is NP-hard.

We developed an efficient greedy algorithm to solve it.

14

Page 15: Efficient summarization framework for multi-attribute uncertain data

For each object o in O \ S,Compute hkjhkhk

Basic Greedy Algorithm

Expensive to compute Cov. It is

(Object-level optimization)

Too many operations of computing Cov.

(Iteration-levelOptimization)

Initialize S = empty set

Select o* with max

Yes

Nodone

Page 16: Efficient summarization framework for multi-attribute uncertain data

Efficiency optimization – Object-level

Reduce the time required to compute the coverage for one object

Instead of directly compute and optimize coverage in each iteration, compute the gain of adding one object o to summary S

gain(S,o) = -

Updating gain(S,o) is much more efficient ( )

16

Page 17: Efficient summarization framework for multi-attribute uncertain data

Submodularity of Coverage

Expected Coverage Cov(S,O) is submodular:

17

Cov(S, O)Cov(S ∪ o, O) – Cov(S, O)

Cov(T, O)Cov(T ∪ o) - Cov(T, O)

Page 18: Efficient summarization framework for multi-attribute uncertain data

18

Efficiency optimization – Iteration-level

Reduce the number of object-level computations (i.e. gain(S,o) ) in each iteration of the greedy process

While traversing objects in O \ S, we maintain the maximum gain so far gain*.

an upper bound Upper(S, O) on gain(S,o). For any

prune an object o if Upper(S, o) < gain*.

By definition

By submodularity

Update in constant time

Page 19: Efficient summarization framework for multi-attribute uncertain data

Experiment -- Datasets

Facebook Photo Set 200 photos uploaded by 10 Facebook users

Review Dataset Reviews about 10 hotels from TripAdvisor.

Each hotel has about 250 reviews on average.

Flickr Photo Set 20,000 photos from Flickr.

19

visual concept

event timeface

visual conceptfacets rating

visual event time

Page 20: Efficient summarization framework for multi-attribute uncertain data

Experiment – Quality

20

Page 21: Efficient summarization framework for multi-attribute uncertain data

Experiment – Efficiency

21

Basic greedy algorithm without optimization runs more than 1 minute

Page 22: Efficient summarization framework for multi-attribute uncertain data

Summary

22

Developed a new extractive summarization framework

Multi-attribute data.

Uncertain/Probabilistic data.

Generates high-quality summaries.

Highly efficient.

Page 23: Efficient summarization framework for multi-attribute uncertain data

23