customizing semantic profiling for digital advertising

26
Ana Roxin – [email protected] – Project team Checksem – Laboratoire Electronique Informatique et Image (LE2I – UMR CNRS 6306) – University of Burgundy Customizing Semantic Profiling for Digital Advertising Ana Roxin - [email protected] October, 28 th 2014 LE2I (Laboratory of Informatics, Image and Electronics) - UMR CNRS 6306 University of Burgundy

Upload: ana-roxin

Post on 12-Apr-2017

56 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Customizing Semantic Profiling for

Digital Advertising

Ana Roxin - [email protected]

October, 28th 2014

LE2I (Laboratory of Informatics, Image and Electronics) - UMR CNRS 6306University of Burgundy

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Agenda

General context

• Need for user profiling…

• … in a Big Data Context…

• … for Digital advertising

Related work

• General trends for Big Data analysis

• Limitations for keyword-based approaches

Our approach

• Domain knowledge modeling

• Practical implementation

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

GENERAL CONTEXT

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Need for User Profiling…

Language and Localisation ?

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

… in a Big Data context …

Volume

• Massive amounts of data have to be treated

Velocity

• Those data arrive in high speed

Variety

• Data types and formats are heterogeneous

Veracity

• Data are not always sound and have to be verified

Value

• They have an inherent value that can be processed by the application

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

… for Digital Advertisment

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

RELATED WORK

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Re-targeting

• Re-confront the user with:

• Previously visited content

• Very similar contents

Web usage mining

• Unite profiling efforts across pages

• Support dynamic structural changes of a Web site in order to adapt advertisements to the active user

Methods for Web advertising

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Machine Learning & Statistics

• “Richness of data and challenges in their predictive analysis and forecasting”

• Stable and scalable techniques:

• SVMs, regressions trees, neural networks

• Not yet semantic-based techniques…

Problems

• Underlying data structures

• Explicability of results

• Context integration

Big Data analysis

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Ex.: Keyword-based analysis

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

?

Problems with keywords 1

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

?

Problems with keywords 2

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Other issues

Integration and interpretationData

• Discover new information sources for knowledge augmentation

• Semantic disambiguation for keywords in a given context

Unstructured, heterogeneous resourcesUnderstanding

• Semantic interpretation and integration of such contents

More and more data…Performance

• … less and less time for analysis and processing

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

OUR APPROACH

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Modelling domain knowledge

• Model expert knowledge

• Identify relevant core concepts and their relations

• Discard redundant/surplus information

Semantic qualification of user data

• Semantic disambiguation

• Find real-world concepts, not keywords

• Exploit additional information sources

• Dbpedia, WordNet, Freebase

Bridge using SW technologies

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Domain Knowledge Modelling

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Domain specialisation 1

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Domain specialisation 2

Sporty Mom

User(?user), Universe:Sport (?u1), hasLongTermInterestIn(?user, ?u1), hasChilren(?user), isAdult (?user) -> belongsToSegment(?user,”SportyMom”)

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Uncertainty handling

0,65

0,2

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Synchronous

Asynchronous

WP Semantic Qualification Process

Population

Qualification

Knowledgebase

Web usage data

Cookies Log

Population

Profile Characterization

Inference

User targeting

Practical implementation 1

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Practical implementation 2

Weighted user profiling

SWRL rules – on weighted classified URLs + navigation information (number of hits, time spent)

Weighted marketing segment belonging

URL weighted classification

Lemma filteringConcept extraction (semantic

annotation)Universe similarity computation

Web navigation logs parsing

URL + navigation information extraction

Sporty Mom80%

Geek20%

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Profiling customization

Knowledge base browsing

Drag ‘n drop SWRL rule creation

Corresponding BIDs display and download

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

CONCLUSION & FUTURE WORK

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Fine tuning

• Precise links to external knowledge bases

• Semantic distance for mapping webpages to universes

• Usage of contextual data (time, location, user agent) in SWRL rules

Testing and extension

• First benchmarks on partner data:

• Real web navigation logs – French websites

• Testing on open datasets (Reuters, Leipzig University)

• Enable scientific comparison + extend to English

• Integration to Stardog graph database

Open questions

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

Presented system

• Functional profiling system based on users' web navigation that aims to predict future consumer decisions

• Highly customized set of classes to summarize user’s interests as expressed in navigation logs

Next steps

• Extend the rule base for segment creation to enable more expressive design and segment creation

• Extend the ontology with external repositories for more interesting inferences

Take away message

• Profiling for digital advertising is a necessary, but not an easy task

• Semantic Web technologies are an answer to related problems

Summary

An

a R

oxi

n –

ana.

roxi

n@

chek

csem

.fr

–P

roje

ct t

eam

Ch

ecks

em –

Lab

ora

toir

e El

ectr

on

iqu

e In

form

atiq

ue

et I

mag

e (L

E2I

–U

MR

CN

RS

6306

) –

Un

iver

sity

of

Bu

rgu

nd

y

LE2I – UMR CNRS 6306 – University of Burgundy

[email protected]

Thank you for your attention. Questions ?