customizing semantic profiling for digital advertising
TRANSCRIPT
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Customizing Semantic Profiling for
Digital Advertising
Ana Roxin - [email protected]
October, 28th 2014
LE2I (Laboratory of Informatics, Image and Electronics) - UMR CNRS 6306University of Burgundy
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Agenda
General context
• Need for user profiling…
• … in a Big Data Context…
• … for Digital advertising
Related work
• General trends for Big Data analysis
• Limitations for keyword-based approaches
Our approach
• Domain knowledge modeling
• Practical implementation
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
GENERAL CONTEXT
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Need for User Profiling…
Language and Localisation ?
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
… in a Big Data context …
Volume
• Massive amounts of data have to be treated
Velocity
• Those data arrive in high speed
Variety
• Data types and formats are heterogeneous
Veracity
• Data are not always sound and have to be verified
Value
• They have an inherent value that can be processed by the application
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
… for Digital Advertisment
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
RELATED WORK
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Re-targeting
• Re-confront the user with:
• Previously visited content
• Very similar contents
Web usage mining
• Unite profiling efforts across pages
• Support dynamic structural changes of a Web site in order to adapt advertisements to the active user
Methods for Web advertising
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Machine Learning & Statistics
• “Richness of data and challenges in their predictive analysis and forecasting”
• Stable and scalable techniques:
• SVMs, regressions trees, neural networks
• Not yet semantic-based techniques…
Problems
• Underlying data structures
• Explicability of results
• Context integration
Big Data analysis
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Ex.: Keyword-based analysis
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
?
Problems with keywords 1
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
?
Problems with keywords 2
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Other issues
Integration and interpretationData
• Discover new information sources for knowledge augmentation
• Semantic disambiguation for keywords in a given context
Unstructured, heterogeneous resourcesUnderstanding
• Semantic interpretation and integration of such contents
More and more data…Performance
• … less and less time for analysis and processing
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
OUR APPROACH
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Modelling domain knowledge
• Model expert knowledge
• Identify relevant core concepts and their relations
• Discard redundant/surplus information
Semantic qualification of user data
• Semantic disambiguation
• Find real-world concepts, not keywords
• Exploit additional information sources
• Dbpedia, WordNet, Freebase
Bridge using SW technologies
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Domain Knowledge Modelling
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Domain specialisation 1
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Domain specialisation 2
Sporty Mom
User(?user), Universe:Sport (?u1), hasLongTermInterestIn(?user, ?u1), hasChilren(?user), isAdult (?user) -> belongsToSegment(?user,”SportyMom”)
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Uncertainty handling
0,65
0,2
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Synchronous
Asynchronous
WP Semantic Qualification Process
Population
Qualification
Knowledgebase
Web usage data
Cookies Log
Population
Profile Characterization
Inference
User targeting
Practical implementation 1
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Practical implementation 2
Weighted user profiling
SWRL rules – on weighted classified URLs + navigation information (number of hits, time spent)
Weighted marketing segment belonging
URL weighted classification
Lemma filteringConcept extraction (semantic
annotation)Universe similarity computation
Web navigation logs parsing
URL + navigation information extraction
Sporty Mom80%
Geek20%
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Profiling customization
Knowledge base browsing
Drag ‘n drop SWRL rule creation
Corresponding BIDs display and download
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
CONCLUSION & FUTURE WORK
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Fine tuning
• Precise links to external knowledge bases
• Semantic distance for mapping webpages to universes
• Usage of contextual data (time, location, user agent) in SWRL rules
Testing and extension
• First benchmarks on partner data:
• Real web navigation logs – French websites
• Testing on open datasets (Reuters, Leipzig University)
• Enable scientific comparison + extend to English
• Integration to Stardog graph database
Open questions
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
Presented system
• Functional profiling system based on users' web navigation that aims to predict future consumer decisions
• Highly customized set of classes to summarize user’s interests as expressed in navigation logs
Next steps
• Extend the rule base for segment creation to enable more expressive design and segment creation
• Extend the ontology with external repositories for more interesting inferences
Take away message
• Profiling for digital advertising is a necessary, but not an easy task
• Semantic Web technologies are an answer to related problems
Summary
An
a R
oxi
n –
ana.
roxi
n@
chek
csem
.fr
–P
roje
ct t
eam
Ch
ecks
em –
Lab
ora
toir
e El
ectr
on
iqu
e In
form
atiq
ue
et I
mag
e (L
E2I
–U
MR
CN
RS
6306
) –
Un
iver
sity
of
Bu
rgu
nd
y
LE2I – UMR CNRS 6306 – University of Burgundy
Thank you for your attention. Questions ?