091104-m hornick-slides-oracle data mining case study
TRANSCRIPT
Copyright © 2009 Oracle Corporation
Oracle Data Mining for Text, Clustering, and Classification:
Case Study of a Recommendation Engine
Mark Hornick Pablo Tamayo
Senior Manager, Development Consulting MTS
[email protected] [email protected]
Data Mining Technologies Group
Copyright © 2009 Oracle Corporation
Introduction
Recommendation Engine at
Oracle OpenWorld Conference
2008
2009
Recommend conference sessions to attendees
Enhance session enrollment application
Use Oracle Data Mining and Oracle Data Miner UI
K-means, Naïve Bayes, Text Mining, Code Generation
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
High Level Objectives
Help attendees find relevant sessions
Maximize individual OOW experience / value
Increase session attendance
Copyright © 2009 Oracle Corporation
Technical Objectives and Constraints
Recommend 2009 sessions before
any history of who registered for any 2009 sessions
Use no session ratings data from attendees
Recommend sessions by relative preference
Recommend exhibitors and demos for attendees
Identify top N related sessions to a given session
Use an automated data mining-based solution
Copyright © 2009 Oracle Corporation
Approach
DeductionQuery refinement
Users specify what they want to retrieve
InductionModel-based recommendation engine
Recommend sessions most relevant to attendee profile
Improve likelihood of finding sessions of interest
…enhance Schedule Builder tool with Oracle Data Mining-generated session recommendations
Enrollment Application – Schedule Builder
Copyright © 2009 Oracle Corporation
Oracle Data Mining
Automatically sifts through data to
find hidden patterns, discover new insights,
and make predictions
Wide range of capabilities
Predict customer behavior (Classification)
Predict or estimate a value (Regression)
Group similar documents (Clustering and Text Mining)
Identify factors that determine an outcome (Attribute Importance)
Find profiles of targeted people or items (Decision Trees)
Determine important relationships and “market baskets” (Associations)
Extract higher-level text features (Feature Extraction)
Find fraud or “rare events” (Anomaly Detection)
…and others
Oracle Data Miner user interface supporting guided analytics
Approach – 30,000 ft.
2008 Data- Sessions- Attendees- Attendance
Model
Build
Apply
2009 Data- Sessions- Attendees
New attendee registersand completes survey
Ranked SessionRecommendationsfor each Attendee
Approach – 30,000 ft.
2009 Sessionrecommendationsfiltered by usercriteriaAttendee logs into
Schedule Builder
Ranked Sessionsretrieved
Ranked SessionRecommendations
for Attendees
Copyright © 2009 Oracle Corporation
Success Metrics
Conversion rate
% attendees who used at least 1 recommendation
Enrollment vs. actual attendance
Test Metrics
Enrichment curve
Global measure of merrit
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Conference Session
Recommendation Problem
Sessions are single use
No two are exactly alike conference to conference
Sessions have no history and no future
Don‟t know who will attend a given session
until after the session
No rating information available, attendance only
Infer preferences using higher level projections
Session themes
Attendee profiles
Copyright © 2009 Oracle Corporation
Conference DataOOW ‟08
Sessions (1850+)
Title, abstract, track(s)
Attendees (41700+)
Survey questions, position, product usage
Attendance (206700+)
Who attended which sessions
Attendee Interestsfrom OOW‟08 registration survey
Applications
Fusion
Agile
EBS
Hyperion
PeopleSoft
Siebel
JD Edwards
On Demand
App Integration Architecture
Development and Management
Technology
Business Intelligence
Security
SOA, BPM, Web Services, App Server
Content Management, Collaboration, Web 2.0
Predictive Analytics, Data Mining
Database
Enterprise Management
Identity Management
Warehousing
Performance / Scalability, GRID / RAC
High Availability
Middleware
Product Area
Customer Relationship Management
Governance, Risk, and Compliance
Master Data Management
Fulfillment (order management / logistics)
Supply Chain Management / Planning
Human Capital Management
Procurement
Project Management
Business Intelligence
Development
.Net
Database
Java
Fusion Development
Service-Oriented Architecture
Tools Development and Management
Product Lifecycle Management
Asset Lifecycle Management
Enterprise Performance Management
Financial Management
Strategy
Oracle Services
Oracle Consulting
Oracle Support
Oracle University
Oracle Linux Support
Automotive
Chemicals
Communications
Consumer Good
Natural Resources
Oil and Gas
Professional Services
Public Sector
Retail
Travel and Transportation
Education and Research
Engineering, Construction and Real Estate
Financial Services
Healthcare
High Tech
Industrial Manufacturing
Life Sciences
Media and Entertainment
Industry
…and others
Oracle Advanced Customer Services
Oracle On Demand
BEA
Primavera
Copyright © 2009 Oracle Corporation
Data Preparation
Sessions
Concatenate relevant columns to facilitate text mining
Attendance
Remove duplicates
Attendees
Synonyms in attribute values, e.g., state = OH and Ohio
Incomplete data, e.g., region = null
Multi-valued attributes requiring parsing,
e.g., member of user groups separated by „;‟ or „/‟
Map data columns between 2008 and 2009
e.g., Advanced customer services split between Apps and Tech
Free form columns, e.g., job title = Vice President, V.P., VP
Copyright © 2009 Oracle Corporation
Free Form FieldsJob Title Example
create table ATTENDEE09_PREP as
…
case when a.job_title like ''%Manager%'' then 1 else 0 end job_title_manager,
case when a.job_title like ''%President%'' then 1 else 0 end job_title_president,
case when a.job_title like ''%Vice%'„ then 1 else 0 end job_title_vice,
case when a.job_title like ''%V.P.%'„ then 1 else 0 end job_title_president,
case when a.job_title like ''%V.P.%'' then 1 else 0 end job_title_vice,
…
from ATTENDEE09
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Methodology
2008 Sessions2008 Attendees
Build classification
model to predictclusters for
attendees, thenscore attendees for each cluster
ClusterSessions
2008 Attendees 2008 Session Clusters(themes)
…
x =.86
.73
.66
Vector multiply eachattendee‟s clusterscores against each session‟s clusterscores for totalorder ranking of recommendations
New 2009 AttendeeCluster Scores
Vector
New 2009 SessionsCluster Scores
Vectors
RankedSessionRec‟s
…
New 2009 Attendees New 2009 Sessions
Copyright © 2009 Oracle Corporation
Model Building and Scoring Details
Cluster sessions
Concatenate all session-related text
Text Mining data preparation – create text index
Lexer with stemming
Custom “stopword” list
Session S291749
Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.
Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0
Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management
1. Perform Stemming (example)
integrate account
processdevelop
integrate
invoice
accountutilize
Session S291749
Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.
Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0
Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management
1. Perform stemming (example)
2. Remove stopwords
X
XX
X
X
X X XX XXX
XX X
X
XX
XX X XXXX
X X
integrate account
processdevelop
integrate
invoice
accountutilize
Copyright © 2009 Oracle Corporation
Creating a Text Index, Stoplist, Lexer
Using Oracle Text
CREATE INDEX session09_txt_idx
ON session09_txt (session_txt)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS
('LEXER OOW_LEXER
STOPLIST OOW_STOPLIST');
ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH');
ctx_ddl.set_attribute('oow_lexer','index_text','true');
ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST');
ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/
ctx_ddl.add_stopword('oow_stoplist', 'oracle');
Copyright © 2009 Oracle Corporation
Session Term Scores Example
Integrate .23
Account .04
Payable .26
Imaging .62
Process .09
Management .05
Technology .17
Content .08
Collaboration .43
…
Copyright © 2009 Oracle Corporation
TF-IDF
(term-frequency – inverse document frequency)
Statistical measure evaluates importance of
a given word to a document in a corpus
Word importance increases proportionally to
the number of times a word appears in
document, but offset by frequency of word
in corpus
Copyright © 2009 Oracle Corporation
TF-IDF Example One way to compute
Consider
A session, S1, title and abstract containing 100 words
Word „mining‟ appears 6 times in S1
Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06
Of 1850 sessions, say 25 contain the word „mining‟
Inverse document frequency is calculated as
ln(1850 / 25) = 4.3
TF-IDF score for „mining‟ in S1 is 0.06 * 4.3, or 0.26
Copyright © 2009 Oracle Corporation
Session Term Scores Example
Specify the maximum
number of terms
to represent entire corpus
to represent the document
Integrate .23
Account .04
Payable .26
Imaging .62
Process .09
Management .05
Technology .17
Content .08
Collaboration .43
…
Copyright © 2009 Oracle Corporation
Model Building and Scoring Details
Cluster sessions
Concatenate all session-related text
Text Mining data prep – create text index
Lexer with stemming
Custom stop word list
1000 max terms in corpus
30 max terms per document
Build k-Means model with 20 clusters (themes)
Score 2008 and 2009 sessions to identify theme probabilities
Clustering Results for 2008 Sessions
Theme (Cluster Name) ClusterID Count
INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 103
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 94
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 82
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 53
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 127
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 148
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 112
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 92
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 66
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 77
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 125
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 62
SOA-BPM-SERVER-APPLICATION-FUSION 32 121
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 33
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 95
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 52
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 76
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 80
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 80
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 69
Copyright © 2009 Oracle Corporation
Model Building and Scoring Details
Classify attendee interests in themes
Build Naïve Bayes model using 2008 attendees
Predict 2009 attendee interest in each of the 20 themes
New 2009 Attendees
“Joe the DBA”
DB_REL_ODB_10G 1
DEV_EN_TEXT_EDITOR 1
DEV_EN_VI 1
GEOGRAPHIC_REGION Americas
INDUSTRY Aerospace
ORACLE_PARTNER Yes
JOB_TITLE_DBA 1
JOB_TITLE_SENIOR 1
ATTEND_ID
COMPANY_REVENUE
DB_REL_ODB_10G
DB_REL_ODB_8I
DB_REL_ODB_9I
DEV_EN_11G_PREVIEW
DEV_EN_BORLAND_JBUILDER
DEV_EN_ECLIPSE
DEV_EN_MS_DOT_NET
DEV_EN_MS_VISUAL_STUDIO
DEV_EN_ORA_APPS_EXPRES
DEV_EN_ORA_FORMS
DEV_EN_ORA_JDEV_10G
DEV_EN_ORA_SQL_DEV
DEV_EN_OTHER
DEV_EN_OTHER_JAVA_IDE
DEV_EN_SQL_EDITORS
DEV_EN_TEXT_EDITOR
DEV_EN_TOAD
DEV_EN_VI
GEOGRAPHIC_REGION
INDUSTRY
ORACLE_PARTNER
ORA_EBS
ORA_JDE
ORA_PS
ORA_SIEBEL
PROFIT_MAGAZINE_SUBSCRIPTION
UG_MEM_APOUC
UG_MEM_EOUC
UG_MEM_HEUG
UG_MEM_IOUG
UG_MEM_OAUG
UG_MEM_ODTUG
UG_MEM_OHUG
UG_MEM_QIUG
UG_INFO_APOUC
UG_INFO_EOUC
UG_INFO_HEUG
UG_INFO_IOUG
UG_INFO_OAUG
UG_INFO_ODTUG
UG_INFO_OHUG
UG_INFO_QIUG
UG_INFO_DO_NOT_SEND_ORA_INFO
JOB_TITLE_MANAGER
JOB_TITLE_PARTNER
JOB_TITLE_PROJECT_LEAD
JOB_TITLE_MARKETING
JOB_TITLE_PRESIDENT
JOB_TITLE_VICE
JOB_TITLE_DIRECTOR
JOB_TITLE_ARCHITECT
JOB_TITLE_ANALYST
JOB_TITLE_DBA
JOB_TITLE_DEVELOPER
JOB_TITLE_SALES
JOB_TITLE_PROD_MGR
JOB_TITLE_CHIEF_OFFICER
JOB_TITLE_CONSULTANT
JOB_TITLE_SENIOR
JOB_TITLE_STUDENT
Theme (Cluster Name) ClusterID Probability
INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0005
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.3997
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.0002
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031
SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994
Att
en
de
eA
ttri
bu
tes
Predict themes(clusters) for “Joe”
How Does This Session Rank for Joe?
Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.
Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0
Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management
Cluster Probabilities for Session S291749
Theme (Cluster Name) ClusterID Probability
INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0023
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.0021
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.9534
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0020
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0020
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.0027
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.0018
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.0032
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0018
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0022
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0026
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0049
SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0037
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0015
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0016
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0016
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0027
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0022
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0037
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0019
Computing this Session‟s Score
Specifically for Joe…
Theme (Cluster Name) ClusterID
Joe's Cluster
Probability
Session
S291749 Cluster
Probability Product
INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0005 0.0023 0.000001
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.3997 0.0021 0.000848
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.0002 0.9534 0.000216
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005 0.0020 0.000001
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005 0.0020 0.000001
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190 0.0027 0.000587
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245 0.0018 0.000780
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010 0.0032 0.000960
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502 0.0018 0.000088
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009 0.0022 0.000002
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098 0.0026 0.000025
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031 0.0049 0.000015
SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000 0.0037 0.000000
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038 0.0015 0.000006
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031 0.0016 0.000005
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260 0.0016 0.000041
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188 0.0027 0.000051
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278 0.0022 0.000062
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075 0.0037 0.000028
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994 0.0019 0.000191
SCORE: 0.003908
x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =
Copyright © 2009 Oracle Corporation
Recommendation Score Query
select attend_id, session_id, score
from (
select a.attend_id, s.session_id,
sum(a.probability * s.probability) score
from SESSION_TXT09_SCORES_T20 s,
ATTENDEE09_SCORES_T20) a
where a.prediction= s.cluster_id
group by a.attend_id, s.session_id
)
order by attend_id, score desc
Pro
ba
bil
ity
Se
ss
ion
1
Se
ss
ion
N…
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Evaluating RecommendationsProducing Training (Build) and Test Datasets
„08 Session Data
‟08
Att
en
de
e D
ata
Build the models
using these datasets
Test themodels
using these datasets
Typical space for recommendations: Recommend same sessions to new attendees
Projection Mining Space: Recommend new sessions to new attendees
Bu
ild
Test
Build Test
Cross-sell / Up-sell Space: Recommend new sessions to same attendees
Evaluating Results:
Session Recommendation CurveModel scores as a function of rank
Linear behavior of recommendations
Threshold separating high from low confidence recommendations
Represents the location of “hits” (attendee attended session)
Dot == Scored Session
Enrichment CurveRunning calculation where enrichment is
maximum deviation from 0
Represents the location of “hits”
Point of maximumenrichment
Recom
mendation
Enri
chm
ent S
core
Model-ranked sessions Model-ranked sessions
Model score
Model score
Model score
Model-ranked sessions Model-ranked sessions
Model-ranked sessions Model-ranked sessions
Attendee W1134872 NE = 1.07 Lift = 1.55 ROC = 0.51
Attendee W1144260 NE = 1.63 Lift = 2.47 ROC = 0.71
Attendee W1152645 NE = 2.88 Lift = 3.07 ROC = 0.79
Model-ra
nked s
essio
ns
Model-ra
nked s
essio
ns
Model-ra
nked s
essio
ns
Model-ranked sessions Model-ranked sessions
Model score
Model score
Model score
Model-ranked sessions Model-ranked sessions
Model-ranked sessions Model-ranked sessions
Attendee W1134872 NE = 1.07 Lift = 1.55 ROC = 0.51
Attendee W1144260 NE = 1.63 Lift = 2.47 ROC = 0.71
Attendee W1152645 NE = 2.88 Lift = 3.07 ROC = 0.79
Model-ra
nked s
essio
ns
Model-ra
nked s
essio
ns
Model-ra
nked s
essio
ns
Global Measure of Merit
PM Model
Random Model
PM Model
Random Model
NE
P(N
E)
PM Model
Random Model
PM Model
Random Model
NE
P(N
E)
Normalized Enrichment
Random recommendations obtain an enrichment score of 1
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Recommending Exhibitors and Demos
Copyright © 2009 Oracle Corporation
Recommending Exhibitors and Demos
Use clustering model from session data
Score exhibitors and demo text against 20 themes
Use existing attendee theme scores to compute
recommendation scores for each exhibitor and demo
New 2009 Attendees 2009 Exhibitors and Demos
Computing Related Sessions
Copyright © 2009 Oracle Corporation
Computing Related Sessions
Data preparation
Focus on tracks, tags, categories
Tokenize targeted terms from title and abstract fields
E.g., “Oracle Data Mining” “OracleDataMining”
Cluster sessions into 200 clusters using K-Means
Multiply cluster score vectors for similarity score
Computing Related Sessions
…
x =.95
.81
.67
Vector multiply eachsession‟s clusterscores against all other sessions‟ clusterscores for totalorder ranking of related sessions
2009 SessionCluster Scores
Vector
Other 2009 SessionsCluster Scores
Vectors
RankedRelatedSessions
…
2009 Sessions
……
2009 Themes(200 clusters)
2009 Sessions
ClusterSessions
2009 Themes (200 clusters)
…Score each sessionagainst each theme (cluster)
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
Agenda
Recommendation engine scenario
Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary
Copyright © 2009 Oracle Corporation
OOW‟08 Recommendation Engine Results
Distinct Schedule Builder visitors: 15667
Distinct visitors signup: 3266
Distinct visitors attended: 1775
Signup conversion rate: 20.3% (3266 / 15667)
Attended conversion rate: 11.3% (1775 / 15667)
Conversion ratepercentage of attendees who used at least 1 recommendation
Conversion Rates in other Domains
OOW Attended Sessions 11.3
OOW Signup Sessions 20.3
Circa 2004
Copyright © 2009 Oracle Corporation
OOW‟08 Recommendation Engine ResultsDetail
Recommendations Signup
1768 attendees (11.3%) selected exactly 1
820 (5.2%) selected 2 recommendations
678 attendees (4.3%) selected 3 or more
32 attendees selected between 8 and 10
Actually Attended
1246 attendees (8%) attended exactly 1
382 (2.4%) attended 2 recommended sessions
147 attendees (0.9%) attended 3 or more
23 attendees attended between 5 and 9
Recommendations: Selected vs.
Attended
0
500
1000
1500
2000
Exactly 1 Exactly 2 More
than 3
Selected Count
Attended Count
Copyright © 2009 Oracle Corporation
Summary
Oracle Data Mining provides a robust platform for Text Mining and building a Recommendation Engine
Oracle Data Mining with Oracle Data Miner code generation facilitated deployment of mining solution
Recommendation evaluation techniques show the models were able to predict sessions of interest
OOW conversion rates show that session recommendations were perceived useful to attendees
For More Information
search.oracle.com
or
oracle.com
www.oracle.com/technology/products/bi/odm/index.html
Oracle Data Mining
Copyright © 2009 Oracle Corporation
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle‟s
products remains at the sole discretion of Oracle.