airg presentation
DESCRIPTION
A presentation I gave to the WPI Artificial Intelligence Research Group in preparation for my Master's thesis defense.TRANSCRIPT
Mandorvol Mandorvol BrowserBrowser
Worcester Polytechnic InstituteWorcester Polytechnic Institute
Kevin MenardKevin Menard
December 8, 2005December 8, 2005
ProblemProblem Lots of information split over many documentsLots of information split over many documents
Search engines are now a necessitySearch engines are now a necessity
Search engines are “dumb”Search engines are “dumb” Document relevance is a mathematical formula, not Document relevance is a mathematical formula, not
a user ratinga user rating Easy to foolEasy to fool Hard to find good info if in a “non-conforming” Hard to find good info if in a “non-conforming”
formatformat
Users know relevance values but can’t be Users know relevance values but can’t be bothered bothered
SolutionSolution
Use implicit user behavior in place of explicit Use implicit user behavior in place of explicit feedback ratingsfeedback ratings
WPI Curious BrowserWPI Curious Browser Discovered a set of implicit indicators that highly Discovered a set of implicit indicators that highly
correlated with feedback valuescorrelated with feedback values
Microsoft Curious BrowserMicrosoft Curious Browser Built upon WPI work and collected user feedbackBuilt upon WPI work and collected user feedback Used to train classifier with explicit & implicit Used to train classifier with explicit & implicit
data to provide predictions of web page relevancedata to provide predictions of web page relevance
Our WorkOur Work Investigate value of “voluntary” dataInvestigate value of “voluntary” data
Previous work only used “mandatory” dataPrevious work only used “mandatory” data
Mandorvol BrowserMandorvol Browser Extension of MS Curious BrowserExtension of MS Curious Browser Collects data using both voluntary & Collects data using both voluntary &
mandatory feedback mechanismsmandatory feedback mechanisms Collects data in Collects data in controlledcontrolled & & uncontrolleduncontrolled
scenariosscenarios
Mandorvol BrowserMandorvol Browser Uncontrolled scenarioUncontrolled scenario
User simply searches for anything on GoogleUser simply searches for anything on Google
Controlled scenarioControlled scenario User is given Excel tasks to completeUser is given Excel tasks to complete
Most people have experience with it, but it’s Most people have experience with it, but it’s complex enough that tasks can be chosen that will complex enough that tasks can be chosen that will require helprequire help
Search is limited to Excel help assetsSearch is limited to Excel help assets Search is performed via custom Java web application Search is performed via custom Java web application
that provides a Google-like interface to Excel help that provides a Google-like interface to Excel help assetsassets
Informal HypothesesInformal Hypotheses
H1: Quality of voluntary data will be higherH1: Quality of voluntary data will be higher Users will only offer feedback if they wantUsers will only offer feedback if they want Good for classifiersGood for classifiers
H2: Quantity of mandatory data will be H2: Quantity of mandatory data will be greatergreater Users must provide feedback for each pageUsers must provide feedback for each page Also good for classifiersAlso good for classifiers
H3: Quantity of controlled data will be lowerH3: Quantity of controlled data will be lower Users completing tasks don’t want to be botheredUsers completing tasks don’t want to be bothered
TimelineTimeline
2004:2004: Development: Aug. – Nov.Development: Aug. – Nov. Pilot Studies: Nov. Pilot Studies: Nov. Dev, Testing, Deployment: Dev, Testing, Deployment:
Dec. - Feb.Dec. - Feb.
2005:2005: Major Study: March - AprilMajor Study: March - April Rudimentary Analysis: April Rudimentary Analysis: April
– May– May Detailed Analysis:Detailed Analysis:
Sep. – Dec.Sep. – Dec.
2006:2006: Conclusions & Thesis Write-Conclusions & Thesis Write-
up: Jan. - Marchup: Jan. - March
Pilot StudiesPilot Studies GoalsGoals
Test voluntary feedback mechanismTest voluntary feedback mechanism Test tasks for controlled situationsTest tasks for controlled situations
Key observationsKey observations Feedback band location mattersFeedback band location matters
Horizontal VS verticalHorizontal VS vertical ““Banner ad” effectBanner ad” effect
Vertical band with bright colorsVertical band with bright colors Double evaluationDouble evaluation
Task-oriented users don’t provide feedback once Task-oriented users don’t provide feedback once they solve their problemthey solve their problem
StudyStudy Ran for two months in two phasesRan for two months in two phases 161 total users across four experiment types161 total users across four experiment types
Mandatory Controlled (28)Mandatory Controlled (28) Mandatory Uncontrolled (45)Mandatory Uncontrolled (45) Voluntary Controlled (48)Voluntary Controlled (48) Voluntary Uncontrolled (40)Voluntary Uncontrolled (40)
ControlledControlled UncontrolledUncontrolled
MandatoryMandatory 17.39%17.39% 27.95%27.95%
VoluntaryVoluntary 29.81%29.81% 24.84%24.84%
FeedbackFeedback Feedback RatioFeedback Ratio
Amount of feedback / # search resultsAmount of feedback / # search results
Feedback OpportunitiesFeedback Opportunities Amount of feedback / # of opportunities to give Amount of feedback / # of opportunities to give
feedbackfeedback
ControlledControlled UncontrolledUncontrolled
MandatoryMandatory 0.9460431650.946043165 0.9776902890.977690289
VoluntaryVoluntary 0.7457627120.745762712 0.9181494660.918149466
ControlledControlled UncontrolledUncontrolled
MandatoryMandatory 0.6261904760.626190476 0.5735180910.573518091
VoluntaryVoluntary 0.4086687310.408668731 0.6063454760.606345476
Feedback DistributionFeedback Distribution
Normalized Normalized No feedbackNo feedback not considered not considered
SatisfiedSatisfied Partially Partially SatisfiedSatisfied
DissatisfiedDissatisfied
Mandatory Mandatory ControlledControlled
29.66%29.66% 23.57%23.57% 46.77%46.77%
Mandatory Mandatory UncontrolledUncontrolled
46.85%46.85% 22.28%22.28% 30.87%30.87%
Voluntary Voluntary ControlledControlled
50.76%50.76% 16.67%16.67% 32.58%32.58%
Voluntary Voluntary UncontrolledUncontrolled
49.42%49.42% 21.71%21.71% 28.88%28.88%
Feedback Distribution Feedback Distribution (Cont.)(Cont.)
Not normalizedNot normalized No feedbackNo feedback values included values included
SatisfiedSatisfied Partially Partially SatisfiedSatisfied
DissatisfiedDissatisfied No No FeedbackFeedback
Mandatory Mandatory ControlledControlled
28.06%28.06% 22.30%22.30% 44.24%44.24% 5.40%5.40%
Mandatory Mandatory UncontrolledUncontrolled
45.80%45.80% 21.78%21.78% 30.18%30.18% 2.23%2.23%
Voluntary Voluntary ControlledControlled
37.85%37.85% 12.43%12.43% 24.29%24.29% 25.42%25.42%
Voluntary Voluntary UncontrolledUncontrolled
45.37%45.37% 19.93%19.93% 26.51%26.51% 8.19%8.19%
High-level AnalysisHigh-level Analysis A distinguished voluntary feedback A distinguished voluntary feedback
mechanism yields high quantity feedbackmechanism yields high quantity feedback Data could be skewed by nature of studyData could be skewed by nature of study
Users more apt to give feedback when Users more apt to give feedback when searching leisurely in a known domainsearching leisurely in a known domain E.g., I search for “drums” and I know what to expect in E.g., I search for “drums” and I know what to expect in
the search results list -- I can better evaluate themthe search results list -- I can better evaluate them
Users more apt to give Users more apt to give SatisfiedSatisfied feedback feedback when searching leisurelywhen searching leisurely
In-depth AnalysisIn-depth Analysis
What?What? Build classifiers to investigate data qualitiesBuild classifiers to investigate data qualities
How?How? Weka – Open-source machine learning toolWeka – Open-source machine learning tool
Why?Why? Similar to previous work – provides validationSimilar to previous work – provides validation Relates back to original problem of improving Relates back to original problem of improving
search resultssearch results
Data PreparationData Preparation Data pulled from DB and turned into Weka fileData pulled from DB and turned into Weka file
15 Data attributes15 Data attributes Experiment type, behavior type, behavior URL Experiment type, behavior type, behavior URL
length, dwell time, page count in session, page or length, dwell time, page count in session, page or in search result list, page order in all search result in search result list, page order in all search result lists, search result URL length, link text length, lists, search result URL length, link text length, page description length, script length, file size, page description length, script length, file size, image count, exit type, feedback valueimage count, exit type, feedback value
Allowed J48 to handle continuous dataAllowed J48 to handle continuous data Allowed J48 to handle missing valuesAllowed J48 to handle missing values
Script length, file size, image count, & exit type Script length, file size, image count, & exit type onlyonly
Classifier TypeClassifier Type Why J48?Why J48?
Easy to read rules are importantEasy to read rules are important Interested in causal relationshipsInterested in causal relationships Performs wellPerforms well
Graph of various classifiers:Graph of various classifiers:
rules.ZeroR '' trees.J48 '-C 0.25 -B -M 2' trees.J48 '-R -N 3 -Q 1 -M 2' trees.J48 '-R -N 3 -Q 1 -B -M 2' trees.J48 '-S -C 0.25 -M 2' trees.J48 '-S -C 0.25 -B -M 2' trees.J48 '-S -R -N 3 -Q 1 -B -M 2' trees.J48 '-U -M 2' rules.OneR '-B 6' trees.J48 '-C 0.25 -M 2'
Data Set
0 1 2 3 4 5 6 7 8 9 10
Cla
ssifi
catio
n A
ccur
acy
40
50
60
70
80
Optimizing TreesOptimizing Trees
Tree Size VS AccuracyTree Size VS Accuracy Occam’s RazorOccam’s Razor
Fewer rules create more general treesFewer rules create more general trees Classification accuracyClassification accuracy
Too few rules may not accurately model the Too few rules may not accurately model the domaindomain
PragmatismPragmatism Larger trees take longer to build and useLarger trees take longer to build and use
Tree Pruning EffectsTree Pruning Effects
Data Set
0 1 2 3 4 5 6 7 8 9 10
Cla
ssifi
catio
n A
ccur
acy
(%)
70
71
72
73
74
75
76
77
Data Set
0 1 2 3 4 5 6 7 8 9 10
Num
ber
of R
ules
0
100
200
300
400
500
600
trees.J48 '-C 0.2 -M 2' trees.J48 '-C 0.15 -M 2' trees.J48 '-C 0.05 -M 2' trees.J48 '-C 0.3 -M 2' trees.J48 '-C 0.25 -M 2' trees.J48 '-C 0.1 -M 2'
Data Set
0 1 2 3 4 5 6 7 8 9 10
Tre
e S
ize
0
200
400
600
800
1000
ResultsResults
Mandatory Mandatory ControlledControlled
Instances: 362 (20 users)Instances: 362 (20 users)
# of Rules: 28# of Rules: 28
Tree Size: 55Tree Size: 55
Accuracy: 67.33%Accuracy: 67.33%
Mandatory Mandatory UncontrolledUncontrolled
Instances: 2050 (37 users)Instances: 2050 (37 users)
# of Rules: 168# of Rules: 168
Tree Size: 329Tree Size: 329
Accuracy: 67.32%Accuracy: 67.32%
Voluntary ControlledVoluntary Controlled
Instances: 398 (29 users)Instances: 398 (29 users)
# of Rules: 32# of Rules: 32
Tree Size: 61Tree Size: 61
Accuracy: 74.18%Accuracy: 74.18%
Voluntary Voluntary UncontrolledUncontrolled
Instances: 1348 (31 users)Instances: 1348 (31 users)
# of Rules: 114# of Rules: 114
Tree Size: 221Tree Size: 221
Accuracy: 70.10%Accuracy: 70.10%
Mandatory VS VoluntaryMandatory VS Voluntary Mandatory:Mandatory:
Instances: 2412 (57 users)Instances: 2412 (57 users) # of Rules: 200 (MC + MU = 196)# of Rules: 200 (MC + MU = 196) Tree Size: 388 (MC + MU = 384)Tree Size: 388 (MC + MU = 384) Classification Accuracy: 67.27%Classification Accuracy: 67.27%
Voluntary:Voluntary: Instances: 1746 (60 users)Instances: 1746 (60 users) # of Rules: 144 (VC + VU = 146)# of Rules: 144 (VC + VU = 146) Tree Size: 273 (VC + VU = 282)Tree Size: 273 (VC + VU = 282) Classification Accuracy: 70.54%Classification Accuracy: 70.54%
Controlled VS Controlled VS UncontrolledUncontrolled
Controlled:Controlled: Instances: 760 (49 users)Instances: 760 (49 users) # of Rules: 57 (MC + VC = 60)# of Rules: 57 (MC + VC = 60) Tree Size: 105 (MC + VC = 116)Tree Size: 105 (MC + VC = 116) Classification Accuracy: 68.55%Classification Accuracy: 68.55%
Uncontrolled:Uncontrolled: Instances: 3398 (68 users)Instances: 3398 (68 users) # of Rules: 300 (MU + VU = 282)# of Rules: 300 (MU + VU = 282) Tree Size: 555 (MU + VU = 550)Tree Size: 555 (MU + VU = 550) Classification Accuracy: 67.27%Classification Accuracy: 67.27%
Rough ConclusionsRough Conclusions Accuracy in a given domain is limited by Accuracy in a given domain is limited by
lowest accuracy in the pair of datasetslowest accuracy in the pair of datasets E.g., VC = 74.18%, VU = 70.10%, V = 70.54%E.g., VC = 74.18%, VU = 70.10%, V = 70.54%
Domain trees seem to be union of trees for Domain trees seem to be union of trees for both pairs of datasetsboth pairs of datasets
Voluntary classifiers > Mandatory classifiersVoluntary classifiers > Mandatory classifiers Voluntary data is higher quality (supports H1)Voluntary data is higher quality (supports H1)
Controlled classifiers > Uncontrolled Controlled classifiers > Uncontrolled classifiersclassifiers Controlled search results are better definedControlled search results are better defined
Future WorkFuture Work My Study:My Study:
Finish analysis with WekaFinish analysis with Weka Investigate rules more thoroughlyInvestigate rules more thoroughly Account for observed classification accuraciesAccount for observed classification accuracies Develop solid conclusionsDevelop solid conclusions
Other Studies:Other Studies: Investigate better voluntary feedback Investigate better voluntary feedback
mechanismsmechanisms More diversified populationMore diversified population Try non-Web browser contextTry non-Web browser context
ConclusionsConclusions Choice of feedback mechanism affects Choice of feedback mechanism affects
data quantitydata quantity Probably affects data qualityProbably affects data quality
Search domain affects feedback values & Search domain affects feedback values & data quantitydata quantity Task-oriented VS leisurely browsingTask-oriented VS leisurely browsing
Questions?Questions?
AcknowledgementsAcknowledgements
Many thanks are extended to:Many thanks are extended to: Prof. BrownProf. Brown Prof. ClaypoolProf. Claypool The NUI group at MicrosoftThe NUI group at Microsoft