mathias verbeke, bettina berendt , siegfried nijssen dept. computer science, ku leuven
DESCRIPTION
Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search. Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven. Agenda. Motivation Diversity Diversity-aware tools (our) Context - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/1.jpg)
Data mining, interactive semantic structuring, and
collaboration: A diversity-aware method for sense-making in search
Mathias Verbeke, Bettina Berendt,
Siegfried Nijssen
Dept. Computer Science, KU Leuven
![Page 2: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/2.jpg)
AgendaAgenda
MotivationMotivationDiversity Diversity Diversity-aware tools Diversity-aware tools (our) (our)
ContextContext
Main partMain partMeasures of diversity Measures of diversity Tool Tool
OutlookOutlook
![Page 3: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/3.jpg)
Motivation (1): Diversity Motivation (1): Diversity is ...is ...
Speaking different Speaking different languages (etc.) languages (etc.) localisation / localisation / internationalisationinternationalisation
Having different Having different abilities abilities accessibilityaccessibility
Liking different Liking different things things collaborative collaborative filteringfiltering
Structuring the Structuring the world in different world in different ways ways ? ?
![Page 4: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/4.jpg)
Motivation (2): Motivation (2): Diversity-aware applications ...Diversity-aware applications ... Must have a (formal) notion of Must have a (formal) notion of
diversitydiversity Can follow aCan follow a
– ““personalization approach“personalization approach“ adapt to the user‘s value on the diversity adapt to the user‘s value on the diversity
variable(s)variable(s)
transparently? Is this paternalistic?transparently? Is this paternalistic?
– ““customization approach“customization approach“ show the space of diversityshow the space of diversity
allow choice / semi-automatic!allow choice / semi-automatic!
![Page 5: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/5.jpg)
(Our) Context(Our) Context
1.1. Diversity and Web usage: language, cultureDiversity and Web usage: language, culture
2.2. Family of tools focussing on interactive Family of tools focussing on interactive sense-making helped by data mining sense-making helped by data mining – PORPOISE: global and local analysis of news and PORPOISE: global and local analysis of news and
blogs + their relations blogs + their relations – STORIES: finding + visualisation of “stories” in STORIES: finding + visualisation of “stories” in
newsnews– CiteseerCluster: literature search + sense-makingCiteseerCluster: literature search + sense-making– Damilicious: CiteseerCluster + re-use/transfer of Damilicious: CiteseerCluster + re-use/transfer of
semantics + diversitysemantics + diversity
![Page 6: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/6.jpg)
Measuring grouping Measuring grouping diversitydiversity
Diversity = 1 – similarity = 1 - Normalized mutual Diversity = 1 – similarity = 1 - Normalized mutual informationinformation
NMI = 0
NMI = 0.35
By colour &
![Page 7: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/7.jpg)
Measuring user diversityMeasuring user diversity
““How similarly do two users group How similarly do two users group documents?“documents?“
For each query For each query qq, consider their groupings , consider their groupings grgr::
““How similarly do two users group How similarly do two users group documents?“documents?“
For each query For each query qq, consider their groupings , consider their groupings grgr::
For various queries: aggregateFor various queries: aggregate
![Page 8: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/8.jpg)
... and now: the application ... and now: the application domaindomain
... that‘s only the 1st step!
![Page 9: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/9.jpg)
WorkflowWorkflow
1. Query2. Automatic clustering3. Manual regrouping 4. Re-use
1. Learn + present way(s) of grouping2. Transfer the constructed concepts
![Page 10: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/10.jpg)
ConceptsConcepts
ExtensionExtension– the instances in a groupthe instances in a group
IntensionIntension– Ideally: “squares vs. Ideally: “squares vs.
circles“circles“– Pragmatically: defined Pragmatically: defined
via a classifiervia a classifier
![Page 11: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/11.jpg)
Step 1: RetrieveStep 1: Retrieve
CiteseerX via OAI Output: set of
– document IDs, – document details– their texts
![Page 12: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/12.jpg)
Step 2: ClusterStep 2: Cluster
“the classic bibliometric solution“ CiteseerCluster:
– Similarity measure: co-citation, bibliometric coupling, word or LSA similarity, combinations
– Clustering algorithm: k-means, hierarchical Damilicious: phrases Lingo How to choose the How to choose the “best“? best“?
– Experiments: Lingo better than k-means at Experiments: Lingo better than k-means at reconstruction and extension-over-timereconstruction and extension-over-time
![Page 13: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/13.jpg)
Step 3 (a): Re-organise Step 3 (a): Re-organise & work on document groups& work on document groups
![Page 14: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/14.jpg)
Step 3 (b): Step 3 (b): Visualising document groupsVisualising document groups
![Page 15: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/15.jpg)
Steps 4+5: Re-useSteps 4+5: Re-use Basic idea: Basic idea:
1.1. learn a classifier from the final grouping (Lingo phrases)learn a classifier from the final grouping (Lingo phrases)2.2. apply the classifier to a new search result apply the classifier to a new search result
“ “re-use semantics“re-use semantics“ Whose grouping?Whose grouping?
– One‘s ownOne‘s own– Somebody else‘sSomebody else‘s
Which search result?Which search result?– “ “ the same“ (same query, structuring by somebody else)the same“ (same query, structuring by somebody else)– “ “ More of the same“ (same query, later time More of the same“ (same query, later time more more
doc.s)doc.s)– “ “ related“ (... Measured how? ...)related“ (... Measured how? ...)– arbitraryarbitrary
![Page 16: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/16.jpg)
Visualising user diversity (1)Visualising user diversity (1)Simulated users with different Simulated users with different
strategiesstrategies U0: did not change anything U0: did not change anything
(“System“)(“System“) U1: U1: tried produce a better fit of the
document groups to the cluster intensions; 5 regroupings
U2: attempted to move everything that did not fit well into the remainder group “Other topics”, & better fit; 10 regroupings
U3: attempted to move everything from „Other topics“ into matching real groups; 5 regroupings
U4: regrouping by author and institution; 5 regroupings
5*5 matrix of diversities gdiv(A,B,q) multidimensional scaling
![Page 17: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/17.jpg)
Visualising user diversity (2)Visualising user diversity (2)
aggregatedaggregatedusing using gdiv(A,B)gdiv(A,B)
Web miningWeb mining Data miningData mining RFIDRFID
![Page 18: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/18.jpg)
Evaluating the applicationEvaluating the application
Clustering only: Does it generate Clustering only: Does it generate meaningful document groups?meaningful document groups?– yes (tradition in bibliometrics) – but: data?yes (tradition in bibliometrics) – but: data?– Small expert evaluation of CiteseerClusterSmall expert evaluation of CiteseerCluster
Clustering & regroupingClustering & regrouping– End-user experiment with CiteseerClusterEnd-user experiment with CiteseerCluster
– 5-person5-person formative user study of formative user study of DamiliciousDamilicious
![Page 19: Mathias Verbeke, Bettina Berendt , Siegfried Nijssen Dept. Computer Science, KU Leuven](https://reader035.vdocuments.site/reader035/viewer/2022081603/568154ee550346895dc2e479/html5/thumbnails/19.jpg)
Summary and Summary and (some) open questions(some) open questions
Damilicious: a tool that helps users in sense-making, Damilicious: a tool that helps users in sense-making, exploring diversity, and re-using semanticsexploring diversity, and re-using semantics
diversity measures when queries and result sets are different?
how to best present of diversity? – How to integrate into an environment supporting user and
community contexts (e.g., Niederée et al. 2005)? Incentives to use the functionalities? how to find the best balance between similarity and
diversity? which measures of grouping diversity are most meaningful?
– Extensional?– Intensional? Structure-based? Hybrid? (cf. ontology matching)
which other sources of user diversity? Thanks!