Download - Guillaume Cabanac [email protected]
![Page 1: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/1.jpg)
Musings at the Crossroads ofMusings at the Crossroads ofDigital Libraries, Information Retrieval, Digital Libraries, Information Retrieval,
and Scientometricsand Scientometrics
http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012
Guillaume [email protected]
March 28th, 2012
![Page 2: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/2.jpg)
Outline of these Musings
2
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
![Page 3: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/3.jpg)
3
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Outline of these Musings
![Page 4: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/4.jpg)
4
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Question DL-1
How to transpose paper-based annotations into digital documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its Applications, pages 529–548. CID, may 2007.
![Page 5: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/5.jpg)
5
Characteristics of paper annotation Secular activity: older than 4 centuries Numerous applicative contexts: theology, science, literature … Personal use: “active reading” (Adler & van Doren, 1972)
Collective use: review process, opinion exchange …
From Individual Paper-based Annotation …
US students
(Marshall, 1998)
1541
Annotated bible
(Lortsch, 1910)
Fermat’s last theorem
(Kleiner, 2000)
Annotations from Blake, Keats…
(Jackson, 2001)
Les Misérables
Victor Hugo
1630 1790 1830 1881 1998
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 6: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/6.jpg)
6
… to Collective Digital Annotations
author
87%
reader13%
1993 2005
ComMentor … iMarkup … Yawas … Amaya …
> 20 annotation systems(Cabanac et al., 2005)
Web servers (Ovsiannikov et al., 1999)
Annotation server
a discussion thread
Hard to share ‘lost’
hardcopy
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 7: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/7.jpg)
7
W3C Annotea / Amaya (Kahan et al., 2002)
Digital Document Annotation: Examples
a reader’s comment
discussionthread
Arakne, featuring “fluid annotations” (Bouvin et al., 2002)
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 8: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/8.jpg)
8
Collective Annotations Reviewed 64 systems designed during 1989–2008
Collective Annotation Objective data
Owner, creation date Anchoring point within the document. Granularity: all doc, words…
Subjective information Comments, various marks: stars, underlined text… Annotation types: support/refutation, question… Visibility: public, private, group…
Purpose-oriented annotation categories
Annotation remark
Annotation reminder
Annotation argumentation
Personal Annotation Space
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 9: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/9.jpg)
9
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Question DL-2
How to measure the social validity ofa statement according to the
argumentative discussion it sparked off?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective annotations : Definition and experiment.” Journal of the American Society for Information Science and Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255
![Page 10: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/10.jpg)
10
Scalability issue
Which annotationsshould I read?
Social validation = degree of consensus of the group
Social Validation
Social Validation of Argumentative Debates
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 11: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/11.jpg)
11
Social Validation of Argumentative Debates
BeforeAnnotation magma
AfterFiltered display
Informing readers about how validated each annotation is
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 12: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/12.jpg)
12
Overview
Two proposed algorithms Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)
Bipolar Argumentation Framework Extension based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005)
Social Validation Algorithms
validity
0socially neutral
– 1 socially refuted
1socially confirmed
case 1case 2case 3 case 4
A
B
A
B
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 13: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/13.jpg)
13
Example
Computing the social validity of a debated annotation
Social Validation AlgorithmMusings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 14: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/14.jpg)
14
Validation with a User-study
Design
Corpus: 13 discussion threads= 222 annotations + answers
Task of a participant Label opinion type Infer overall opinion
Volunteer subjects
53
119
Aim: social validation vs human perception of consensus
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 15: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/15.jpg)
15
Q1 Do people agree when labeling opinions? Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003)
Inter-rater agreement among n > 2 raters
Weak agreement, with variability subjective task
Experimenting the Social Validation of Debates
Debate Id
Fair to good
Poor
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Val
ue o
f K
appa
agreement
![Page 16: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/16.jpg)
16
Q2 How well SV approximates HP? HP = Human Perception of consensus SV = Social Validation algorithm
1. Test whether PH and VS are different (p < 0.05) Student’s paired t-test: (p = 0,20) > ( = 0,05)
2. Correlate HP et SV Pearson’s coefficient of correlation r
r(HP, SV) = 0.48 shows a weak correlation
Experimenting the Social Validation of Debates
HP – SV
Density y = p(HP – SV)
example: HP = SV for 24 % of all cases
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Den
sity
![Page 17: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/17.jpg)
17
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Question DL-3
How to harness a quiescent capital present in any community:
its documents?
IRIRDLDL
SCIMSCIM
Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an original facet for exploring the quiescent information capital of a community.” International Journal on Digital Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6
![Page 18: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/18.jpg)
18
Personal Documents Filtered, validated, organized information…
… relevant to activities in the organization
Paradox: profitable, but under-exploited Reason 1 – folders and files are private
Reason 2 – manual sharing
Reason 3 – automated sharing
Consequences People resort to resources available outside of the community Weak ROI why would we have to look outside when it’s already there?
Documents as a Quiescent WealthMusings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 19: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/19.jpg)
19
Mapping the documents of the community SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…
Limitations Find the documents with same topicssame topics as D Find documents that colleagues useuse with D
concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common
How to Benefit from Documents in a Community?Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 20: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/20.jpg)
20
Organization-based similarities inter-folder
inter-document
inter-user
Musings at the Crossroads of DL, IR, and SCIM
Guillaume CabanacHow to Benefit from Documents in a Community?
![Page 21: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/21.jpg)
21
Purpose: Offering a global view of … people and their documents
Based on document contents Based on document usage/organization
Requirement: non-intrusiveness and confidentiality
OperationalOperational needs Find documents
With related materials With complementary materials
Seeking people ⇆ seeking documents
ManagerialManagerial needs Visualize the global/individual activity Work position required documents
How to Help People to Discover/Find/Use Documents?
community
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 22: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/22.jpg)
22
4 views = {documents, people} {group, unit}
1. Group of documents Main topics Usage groups
2. A single document Who to liaise with? What to read?
3. Group of people Community of interest Community of use
4. A single people Interests Similar users (potential help)
Proposed System: Static AspectMusings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 23: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/23.jpg)
23
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Outline of these Musings
![Page 24: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/24.jpg)
24
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question IR-1
Is document tie-breaking affecting the evaluation of
Information Retrieval systems?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F. Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13
![Page 25: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/25.jpg)
25
Measuring the Effectiveness of IR systems User-centered vs. System-focused [Spärck Jones & Willett,
1997]
Evaluation campaigns 1958 Cranfield, UK 1992 TREC (Text Retrieval Conference), USA 1999 NTCIR (NII Test Collection for IR Systems), Japan 2001 CLEF (Cross-Language Evaluation Forum), Europe …
“Cranfield” methodology Task Test collection
Corpus Topics Qrels
Measures : MAP, P@X ... using trec_eval
[Voorhees, 2007]
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 26: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/26.jpg)
26
Runs are Reordered Prior to Their EvaluationQrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim,
run_id
Reordering by trec_evalqid asc, sim desc, docno desc
Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 27: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/27.jpg)
27
Consequences of Run Reordering Measures of effectiveness for an IRS s
RR(s,t) 1/rank of the 1st relevant document, for topic t
P(s,t,d) precision at document d, for topic t
AP(s,t) average precision for topic t
MAP(s) mean average precision
Tie-breaking bias
Is the Wall Street Journal collection more relevant than Associated Press?
Problem 1comparing 2 systemsAP(s1, t) vs. AP(s2, t)
Problem 2 comparing 2 topicsAP(s, t1) vs. AP(s, t2)
ChrisChris
EllenEllen
Sensitive to document
rank
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 28: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/28.jpg)
28
What we Learnt: Beware of Tie-breaking for AP Poor effect on MAP, larger effect on AP
Measure bounds APRealistic APConventionnal APOptimistic
Failure analysis for the ranking process Error bar = element of chance potential for improvement
padre1, adhoc’94
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 29: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/29.jpg)
29
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question IR-2
How to retrieve documents matching keywords and
spatiotemporal constraints?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries, 11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z
![Page 30: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/30.jpg)
30
Geographic Information Retrieval Query = “Road trip around Aberdeen summer 1982”
Search engines Topic term {road, trip, Aberdeen, summer}
spatial {AberdeenCity, AberdeenCounty…} Geographic temporal [21-JUN-1982 .. 22-SEP-1982]
term {road, trip, Aberdeen, summer}
1/6 queries = geographic queries Excite (Sanderson et al., 2004) AOL (Gan et al., 2008) Yahoo! (Jones et al., 2008)
Current issue worth studying
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 31: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/31.jpg)
31
The Internals of a Geographic IR System 3 dimensions to process
Topical, spatial, temporal
1 index per dimension Topic bag of words, stemming, weighting, comparing with VSM… Spatial spatial entity detection, spatial relation resolution… Temporal temporal entity detection…
Query processing with sequential filtering e.g., priority to theme, then filtering according to other dimensions
Issue: effectiveness of GIRSs vs state-of-the-art IRSs?
Hypothesis: GIRSs better than state-of-the-art IRSs
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 32: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/32.jpg)
32
Case Study: the PIV GIR System Indexing: one index per dimension
Topical = Terrier IRS Spatial = tiling Temporal = tiling
Retrieval Identification of the 3 dimensions in the query Routing towards each index Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997]
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 33: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/33.jpg)
33
Case Study: the PIV GIR System Principle of CombMNZ and Borda Count
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 34: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/34.jpg)
34
Case Study: the PIV GIR System Gain in effectiveness
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 35: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/35.jpg)
35
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question IR-3
Do operators in search queries improve the effectiveness of search results?
IRIRDLDL
SCIMSCIM
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129. Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.
![Page 36: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/36.jpg)
Various Operators Quotation marks, Must appear (+), boosting operator (^),
Boolean operators, proximity operators…
36
Information need
“I’m looking for research projects funded in the DL domain”
Regular query Query with operators
Search Engines Offer Query Operators
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 37: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/37.jpg)
Our Research Questions
37
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 38: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/38.jpg)
38
Our Methodology in a Nutshell
Regular query V1: Query variant with operators
V3V2
V4VN. . .
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 39: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/39.jpg)
39
Effectiveness of Query Operators TREC-7 per Topic Analysis: Boxplots
‘+’ and ‘^’
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 40: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/40.jpg)
40
Effectiveness of Query Operators Per Topic Analysis: Box plot
AP of TREC’s regular query
Query variant highest AP
32Topics
AP (
Avera
ge P
reci
sion)
0.2
0.1
0.3
0.4
Query variant lowest AP
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 41: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/41.jpg)
41
Effectiveness of Query Operators TREC-7 Per Topic Analysis
‘+’ and ‘^’
MAP = 0.1554
MAP ┬ = 0.2099+35.1%
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 42: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/42.jpg)
42
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Outline of these Musings
![Page 43: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/43.jpg)
43
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question SCIM-1
How to recommend researchers according to their research topics
and social clues?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.” Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1
![Page 44: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/44.jpg)
44
Recommendation of Literature (McNee et al., 2006)
Collaborative filtering Principle: mining the preferencespreferences of researchers
those who liked this paper also liked…
Snowball effect / fad Innovation? Relevance of theme?
Cognitive filtering Principle: mining the contentscontents of articles
profile of resources (researcher, articles) citation graph
Hybrid approach
????
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 45: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/45.jpg)
45
Foundations: Similarity Measures Under Study
Model Coauthors graph authors auteurs Venues graph authors conferences / journals
Social similarities Inverse degree of separation length of the shortest path Strength of the tie number of shortest paths Shared conferences number of shared conference editions
Thematic similarity Cosine on Vector Space Model di = (wi
1, … , win)
built on titles (doc / researcher)
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 46: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/46.jpg)
46
Computing Similarities with Social Clues Task of literature review
Requirement topical relevance Preference social proximity (meetings, project…)
re-rank topical results with social clues
Combination with CombMNZ (Fox & Shaw, 1993)
Final result: list of recommended researchers
CombMNZ
Degree of separation
Strength of ties
Shared conferences
Social list
Topical list
CombMNZ TS listTS list
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 47: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/47.jpg)
47
Evaluation Design Comparison of recommendations and researchers’ perception
Q1 : Effectiveness of topical (only) recommendations? Q2 : Gain due to integrating social clues?
IR experiments: Cranfield paradigm (TREC…) Does the search engine retrieve relevant documents?
Doc relevant?
assessor
relevance judgments{0, 1} binary[0, N] gradual
qrels
trec_eval
Effectiveness measuresMean Average PrecisionNormalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 % significativity p < 0.05 (paired t-test)
search engine xinput
topic
corpus
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 48: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/48.jpg)
48
Evaluating Recommendations
doc relevant ?
assessor
relevance judgments{0, 1} binary[0, N] gradual
qrels
trec_eval
Effectiveness measures Mean Average PrecisionNormalized Discounted Cumulative Gain
topic S1 S2
1 0.5687 0.6521
… … …
50 0.7124 0.7512
avg 0.6421 0.7215
improvement +12.3 % significativity p < 0.05 (paired t-test)
search engine xinput
topic
corpus
name of a researcher
researcher
« With whom would you like to chat for improving your research? »
recommender system
topical topical + social
#subjects
Top 25
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 49: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/49.jpg)
49
Experiment Features
Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers) Subjects 90 researchers-contacts contacted by mail
74 researchers began to fill the questionnaire. 71 completed it
Interface for assessing recommendations
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 50: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/50.jpg)
50
Experiments: Profile of the Participants Experience of the 71 subjects Mdn = 13 years
74
Productivity of the 71 subjects Mdn = 15 publications
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Num
ber
of p
arti
cipa
nts
Num
ber
of p
arti
cipa
nts
Seniority
Number of publications
![Page 51: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/51.jpg)
51
Empirical Validation of our Hypothesis Strong baseline effective approach based on VSM
+8.49 % = significant improvement (p < 0.05 ; n = 70)
of topical recommendations by social clues
0,5
0,6
0,7
0,8
0,9
1
global < 15 publis >= 15 publis < 13 ans >= 13 ans
Thématique Thématique + Social
productivity experience
+8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 %
ND
CG
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Topical Topical + social
yearsyears
![Page 52: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/52.jpg)
52
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question SCIM-2
What is the landscape of research in Information Systems from the perspective of gatekeepers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609
![Page 53: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/53.jpg)
53
Landscape of Research in Information Systems The gatekeepers of science
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 54: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/54.jpg)
54
Landscape of Research in Information Systems The 77 core peer-reviewed IS journals in the WoS
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 55: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/55.jpg)
55
Landscape of Research in Information Systems Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 56: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/56.jpg)
56
Landscape of Research in Information Systems Exploratory data analysis
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 57: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/57.jpg)
57
Landscape of Research in Information Systems Topical map of the IS field
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 58: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/58.jpg)
58
Landscape of Research in Information Systems Most influential
gatekeepers
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 59: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/59.jpg)
59
Landscape of Research in Information Systems Number of gatekeepers per country
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 60: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/60.jpg)
60
Landscape of Research in Information Systems Geographic and gender diversity
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 61: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/61.jpg)
61
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Question SCIM-3
What if submission date influenced the acceptance of conference papers?
IRIRDLDL
SCIMSCIM
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to the Journal of the American Society for Information Science and Technology, Wiley.
![Page 62: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/62.jpg)
62
Conferences Affected by a Submission-Date bias? Peer-review
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 63: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/63.jpg)
63
The Submission-Date bias Dataset from the ConfMaster conference management system
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 64: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/64.jpg)
64
The Submission-Date bias Influence of submission date on bids
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 65: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/65.jpg)
65
The Submission-Date bias Influence of submission date on average marks
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
![Page 66: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/66.jpg)
Conclusion
66
Musings at the Crossroads of DL, IR, and SCIM
Guillaume Cabanac
Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity
Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators
ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems
The submission-date bias in peer-reviewed conferences
![Page 67: Guillaume Cabanac guillaumebanac@univ-tlse3.fr](https://reader035.vdocuments.site/reader035/viewer/2022062800/5681418e550346895dad7821/html5/thumbnails/67.jpg)
Thank you
http://www.irit.fr/~Guillaume.Cabanachttp://www.irit.fr/~Guillaume.Cabanac
Twitter: @tafanorTwitter: @tafanor