enhancing social tagging with a knowledge organization system
DESCRIPTION
Presentation slides associated with the paper "Enhancing Social Tagging with a Knowledge Organization System" written by Koraljka Golub, Jim Moon, Douglas Tudhope and Marianne Lykke Nielsen, accepted for the IFLA Satellite Meeting, Emerging Trends in Technology: Libraries Between Web 2.0, Semantic Web and Search, Florence, 19-20 August 2009. Much of the content of the slides is taken from previous presentations given by Koraljka Golub of UKOLN and Brian Matthews of STFCTRANSCRIPT
A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
Enhancing Social Tagging with a Knowledge Organization System
Koraljka Golub, Jim Moon, Douglas Tudhopeand Marianne Lykke Nielsen
Emerging Trends in Technology: Libraries Between Web 2.0, Semantic Web and Search, IFLA Satellite Meeting, Florence, 19-20 August 2009
A centre of expertise in digital information management
www.ukoln.ac.uk
Enhancing Social Tagging with a Knowledge Organization System
Presentation given by:
Michael DayResearch & Development Team Leader
UKOLN, University of BathBath BA1 4BD, [email protected]
A centre of expertise in digital information management
www.ukoln.ac.uk
Presentation outline
• EnTag project context• Intute case study
– Methods used– The interface– Some observations
• STFC EPubs repository case study• Conclusions and further work
A centre of expertise in digital information management
www.ukoln.ac.uk
EnTag project
• Enhanced tagging for discovery project– http://www.ukoln.ac.uk/projects/enhanced-tagging/
• Funded by the Joint Information Systems Committee (JISC)
• Partners:– Funded: UKOLN (University of Bath), University of
Glamorgan, Science and Technology Facilities Council (STFC), Intute
– Non-funded: OCLC Office of Research, Danish Royal School of Library and Information Science
A centre of expertise in digital information management
www.ukoln.ac.uk
Controlled vocabularies (1)
• Traditional way of providing subject classification– For location (shelf-marks, subject browsing)– For searching– For association of resources
• Different types used, such as – Subject classification schemes– Controlled keyword lists– Thesauri
A centre of expertise in digital information management
www.ukoln.ac.uk
Controlled vocabularies (2)
• General observations:– Enables the precise classification of resources
• Good for precision and recall– Hierarchical schemes can exploit the structure to modify
search queries• Broader/narrower/related terms
– Expensive • Requires investment in specialist expertise to devise
the vocabulary• Requires investment in specialist expertise to classify
resources.– Difficult to maintain their currency
A centre of expertise in digital information management
www.ukoln.ac.uk
Social tagging (1)
• The Web 2.0 way of providing search terms• People “tag” resources with free-text terms of their
own choosing• Tags used to associate resources together• Examples:
– del.icio.us, Flickr, Connotea, LibraryThing
• “Folksonomy”– The terms that a community chooses to tag its resources
A centre of expertise in digital information management
www.ukoln.ac.uk
Social tagging (2)
• People often use the same tags or keywords– Supports retrieval - makes things that mean the same
thing to people easier to find
• A potentially cheap way of getting a very large number of resources classified– Represents the “community consensus” in some sense– “The Wisdom Of Crowds”– Has currency as people continue update– Tag clouds of popular tags (many examples)
A centre of expertise in digital information management
www.ukoln.ac.uk
Social tagging (3)
• However, in uncontrolled contexts:– Individuals often use similar but not identical tags:
• e.g. Semantic Web, SemanticWeb, SemWeb, SWeb – Individuals make mistakes in tags
• Spelling errors, using spaces or punctuation incorrectly
– Mixture of subject terms, genre terms, etc.– Some tags are more specific than others – difficult to get
consistency– Tags often have personal meaning, but no (immediate)
wider significance, e.g. “favourite”
A centre of expertise in digital information management
www.ukoln.ac.uk
EnTag objectives
• Project aims:– To investigate the combination of controlled and social
tagging approaches to support resource discovery in repositories and digital collections
– To investigate whether the use of an established controlled vocabulary can help move social tagging beyond personal bookmarking to aid resource discovery
– To Improve tagging
• Relevance of tags, Consistency, Efficiency
– To Improve retrieval
• Effectiveness (degree of match between user and system terminologies)
A centre of expertise in digital information management
www.ukoln.ac.uk
EnTag main approach
• Main focus:– To compare free tagging with no instructions with tagging
using a combined system with guidance for users
• Two demonstrators:– Intute digital collection http://www.intute.ac.uk
• The main focus of EnTag
• Tagging by reader
• Using a cohort of students to evaluate tools
– STFC repository http://epubs.stfc.ac.uk/
• Tagging by author
• A more qualitative approach
A centre of expertise in digital information management
www.ukoln.ac.uk
The Intute studyhttp://www.intute.ac.uk
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute metadata
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute study: demonstrator
• 11,042 stripped records• Politics
• Interfaces– Searching– Simple: free tagging – Enhanced: DDC / LCSH / Relative Index
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute demonstrator: searching
Searching
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute demonstrator: Enhanced
Tagging interfaces
A centre of expertise in digital information management
www.ukoln.ac.uk
Enhanced interface
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute study: user study 1
• Research questions– Choice of tag– Retrieval implications
• Participants– 28 UK politics students– Little tagging experience
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute study: user study 2
• Data collection– Logging– Three questionnaires
• Four tagging tasks– Two controlled, two free– Tag 15 documents in each task
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute study: user study 3
• Hypothetical group project scenario
• Instructions– 5 to 10 min per document– Open document but focus– Try consider enhanced suggestions where appropriate
Imagine that as part of one of your courses, you are asked to write a four-page essay on the topic of European integration, as a joint project in groups of four. The essay should critically discuss existing theories about the creation of the European Union and its institutions. Your lecturer has instructed you to look for resources in the EnTag system. Since you will be working together with three other students, you should tag the documents you retrieve with tags that would be useful to you but would also enable other students to find those documents in EnTag and understand from your tags what the documents are about.
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: number of tags
• 7,568 tags in total• 278 tags per person• 94 + 751 documents tagged (controlled + free task)
• More in simple interface• More in free task
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: tag selection
• Simple interface– 91% freely assigned
• Enhanced interface – 71% freely assigned– 17% controlled tags
• Other features (both interfaces)– 8% other taggers’ tags– 2% main tag cloud– < 1% own tag
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: browsing for tags
• Simple interface– 73% others’ tags– 17% main tag cloud – 10% own tag
• Enhanced interface– 74% controlled vocabulary– 18% others’ tags
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: retrieval implications
• Versus metadata records– All tags: new access points for 36% documents – Controlled tags: new access points for 69% documents
• Search terms– More in tags than in (un)controlled keywords (2x / 3x)
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: post-questionnaires 1
• Post-task– Familiar / easy / satisfied / certain– Useful: own tags, DDC disambiguation pane, DDC
suggestions– Not useful: main tag cloud, others’ names, hierarchical
DDC pane
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute results: post-questionnaires 2
• Post-study– Easy to learn and useful in real life– Simple
++ Simplicity, speed, freedom of choice No suggestions
– Enhanced++ Suggestions Inappropriate suggestions, cluttered interface, number
of steps
A centre of expertise in digital information management
www.ukoln.ac.uk
Intute study: conclusions
• Controlled vocabulary suggestions are valued if appropriate
• Potential for additional access points• Value of added consistency for information
retrieval
A centre of expertise in digital information management
www.ukoln.ac.uk
The STFC ePubs study
• Institutional Repository • A study of the Authors of papers
– Smaller number - c.10-12. – Regular depositors ( > 10 papers each)– Subject experts
• Expectation that they would want their papers accurately tagged to support precision in recall
• A more qualitative study• Used the ACM Computing Classification Scheme
– Widely used in the community
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC ePubs: study approach
• Questions– Do authors appreciate the purpose and use of tags?– Value of using a controlled vocabulary – does it lead to
the creation of better tags?– Evaluating the user interface
• Supervised sessions– 40 minute observed trial– Logging statistics– Task worksheet
• Tagging own papers – a number of their choice• Tag cloud, own tags, controlled vocabulary
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC ePubs: study limitations
• A number of limitations of this approach:– Small sample size – Small number of papers tagged– Inappropriate controlled vocabulary– Computing and IT specialists too familiar with the
concept of semantic annotation.– Single, observed use of the tool – not real life
• Nevertheless, it was felt that the results of the study were illuminating and useful
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC ePubs repository
http://epubs.stfc.ac.uk/
A centre of expertise in digital information management
www.ukoln.ac.uk
ePubs – a single repository entry
A centre of expertise in digital information management
www.ukoln.ac.uk
The Tagger
A centre of expertise in digital information management
www.ukoln.ac.uk
Browsing the Thesaurus
A centre of expertise in digital information management
www.ukoln.ac.uk
Browsing the Thesaurus
A centre of expertise in digital information management
www.ukoln.ac.uk
Picking terms
A centre of expertise in digital information management
www.ukoln.ac.uk
Global vs. Personal Tag Cloud
A centre of expertise in digital information management
www.ukoln.ac.uk
Picking terms from the Tag Cloud
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC study findings: term choice
• Chose terms from the bottom of the hierarchy if possible.
• Often preferred an appropriate term from the thesaurus over their own– Appreciated the better IR properties
• Would like definitions of terms to be available• Would like automatic suggestions• Very little use of the Tag Cloud
– Presentation of cloud? Unfamiliarity? Limited population?
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC study findings: user interface
• Tool generally (though not universally) thought to be easy to use– Some wanted it to be simpler– More suited for a library professional?
• Wanted more automation• Tag cloud interface not right• Would be willing to use
– Especially if benefit in improved retrieval could be established.
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC study findings: preferred style
• Most depositors had a strong preference for the way they interact with the system. – Free text taggers:
• Enter tags, don’t really use the vocabulary– Thesaurus browsers:
• systematically browse controlled vocabulary, – Thesaurus searchers:
• Use the vocabulary search tool for preference• only enter free-text term when they can’t find an
appropriate term
A centre of expertise in digital information management
www.ukoln.ac.uk
STFC study findings: ACM scheme
• ACM Computing Classification Scheme– General recognition of this scheme– Used in journals to classify papers
• Meant that there was acceptance of its authority• Willingness to use it
– Feeling that it was abstract and academic– Feeling that it was not up to date and had much missing
A centre of expertise in digital information management
www.ukoln.ac.uk
Comparison of Intute and STFC EPubs results• Different user groups and approach to studies• Similarities between the Intute and STFC users
could be identified:– Users appreciated the benefits of consistency and
vocabulary control• Willingness to engage with the tagging system
– Support for automated suggestions– Appropriateness of the controlled vocabulary is important– Tag cloud hard to use effectively– The user interface and interaction is important
A centre of expertise in digital information management
www.ukoln.ac.uk
Observations (1)
• Users are willing to add tags using a controlled vocabulary in conjunction with free text– By and large they understand why it is useful
• Good search terms = good retrieval– But they need help
• Automation, suggestions, good interfaces• Support for different styles of interaction
– Produce “better” tags (?)
• Need for flexible and targeted controlled vocabularies
A centre of expertise in digital information management
www.ukoln.ac.uk
Observations (2)
• “Web 2.0” features need to be thought through very carefully– “tag clouds” not a success– Need much better structuring and presentation – integrated
• Interaction between tag clouds and structured vocabularies needs further investigation– Develop flexible user focussed vocabularies from tags– “structured folksonomy”
A centre of expertise in digital information management
www.ukoln.ac.uk
Conclusions
• Controlled vocabulary and tags complement each other• Controlled vocabulary suggestions are valued if
appropriate• Future work:
– Qualitative analysis – Enhancements
• Controlled vocabulary• Auto suggestions• Interface
– Motivation for tagging• Would users actually enhance tags in “real life” ?
A centre of expertise in digital information management
www.ukoln.ac.uk
Acknowledgements
• I would like the thank the EnTag project team for providing the main content of this presentation– Koraljka Golub (UKOLN), Joint Conference on Digital
Libraries (JCDL), Austin, TX, 15-19 June 2009– Brian Matthews (STFC), presentation given at ISKO UK
Conference, London, 22-23 June 2009– Douglas Tudhope (University of Glamorgan)
http://www.ukoln.ac.uk/projects/enhanced-tagging/dissemination/
A centre of expertise in digital information management
www.ukoln.ac.uk
Thank You!