april 2008 1 uncorking the varietals: social tagging, folksonomies & controlled vocabularies...
Post on 22-Dec-2015
214 views
TRANSCRIPT
April 2008 1
Uncorking the Varietals: Social Tagging, Folksonomies & Controlled VocabulariesMargaret MaurerHead, Catalog and MetadataKent State University Libraries and Media Services
2April 2008
In wine making - What is a Varietal? A wine made from a single,
named grape variety.
Cabernet Sauvignon wines are made from cabernet sauvignon grapes
Chardonnay wines are made from chardonnay grapes
3April 2008
In information seeking – on the Web or in the catalog
Access and identification systems may be controlled by librarians–controlled vocabularies
Access and identification systems may be dynamically generated by users–social tagging, folksonomies
These are different varieties of access and identification systems
4April 2008
This presentation Controlled vocabularies Social Tagging Folksonomies My recommendations
First we’ll talk about the cabernet sauvignons – the controlled vocabs
5April 2008
Purpose of a controlled vocabulary
To create sets of objects To serve as a bridge between the searcher’s
language and the author’s language To provide consistency To improve precision and recall
6April 2008
Characteristics of a controlled vocabulary
Features a single, authorized form of heading Often features a syndetic structure of cross-
references Based on belief that the successful use of the
catalog is based on the quality of the individual records
7April 2008
The authority record structure
Records the standardized form Ensures the gathering together of records via
that access point Enables standardized catalog records Documents decisions taken Records all other heading forms
and provides links from them to
the standardized form
8April 2008
Benefits of controlled vocabularies Promotes discovery generally Promotes discovery when the aboutness of
something has nothing to do with words in the resource or its representation Imaginative literature (Genre headings) Humanities
Promotes pre-coordinated displays expand access–http://cinema.library.ucla.edu
9April 2008
Benefits when combined with keyword searching
Keywords hook into strings of terms most efficiently
Users can be routed by pre-coordinated strings
10April 2008
Controlled vocabularies support faceted catalogs
Encore Evergreen Endeca WorldCat Local
All provide hyperlinks to authorized headings
11April 2008
Weaknesses of controlled vocabularies The artificially controlled language is not
necessarily natural language—Cookery anyone? Subject searches are the most problematic for
users It may work better in theory than in practice It is costly to perform necessary maintenance Cost is seen to outweigh the benefits by many
administrators
12April 2008
Library of Congress Subject Headings - LCSH
Has a long and well-documented history Commonly used Is contained in millions of bibliographic records Strong institutional support from LC
13April 2008
More benefits of LCSH
The rich vocabulary covers most subjects It imposes synonym and homograph control There are machine assisted authority control
mechanisms There is pre-coordination with LCC The music subject heading system is well
developed
14April 2008
Weaknesses of LCSH
It is a generalist taxonomy that can’t always provide needed granularity
Terminology currency It doesn’t allow for post-search coordination (it is
pre-coordinated) It suffers from LC Collection bias
15April 2008
More weaknesses of LCSH
Training neededRequires some orientation to use effectively Is not always accurately applied by catalogers
Maintenance It is difficult to maintain when changes occur
16April 2008
Authority control outside the catalog Data critical mass tipping point?
Homogeneity of data in terms of subject matter
Requirements within data community’s users for specificity
SizeComputing power
Wikipedia’s “disambiguation”
17April 2008
ZoomInfo http://www.zoominfo.com/Default.aspx
18April 2008
19April 2008
What if we did open up our authority files to the web?
National Library of Australia’s People Australia Project
http://www.nla.gov.au/initiatives/peopleaustralia/ Wikipedia Persondata-Tool
http://www.ifla.org/IV/ifla73/papers/113-Danowski-en.pdf
20April 2008
Is ontology overrated?
Physicality requires ontologies for searching, but systems with hyperlinks do not
Browse versus search may eliminate the need for creating lists of authorized headings
21April 2008
Ontological classification
Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges
Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges
22April 2008
Ontological classification
Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users
Does not work well when the participants are uncoordinated, armature, naïve or non-authoritative
23April 2008
Now we talk about the Chardonnays – social tagging and folksonomies
24April 2008
What are tags?
Keywords or terms associated with or assigned to a piece of information
They enable keyword-based classification and search of information
25April 2008
Common Web sites that use tags include Del.icio.us – Social bookmarking site Flickr – Image tagging LibraryThing Gmail - Webmail YouTube
26April 2008
Tags, and therefore social tags and folksonomies are
Dynamic categorization systems Often created on-the-fly Chosen as relevant to the user – not to the
creator, cataloger or researcher A social activity (more on this later) Hopefully one small step toward a more
interactive and responsive library system
27April 2008
Social tags are
Non-hierarchical A way to create links between items by the
creation of sets of objects A means of connecting with others interested in
the same things
28April 2008
Way baaack in 2003…
Del.icio.us includes identity in its social bookmarking
Flickr includes tags Lists of tags became a tool for serendipitous
discovery (folksonomies)
29April 2008
Why is tagging so popular?
It is easy and enjoyable It has a low cognitive cost It is quick to do It provides self and social
feedback immediately
30April 2008
People tag things
To find them again To get exposure and traffic To voice their opinions Incidentally as they perform other tasks To take advantage of functionality built on top of
a folksonomy To play a game or earn points
31April 2008
Putting the social in tagging
Tags allow for social interaction because when we navigate by tags we are directly connecting with others
People tag for their own benefit
32April 2008
Don’t confuse tags with keywords or full-text searching Keywords are behind the scenes, tags are often
visibly aggregated for use and browsing Keywords can not be hyper-linked Keywords imply searching, tags imply linking Full-text searching is passive, tagging is active It’s more about connecting items rather than
categorizing them.
33April 2008
What is a Folksonomy?
Folksonomy refers to an “emergent, grassroots taxonomy”An aggregate collections of tagsA bottom-up categorical structure
developmentAn emergent thesaurus
A term coined by Thomas Vander Wal
34April 2008
How do folksonomies work?
The searcher defines the access, but The aggregation of the terms has public value It’s a typically messy democratic approach
35April 2008
What makes folksonomies popular?
Their dynamic nature works well
with dynamic resources They’re personal They lower barriers to cooperation
36April 2008
Tagging and the consequent folksonomies work best when It’s easy to do It’s not commercial in nature Taggers have ownership Taggers are more likely to tag their own stuff
than they are your stuff It has been shown to work well on the Web
37April 2008
The unexpected development: terminological consensus
Collective action yields common terms Stabilization may be caused by imitation and
shared knowledge The wisdom of the crowd
38April 2008
Is your tagging influenced by my tagging?
Of course it is! People are beginning tag in ways that make it
easier for others to fine like stuff Shared meaning consequently evolves for tags Most used tags become most visible
39April 2008
Strengths of folksonomies
Cost-effective way to organize Internet Social benefits It’s inclusive For many environments, they work well
40April 2008
Issues with meaning
They do not yield the level of clarity that controlled vocabularies do
Term ambiguity – words with multiple meanings No synonym control
41April 2008
Issues with specificity
Variable specificity for related terms Broadness of terms impacts precision – terms
are often imprecise Mixed perspectives
42April 2008
Issues with structure
Singular and plural forms create redundant headings
No guidelines for the use of compound headings, punctuation, word order
No scope notes No cross references
43April 2008
Issues with accuracy
Collective ‘wisdom’ of the tagging community How does wrong information impact retrieval Conflicting cultural norms Sometimes authority counts
44April 2008
“Spagging” and other problems
Opening doors to opinion tags Tagging wars “Spagging” Spam tagging
45April 2008
Tidying up the tags…?
Lists of tagging norms have been developed Are there programmatic solutions? Users know they are looking at tags By tidying, do we destroy the essence of why
this works? Do we realistically have the resources?
46April 2008
Recommendations
Don’t assume that one size fits all Retain controlled vocabularies in the catalog Explore ways to use controlled vocabularies to
help organize the internet by re-purposing controlled vocabularies that already exist
Invite Folksonomies to the party in the catalog to gain their benefits
Explore ways to combine the two systems
47April 2008
RecommendationsWhen you invite folksonomies into the
catalog, do so strategically, and carefully
Don’t put terms in the same index as controlled vocabularies Find ways to associate terms applied across
editions of works Need for mediation, or at least observation The crowd is not necessarily the best arbiter
of specific terminology
48April 2008
Recommendations
Always remember why people tag
People tag things because they want to find them, not because they want others to find them
Be aware that this will impact the quality of the terms, and their frequency
49April 2008
Recommendations
Controlled vocabularies could be better utilized than they currently are
Subject structures are underutilized in the ILS Controlled vocabularies that exist are not being
exported to the Web Well-connected terms foster discovery – let’s
connect them. Index those cross references where available