ocolr 20041025 #53928015 oclcr making data work harder lorcan dempsey oclc members council 17 may...
TRANSCRIPT
OCoLR 20041025 #53928015 OCLCR
Making data work harder
Lorcan Dempsey
OCLC Members Council17 May 2005
May 2005 Members’ Council
Web hub services OWC Presentation Examples
A comprehensive discovery experience
Yes
Predictable, often immediate, fulfilment
In progress
Data works
hard
Being improved Yes Curioser
FAST
Open to intermediate consumers
In progress
Co-created with users
Not yet Yes WorldCat
Wiki
May 2005 Members’ Council
May 2005 Members’ Council
Making data work hard
The user experience: from search to rich browse Capturing user contribution
Data mining
May 2005 Members’ Council
Context: value
Amazoogle: we can add significant value. We should be looking for organizational frameworks within which we can do this.
ROI: libraries invest in data but do not extract as much value as they might from it. Unless we release more value, then the argument for this investment becomes weaker. The user experience Management intelligence
May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
Top Sets for Fiction (Records)
Record Keys
1,296 defoe, daniel\1661 1731/robinson crusoe
1,267 carroll, lewis\1832 1898/alices adventures in wonderland
971 cervantes saavedra, miguel de\1547 1616/don quixote
828 stevenson, robert louis\1850 1894/treasure island
689 twain, mark\1835 1910/adventures of huckleberry finn
624 twain, mark\1835 1910/adventures of tom sawyer
618 swift, jonathan\1667 1745/gullivers travels
May 2005 Members’ Council
FRBR & FAST
FRBR ‘Interim FRBR’ in OWC FRBR in research projects
FictionFinder Curioser xISBN Algorithm Top 1000
FRBR in FirstSearch – late this year
Curioser ….
FAST Moving FAST headings
into OpenWorldCat Experiment: mapping
Yahoo! categories to FAST headings
Recognized value …
May 2005 Members’ Council
WIKI in WorldCat
Capture user input in structured ways
May 2005 Members’ Council
Extending Wiki’s utility
Wiki: supported markup:
wikitext page editing:
a single text block
searches: full text searching
collections managed: one per wiki
MetaWiki: supported markup:
wikitext structured data (e.g., MARC,
METS, DC…) page editing:
a single text block, or, field level
searches: full text searching fielded searching
collections managed: one/multiple per MetaWiki
Built on top of standards (OAI, OpenURL, SRU)
May 2005 Members’ Council
Management intelligence: data mining
Data Bibliographic data Transaction logs …
Need to mine this data for intelligence that creates value for libraries and users
OCLC Research undertaking a number of data-mining projects aimed at: Knowing more about the characteristics of library collections Creating interesting and useful data displays Generating intelligence to support library decision-making
May 2005 Members’ Council
Know Your Audience!
Implies: we can infer materials’ audience level from holdings patterns, which in turn can support:• Collection management• Readers’ advisory services• Reference services• Information retrieval
Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file
Selections are made to serve the interests of a library’s target community …• Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school …
Paper forthcoming!
?
May 2005 Members’ Council
The Implications of Google Libraries …
Potentially covers about one third of print books in WorldCat
~60 percent of total G5 books held by only one of the Google 5
Less than 5 percent held by all of the Google 5
~20 percent of total G5 print books out of copyright
Paper forthcoming …
May 2005 Members’ Council
“Last Copy”: Identifying At-Risk Materials
~23 million WorldCat records have only a single holding attached
Libraries need to know what portions of their collections are:Rare … Rare and valuable …“Last copy” (artifact and/or content)
Identification of rare materials essential intelligence in support of storage, digitization, and preservation decision-making
Data-mining study of Vanderbilt holdings in WorldCat:• Identified 23,000 items held uniquely by Vanderbilt
• ~60 % are print books• ~60 % produced prior to 1950; ~25 % produced after 1970
Paper forthcoming!
May 2005 Members’ Council
Looking at Library Print Book Collections … Systematically
32 million print books, representing26 million distinct works
Half of print books published after1977; more than 80% still “in copyright”
Rareness is common! Only a third of print books have more than five holdings; half have two or less
OCLC/Ithaka collaboration: Use WorldCat to characterize the “system-wide” print book collection – i.e., aggregate print book holdings in WorldCat
Intelligence of this kind can help establish digitization prioritiesand inform preservation planning
More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt
Only about 120,000 works had bothprint book and e-book manifestations
May 2005 Members’ Council
Thank you!
OCLC Research:
http://www.oclc.org/research/