ncsu libraries endeca and faceted browsing: giving the user a useful catalog scott warren ncsu...
TRANSCRIPT
NCSU Libraries
Endeca and faceted browsing: Giving the user a useful catalog
Scott WarrenNCSU Libraries
South Carolina Library Association Annual MeetingJune 7, 2007
NCSU Libraries
Outline
1. Problem and Context
2. Online searching, shopping, and examples
3. Demo
4. Faceted Navigation
5. Implementation Challenges
6. Facet Usage Statistics
7. Reflections
NCSU Libraries
The Context
NCSU Libraries
Online Catalogs
"Most integrated library systems, as they are currently configured and used, should be removed from public view.“
- Roy Tennant, CDL
NCSU Libraries
What is the problem?
• Existing catalogs are hard to use:
– known item searching works pretty well, but …
• users often do keyword searching and get large result sets returned in system sort order (last in, etc.)
• catalogs are unforgiving on spelling errors, stemming
• Authority searching completely mystifying
NCSU Libraries
Catalog metadata is buried
• Subject headings are not leveraged in searching• they should be browsed or linked from, not
searched
• Data from the item record is not leveraged• should be able to filter by item type, location,
circulation status, popularity
NCSU Libraries
Word of the Day for Saturday, May 5, 2007
• moil \MOYL\, intransitive verb:
• 1. To work with painful effort; to labor; to toil; to drudge.2. To churn or swirl about continuously.3. Toil; hard work; drudgery.4. Confusion; turmoil.
NCSU Libraries
What’s the big picture?
• Improve the quality of the library catalog user experience.
• Exploit our existing metadata infrastructure (make MARC work harder).
• Build a more flexible catalog tool that can be integrated with discovery tools of the future.
NCSU Libraries
What is Endeca?
• Software company based in Cambridge, MA
• Search/information access technology provider for a number of major e-commerce websites
• Developers of the Endeca Information Access Platform
NCSU Libraries
Why Endeca?
• Customized relevance ranking of results
• Better subject access by leveraging available metadata through facets
• Improved response time
• Enhanced natural language searching through spell correction, etc.
• Browse
NCSU Libraries
A question
• “How is the new generation of library catalog being developed?”
• informed and enhanced by search technologies developed outside of the library
• based on how our users know how to search, not on how we want them to search
• What does search look like for our users?
NCSU Libraries
Examples
NCSU Libraries
NCSU Libraries
NCSU Libraries
Faceted Navigation on the Web
NCSU Libraries
NCSU Libraries
Facet
Value
NCSU Libraries
NCSU Libraries
NCSU Libraries
NCSU Libraries
Faceted Navigation in Libraries
NCSU Libraries
Faceted Navigation in Libraries
NCSU Libraries
Faceted Navigation in Libraries
NCSU Libraries
Demonstration
NCSU Libraries
Faceted Navigation
NCSU Libraries
What is Faceted Navigation?
NCSU Libraries
What is Faceted Navigation?
• Search and browse in a single interface
• Facets can vary in scope– What is the item about?– What kind of item is it?– Where is it?
• Enables users to narrow results
• Macroscopic behavior of results set
• Clues to being on the right path
NCSU Libraries
Origins of Facets
• 1930s Ranganathan
• Colon Classification
NCSU Libraries
Cartesian Coordinates
NCSU Libraries
Coordinate System
LCSH
Format
Library
(x, y, z)(Library, LCSH, Format)
(Branch 1, History, Book)(Branch 2, History, DVD)
Multiple records could beassociated with eachcoordinate point.
Each point is associated with at least one record.
Branch 1
Branch 2
DVD
Book
History
Art
(Branch 1, History, Book)
NCSU Libraries
Another way to think about it
• 11 dimensional lattice space
• All points associated with at least one item/record
• Records can be associated with > 1 point
• Keyword search selects subset of points with word(s) in record• Facets shown are those dimensions corresponding to the points in that set
(nonzero values).
• Choosing a facet value is equivalent to slicing through the multidimensional lattice on a plane along that facet value and reducing the lattice’s dimension by 1.
• Choose enough facets and you will get down to a few items (never a null set)
NCSU Libraries
Implementation
NCSU Libraries
Implementation Challenges
• Facet selection
• Interface design
• Data issues
NCSU Libraries
Endeca at NCSU
• Endeca used to improve the discovery portion of the library catalog
• Endeca software indexes 1.6 million MARC records exported nightly from Sirsi Unicorn ILS
• Backend functions of ILS remain intact
NCSU Libraries
Facets Implemented at NCSU
• Availability• Author• Library• Format• Language
• Browse: New
• LC Classification• Subject: Topic• Subject: Genre• Subject: Region• Subject: Era
NCSU Libraries
Facet Selection
NCSU Libraries
Interface Design
• Iterative approach using wireframes
• Eight major revisions in a four month period
• Still lots of room for improvement
NCSU Libraries
Technical Overview
• Endeca co-exists with SirsiDynix Unicorn ILS and Web2 online catalog
• Endeca handles keyword search• Web2 handles authority search and detail page
display
• Endeca indexes MARC records exported nightly from Unicorn
• Endeca = discovery portion of the ILS
NCSU Libraries
Technical Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundr
y
Parse text files
Indices
MDEX Engine
NCSU Web Application
HTTP
HTTP
Information Access Platform
NCSU Libraries
Technical Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundr
yParse text
files Indices
MDEX Engine
NCSU Web Application
HTTP
HTTP
Offline - Nightly
NCSU Libraries
Technical Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundr
yParse text
files Indices
MDEX Engine
NCSU Web Application
HTTP
HTTP
Always Online
NCSU Libraries
Implementation Team
• Seven member team– 5 IT staff,– 1 cataloging librarian,– 1 reference librarian
• Timeline– License / negotiation: Spring 2005– Software acquisition: Summer 2005– Implementation: Aug 2005 to Jan 2006
NCSU Libraries
Data Issues
• ILS data with MARC-8 encoding => Text data with UTF-8 encoding
• Data consistency between ILS and Endeca catalog indexes (updates!)
• Data issues revealed by exposing metadata (ex: subject headings) in facets
NCSU Libraries
Outcomes
NCSU Libraries
Added search tools
• Automatic spell correction
• “Did you mean…” suggestions
– Automatic stemming– Bookmark-ability
NCSU Libraries
True browse
• Regain ability to browse catalog without entering any search terms
NCSU Libraries
Search and Navigation
Search 67%Navigation 8%
Search -> Navigation 25%
July 06 – Jan 07
NCSU Libraries
Requests by Search Type
Search 67%Navigation 8%
Search -> Navigation 25%
Requests by Search Type
Includes Navigation 33%
Search 67%
July 06 – Jan 07
NCSU Libraries
Navigation by Dimensions
Subject: Topic26%
Availability2%
LC Classification21%
Format10%
New10%
Library10%
Subject: Genre6%
Subject: Era2% Language
3%
Subject: Region4%
Author6%
July 06 – Jan 07
NCSU Libraries
Navigation by Dimension (most used)
0 20,000 40,000 60,000 80,000 100,000 120,000 140,000
Availability
Subject: Era
Language
Subject: Region
Author
Subject: Genre
Library
New
Format
LC Classification
Subject: Topic
Requests
July 06 – Jan 07
NCSU Libraries
Navigation by Dimension (order of UI presentation)
32,650
16,009
12,257
22,818
54,476
57,667
34,096
145,589
120,644
9,286
0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000
Author
Language
Subject: Era
Subject: Region
Library
Format
Subject: Genre
Subject: Topic
LC Classification
Availability
Requests
July 06 – Jan 07
NCSU Libraries
Dimension Value RequestsNew NEW 56,286
Format Book 16,188
LC Classification Q - Science 12,462
Library Textiles 11,160
Library D.H. Hill 11,060
Availability Available 9,276
Library Online Resources 8,164
LC Classification T – Technology 8,052
Subject: Topic History 7,915
Format Online 7,858
LC Classification P - Language and literature 7,005
LC Classification H - Social Sciences 6,953
Language English 6,854
Subject: Region United States 6,298
Format Journal, Magazine, or Serial 4,621
NCSU Libraries
Usability testing
• 10 undergraduate students– 5 with new Endeca-based interface– 5 with old catalog interface– Identical searching tasks
• Data collected– Task difficulty/failure– Task duration
NCSU Libraries
Usability testing
Task Difficulty: Old Catalog
Easy43%
Medium12%
Hard22%
Failed23%
Task Difficulty: New Catalog
Easy59%
Medium12%
Hard7%
Failed22%
NCSU Libraries
Usability testing
Average Task Duration:Old vs New Catalog
00:00.0 00:43.2 01:26.4 02:09.6 02:52.8 03:36.0
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 7
Task 8
Task 9
Task 10
Old Catalog
New Catalog
NCSU Libraries
Usability testing
• For students, relevance ranking is key.– July 06 – Jan 07: ~19% continued to page 2
• Faceted navigation is intuitive, even for students who don’t use it.
• Beware of library jargon– “keyword anywhere”, “keyword in subject”
• User behavior is influenced by previous experience.
NCSU Libraries
Reflections
• Faceted navigation enables new ways to discovery resources
• Library collections often contain rich descriptive metadata… exploit this!
• We have much to learn about how to optimize these interfaces for the user
• Great for collection analysis
NCSU Libraries
Analyzing collections
Textiles books divided by LCSHcarpets
embroidery
knitting
dressmaking
data processing
textile printing
chemistry
nonw oven fabrics
fashion
costume
patterns
plastics
hand w eaving
cottom
clothing and dress
yarn
w eaving
textile machinery
quality control
textile f ibers, synthetic
cotton manufacture
management
clothing trade
textile f ibers
testing
dyes and dyeing
history
polymers
textile fabrics
textile industry
NCSU Libraries
Conclusions
NCSU Libraries
Features Not Supported
• Work level aggregations / roll-up
• Customization / personalization
• Folksonomies / user contributed content
• Recommender functionality
• Shopping cart functionality
NCSU Libraries
QuickSearch
NCSU Libraries
Future directions
• Experiment with FRBR search/display through partnership with OCLC.
• Integrate catalog w/other tools through web services:– OpenSearch, RSS
• Enrich catalog through external web services:– book jackets, reviews, etc. – Amazon/OCLC
• Build modular shopping cart functionality.
• Use Endeca to index local collections.
NCSU Libraries
Big Issues
• Benchmarking– Just how much better is it? For whom? When is it not
better?
• Natural Language– Revolutionary War problem
• Experimenting – What is the optimal interface?– Power Search?
NCSU Libraries
Big Wins
• Relevance ranking
• Speed / performance
• Locally managed presentation interface
• Persistent parameter based entry points
• Proving it could be done