what to read next? three stages of data-driven book discovery
TRANSCRIPT
WHAT TO READ NEXT?THREE STAGES OF DATA-DRIVEN BOOK DISCOVERY
METTE SKOVTOINE BOGERSAALBORG UNIVERSITY
AOIR PANEL ON ‘NETWORKED READING’ OCTOBER 7, 2016
BOOKS ARE NOT DEAD (THEY AREN’T EVEN SICK!)
� Books remain very popular!
– Slow but steady increase in book sales to 2.7 billion books in the US in 2015
� E-books make up 18.9% of that in the US
– Total sales revenue: $29.2 billion in US in 2015
� E-book sales revenue was $5.3 billion
� So there is definitely a market & need for discovering (new) interesting books!
3
BOOK DISCOVERY IS NOT THAT EASY...
� Readers often struggle with existing systems (search engines & recommender systems) to discover new books
– Information needs are highly complex
� Topical match, complex relevance aspects, personal interests & preferences, context of use
– Search engines and recommenders are ill-equipped to address such needs!
4
SOCIAL BOOK SEARCH LAB
� Series of workshops (2011-2016) with a shared data challenge using data from LibraryThing and Amazon
� Focus is on the design, development & evaluation of systems that can address complex book requests
1. Detecting complex book requests
2. Analyzing book requests for relevance aspects
3. Developing better algorithms for suggesting relevant books
4. Exploring interactions with book search engines
6
BOOK REQUESTS
� Forum posts describing realistic book search requests
– Book request narratives can touch upon many different aspects
� Users search for topics, genres, authors, plots, etc.
� Users want books that are engaging, funny, well-written, educational, etc.
� Users have different preferences, knowledge, reading level, etc.
– Book discussion fora contain many such focused requests!
� LibraryThing, Goodreads, …
9
OVERVIEW OF DATA SOURCES 10
2.8 million books
944 annotated requests
5,658 suggestionsSuggestions
Books
UsersBook
requests
OVERVIEW OF DATA SOURCES 13
94,000 user profiles
2.8 million books
944 annotated requests
5,658 suggestionsSuggestions
Books
UsersBook
requests User profiles
OVERVIEW OF DATA SOURCES 14
Suggestions
Books
UsersBook
requests
Bibliographic metadata
Curatedmetadata
User-generatedcontent
User profiles
Tags
Reviews
DETECTING BOOK REQUESTS
� How common are requests for book recommendations in the LibraryThing Forums?
– Currently 233,000+ threads in the LibraryThing forums
– Annotated a random sample of 4,000 threads of which 15.1% were book requests
– Means there are potentially over 35,000 book requests on LibraryThing!
17
DETECTING BOOK REQUESTS
� Can we detect such book requests automatically?
– Initial experiments achieved an accuracy of 94.17% on a test set of 2,000 annotated book requests
– Most predictive characteristics
� Words such as any, suggestions, looking, recommendations, thanks, anyone, read, books, and recommend
� No. of sentences ending in a question mark
� Degree of expertise of LibraryThing users replying to the thread
� Ratio of suggested books cataloged afterwards by the requester
18
ANALYZING BOOK REQUESTS
� Book requests contain many elements that could be mined to benefit search engines & recommendation systems
– Example: Relevance aspects
� What makes a suggested book relevant to the user?
� Identified eight relevance aspects in book search requests (Reuter, 2007; Koolen et al., 2015)
20
ANALYZING BOOK REQUESTS
� Accessibility
– Accessibility in terms of the language, length, or level of difficulty of a book.
� Content
– Aspects such as topic, plot, genre, style, or comprehensiveness of a book.
� Engagement
– Books that fit a particular mood or interest, books that are considered high quality, or provide a particular reading experience.
� Familiarity
– Books that are similar to known books or related to a previous experience.
21
ANALYZING BOOK REQUESTS
� Known-item
– Descriptions of known books with the sole purpose of identifying its title and/or author.
� Metadata
– Books with a certain title or by a certain author, editor, illustrator, publisher, in a particular format, or written or published in certain year or period.
� Novelty
– Books with content that is novel to the reader, books that are unusual or quirky.
� Socio-Cultural
– Books related to the user’s socio-cultural background or values, books that are popular or obscure, or books that have had a particular cultural or social impact.
22
ANALYZING BOOK REQUESTS
� Distribution of relevance aspects & prediction success
24
Aspect % of narratives (N = 944) Precision
Accessibility 16.1 % 0.31
Content 73.9 % 0.63
Engagement 22.6 % 0.32
Familiarity 35.8 % 0.47
Known-item 21.4 % 0.71
Metadata 28.0 % 0.17
Novelty 3.6 % 0.03
Socio-cultural 14.2 % 0.23
SUGGESTING RELEVANT BOOKS 26
Suggestions
Books
UsersBook
requests
Bibliographic metadata
Curatedmetadata
User-generatedcontent
User profiles
Tags
Reviews
SUGGESTING RELEVANT BOOKS 29
- Title
- Publisher
- Editorial
- Creator
- Series
- Award
- Character
- Place
• Different grou
- Blurb
- Epigraph
- First words
- Last words
- Quotation
- User reviews
• Different grou
- Dewey
- Thesaurus
- Index terms
- Tags
Bibliographic metadata
Content
Curated metadata
Reviews
Tags
� Different types of book metadata fields
SUGGESTING RELEVANT BOOKS 30
Set of metadata fields NDCG@10
Bibliographic metadata 0.2015
Content 0.0115
Curated metadata 0.0691
Tags 0.2056
Reviews 0.2832
All fields combined 0.3058
[email protected] 0.1 0.2 0.3 0.4
SUGGESTING RELEVANT BOOKS 31
Set of metadata fields NDCG@10
Bibliographic metadata 0.2015
Content 0.0115
Curated metadata 0.0691
Tags 0.2056
Reviews 0.2832
All fields combined 0.3058
[email protected] 0.1 0.2 0.3 0.4
AIM & APPROACH
� We aim to contribute to building dedicated book search and discovery services
� Our long-term goal is to investigate book search behaviour through a range of user tasks and interfaces:
– How should the user interface combine professional, curated metadata and user-generated metadata?
– How should the user interface adapt itself as the user progresses through their search task, and if so, how?
– When do users prefer to browse or search?
– How can we best support different types of search tasks?
USER STUDY OF INTERACTIVE BOOK SEARCH BEHAVIOUR
Comparative user studies with 192 + 111 participants (2015 & 2016)
Welcome Informed Consent Background
Pre-Task InformationTaskPost-Task
Questions
Experience Thank you
EXPERIMENTAL TASKS
Goal-oriented task: Imagine you participate in an experiment at a desert-island for one month. There will be no people, no TV, radio or other distraction. The only things you are allowed to take with you are 5 books:
– On surviving on a desert island
– That will teach you something new
– Highly recommended by other users
– For fun
– About one of your personal hobbies or interests
Non-goal task: Imagine you are waiting to meet a friend in a coffee shop or pub or the airport or your office. While waiting, you come across this website and explore it looking for any book that you find interesting, or engaging or relevant. Explore anything you wish until you are completely and utterly bored…
WHAT HAVE WE LEARNED SO FAR?
� Need for heterogeneous record information (user generated and curated, professional data)
� Multi-stage interface:
– Longer search session
– Less queries issued (more browsing)
– No differences in number of books added to book bag
� Clear differences in search behaviour between the different types of tasks
(Gäde et al. 2015, 2016)
CONCLUSIONS
� Tens of thousands of information needs are going unmet
– Just the tip of the iceberg?
– Search engines and recommender systems are ill-equipped to deal with this!
42
OPEN QUESTIONS
� How (dis)similar are relevance aspects for books to those for other domains?
� How do relevance aspects influence the choice of algorithm(s) & data representation(s)?
� How does the combination of data from different sources (Amazon, LibraryThing, Library of Congress, British Library) affect the quality of the results and UX?
� Decontextualized metadata: What happens when we mix metadata from different sources?
– Example: reuse of recommendations or tags ‘out of context’
43
QUESTIONS?
“ALWAYS READ SOMETHING THAT WILL MAKE YOU LOOK GOOD IF YOU DIE IN THE MIDDLE OF IT.”
P.J. O’Rourke
REFERENCES
� Slide 3
– Book sales statistics taken from https://www.statista.com/topics/1177/book-market/ and https://www.statista.com/topics/1474/e-books/; last visited October 1, 2016
� Slide 6
– Official website of the Social Book Search lab: http://social-book-search.humanities.uva.nl/
� Slide 20
– Reuter, K. (2007). Assessing Aesthetic Relevance: Children’s Book Selection in a Digital Library. JASIST, 58(12), 1745–1763.
– Koolen, M., Bogers, T., Van den Bosch, A., and Kamps, J. (2015). Looking for Books in Social Media: An Analysis of Complex Search Requests. Proceedings of ECIR 2015, Volume 9022 of the series Lecture Notes in Computer Science, pp. 184-196
46
REFERENCES
� Slide 40
– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Toms, E. & Walsh, D. (2015). Overview of the SBS 2015 Interactive Track . Working Notes of CLEF 2015 – Conference and Labs of the Evaluation Forum, CEUR workshop proceedings, vol. 1391
– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Bogers, T. & Walsh, D. (2015). Overview of the SBS 2016 Interactive Track. Working Notes of CLEF 2016 – Conference and Labs of the Evaluation Forum, CEUR workshop proceedings, vol. 1609
47