what to read next? three stages of data-driven book discovery

42
WHAT TO READ NEXT? THREE STAGES OF DATA-DRIVEN BOOK DISCOVERY METTE SKOV TOINE BOGERS AALBORG UNIVERSITY AOIR PANEL ON ‘NETWORKED READING’ OCTOBER 7, 2016

Upload: toine-bogers

Post on 16-Apr-2017

158 views

Category:

Science


0 download

TRANSCRIPT

WHAT TO READ NEXT?THREE STAGES OF DATA-DRIVEN BOOK DISCOVERY

METTE SKOVTOINE BOGERSAALBORG UNIVERSITY

AOIR PANEL ON ‘NETWORKED READING’ OCTOBER 7, 2016

INTRODUCTIONPART 1

BOOKS ARE NOT DEAD (THEY AREN’T EVEN SICK!)

� Books remain very popular!

– Slow but steady increase in book sales to 2.7 billion books in the US in 2015

� E-books make up 18.9% of that in the US

– Total sales revenue: $29.2 billion in US in 2015

� E-book sales revenue was $5.3 billion

� So there is definitely a market & need for discovering (new) interesting books!

3

BOOK DISCOVERY IS NOT THAT EASY...

� Readers often struggle with existing systems (search engines & recommender systems) to discover new books

– Information needs are highly complex

� Topical match, complex relevance aspects, personal interests & preferences, context of use

– Search engines and recommenders are ill-equipped to address such needs!

4

EXAMPLES OF COMPLEX INFORMATION NEEDS 5

SOCIAL BOOK SEARCH LAB

� Series of workshops (2011-2016) with a shared data challenge using data from LibraryThing and Amazon

� Focus is on the design, development & evaluation of systems that can address complex book requests

1. Detecting complex book requests

2. Analyzing book requests for relevance aspects

3. Developing better algorithms for suggesting relevant books

4. Exploring interactions with book search engines

6

OVERVIEW OF DATA SOURCES 7

2.8 million books

944 annotated requests

Books

UsersBook

requests

ANNOTATED LT TOPIC

8

Group name

Request title

Narrative

BOOK REQUESTS

� Forum posts describing realistic book search requests

– Book request narratives can touch upon many different aspects

� Users search for topics, genres, authors, plots, etc.

� Users want books that are engaging, funny, well-written, educational, etc.

� Users have different preferences, knowledge, reading level, etc.

– Book discussion fora contain many such focused requests!

� LibraryThing, Goodreads, …

9

OVERVIEW OF DATA SOURCES 10

2.8 million books

944 annotated requests

5,658 suggestionsSuggestions

Books

UsersBook

requests

ANNOTATED LT TOPIC

11

Recommended books

Group name

Request title

Narrative

Catalog additions

12

Forum suggestion added after the request was posted

OVERVIEW OF DATA SOURCES 13

94,000 user profiles

2.8 million books

944 annotated requests

5,658 suggestionsSuggestions

Books

UsersBook

requests User profiles

OVERVIEW OF DATA SOURCES 14

Suggestions

Books

UsersBook

requests

Bibliographic metadata

Curatedmetadata

User-generatedcontent

User profiles

Tags

Reviews

DETECTING COMPLEX BOOK REQUESTS

PART 2

DETECTING BOOK REQUESTS

� How common are requests for book recommendations in the LibraryThing Forums?

– Currently 233,000+ threads in the LibraryThing forums

– Annotated a random sample of 4,000 threads of which 15.1% were book requests

– Means there are potentially over 35,000 book requests on LibraryThing!

17

DETECTING BOOK REQUESTS

� Can we detect such book requests automatically?

– Initial experiments achieved an accuracy of 94.17% on a test set of 2,000 annotated book requests

– Most predictive characteristics

� Words such as any, suggestions, looking, recommendations, thanks, anyone, read, books, and recommend

� No. of sentences ending in a question mark

� Degree of expertise of LibraryThing users replying to the thread

� Ratio of suggested books cataloged afterwards by the requester

18

ANALYZING BOOK REQUESTS

PART 3

ANALYZING BOOK REQUESTS

� Book requests contain many elements that could be mined to benefit search engines & recommendation systems

– Example: Relevance aspects

� What makes a suggested book relevant to the user?

� Identified eight relevance aspects in book search requests (Reuter, 2007; Koolen et al., 2015)

20

ANALYZING BOOK REQUESTS

� Accessibility

– Accessibility in terms of the language, length, or level of difficulty of a book.

� Content

– Aspects such as topic, plot, genre, style, or comprehensiveness of a book.

� Engagement

– Books that fit a particular mood or interest, books that are considered high quality, or provide a particular reading experience.

� Familiarity

– Books that are similar to known books or related to a previous experience.

21

ANALYZING BOOK REQUESTS

� Known-item

– Descriptions of known books with the sole purpose of identifying its title and/or author.

� Metadata

– Books with a certain title or by a certain author, editor, illustrator, publisher, in a particular format, or written or published in certain year or period.

� Novelty

– Books with content that is novel to the reader, books that are unusual or quirky.

� Socio-Cultural

– Books related to the user’s socio-cultural background or values, books that are popular or obscure, or books that have had a particular cultural or social impact.

22

ANALYZING BOOK REQUESTS

� Distribution of relevance aspects & prediction success

24

Aspect % of narratives (N = 944) Precision

Accessibility 16.1 % 0.31

Content 73.9 % 0.63

Engagement 22.6 % 0.32

Familiarity 35.8 % 0.47

Known-item 21.4 % 0.71

Metadata 28.0 % 0.17

Novelty 3.6 % 0.03

Socio-cultural 14.2 % 0.23

SUGGESTING RELEVANT BOOKS

PART 4

SUGGESTING RELEVANT BOOKS 26

Suggestions

Books

UsersBook

requests

Bibliographic metadata

Curatedmetadata

User-generatedcontent

User profiles

Tags

Reviews

SUGGESTING RELEVANT BOOKS 29

- Title

- Publisher

- Editorial

- Creator

- Series

- Award

- Character

- Place

• Different grou

- Blurb

- Epigraph

- First words

- Last words

- Quotation

- User reviews

• Different grou

- Dewey

- Thesaurus

- Index terms

- Tags

Bibliographic metadata

Content

Curated metadata

Reviews

Tags

� Different types of book metadata fields

SUGGESTING RELEVANT BOOKS 30

Set of metadata fields NDCG@10

Bibliographic metadata 0.2015

Content 0.0115

Curated metadata 0.0691

Tags 0.2056

Reviews 0.2832

All fields combined 0.3058

[email protected] 0.1 0.2 0.3 0.4

SUGGESTING RELEVANT BOOKS 31

Set of metadata fields NDCG@10

Bibliographic metadata 0.2015

Content 0.0115

Curated metadata 0.0691

Tags 0.2056

Reviews 0.2832

All fields combined 0.3058

[email protected] 0.1 0.2 0.3 0.4

ANALYZING BOOK SEARCH BEHAVIOR

PART 5

AIM & APPROACH

� We aim to contribute to building dedicated book search and discovery services

� Our long-term goal is to investigate book search behaviour through a range of user tasks and interfaces:

– How should the user interface combine professional, curated metadata and user-generated metadata?

– How should the user interface adapt itself as the user progresses through their search task, and if so, how?

– When do users prefer to browse or search?

– How can we best support different types of search tasks?

USER STUDY OF INTERACTIVE BOOK SEARCH BEHAVIOUR

Comparative user studies with 192 + 111 participants (2015 & 2016)

Welcome Informed Consent Background

Pre-Task InformationTaskPost-Task

Questions

Experience Thank you

EXPERIMENTAL TASKS

Goal-oriented task: Imagine you participate in an experiment at a desert-island for one month. There will be no people, no TV, radio or other distraction. The only things you are allowed to take with you are 5 books:

– On surviving on a desert island

– That will teach you something new

– Highly recommended by other users

– For fun

– About one of your personal hobbies or interests

Non-goal task: Imagine you are waiting to meet a friend in a coffee shop or pub or the airport or your office. While waiting, you come across this website and explore it looking for any book that you find interesting, or engaging or relevant. Explore anything you wish until you are completely and utterly bored…

BASELINE INTERFACE

MULTISTAGE INTERFACE – BROWSE VIEW

MULTISTAGE INTERFACE – SEARCH VIEW

MULTISTAGE INTERFACE – BOOK-BAG VIEW

WHAT HAVE WE LEARNED SO FAR?

� Need for heterogeneous record information (user generated and curated, professional data)

� Multi-stage interface:

– Longer search session

– Less queries issued (more browsing)

– No differences in number of books added to book bag

� Clear differences in search behaviour between the different types of tasks

(Gäde et al. 2015, 2016)

OPEN QUESTIONS

PART 6

CONCLUSIONS

� Tens of thousands of information needs are going unmet

– Just the tip of the iceberg?

– Search engines and recommender systems are ill-equipped to deal with this!

42

OPEN QUESTIONS

� How (dis)similar are relevance aspects for books to those for other domains?

� How do relevance aspects influence the choice of algorithm(s) & data representation(s)?

� How does the combination of data from different sources (Amazon, LibraryThing, Library of Congress, British Library) affect the quality of the results and UX?

� Decontextualized metadata: What happens when we mix metadata from different sources?

– Example: reuse of recommendations or tags ‘out of context’

43

QUESTIONS?

“ALWAYS READ SOMETHING THAT WILL MAKE YOU LOOK GOOD IF YOU DIE IN THE MIDDLE OF IT.”

P.J. O’Rourke

REFERENCES

� Slide 3

– Book sales statistics taken from https://www.statista.com/topics/1177/book-market/ and https://www.statista.com/topics/1474/e-books/; last visited October 1, 2016

� Slide 6

– Official website of the Social Book Search lab: http://social-book-search.humanities.uva.nl/

� Slide 20

– Reuter, K. (2007). Assessing Aesthetic Relevance: Children’s Book Selection in a Digital Library. JASIST, 58(12), 1745–1763.

– Koolen, M., Bogers, T., Van den Bosch, A., and Kamps, J. (2015). Looking for Books in Social Media: An Analysis of Complex Search Requests. Proceedings of ECIR 2015, Volume 9022 of the series Lecture Notes in Computer Science, pp. 184-196

46

REFERENCES

� Slide 40

– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Toms, E. & Walsh, D. (2015). Overview of the SBS 2015 Interactive Track . Working Notes of CLEF 2015 – Conference and Labs of the Evaluation Forum, CEUR workshop proceedings, vol. 1391

– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Bogers, T. & Walsh, D. (2015). Overview of the SBS 2016 Interactive Track. Working Notes of CLEF 2016 – Conference and Labs of the Evaluation Forum, CEUR workshop proceedings, vol. 1609

47