dec 2003, drtc© c.watters 1 users and the digital library carolyn watters dalhousie university...
TRANSCRIPT
Dec 2003, DRTC © C.Watters1
Users and the Digital Library
Carolyn Watters
Dalhousie University
Halifax, Canada
Dec 2003, DRTC© C.Watters2
Are Digital Libraries Libraries?
Phase I -Electronic access to traditional library Phase II- Access to electronic documents Phase III- Access to all information that is digital
– Communities of interest– Personal– Archival– Current– Editions
Phase IV - Semantic Web
Dec 2003, DRTC© C.Watters3
Ranganathan’s 5 Laws and the DL
Books are for use. Every book its reader. Every reader its book. Save the time of the reader. Library is a living organism.
Dec 2003, DRTC© C.Watters4
This Talk: Users of Digital Libraries
Task: what kind of information? Motivation: why is the user interacting? Interventions: what can we change?
– Query– Matching– Ranking– Presentation
Conclusions: what is effective?
Dec 2003, DRTC© C.Watters5
Task: what kind of information?
Research– I am doing a study on Mesopotamia
Search– Who was the Prime Minister of India in 1962
Refind– What was the name of that movie I read the review on?
Browse– I am interested in Post-modern Art
Dec 2003, DRTC© C.Watters6
Impact of Motivation & Task
Type I - Uses & Gratification Task– Satisfaction is in the result– News reading ?
Type II - Ludic Task– Part of the satisfaction is in the process– News reading?
Dec 2003, DRTC© C.Watters7
Type I Tasks – Uses and Gratification tasks
The uses and gratification theoretical perspective is based on the assumption that the reader has some underlying goal, outside the reading itself, that reading satisfies.
Having the answer is the goal May be intrinsically motivated (i.e. may just
want to know) Traditional information need
Dec 2003, DRTC© C.Watters8
Types of U&G Tasks
Research Queries– History of Kerala– Breadth important– Multiple viewpoints/sources expected
Question and Answer– Capital of Kerala– Accuracy important– Contradiction not expected
Dec 2003, DRTC© C.Watters9
Characteristics of U&G Info Tasks
Articulation of a specific query
Recognition of relevance of retrieved results
User has control of “satisficing” point
* Herbert Simon (1976)
Dec 2003, DRTC© C.Watters10
Can we make predictions for U&G tasks?
A study* of reading the news suggested that a neural net model was able to learn enough about user preferences to be able to predict what a user would read for this type of task
Use this to modify the query Most useful for repeated topic queries
*Shepherd, M., C. Watters, and A. Marath. Adaptive Filtering for Electronic News. Proc. of HICSS’35. Jan 7-10, 2002.
Dec 2003, DRTC© C.Watters11
Type II Tasks - Ludic Tasks
The ludic theoretical perspective is based on the assumption that the reading itself brings satisfaction to the the reader.
The process of getting information is satisfying
Web browsing News reading
Dec 2003, DRTC© C.Watters12
Ludic Use characteristics
individual path selection – Users are happy to get different information for
same general search Apperception
– Users choose information that fits their current knowledge
Habitualness– Users perform these searches as part of their
routine (rather than a specific one time info need)
Dec 2003, DRTC© C.Watters13
Types of Ludic Tasks
Updating & community awareness tasks– What is happening?– Breadth important– Unknown events are relevant
Search & Browse tasks– What is new/odd/interesting– Novelty is important– answers not expected– Community membership
– ***Exact query unknown
Dec 2003, DRTC© C.Watters14
Can we predict what will be read?
We cannot predict based on past behavior what a user will chose to read or the path the user will chose to follow
– Reading the news– Browsing the web
User often multitasking– 33% of web sessions involve more than 2 topics (Spink)
How can we help the user??
Dec 2003, DRTC© C.Watters15
Predicting for Ludic Tasks
“when you don’t know where you are going, any road will take you there!”
Lewis Carroll
Dec 2003, DRTC© C.Watters16
What do we have to work with?
An information need expressed as a query [user] a user profile [user/system]]
– Interests– history
document content (metadata keywords genre) [derived]
Link topology [author]
Usage patterns of documents [community]
Information about current task [user]
Dec 2003, DRTC© C.Watters17
What can we Manipulate?
I. Query
II. Matching
III. Ranking
IV. Presentation
Dec 2003, DRTC© C.Watters18
I. Improving the Query
Longer queries are better [BelK03] Average query is 2.2 terms Type of queries (Q&A / research/ browse/refind)
Qualitity of query Modification of query Personal Profiles Stereotypes
Dec 2003, DRTC© C.Watters19
Quality of Query
Purpose of query Separation of doc set into relevant & nonrelevant documents
How well does the language of the query fit or not fit the language of the docs
Clarity*= difference between the distribution of the terms used in the query and in the distribution of all terms in the collection
*Croft, 2001
Dec 2003, DRTC© C.Watters20
Example of Clarity Values
Query = Apple– General news DL: Clarity value is low
Apple pies, computers, city
– Computer DL: clarity value is high
• Query = Apple Computer Company• General news DL: Clarity is high• Computer DL: same as just Apple
Dec 2003, DRTC© C.Watters21
Query Modification
Feedback: add terms from similar docs– User relevance judgements – More like this one
Profiles: add terms from history of user or user interests
Thesaurus: add related terms
Dec 2003, DRTC© C.Watters22
Rocchio Feedback
Dec 2003, DRTC© C.Watters23
II. Improving the Matching
Using Profiles– [Joe: railway, steam, engine, track, Europe]
Using metadata– Mapping to controlled vocabularies– Add semantics to documents– [D1:<loc>Europe</loc> <topic>Train Transportation</topic>]
Genre– Reports– Home pages– Shopping pages– News
Dec 2003, DRTC© C.Watters24
III. Improving ranking
User Profiles Location Stereotypes
Dec 2003, DRTC© C.Watters25
User Profiles: Recommender Systems
To “recommend” an existing path through an information space that best satisfies the user’s information need.
Depends on goal of search!
Dec 2003, DRTC© C.Watters26
Dec 2003, DRTC© C.Watters27
Community & Personal Profiles
Community profiles – provide stability– Common interests
Personal Profiles– Long term interests vs short term– Multiple interests– Topic drift
Dec 2003, DRTC© C.Watters28
Effect for Browsing Tasks
Browsing behavior was idiosyncratic and personal System could not learn over time
*Shepherd, M., C.Watters, and R.Kaushik. Lessons from Reading E-News for Browsing the Web: The Roles of Genre and Task. Proc. of the Annual Conference of the American Society for Information Science and Technology. November 2001, Washington.
Dec 2003, DRTC© C.Watters29
Profiles & Tasks
Repeated queries based on user profile for sustained interests work well
Feedback mechanisms such as Rocchio work well for sustained interests
BUT not for idiosyncratic queries or browsing
Dec 2003, DRTC© C.Watters30
Example of Alternate Ranking:Geospatial Queries
What is here?– What can I do in Bangalore?
Where is there x?– Where can I ski in Eastern Canada or USA?
Dec 2003, DRTC© C.Watters31
Where can I go skiing?
Dec 2003, DRTC© C.Watters32
What do we have to work with
Geoparsing– Recognizing geographical context – country, river, feature etc
Geocoding– Assigning longitude and latitude values– End, middle, etc
Dec 2003, DRTC© C.Watters33
Dec 2003, DRTC© C.Watters34
User Stereotypes for Medical News*
Select medical items from online news sources
Categorize medical items by intended audience
*Watters,C., W.Zheng, and E.Milios. 2002. Filtering for Medical News Items. Proc. of the American Society for Information and Technology Conference. Nov. 15-19, Pittsburgh.
Dec 2003, DRTC© C.Watters35
Profiling by Keyword
Customized vocabulary in MeSH Pruned non medical branches
– 31, 441 headings
Assigned weights to these headings
nonmedical 1 building
Lay medical 2 Body,stomach
General med 3 Anatomy,umbilicus
Specific med 4 Inguinal Canal
Dec 2003, DRTC© C.Watters36
Prototype
Dec 2003, DRTC© C.Watters37
IV. Improving Presentation
1.Genre
2. Views
3. Transformations for device
Dec 2003, DRTC© C.Watters38
Hit List(linear)
Dec 2003, DRTC© C.Watters39
News (broadsheet)
Dec 2003, DRTC© C.Watters40
Album
Preferred broadsheet for browsing
*Shneiderman’s PhotoLib
Dec 2003, DRTC© C.Watters41
Report
Dec 2003, DRTC© C.Watters42
Effect of Genre on Different Devices
Dec 2003, DRTC© C.Watters43
Views
*R.Furuta
Dec 2003, DRTC© C.Watters44
Million list server messages
*websom.hut.fi/websom
Dec 2003, DRTC© C.Watters45
Summary of Interventions
Improve the query Improve matching Improve the ranking Improve presentation
profiles
querymatch rank present
user
Dec 2003, DRTC© C.Watters46
Reality Check
Not so easy for real tasks and real users– Topic shifts– Topic relevancy/ importance judgements– *Multitasking– *Task Detection– *Getting Personal Preferences
Dec 2003, DRTC© C.Watters47
Can we help?
YES – Queries
can be modified for U&G tasks Use of community profiles for Ludic tasks
– Alternative ranking schemes can be based on type of task
– Match presentation to contents and use
Dec 2003, DRTC© C.Watters48
Roles of the DL
Conflicting Goals– Archival – Access – Derivative uses (ex. Animation)– Digital rights management
Dec 2003, DRTC© C.Watters49
Conclusions for Digital Libraries
User is an integral part of the system User’s immediate task and motivation matters Community interests matter Ranganathan’s rule 2 for 2003
– Every user his or her information.
Dec 2003, DRTC© C.Watters50
Thank you!
More information: My web site
– www.cs.dal.ca/~watters
Web Information Filtering Lab– www.cs.dal.ca/wifl