1 user interfaces for information access marti hearst is202, fall 2006

123
1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

1

User Interfaces for Information Access

Marti HearstIS202, Fall 2006

 

 

Page 2: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

2

Outline

• What do people search for?• Why is supporting search difficult?• What works in search interfaces?• When does search result grouping work?• What about social tagging and search?

Page 3: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

3

What Do People Search For?(And How?)

Page 4: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

4

A of Information Needs

• What is the typical height of a giraffe?

• What are some good ideas for landscaping my client’s yard?

• What are some promising untried treatments for Raynaud’s disease?Text Data Mining

Browse and Build

Question/Answer

Page 5: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

5

Questions and Answers

• What is the height of a typical giraffe?

– The result can be a simple answer, extracted from existing web pages.

– Can specify with keywords or a natural language query

• However, most search engines are not set up to handle questions properly.

• Get different results using a question vs. keywords

Page 6: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

6

Page 7: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

7

Page 8: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

8

Page 9: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

9

Page 10: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

10

Page 11: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

11

Classifying Queries• Query logs only indirectly indicate a user’s needs• One set of keywords can mean various different things

– “barcelona”– “dog pregnancy”– “taxes”

• Idea: pair up query logs with which search result the user clicked on.– “taxes” followed by a click on tax forms– Study performed on Altavista logs– Author noted afterwards that Yahoo logs appear to have a

different query balance.

Rose & Levinson, Understanding User Goals in Web Search, Proceedings of WWW’04

Page 12: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

12

Classifying Web QueriesSearch Goals Description Example Queries

Navigational(~25%)

Go to a specific known website

aloha airlines

duke university hospital

kelly blue book

Informational (~62%)

Learn something about a topic, get an answer to a question, get advice, ideasget a list of links

2004 election data

baseball death and injury

why are metals shiny

phone card

Resource(~13%)

Obtain a resource (e.g., software, music, knitting patterns)

kazaa lite

live camera in L.A.

Rose & Levinson, Understanding User Goals in Web Search, Proceedings of WWW’04

Page 13: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

13

What are people looking for?Check out Google Answers

Page 14: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

14

Page 15: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

15

Page 16: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

16

Why is Supporting Search Difficult?

Page 17: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

17

Why is Supporting Search Difficult?

• Everything is fair game• Abstractions are difficult to represent• The vocabulary disconnect• Users’ lack of understanding of the technology

Page 18: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

18

Everything is Fair Game

• The scope of what people search for is all of human knowledge and experience.

– Other interfaces are more constrained(word processing, formulas, etc)

• Interfaces must accommodate human differences in:

– Knowledge / life experience– Cultural background and expectations– Reading / scanning ability and style– Methods of looking for things (pilers vs. filers)

Page 19: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

19

Abstractions Are Hard to Represent

• Text describes abstract concepts– Difficult to show the contents of text in a visual or compact

manner

• Exercise:– How would you show the preamble of the US Constitution

visually?– How would you show the contents of Joyce’s Ulysses

visually? How would you distinguish it from Homer’s The Odyssey or McCourt’s Angela’s Ashes?

• The point: it is difficult to show text without using text

Page 20: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

20

Vocabulary Disconnect

• If you ask a set of people to describe a set of things there is little overlap in the results.

Page 21: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

21

Lack of Technical Understanding

• Most people don’t understand the underlying methods by which search engines work.

Page 22: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

22

People Don’t Understand Search Technology

A study of 100 randomly-chosen people found:– 14% never type a url directly into the address bar

• Several tried to use the address bar, but did it wrong– Put spaces between words– Combinations of dots and spaces– “nursing spectrum.com” “consumer reports.com”

– Several use search form with no spaces• “plumber’slocal9” “capitalhealthsystem”

– People do not understand the use of quotes• Only 16% use quotes• Of these, some use them incorrectly

– Around all of the words, making results too restrictive– “lactose intolerance –recipies”

» Here the – excludes the recipes– People don’t make use of “advanced” features

• Only 1 used “find in page”• Only 2 used Google cache

Hargattai, Classifying and Coding Online Actions, Social Science ComputerReview 22(2), 2004 210-227.

Page 23: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

23

People Don’t Understand Search Technology

Without appropriate explanations, most of 14 people had strong misconceptions about:

• ANDing vs ORing of search terms– Some assumed ANDing search engine indexed a smaller collection;

most had no explanation at all

• For empty results for query “to be or not to be”– 9 of 14 could not explain in a method that remotely resembled

stop word removal

• For term order variation “boat fire” vs. “fire boat”– Only 5 out of 14 expected different results– Understanding was vague, e.g.:

» “Lycos separates the two words and searches for the meaning, instead of what’re your looking for. Google understands the meaning of the phrase.”

Muramatsu & Pratt, “Transparent Queries: Investigating Users’Mental Models of Search Engines, SIGIR 2001.

Page 24: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

24

What Works in Search Interfaces?

Page 25: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

25

What Works for Search Interfaces?• Query term highlighting

– in results listings– in retrieved documents

• Sorting of search results according to important criteria (date, author)

• Grouping of results according to well-organized category labels (see Flamenco)

• DWIM only if highly accurate:– Spelling correction/suggestions– Simple relevance feedback (more-like-this)– Certain types of term expansion

• So far: not really visualization

Hearst et al: Finding the Flow in Web Site Search, CACM 45(9), 2002.

Page 26: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

26

Highlighting Query Terms

• Boldface or color• Adjacency of terms with relevant context is a

useful cue.

Page 27: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

27

Page 28: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

28

Page 29: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

29

Highlighted query term hits using Google toolbar

US

Blackout

PGA

Microsoft

found!

found!

don’t know

don’t know

Microso

Page 30: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

30

Small Details Matter

• UIs for search especially require great care in small details– In part due to the text-heavy nature of search– A tension between more information and

introducing clutter

• How and where to place things important– People tend to scan or skim– Only a small percentage reads instructions

Page 31: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

31

Small Details Matter

• UIs for search especially require endless tiny adjustments– In part due to the text-heavy nature of search

• Example:– In an earlier version of the Google Spellchecker, people

didn’t always see the suggested correction• Used a long sentence at the top of the page:

“If you didn’t find what you were looking for …”• People complained they got results, but not the right results.• In reality, the spellchecker had suggested an appropriate

correction.

Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html

Page 32: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

32

Small Details Matter

• The fix: – Analyzed logs, saw people didn’t see the correction:

• clicked on first search result, • didn’t find what they were looking for (came right back to the

search page• scrolled to the bottom of the page, did not find anything• and then complained directly to Google

– Solution was to repeat the spelling suggestion at the bottom of the page.

• More adjustments:– The message is shorter, and different on the top vs. the

bottom

Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html

Page 33: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

33

Page 34: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

34

Using DWIM

• DWIM – Do What I Mean– Refers to systems that try to be “smart” by guessing users’

unstated intentions or desires• Examples:

– Automatically augment my query with related terms– Automatically suggest spelling corrections– Automatically load web pages that might be relevant to

the one I’m looking at– Automatically file my incoming email into folders– Pop up a paperclip that tells me what kind of help I need.

• THE CRITICAL POINT:– Users love DWIM when it really works– Users DESPISE it when it doesn’t

Page 35: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

35

DWIM that Works• Amazon’s “customers who bought X also bought Y”

– And many other recommendation-related features

Page 36: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

36

DWIM Example: Spelling Correction/Suggestion

• Google’s spelling suggestions are highly accurate

• But this wasn’t always the case. – Google introduced a version that wasn’t very

accurate. People hated it. They pulled it. (According to a talk by Marissa Mayer of Google.)

– Later they introduced a version that worked well. People love it.

• But don’t get too pushy.– For a while if the user got very few results, the page was

automatically replaced with the results of the spelling correction

– This was removed, presumably due to negative responses

Information from a talk by Marissa Mayer of Google

Page 37: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

37

Query Reformulation

• Query reformulation:– After receiving unsuccessful results, users modify

their initial queries and submit new ones intended to more accurately reflect their information needs.

• Web search logs show that searchers often reformulate their queries– A study of 985 Web user search sessions found

• 33% went beyond the first query• Of these, ~35% retained the same number of terms

while 19% had 1 more term and 16% had 1 fewer

Use of query reformulation and relevance feedback by Excite users,Spink, Janson & Ozmultu, Internet Research 10(4), 2001

Page 38: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

38

Query Reformulation

• Many studies show that if users engage in relevance feedback, the results are much better.– In one study, participants did 17-34% better with RF– They also did better if they could see the RF terms

than if the system did it automatically (DWIM)

• But the effort required for doing so is usually a roadblock.

Koenemann & Belkin, A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness, CHI’96

Page 39: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

39

Query Reformulation

• What happens when the web search engines suggests new terms?

• Web log analysis study using the Prisma term suggestion system:

Anick, Using Terminological Feedback for Web Search Refinement –A Log-based Study, SIGIR’03.

Page 40: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

40

Query Reformulation Study

• Feedback terms were displayed to 15,133 user sessions. – Of these, 14% used at least one feedback term– For all sessions, 56% involved some degree of query refinement

• Within this subset, use of the feedback terms was 25%– By user id, ~16% of users applied feedback terms at least once

on any given day

• Looking at a 2-week session of feedback users:– Of the 2,318 users who used it once, 47% used it again in the

same 2-week window.

• Comparison was also done to a baseline group that was not offered feedback terms.– Both groups ended up making a page-selection click at the same

rate.

Anick, Using Terminological Feedback for Web Search Refinement –A Log-based Study, SIGIR’03.

Page 41: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

41

Query Reformulation Study

Anick, Using Terminological Feedback for Web Search Refinement –A Log-based Study, SIGIR’03.

Page 42: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

42

Query Reformulation Study• Other observations

– Users prefer refinements that contain the initial query terms

– Presentation order does have an influence on term uptake

Anick, Using Terminological Feedback for Web Search Refinement –A Log-based Study, SIGIR’03.

Page 43: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

43

Query Reformulation Study• Types of refinements

Anick, Using Terminological Feedback for Web Search Refinement –A Log-based Study, SIGIR’03.

Page 44: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

44

Prognosis: Query Reformulation

• Researchers have always known it can be helpful, but the methods proposed for user interaction were too cumbersome– Had to select many documents and then do feedback– Had to select many terms– Was based on statistical ranking methods which are hard

for people to understand

• Indirect Relevance Feedback can improve general ranking (see section on social search)

Page 45: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

45

Usability of Grouping Search Results

Page 46: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

46

The Need to Group

• Interviews with lay users often reveal a desire for better organization of retrieval results

• Useful for suggesting where to look next– People prefer links over generating search

terms*– But only when the links are for what they want

*Ojakaar and Spool, Users Continue After Category Links, UIETips Newsletter, http://world.std.com/~uieweb/Articles/, 2001

Page 47: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

47

Page 48: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

48

Page 49: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

49

Page 50: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

50

Page 51: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

51

Conundrum• Everyone complains about disorganized

search results.

• There are lots of ideas about how to organize them.

• Why don’t the major search engines do so?

• What works; what doesn’t?

Page 52: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

52

Different Types of GroupingClusters

(Document similarity based)(polythetic)

Scatter/GatherGrouper

Keyword Sharing (any doc with keyword in group)

(monothetic)

FindexDisCover

Single Category

SwishDynacat

Multiple (Faceted) Categories

FlamencoPhlat/Stuff I’ve seen

Monothetic vs Polythetic After Kummamuru et al, 2004

Page 53: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

53

Clusters

• Fully automated• Potential benefits:

– Find the main themes in a set of documents• Potentially useful if the user wants a summary of the main

themes in the subcollection• Potentially harmful if the user is interested in less

dominant themes– More flexible than pre-defined categories

• There may be important themes that have not been anticipated

– Disambiguate ambiguous terms• ACL

– Clustering retrieved documents tends to group those relevant to a complex query together

Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96

Page 54: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

54

Categories

• Human-created– But often automatically assigned to items

• Arranged in hierarchy, network, or facets– Can assign multiple categories to items– Or place items within categories

• Usually restricted to a fixed set– So help reduce the space of concepts

• Intended to be readily understandable– To those who know the underlying domain– Provide a novice with a conceptual structure

• There are many already made up!

Page 55: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

55

Cluster-based Grouping

Document Self-similarity(Polythetic)

Page 56: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

56

Scatter/Gather Clustering

• Developed at PARC in the late 80’s/early 90’s• Top-down approach

– Start with k seeds (documents) to represent k clusters– Each document assigned to the cluster with the most

similar seeds• To choose the seeds:

– Cluster in a bottom-up manner– Hierarchical agglomerative clustering

• Can recluster a cluster to produce a hierarchy of clusters

Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, SIGIR 1992

Page 57: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

57first page

The Scatter/Gather Interface

Page 58: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

58first page

Two Queries: Two Clusterings

AUTO, CAR, ELECTRIC AUTO, CAR, SAFETY

The main differences are the clusters that are central to the query

8 control drive accident …

25 battery california technology …

48 import j. rate honda toyota …

16 export international unit japan

3 service employee automatic …

6 control inventory integrate …

10 investigation washington …

12 study fuel death bag air …

61 sale domestic truck import …

11 japan export defect unite …

Page 59: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

59

Scatter/Gather Evaluations

• Can be slower to find answers than linear search!

• Difficult to understand the clusters.• There is no consistence in results.• However, the clusters do group relevant

documents together.• Participants noted that useful for eliminating

irrelevant groups.

Page 60: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

64

Visualizing Clustering Results

• Use clustering to map the entire huge multidimensional document space into a huge number of small clusters.

• User dimension reduction and then project these onto a 2D/3D graphical representation

Page 61: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

65

Clustering Visualizations

image from Wise et al 95

Page 62: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

66

Clustering Visualizations

(image from Wise et al 95)

Page 63: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

67

Koh

on

en F

eatu

re M

aps

(Lin

92

, C

hen e

t al. 9

7)

Page 64: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

68

Are visual clusters useful?

• Four Clustering Visualization Usability Studies• Conclusions:

– Huge 2D maps may be inappropriate focus for information retrieval

• cannot see what the documents are about• space is difficult to browse for IR purposes• (tough to visualize abstract concepts)

– Perhaps more suited for pattern discovery and gist-like overviews.

Page 65: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

80

Term-based Grouping

Single Term from Document Characterizes the Group

(Monothetic)

Page 66: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

81

Findex, Kaki & Aula

• Two innovations:– Used very simple method to create the

groupings, so that it is not opaque to users• Based on frequent keywords• Doc is in category if it contains the keyword• Allows docs to appear in multiple categories

– Did a naturalistic, longitudinal study of use• Analyzed the results in interesting ways

Kaki and Aula: “Findex: Search Result Categories Help Users when Document Ranking Fails”, CHI ‘05

Page 67: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

82

Page 68: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

83

Study Design

• 16 academics– 8F, 8M– No CS– Frequent searchers

• 2 months of use• Special Log

– 3099 queries issued– 3232 results accessed

• Two questionnaires (at start and end)• Google as search engine; rank order retained

Page 69: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

84

After 1 Week After 2 Months

Page 70: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

85

Kaki & Aula Key Findings (all significant)

• Category use takes almost 2 times longer than linear– First doc selected in 24.4 sec vs 13.7 sec

• No difference in average number of docs opened per search (1.05 vs. 1.04)

• However, when categories used, users select >1 doc in 28.6% of the queries (vs 13.6%)

• Num of searches without 0 result selections is lower when the categories are used

• Median position of selected doc when:– Using categories: 22 (sd=38)– Just ranking: 2 (sd=8.6)

Page 71: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

86

Kaki & Aula Key Findings

• Category Selections– 1915 categories selections in 817 searches– Used in 26.4% of the searches– During the last 4 weeks of use, the proportion of searches

using categories stayed above the average (27-39%)– When categories used, selected 2.3 cats on average– Labels of selected cats used 1.9 words on average

(average in general was 1.4 words)– Out of 15 cats (default):

• First quartile at 2nd cat• Median at 5th

• Third quartile at 9th

Page 72: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

87

Kaki & Aula Survey Results

• Subjective opinions improved over time• Realization that categories useful only some of the time• Freeform responses indicate that categories useful

when queries vague, broad or ambiguous• Second survey indicated that people felt that their

search habits began to change– Consider query formulation less than before (27%)– Use less precise search terms (45%)– Use less time to evaluate results (36%)– Use categories for evaluating results (82%)

Page 73: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

88

Conclusions from Kaki Study

• Simplicity of category assignment made groupings understandable – (my view, not stated by them)

• Keyword-based Categories: – Are beneficial when result ranking fails– Find results lower in the ranking – Reduce empty results– May make it easier to access multiple results– Availability changed user querying behavior

Page 74: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

89

Highlight, Wu et al.

• Select terms from document summaries, organize into a subsumption hierarchy.

• Highlight the terms in the retrieved documents.

Wu, Shankar, Chen, Finding More Useful Information Faster from Web Search ResultsCICM ‘03

Page 75: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

90

Page 76: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

91

Page 77: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

94

Category-based Grouping

General CategoriesDomain-Specific Categories

Page 78: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

95

Page 79: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

96

SWISH, Chen & Dumais

• 18 participants, 30 tasks, within subjects• Significant (and large, 50%) timing differences

in favor of categories• For queries where the results are in the first

page, the differences are much smaller.• Strong subjective preferences.• BUT: the baseline was quite poor and the

queries were very cooked.– Very small category set (13 categories)– Subhierarchy wasn’t used.

Chen, Dumais, Bringing Order to the Web: Automatically Categorizing Search Results CHI 2000

Page 80: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

97

Test queries, Chen & Dumais

Information Need Pre-specified Query

giants ridge ski resort “giants”

book about "numerical recipes" for computer software

“recipes”

information about Indian motorcycles

“Indian”

"the home page for the band, "They Might be Giants""

“giants”

"the home page for the basketball team, the Washington Wizards"

“washington”

Chen, Dumais, Bringing Order to the Web, Automatically Categorizing Search Results. CHI 2000

Page 81: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

102

• This followup study reveals that the baseline had been unfairly weakened.

• The speedup isn’t so much from the category labels as the grouping of similar documents.

• For queries where the answer is in the first page, the category effects are not very strong.

Revisiting the Study, Dumais, Cutrell, Chen

Page 82: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

103

DynaCat, Pratt, Hearst, and Fagan.

• Medical Domain• Decide on important question types in an

advance– What are the adverse effects of drug D?– What is the prognosis for treatment T?

• Make use of MeSH categories• Retain only those types of categories known to

be useful for this type of query.

Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99

Page 83: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

104

DynaCat, Pratt, Hearst, & Fagan

Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99

Page 84: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

105

DynaCat Study, Pratt, Hearst & Fagan

• Design– Three queries– 24 cancer patients– Compared three interfaces

• ranked list, clusters, categories

• Results– Participants strongly preferred categories– Participants found more answers using categories– Participants took same amount of time with all three

interfaces

Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99

Page 85: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

106

DynaCat study, Pratt et al.

Page 86: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

107

Faceted Category Grouping

Multiple Categories per Document

Page 87: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

108

Search Usability Design Goals

1. Strive for Consistency2. Provide Shortcuts3. Offer Informative Feedback4. Design for Closure5. Provide Simple Error Handling6. Permit Easy Reversal of Actions7. Support User Control8. Reduce Short-term Memory Load

From Shneiderman, Byrd, & Croft, Clarifying Search, DLIB Magazine, Jan 1997. www.dlib.org

Page 88: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

109

How to Structure Information for Search and Browsing?

• Hierarchy is too rigid

• KL-One is too complex

• Hierarchical faceted metadata:– A useful middle ground

Page 89: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

110

• Inflexible– Force the user to start with a particular category– What if I don’t know the animal’s diet, but the

interface makes me start with that category?

• Wasteful– Have to repeat combinations of categories– Makes for extra clicking and extra coding

• Difficult to modify– To add a new category type, must duplicate it

everywhere or change things everywhere

The Problem with Hierarchy

Page 90: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

111

The Idea of Facets

• Facets are a way of labeling data– A kind of Metadata (data about data)– Can be thought of as properties of items

• Facets vs. Categories– Items are placed INTO a category system– Multiple facet labels are ASSIGNED TO items

Page 91: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

112

The Idea of Facets• Create INDEPENDENT categories (facets)

– Each facet has labels (sometimes arranged in a hierarchy)

• Assign labels from the facets to every item– Example: recipe collection

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Bell Pepper

Curry

Chicken

Page 92: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

114

Using Facets

• Now there are multiple ways to get to each item

Preparation Method Fry Saute Boil Bake Broil Freeze

Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan

Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Fruit > PineappleDessert > Cake

Preparation > Bake

Dessert > Dairy > SherbetFruit > Berries > Strawberries

Preparation > Freeze

Page 93: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

116

Flamenco Usability Studies

• Usability studies done on 3 collections:– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations.

Yee, K-P., Swearingen, K., Li, K., and Hearst, M., Faceted Metadata for Image Search and Browsing, in CHI 2003.

Page 94: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

118

Flamenco Study Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FacetedBaseline

Overall Assessment

More useful for your tasksEasiest to useMost flexible

More likely to result in dead endsHelped you learn more

Overall preference

Find images of rosesFind all works from a given period

Find pictures by 2 artists in same media

Which Interface Preferable For:

Yee, K-P., Swearingen, K., Li, K., and Hearst, M., Faceted Metadata for Image Search and Browsing, in CHI 2003.

Page 95: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

119

The Advantages of Facets

• Lets the user decide how to start, and how to explore and group.

• After refinement, categories that are not relevant to the current results disappear.

• Seamlessly integrates keyword search with the organizational structure.

• Very easy to expand out (loosen constraints)• Very easy to build up complex queries.

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, P., Finding the Flow in Web Site Search, Communications of the ACM, 45 (9), September 2002, pp.42-49

Page 96: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

120

Advantages of Facets

• Can’t end up with empty results sets– (except with keyword search)

• Helps avoid feelings of being lost.• Easier to explore the collection.

– Helps users infer what kinds of things are in the collection.– Evokes a feeling of “browsing the shelves”

• Is preferred over standard search for collection browsing in usability studies.– (Interface must be designed properly)

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, P., Finding the Flow in Web Site Search, Communications of the ACM, 45 (9), September 2002, pp.42-49

Page 97: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

121

Advantages of Facets

• Seamless to add new facets and subcategories• Seamless to add new items.• Helps with “categorization wars”

– Don’t have to agree exactly where to place something

• Interaction can be implemented using a standard relational database.

• May be easier for automatic categorization

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, P., Finding the Flow in Web Site Search, Communications of the ACM, 45 (9), September 2002, pp.42-49

Page 98: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

122

Summary: Evaluation Good Ideas

• Longitudinal studies of real use• Match the participants to the content of the

collection and the tasks• Test against a strong baseline

Page 99: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

123

Summary: Evaluation Problems

• Bias participants towards a system– “Try our interface” versus linear view

• Tailor tasks unrealistically to benefit the target interface

• Impoverish the baseline relative to the test condition

• Conflate test conditions

Page 100: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

124

Summary: Grouping Search Results

Grouping search results seems beneficial in two circumstances:

1. General web search, using transparent labeling (monothetic terms) or category labels rather than cluster centroids.Effects: • Works primarily on ambiguous queries,

– (so used a fraction of the time)• Promotes relevant results up from below the first page of hits

– So important to group the related items together visually• Users tend to select more documents than with linear search• May work even better with meta-search• Positive subjective responses (small studies)• Visualization does not work.

Page 101: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

125

Summary: Grouping Search Results

Grouping search results seems beneficial in two circumstances:

2. Collection navigation with faceted categories• Multiple angles better than single categories• “searchers” turn into “browsers”• Becoming commonplace in e-commerce, digital

libraries, and other kinds of collections• Extends naturally to tags.• Positive subjective responses (small studies)

Page 102: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

126

Social Tagging and Search

Page 103: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

127

SearchTopical

Metadata

Structured, FlexibleNavigation

Page 104: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

128

Problem with Metadata-Oriented Approaches

Getting the metadata!

Page 105: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

129

SearchTopical

Metadata

Social questionanswering

Recorded Human Interaction

Click-throughranking

Inferred recommendations

Page 106: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

130

Human Real-time Question Answering

• More popular in Korea than algorithmic search– Maybe fewer good web pages?– Maybe more social society?

• Several examples in US:– Yahoo answers recently released and successful– wondir.com– answerbag.com

Page 107: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

131

Yahoo Answers (also answerbag.com, wondir.com, etc)

Page 108: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

132

Yahoo Answers appearing in search results

Page 109: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

133

answerbag.com

Page 110: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

134

Using User Behavior as Implicit Preferences

• Search click-through experimentally shown to boost search rankings for top results– Joachims et al. ‘05, Agichtein et al. ‘06

– Works ok even if non-relevant documents examined– Best in combination with sophisticated search algorithms– Doesn’t work well for ambiguous queries

• Aggregates of movie and book selections comprise implicit recommendations

Page 111: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

135

SearchTopical

Metadata

Recorded Human Interaction

Social Tagging(photos, bookmarks)

Game-basedtagging

Page 112: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

136

Social Tagging• Metadata assignment without all the bother• Spontaneous, easy, and tends towards single terms

Page 113: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

137

Issues with Photo and Web link Tagging• There is a strong personal component

– Marking for my own reminders– Marking for my circle of friends

• There is also a strong social component– Try to promote certain tags to make them more

popular, or post to popular tags to see your influence rise

Page 114: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

138

Tagging Games

• Assigning metadata is fun! (ESP game, von Ahn)– No need for reputation system, etc.

• Pay people to do it– MyCroft (iSchool student project)

• Drawback: least common denominator labels• Experts already label their own data or that

about which they have expertise– E.g., protein function– Wikipedia

Page 115: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

139

SearchTopical

Metadata

Social questionanswering

Recorded Human Interaction

Social Tagging(photos, bookmarks)

Click-throughranking

Inferred recommendations

Game-basedtagging

????

Page 116: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

140

Expert-Oriented Tagging in Search• Already happening at Google co-op• Shows up in certain types of search results

Page 117: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

141

Expert-Oriented Tagging• Already happening at Google co-op• Shows up in certain types of search results

Page 118: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

142

Promoting Expertise-Oriented Tagging

• Research area: User Interfaces– To make rapid-feedback suggestions of pre-established

tags• Like type-ahead queries

– To incentivize labeling and make it fun– To allow the personal aspects to shine through

Page 119: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

143

Promoting Expertise-Oriented Tagging

• Research area: NLP Algorithms– (We have an algorithm to build facets from text)– To convert tags into facet hierarchies– To capture implicit labeling information

Page 120: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

144

Promoting Expertise-Oriented Tagging

• Research area: Digital infrastructure• Extending tagging games• Build an architecture that channels specialized

subproblems to appropriate experts– We now know there is a green plant in an office; direct this

to the botany > houseplants experts

Page 121: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

145

Promoting Expertise-Oriented Tagging

• Research area: economics and sociology– What are the right incentive structures?

Page 122: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

146

Using Implicit Preferences

• Extend implicit recommendation technology to online catalog use

Page 123: 1 User Interfaces for Information Access Marti Hearst IS202, Fall 2006

147

Final Words

• User interfaces for search remains a fascinating and challenging field

• Search has taken a primary role in the web and internet business

• Thus, we can expect fascinating developments, and maybe some breakthroughs, in the next few years!