measuring system performance

44
Measuring system Measuring system performance performance

Upload: spike

Post on 10-Feb-2016

65 views

Category:

Documents


1 download

DESCRIPTION

Measuring system performance. The library. A system view. Environment. U s e r s. Inputs. Outputs. Transformational process. energy money materials personnel information. products services. System performance measures. recall. precision. relevance. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Measuring system performance

Measuring system Measuring system performanceperformance

Page 2: Measuring system performance

The libraryA system view

Environment

Transformationalprocess

Inputs Outputs

energymoneymaterialspersonnelinformation

productsservices

Users

Page 3: Measuring system performance

System performance measures

recall precision

relevance

Page 4: Measuring system performance

Robert Taylor's four levels of question formation

The actual but unexpressed need forinformation (the visceral need)Q1The conscious, within-brain descriptionof the need (the conscious need)Q2The formal statement of the need(the formalized need)Q3The question as presented to the infor-mation system (the compromised need)Q4

Taylor, Robert S. 1968. Question-negotiation and information seeking in libraries. College & Research Libraries 29(3): 178-194 (May 1968).

Page 5: Measuring system performance

System-defined relevance

find health AND feet

The health of the lumber90% industry in terms of cubic feet

of lumber produced

"My feet are killing me."

Page 6: Measuring system performance

Information retrieval process

Questionformulation

RelevancydeterminationSystem: Which documents are relevant to the query?User: Are these documents

relevant to my needs?

Page 7: Measuring system performance

Defining relevance

System-definedrelevance

User-definedrelevancevs.

ObjectiveOften topical.Does it match

the query?

Subjective.Situational.Is it useful?

Page 8: Measuring system performance

User-defined relevance

The effect of lysergic acid diethylamideingestion on toenail fungus in cloned mice

"My feet are killing me."

Soothing remedies for aching feet

Controlling the body by controlling the mind--meditative techniques for dealing with pain

Page 9: Measuring system performance

Determining topical relevance• Analyze work as to what it

is about

• Assign to the document one or more terms from a finite list of topics

• Users can then search on those topic indicators

Page 10: Measuring system performance

Recall

Recall =

No. of relevant documents retrieved

Total no. of relevant documents in the file

Page 11: Measuring system performance

Precision

Precision =

No. of relevant documents retrieved

Total no. of documents retrieved from the file

Page 12: Measuring system performance

Precision vs. RecallAn inverse relationship

As the level of recall rises the level of precision generally declines and vice versa.

The Cranfield experiments (1957 & 1962)Cyril Cleverdon, p.i.

Page 13: Measuring system performance

Precision vs. RecallSubject: sexual dimorphismWord stemming:

sex sexes sexualsexy sexier sexiest

Field-specific searches:DE,TI/sexual()dimorphism

Recall Precision

Recall Precision

Page 14: Measuring system performance

User-defined relevance"Relevance appears to be a subjective quality, unique between the individual and a given document supporting the assumption that relevance can only be judged by the information user."

Miranda Pao

Page 15: Measuring system performance

Years later

The effect of lysergic acid diethylamideingestion on toenail fungus in cloned mice

"My feet are still killing me."

Soothing remedies for aching feet

Controlling the body by controlling the mind--meditative techniques for dealing with pain

Page 16: Measuring system performance

Factors affecting relevance (1)• Purpose of the information• Situation of the user• Level at which the information

source is written– Journal of the Amer. Med. Assn.– Healthy times

Page 17: Measuring system performance

Factors affecting relevance (2)• Subject knowledge of the user

– Is the data new to the user?– Does the information relate to the

user's prior knowledge?• Values - ethical, social,

philosophical, political, religious, legal

Page 18: Measuring system performance

User-defined relevance

Subjectivity and fluidity make it difficult to use as measuring tool for system performance

Page 19: Measuring system performance

Incorporating user-defined relevance into information retrieval systems (1)

• User performs search• System retrieves results

.

.

.

Page 20: Measuring system performance

Incorporating user-defined relevance into information retrieval systems (2)• System asks user if he/she would

like to retrieve similar documentsSearch for other documents with

similar word frequenciesSearch for other documents with

same subject descriptors

Page 21: Measuring system performance

Search for other documents with same subject descriptors

Main Author:Title:

Subject(s):

Gribbin, John R.In search of Schrodinger's cat :quantum physics and reality / by John Gribbin.

Schrodinger, Erwin, 1887-1961.Quantum theory History.Reality.

Page 22: Measuring system performance

Amazon.com

Page 23: Measuring system performance

Amazon.com

Page 24: Measuring system performance

Amazon.com

Page 25: Measuring system performance

Assisting users in determining relevancy

Indexingterms

Title

Citationdata

Abstract

Source: Barry, Carol L. 1998. Document representations and clues to document relevance. Journal of the American Society for Information Science 49(14):1293-1303.

Page 26: Measuring system performance

Document representation research

Titles

Fulltext

Title: Getting good grades in graduate school

Title: How to impress your advisor in graduate school

Title: Writing a dissertation

Title: The well-written graduate paper

Getting good grades in graduate school

The best way to get good grades is to study hard…

How to impress your advisor in graduate school

Never show up late for a meeting with your advisor…

Writing a dissertation

The first thing to do is to pick a topic that truly interests you…

The well-written graduate paper

Before finalizing your topic do a preliminary search on…

How relevant

are these?

How relevant

are these?

Page 27: Measuring system performance

Document representation research

Titles Citationdata

Indexingterms Abstracts

Fulltext

Fulltext

Fulltext

Fulltext

How relevant

are these?

How relevant

are these?

Page 28: Measuring system performance

Utility studies - Indications that user found relevant materials

• Citation & abstract databases– User requests citations be formatted for

printing– User requests citations be sent by e-mail– User downloads citations

• Full-text databases– Pull up the full text– Print the article– Download the article to their Blackberry

Page 29: Measuring system performance

Utility studies - Indications that user found relevant materials

Search Short list

If user stops may not have

found a relevant article

chocolate

Page 30: Measuring system performance

Utility studies - Indications that user found relevant materials

Search Short list

Modifiessearch

View fullcitationdata forarticle

View fulltext ofarticle

Downloador printarticle

Assume that user found

article relevant

Page 31: Measuring system performance

Characteristics of searches that produce relevant materials• Subject searching• Utilization of Boolean operators• Search modification• Increased time in display activities• User of greater number of

databasesCooper, Michael Dr. and Hui-Min Chen. 2001. Predicting the relevance of a library catalog search. Journal of the American Society for Information Science and Technology 52 (10):813-827.

Page 32: Measuring system performance

Importance of abstract (1)• Indication as to depth/scope of

the article

• Delineates methodology--indication of reliability and validity

• Gives indication as to content novelty

Authors studied leg-hair count variations of Drosophila in

Kawainui Marsh

Random sampling in 40 sectors during March, June,

September & December

Greater variation in June

Page 33: Measuring system performance

Importance of abstract (2)• Basis for research may

indicate recency

• Delineation of results indicates "tangibility" (important, useful data)

American housing market was selected because it is always robust.

Authors concluded that American teenagers listen to rock music.

Page 34: Measuring system performance

Types of abstracts

• Indicative• Informative• Critical (evaluative)

(Not common in library databases)

Page 35: Measuring system performance

Indicative abstractIndicates what the document is about but doesn't report findings

Title: A review of the current literature on relevance.

Abstract: The author reviews the current literature on relevance.

Page 36: Measuring system performance

Informative abstractActs as a substitute for the documentTitle: The effects of library school on the mental health of library students

Abstract: The authors performed longitudinal studies on 32 graduate students in 8 library and information science programs and found a significant increase in aberrant psychological traits over time.

(fictitious title and abstracts)

Page 37: Measuring system performance

Abstract creation

• Author-produced• Vendor-added• Automated abstracting

Page 38: Measuring system performance

Automated abstracting1. Word counts2. Remove stop words3. Weight remaining words

according to frequency4. Search for sentences with

highest density of most frequently-occurring words

Page 39: Measuring system performance

1. Word countTitle: Seasonal variations in the feral cat population of Fargo

the 81is 68a 56to 42cats 61number 45season 27winter 11

summer 11spring 11fall 11monthly 10temperature 61variation 12food 10availability 10

average 9concept 7per 8over 9immediate 5implement 3mortality 8survival 9

Page 40: Measuring system performance

2. Eliminate stop wordsTitle: Seasonal variations in the feral cat population of Fargo

the 81is 68a 56to 42cats 61number 45season 27winter 11

summer 11spring 11fall 11monthly 10temperature 61variation 12food 10availability 10

average 9concept 7per 8over 9immediate 5implement 3mortality 8survival 9

Page 41: Measuring system performance

3. Rank by frequencyTitle: Seasonal variations in the feral cat population of Fargo

cats 61temperature 61number 45seasonal 27variation 12winter 11

summer 11spring 11fall 11monthly 10food 10availability 10

average 9survival 9mortality 8concept 7immediate 5implement 3

Page 42: Measuring system performance

4. Search for sentences with highest density of high frequency wordsTitle: Seasonal variations in the feral cat population of Fargo

We found a significant seasonal variation in the number of cats. The highest number of cats are found in the summer, the lowest number of cats in the winter.

Page 43: Measuring system performance

Automated abstract... The Children's Internet Protection Act (CIPA) sets conditions on public libraries' receipt of federal financial assistance for Internet access. ... It would not have been possible for the broadcasting station to limit the use of federal funds to all non-editorializing activities. ... The instant Court distinguished Velazquez, restricting its holding to situations in which the grantee is "pit[ted] . . . against the Government. ... " Justice Stevens asserted that the filtering condition was unconstitutional because it distorted the normal usage of library Internet terminals as sources of a wide array of information. ... A condition mandating Internet filters distorts this mission by "deny[ing] patrons access to constitutionally protected speech that libraries would otherwise provide. ...

Page 44: Measuring system performance

Relevance and information overloadIn this age of information overload, tools to aid the user in determining relevance are increasingly critical.