d o b atch and u ser e valuations g ive the s ame r esults ? william hersh, andrew turpin, susan...

8
DO BATCH AND USER EVALUATİONS GİVE THE SAME RESULTS? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek, Daniel Olson Hande Adıgüzel Hayrettin Erdem

Upload: felicity-chapman

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

C ONTRIBUTION OF THE PAPER To find whether IR approaches achieving better performance in the batch environment could translate that effectiveness to real users. 3

TRANSCRIPT

Page 1: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

DO BATCH AND USER EVALUATİONS GİVE THE SAME RESULTS?

William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek, Daniel Olson

Hande AdıgüzelHayrettin Erdem

Page 2: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

BATCH EXPERİMENTS VS TREC INTERACTİVE TRACK

Measuring recall and precision in the noninteractive laboratory setting

Interaction is the key element of successful retrieval system use, and relevance-based measures do not capture the complete picture of user performance .

The TREC Interactive Track instructs human users to tag the relevant documents for different topics.  

These results are used for independent relevance judgment. 2

Page 3: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

CONTRİBUTİON OF THE PAPER

To find whether IR approaches achieving better performance in the batch environment could translate that effectiveness to real users.

3

Page 4: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

EXPERİMENT STEPS1. Establishment of the best weighting

approach for batch searching experiments.

2. User experiments to determine if those measures give comparable results with human searchers.

3. Verification that the new TREC interactive track data gives comparable batch searching results for the chosen weighting schemes.

4

Page 5: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

Finding an effective weighting scheme for experimental system using TREC 6-7

5

Page 6: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

Interactive searching to assess weighting scheme with real users

6

Not statistically significant.

Page 7: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

Interactive searching to assess weighting scheme with real users

7

All of the difference between the systems occurred in just one query, 414i.

Page 8: D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

VERİFYİNG WEİGHTİNG SCHEME WİTH TREC 8

This experiment is to verify that the improvements in batch evaluation detected with TREC-6 and TREC-7 data held with TREC-8 data.

8