an information-pattern-based approach to novelty detection

17
Intelligent Database Systems Lab N.Y.U.S. T. I. M. An information-pattern- based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan Li, W. Bruce Croft Information Processing and Management (2008)

Upload: haley

Post on 12-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

An information-pattern-based approach to novelty detection. Presenter : Lin, Shu-Han Authors : Xiaoyan Li, W. Bruce Croft. Information Processing and Management (2008). Outline. Motivation Objective Definition Observation Methodology Experiments Conclusion Personal Comments. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

An information-pattern-based approach to novelty detection

Presenter : Lin, Shu-Han

Authors : Xiaoyan Li, W. Bruce Croft

Information Processing and Management (2008)

Page 2: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation

Objective

Definition

Observation

Methodology

Experiments

Conclusion

Personal Comments

Page 3: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation - specific topic

It is very difficult for traditional word-based approaches to separate the two non-relevant sentences(3&4) from the two relevant sentences(1&2).

The two non-relevant sentences are very likely to be indentified as novel because they contain many new words that do not appear in previous sentences.

3

Page 4: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation - general topic

It is very difficult for traditional word-based approaches to separate the non-relevant sentence(2) from the relevant sentence(1).

4

Page 5: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

To attack above hard problem: To provide a new and more explicit definition of novelty. Novelty is defined as new

answers to the potential questions representing a user’s request or information need .

To propose a new concept in novelty detection – query-related information patterns. Very effective information patterns for novelty detection at the sentence level have been identified.

To propose a unified pattern-based approach that includes the following three steps: query analysis, relevant sentence detection and new pattern detection. The unified approach works for both specific topics and general topics.

5

Page 6: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Definition - Information Patterns

Information patterns of specific topics

Information patterns of general topics

Opinion patterns and opinion sentences

Event patterns and event sentences

6

Table. Word patterns for the five types of NE(Name Entities)-questions

Table. Examples of opinion patterns

Page 7: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Observation – information patterns

Sentence lengths

Relevant sentences on average have more words than non-relevant sentences.

Novel sentences on average have slightly more words than relevant sentences.

Opinion patterns

There are relatively more opinion sentences in relevant (and novel) sentences than in non-relevant sentences.

The novel sentences’ percentage of opinion sentences is slightly larger than relevant sentences’.

7

Table. Statistics of sentence lengths

Table. Statistics on opinion patterns for 22 opinion topics (2003)

Page 8: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Observation – information patterns(Cont.)

NE(Named entity) combinations PLD(PERSON, LOCATION, DATE) types

are more effective in separating relevant and non-relevant sentence.

POLD types(PERSON, ORGANIZATION,

LOCATION, DATE) will be used in new pattern detection; NEs of the ORGANIZATION type may provide different sources of new information.

NEs of the PLD types play a more important role in event topics than in opinion topics.

8

Page 9: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology

9 Fig. ip-BAND: a unified information-pattern-based approach to novelty detection.

Page 10: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology(Cont.)

(1) Query analysis and question formulation

10

How many (2)

Where (3)

Page 11: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology(Cont.)

(2) Using patterns in relevance re-ranking Ranking with TFISF(term frequency –inverse sentence frequency) models

TFISF with information patterns

Sentence lengths

Name Entities

Opinion patterns

(3) Novel sentence extraction

11

Page 12: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

Baseline approaches

B-NN: initial retrieval ranking

B-NW: new word detection

B-NWT: new word detection with a threshold

B-MMR: Maximal Marginal Relevance(MMR)

12

Page 13: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

Performance for specific topics from TREC 2002, 2003, 2004

13Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level.Chg%: Improvement over the first(B-NN) baseline in %.

Table. Performance of novelty detection for 8 specific topics (queries) from TREC 2002

Table. Performance of novelty detection for 15 specific topics (queries) from TREC 2003

Table. Performance of novelty detection for 11 specific topics (queries) from TREC 2004

①②③ ④

3.4 of 15 novel sentence

10.1 of 15 novel sentence

4.6 of 15 novel sentence

Page 14: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

Performance for general topics from TREC 2002, 2003, 2004

14Note: Data with * pass significance test at 95% confidence level by the Wilcoxon test and ** for significance test at 90% level.Chg%: Improvement over the first(B-NN) baseline in %.

Table. Performance of novelty detection for 41 general topics (queries) from TREC 2002

Table. Performance of novelty detection for 35 general topics (queries) from TREC 2003

Table. Performance of novelty detection for 3 general topics (queries) from TREC 2004

①④

3.2 of 15 novel sentence

7.5 of 15 novel sentence

3.4 of 15 novel sentence

Page 15: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

Comparison among specific, general and all topics at top 15 ranks

15

Note: Chg%: Improvement over the first baseline in percentage; Nvl#: Number of true novel sentences; Rdd#: Number of relevant but redundant sentences; NRl#: Number of non-relevant sentences.

Table. Comparison among specific, general and all topics at top 15 ranks

Page 16: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

16

Conclusions

Novelty means new answers to the potential questions representing a user’s request or information need.

The proposed ip-BAND outperforms all baselines for specific topics and general topics, and specific topics is better than general topics.

It is impossible to collect complete novelty judgments in reality Baseline selection and evaluation measure by human assessors

Misjudgment of relevance and/or novelty by human assessors and disagreement of judgments between the human assessors

Limitation and accuracy of question formulations

Novelty detection precision will be low since some non-relevant sentences may be treated as novel.

Page 17: An information-pattern-based approach to novelty detection

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

17

Personal Comments

Advantage …

Drawback …

Application …