felix d íaz-hemida, david e. losada, alberto bugarín, and senén barro

42
A Probabilistic Quantifier Fuzzificati on Mechanism: The Model and Its Evaluation for Infor mation Retrieval Felix Díaz-Hemida, David E. Losada, Alberto Bugarín, and Senén Barro Present by Chia-H ao Lee

Upload: wayne-figueroa

Post on 30-Dec-2015

20 views

Category:

Documents


1 download

DESCRIPTION

A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval. Felix D íaz-Hemida, David E. Losada, Alberto Bugarín, and Senén Barro. Present by Chia-Hao Lee. outline. Introduction Fuzzy Quantifiers - PowerPoint PPT Presentation

TRANSCRIPT

A Probabilistic Quantifier Fuzzification Mechanism:The Model and Its Evaluation for Information Retriev

al

Felix Díaz-Hemida, David E. Losada, Alberto Bugarín, and Senén Barro

Present by Chia-Hao Lee

2

outline

• Introduction• Fuzzy Quantifiers

– Probabilistic Quantifier Fuzzification Mechanisms

• New View in Crisp Representatives– FA Quantifier Fuzzification mechanism– Properties of the Model

• Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval – Fuzzy Quantifiers and Information Retrieval– Example

• Information Retrieval Experiments• Conclusion

3

Introduction

• The ability of fuzzy quantifiers to model linguistic statements in a natural way has proved useful in diverse areas such as expert systems, data mining, control systems, database systems, etc.

• In the information retrieval (IR) field, fuzzy quantification has been proposed for handling expressive queries giving rise to flexible query language.

4

Introduction

• Fuzzy quantification is a linguistic granulation technique capable of expressing the global characteristics of a collection of individuals, or a relation between individuals, through meaningful linguistic summaries.

• Granular computing attempts to manage complex, large-scale problems by organizing these into different levels of detail.

• It is understood that each sub-problem should be solved at its appropriate level of granularity, and there are effective transformations which mediate between these levels.

5

Introduction

• The need for such transformation process not only arises in the technical problem areas tackled by computers.

• It is hence not surprising that natural language (NL) provides a class of expressions specifically designed to express accumulative properties and to summarize information: natural language quantifiers .

• NL quantifiers, and in particular their approximate variety (“almost all ”, “a few ” etc.), provide flexible means for expressing accumulative properties of collections and can also describe global aspects of relationships between individuals.

6

Introduction

• Fuzzy set theory attempts to model NL quantifiers by operators called fuzzy quantifiers . – Interpretation : the development of methods for evaluating quantify

ing expressions which capture the meaning of natural language quantifiers.

– Summarization : the development of processes for constructing quantifying statements, which succintly describe a collection of observations and/or relationships between a large number of observations (find domain concepts X and Y and a quantifier Q such that “ Q X’s are Y’s is true ”) .

– Reasoning : the development of methods which deduce further knowledge from a set of rules and/or facts involving fuzzy quantifiers.

7

Fuzzy Quantifiers

Fuzzy Quantifiers

Two-valued quantifier: input : crisp input

output: crisp output

Fuzzy quantifier:Input : fuzzy input

Output : fuzzy output

Semi-fuzzy quantifier:Input : crisp input

Output : fuzzy output

8

• Definition 1 (Classic Quantifier or Two valued Quantifier) : An n-ary generalized quantifier on a base set is a mappi

ng Q : A two-valued quantifier hence assigns to each n-tuple of crisp subsets a two-valued quantification res

ult .

Fuzzy Quantifiers

1,0 2E nE

EXX n ~,,1 2XXQ n ,,1

E : the powerset of E E~ : the fuzzy powerset of E

9

Fuzzy Quantifiers

• Well-known examples

• A typical example of a classic quantifier is the following definition of an all statement which can be used for sentences such as “ ” :sXaresXall 21

)1(,0

,1, 21

21

otherwise

XXifXXall

EXXE 1 XXE 1 2121 1, XXXXallE

2121 1, XXXXsomeE kXXXXkleastat E 2121 1,

12121 1, XXXXXrate 12121 1, XXXXXrate

10

Fuzzy Quantifiers

• For example :

Let us consider the evaluation of the sentence “80% or more of students are Spanish” in the reference

where the properties “students” and “Spanish” are,

respectively, defined as

X1(students)={1,0,1,0,1,0,1,1} (true : 1 , false : 0)

X2(Spanish)={1,0,1,0,1,0,0,0}

and “80% or more” is defined as in (1). Then

1

11

21

21

1

80.0,___%80

Xif

XifX

XXXXtheofmoreorabout

080.0

1,1,0,1,0,1,0,1

0,0,0,1,0,1,0,1,___%80 21

XXtheofmoreorabout

Logic “and”

87654321 ,,,,,,, eeeeeeeeE

11

• Definition 2 (Fuzzy Quantifier) :

An n-ary fuzzy quantifier on a base set is a mapping which to each n-tuple of fuzzy subsets of E assigns a gradual result

An example of a fuzzy quantifier is , which can defined as a fuzzy extension of 1 using a typical definition for the fuzzy inclusion operator:

Fuzzy Quantifiers

E 1,0~:

~ nEQ

1,0~:~ 2 Elal

)2(:,1maxinf,~

2121 EeeeXXlal XX

Q~

nXX ,,1 1,0,,

~1 nXXQ

12

• For example :

Let us consider the evaluation of sentence “all tall people

are blond” in the referential set . Let us assume

that properties “tall” and “blond” are, respectively, defined as

Using expression (2) then:

• In many cases, it is not easy to achieve consensus on an intuitive and generally applicable expression for implementing a given quantified expression.

Fuzzy Quantifiers

4321 ,,, eeeeE

43211 /3.0,/6.0,/1,/8.0 eeeetallX 43212 /2.0,/3.0,/7.0,/9.0 eeeeblondX

EeeeXXlal XX :,1maxinf,~

2121

4.07.0,4.0,7.0,9.0inf

13

Fuzzy Quantifiers

• Definition 3 (Semi-fuzzy Quantifier) :

An n-ary semi-fuzzy quantifier on a base set is a mapping which to each n-tuple of crisp subsets of E assigns a gradual result .

.

E 1,0: nEQ

1,0,,1 nXXQ

14

Fuzzy Quantifiers

• Examples of semi-fuzzy quantifier are :

218,6,4,221 ,5_ XXTXXabout

1

11

218.0,5.0

21

1

,___%80

Xif

XifX

XXS

XXtheofmoreorabout

,,0

,,1

,,1

,,

,,0

,,,

xd

dxccd

cxcxb

bxaab

axax

xT dcba

x

xx

xx

x

x

,12

,21

2,2

,0

S 2

2

,

15

Fuzzy Quantifiers

• For example :

Let us consider the evaluation of the sentence “about 80% or more of the students are Spanish”. Let us assume that properties “students” and “Spanish” are, respectively, defined as

X1(students)={1,0,1,0,1,0,1,1} ,

X2(Spanish)={1,0,1,0,1,0,0,0} then

22.0

1,1,0,1,0,1,0,1

0,0,0,1,0,1,0,1,___%80 8.0,5.021

SXXtheofmoreorabout

16

Fuzzy Quantifiers

• Semi-fuzzy quantifiers are half-way between two-valued quantifiers and fuzzy quantifiers because they have crisp input and fuzzy output. In particular, every two-valued quantifier of TGQ (theory of generalized quantifiers) is a semi-fuzzy quantifier by definition.

• Being half-way between two-valued generalized quantifiers and fuzzy quantifiers, semi-fuzzy quantifiers do not accept fuzzy input, and we have to make use of a fuzzification mechanism which transports semi-fuzzy quantifiers to fuzzy quantifiers.

1,0~:~

1,0:: nn EQEQF

17

Fuzzy Quantifiers

• Probabilistic Quantifier Fuzzification Mechanisms :

In the universe of discourse E is finite and expressions and unary then both expressions collapse into the same discrete expression

• The value can be interpreted as the probability that ( ) is selected as the crisp representative for the fuzzy set X .

m

iiii

XQXQF0

1

1 ii

iX eEeX X,

18

Fuzzy Quantifiers

• Let be a set of individuals for which the set

represents the fulfillment of the property “being all”. It is reasonable for X to arise on the basis of a consonant vote. The intuitive ordering of the elements of the referential on the basis of their height is

. The focal elements and their associated probability masses are :

54311 /2.0,/5.0,/5.0,/1,/8.0 eeeeeX

2.0, 12121 eeme XX

3.0,, 312212 eemee XX

3.0,,,, 53343213 eemeeee XX

2.0,,,,, 54543214 emeeeee X

54321 ,,,, eeeeeE EX ~

54312 eeeee

19

Fuzzy Quantifiers

It should also be noted that

where denoted the α-cut of X ;

1

0 deXe ii

X

20

Fuzzy Quantifiers

• For example :

Let us consider the evaluation of the quantified sentence “almost all students are tall.”

Suppose that we model the property tall for a referential set

of students through the fuzzy set tall

and we support the quantified expression “almost all” by means of the following semi-fuzzy quantifier :

321 ,, eeeE 321 /1,/9.0,/8.0 eee

2

11 3,

nnqwhereXqXQ

the feature “tall”

21

Fuzzy Quantifiers

given the fuzzy set tall, the values are

and the fuzzification process runs as follows:

0,8.0,9.0,1,1 43210 i

3

01

iiii

tallQtallQF

2110 10 tallQtallQ

4332 32 tallQtallQ

9.0111 33 eQeQtallQF

08.0,,8.09.0, 32132 eeeQeeQ

8.03

31.0

3

21.0

3

1222

855.0

22

New View on Crisp Representatives

• Given a fuzzy set , the process that selects a number of elements in E to be included in a crisp representative of X can be viewed as a random process in which n mutually independent binary decisions are made .

• Every individual decision involving an element may be viewed as a Bernoulli trial whose probability of success equals .

EX ~

En

Ee

eXA random variable X has a Bernoulli distribution with parameter p (0<p<1) if X take only the values 0 and 1.The p.f. f (·|p) of X can be written in the form

otherwise

xforqppxf

xx

0

1,01

pq 1

23

New View on Crisp Representatives

• Definition 4 ( ) :We define the probability that a crisp set i

s a crisp representative of X as

• Definition 5 ( ):Let be a semi-fuzzy quantifier.

YtiveRepresentaP X

EY

Ye

XYe

XX eeYtiveRepresentaP 1

For simplicity , YtiveRepresentaPYm XX

1,0: nEQ

EXXYYQYmYmXXQF sEY EY

ssXXsA

s

s

~,,,,,,, 1111

1

1

AF

fuzzification process :AF

24

New View on Crisp Representatives

• We will denote by a referential containing m elements. By we will denote a crisp (fuzzy) set on this referential. (so we have subsets)

• Let us consider a unary semi-fuzzy quantitative quantifier

mm eeE ,,1

mm EXEY ~

mEYYqYQ ,11

1,0:1 Nq1q : a function with the form

m2

25

New View on Crisp Representatives

• For this case, the expression becomes

• And we instead of

mEY

XA YQYmXQF 11

mEY

XYEY

Xmm

YQYmYQYm 10

1

mEY

XYEY

Xmm

YqYmYqYm 10

1

jYEYX

m

Ym jcardP Xr

mqmcardPqcardPXQF XrXrA

111 00

m

jXr jqjcardP

01

26

New View on Crisp Representatives

• Example of the approach"" tallarestudentsallalmostsentencequantifiedtheagainevaluateusLet

model.tionquantificaFtheapplying A

:/1,/9.0,/8.0"" 321 quantifierfuzzysemitheandeeetallsetfuzzytheGiven :, jofvalueeveryforjcardPiesprobabilitthecomputeweFirst tallr

001.02.000

tallYEY

talltallr mYmcardP

02.011.02.009.02.001.08.0

1 3211

emememYmcardP talltalltallYEY

talltallr

26.019.02.011.08.009.08.0

,,,2 3231212

eemeemeemYmcardP talltalltallYEY

talltallr

72.019.08.0,,3 3213

eeemYmcardP tallYEY

talltallr

AF

27

New View on Crisp Representatives

838.03

372.0

3

226.0

3

102.000

33221100222

1111

3

01

qcardPqcardPqcardPqcardP

jqjcardPtallQF

tallrtallrtallrtallr

jtallr

A

,thenAnd

• It can be proved that all the value can be obtained with a complexity

jcardP Xr 2mO

28

New View on Crisp Representatives

• We can advance that the model is well-behaved because it fulfills the properties of correct generalization of crisp expressions, induced operations, external negation, internal negation, duality, internal meets, monotonicity in arguments monotonicity in quantifiers and coherece with logic .

29

Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval

• IR is the science concerned with the effective and efficient retrieval of information for the subsequent use by interested parties.

• IR models differ in the way in which documents and queries are represented and matched.

• The proposal designs a general framework based on the NVM method in which quantifiers with different degrees of expressiveness can be handled.

30

Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval

• Consider a query with the form . Given a document

of the document base, every query term produces a score which represents the connection between the document’s semantics and the term.

• Formally, every document induces a fuzzy set on the set of query terms which is defined applying the popular weighting strategy

nqtqtall ,,1 kd

iqt

kdkdC

idftf /

nnCCd qtqtqtqtCkdkdk

/,,/ 11

5

maxmax ,

,

ll

i

kzz

kqt

iC qtidf

qtidf

f

fqt i

kd

iqt

kdkqtif , : the raw frequency of term in the document

kzz f ,max : the maximum raw frequency computed over all terms mentioned by the document kd

31

Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval

• The fuzzy set models the connection between the document and every query component.

• Quantification can now be applied on for evaluating the quantified symbol all.

kdC

kd

kdC

32

Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval

• Example :Let us suppose that we apply the following power function for supp

orting a given query quantification symbol Q :

Imagine a query and consider a document whose fuzzy set induced on the query components is

Applying now the fuzzification process explained along this paper, the query-document matching is assigned a score

2

2

,n

xxpqXpqXQs n : the number of query terms

4321 ,,, qtqtqtqtQkd

4321 /2.0,/0,/3.0,/7.0 qtqtqtqtCkd

1100 pqcardPpqcardPkdkd CrCr

3322 pqcardPpqcardPkdkd CrCr 44 pqcardP

kdCr

24

12.07.03.08.03.03.08.07.07.008.07.03.0

12625.04

32.03.07.0

4

22.03.03.02.07.07.08.03.07.0

2

2

2

2

33

Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval

Let us now apply the NVM approach to handle the same example.

The score assigned is equal to

It follows that the final value yielded by the NVM method is:

4

01

iiids

ikCQ

1.0,4.03.0 211 qtqtQqtQQ sss

0,,,2.0,, 4321321 qtqtqtqtQqtqtqtQ ss

1625.02.04

31.0

4

24.0

4

12

2

2

2

2

34

Information Retrieval Experiment

• We ran experiments against the Wall Street Journal (WSJ) documents, which are about 173,000 news articles (from 1987 to 1992).

• Natural language documents are preprocessed as follow:– First, common words such as prepositions, articles, etc. are

eliminated.– Second, terms are reduced to their syntactical root by applying

the popular Porter’s stemmer.

35

Information Retrieval Experiment

• We tried out different semi-fuzzy quantifiers for relaxing the interpretation of the quantified statement all and, for each semi-fuzzy quantifier, both the fuzzification approach and the NVM approach were applied.

• We experimented with power functions and exponential functions, both of them normalized in the interval

as follows :

AF

1,0

XpqXPQ 1 exp

exp

1 n

xxpq

XeqXEQ 1 nk

xk

e

exeq

1

36

Information Retrieval Experiment

37

Information Retrieval Experiment

38

Information Retrieval Experiment

39

Information Retrieval Experiment

40

Information Retrieval Experiment

41

Information Retrieval Experiment

42

Conclusion

• In the paper, we present a new probabilistic quantifier fuzzification mechanism, its efficient implementation and its application for the basic information retrieval task.