incorporating game theory in feature selection for text categorization

27
Incorporating Game Theory in Feature Selection for Text Categorization Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2 [email protected] [email protected] http://www.cs.uregina.ca/~azam200n http:// www.cs.uregina.ca/~jtyao

Upload: rian

Post on 07-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Incorporating Game Theory in Feature Selection for Text Categorization. Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2 [email protected] [email protected] http://www.cs.uregina.ca/~azam200n http://www.cs.uregina.ca/~jtyao. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Incorporating Game Theory in Feature Selection for Text Categorization

Incorporating Game Theory in Feature Selection for Text Categorization

Nouman Azam and JingTao Yao

Department of Computer Science University of ReginaCANADA S4S 0A2

[email protected] [email protected]://www.cs.uregina.ca/~azam200n http://www.cs.uregina.ca/~jtyao

Page 2: Incorporating Game Theory in Feature Selection for Text Categorization

Acknowledgement• Thanks to Dr. Dominik Slezak for presenting

this work on our behalves.

J T Yao Incorporating Game Theory in Feature Selection for TC 2

Page 3: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 3

Introduction• Feature selection.

– Selecting a subset of important features.

• Text categorization.– Assigning textual documents to predefined

categories.

• Text categorization and high imbalance.– The number of instances in categories varies

significantly. – Importance of features vary accordingly.– Hard to apply feature selection techniques directly.

Page 4: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 4

Feature Selection in Text Categorization

• Assigning positive or negative values to features.– The values indicate importance of features.– Positive values indicates importance for positive

category.– Negative values indicates importance for negative

category.

Page 5: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 5

Existing Feature Selection Approaches

• One sided approaches.– Selecting features with high positive values.

• Two sided approaches.– Selecting features with high absolute value.

• Explicit combinational approach.– Selecting features with high positive or negative

values generated by a one sided method.

Page 6: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 6

Limitations of Existing Approaches

• Favours features indicative of either positive or negative category.– There may be features that indicates both categories.– It is plausible to include such features in some

applications.

• Dilemma: positive features vs. negative features.• However, we need to find a way to select these

features. – Incorporating Game Theory in Feature Selection to

deal with this issue.

Page 7: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 7

Incompetence of Existing Approaches• An Example.

– Considering an imbalanced data set with 10 documents in positive and 100 in negative categories.

– There are eight words in these documents.

• Considering four methods.– One sided approaches: correlation coefficient and

GSS coefficient.– Two sided approaches: chi square and gini index.

Page 8: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 8

Probabilities of Words in Categories

• Meaning of probabilities.– Referring to fraction of documents from a category

containing the word.

Page 9: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 9

Scores of Words

Page 10: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 10

Rankings of Words

• Observations– w7 and w8 are not considered as important by any

method. – They will be ignored, if we select three features.

Page 11: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 11

A Simple Solution

• Using an explicit combinational approach.– Probabilities in respective categories are used for

rankings. – The new rankings.

– Considering positive category twice as important as negative category.

• We may select w1, w8 and w4.• We note that w8 which indicates both categories is selected.

Page 12: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 12

• A feature may be considered as good for, – Positive category, – Negative category, – Both of them, or – Neither of them.

• We are trying to find a systematic method, that finds the best decision choice.

• Game theory may be useful for formulating such method.

Conclusion from the Simple Solution

Page 13: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 13

• Game theory is a core subject in decision sciences.– Prisoners Dilemma.

• A classical example in Game Theory.

Game Theory

Page 14: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 14

• Formulating problems with Game Theory requires to,– Identify the player set.– Identify the strategy set.– Determine the payoff functions.– Implement a competition.

Feature Selection with Game Theory

Page 15: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 15

• Two players were selected.

• The players represents positive and negative category.– The player C+ represents positive category. – The player C- represents negative category.

• Each player determine the features’ utility for its respective category.

The Player Set

Page 16: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 16

• Two actions were formulated for each player.– Action a1 for keeping a feature.– Action a2 for discarding a feature.

• For Differentiating the actions of the two players– denote the actions of C+. – denote the actions of C-.

The Strategy Set

Page 17: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 17

The Payoff Functions

• Notation for a payoff function.– Payoff of player i, performing action j, given action

k of opponent is denoted as .

• The payoff sets.

Page 18: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 18

Defining the Payoff Functions

• Let cat and cat represents positive and negative categories.– A and B represent the number of documents from cat

and cat containing word w.– C and D represent the number of documents from

cat and cat that does not contain w.

• Conditional probabilities of w in cat and cat are

Page 19: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 19

Payoffs Functions for Players

• Both players deciding to keep a feature.• The payoffs of players are calculated as average.

.

• Both players deciding to discard a feature. – The payoffs are calculated as .

• C+ deciding to keep while C- discard. – The payoffs are and respectively.

• C+ deciding to discard while C- keep.– The payoffs are and respectively.

Page 20: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 20

Actions Scenarios for Players

Page 21: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 21

Implementing Competition

• Representing the game in a payoff table.– Determining Nash equilibrium for finding the

actions of players.

Page 22: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 22

Selected Features Set

• Defining two features sets.– FS+ as set of features representing positive

category.– FS- as set of features representing negative category.

• The game will determine the inclusion or exclusion of features in these sets.– Final selected features is the union of FS+ and FS-.

Page 23: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 23

A Demonstrative Example

• Considering earlier example.

Page 24: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 24

• The bold cells represents Nash equilibrium.– Considering w1.

• The actions of players in equilibrium are for C+ and

for C-.• The actions of players decides to include w1 in FS+.

Payoff Tables for Words

Page 25: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 25

Payoff Tables for Words

Page 26: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 26

• Result of implementing game for features. – FS+ = {w1, w7, w8} and FS- = {w4, w7,w8}.– FS = {w1, w4, w7, w8}.

• Observation.– The words w7 and w8 are selected.– The suggested approach selects features, that

indicates both categories.

Selected Features

Page 27: Incorporating Game Theory in Feature Selection for Text Categorization

J T Yao Incorporating Game Theory in Feature Selection for TC 27

Conclusion

• Limitations of existing approaches.– Preference is given to features indicating positive or

negative category.• The may not be suitable for selecting features indicating

both categories.

• Game theory based method.– Implements a game between categories.

• Importance of the method. – Useful in selecting features indicating positive

category, negative category or both of them.