multiple classifier systems for adversarial classification tasks
DESCRIPTION
Pattern classification systems are currently used in security applications like intrusion detection in computer networks, spam filtering and biometric identity recognition. These are adversarial classification problems, since the classifier faces an intelligent adversary who adaptively modifies patterns (e.g., spam e-mails) to evade it. In these tasks the goal of a classifier is to attain both a high classification accuracy and a high hardness of evasion, but this issue has not been deeply investigated yet in the literature. We address it under the viewpoint of the choice of the architecture of a multiple classifier system. We propose a measure of the hardness of evasion of a classifier architecture, and give an analytical evaluation and comparison of an individual classifier and a classifier ensemble architecture. We finally report an experimental evaluation on a spam filtering task.TRANSCRIPT
Multiple Classifier Systemsfor Adversarial Classification Tasks
Battista Biggio, Giorgio Fumera and Fabio RoliDept. of Electrical and Electronic Eng., University of Cagliari
Overview
Adversarial classification
An approach to evaluate the hardness of evasion
Comparison of classifier architectures:single classifier vs MCS− analytical comparison− experimental comparison
Traditional pattern recognition problems
Physical / logicalprocess
Featuremeasurement Classification
Adversarial classification problems
Physical / logicalprocess:
legitimate samples
ClassificationFeaturemeasurement
Adversary:malicious samples
Adversarial classification:previous works
Not related to concept drift Analysis of specific vulnerabilities, proposal of specific
defence strategies− Globerson and Roweis, ICML 2000− Perdisci et al., ICDM 2006− Jorgensen et al., JMLR 9, 2008− Wittel and Wu, CEAS 2004− Lowd and Meek, CEAS 2005
Theoretical frameworks− Dalvi et al., KDDM 2004− Lowd and Meek, KDDM 2005
Design of pattern recognition systems
Goal in “traditional” applications: maximise accuracy
Dataacquisition
Featureextraction
Modelselection Classification
Design of pattern recognition systems
Goal in “traditional” applications: maximise accuracy
Dataacquisition
Featureextraction
Modelselection Classification
Goal in adversarial classification tasks: maximise accuracy and hardness of evasion
Dataacquisition
Featureextraction
Modelselection Classification
Design of pattern recognition systems
Goal in “traditional” applications: maximise accuracy
Dataacquisition
Featureextraction
Modelselection Classification
Goal in adversarial classification tasks: maximise accuracy and hardness of evasion
Dataacquisition
Featureextraction
Modelselection Classification
Hardness of evasion
+
th
x1
...
xn
≥ 0: malicious
< 0: legitimateDecision function...
y Î {malicious, legitimate}
Hardness of evasion
+
th
x1
...
xn
≥ 0: malicious
< 0: legitimateDecision function...
y Î {malicious, legitimate}
Expected value of the minimum number of featuresthe adversary has to modify to evade the classifier
(worst case: the adversary has full knowledge on theclassifier)
Hardness of evasion: an example
+
th = 2
x1 = 1x2 = 1x3 = 0x4 = 1x5 = 0
≥ 0: malicious
< 0: legitimate
x = (1 1 0 10)
0.30.83.01.51.0
Expected value of the minimum number of featuresthe adversary has to modify to evade the classifier
Hardness of evasion: an example
+
th = 2
x1 = 1x2 = 1x3 = 0x4 = 1x5 = 0
≥ 0: malicious
< 0: legitimate
x = (1 1 0 10)
0.30.83.01.51.0
Expected value of the minimum number of featuresthe adversary has to modify to evade the classifier
Hardness of evasion: an example
+
th = 2
x1 = 0x2 = 1x3 = 1x4 = 0x5 = 0
≥ 0: malicious
< 0: legitimate
x = (0 1 1 00)
0.30.83.01.51.0
Expected value of the minimum number of featuresthe adversary has to modify to evade the classifier
Hardness of evasion: an example
+
th = 2
x1 = 0x2 = 1x3 = 1x4 = 0x5 = 0
≥ 0: malicious
< 0: legitimate
x = (0 1 1 00)
0.30.83.01.51.0
Expected value of the minimum number of featuresthe adversary has to modify to evade the classifier
Comparison of two classifier architecturesx1
xn
x2
t
w1
w2
...
wn
X xi Î {0,1}
Comparison of two classifier architecturesx1
xn
x2
t
t1w1
w2
...
wn
...
t2
...
...
tN
...
X1
X2
XN
OR
X1 È X2 È ... È XN = XXi Ç Xj = Æ, i ¹ j
X xi Î {0,1}
Comparison of two classifier architecturesx1
xn
x2
t
t1w1
w2
...
wn
...
t2
...
...
tN
...
X1
X2
XN
OR
X1 È X2 È ... È XN = XXi Ç Xj = Æ, i ¹ j
x1, x2,..., xn i.i.d. identical weightst1 = t2 =...= tn, |Xi| = n/N
X xi Î {0,1}
Comparison of two classifier architectures
p1A = 0.25p1L = 0.15
Details are in the paper
Comparison of two classifier architectures
p1A = 0.25p1L = 0.15
Details are in the paper
Comparison of two classifier architectures
ROC working point:min (C×FP + FN)
C = 1, 2, 10, 100
C = 1
C = 2
C = 10
C = 100
Experimental set-up
SpamAssassin filter (open source) − linear classifier: weighted sum of about N = 900 binary-
valued (0/-1 or 0/1) features (tests) TREC 2007 e-mail data set
− 25,220 legitimate, 50,199 spam (April-July 2007) Classifier architectures
− linear classifier: standard SpamAssassin(linear SVM for weight computation)
− MCS: logical OR of N linear SVM classifiers (N = 3, 10)trained on disjoint feature subsets (identical size, randomsubdivision)
− working point: minimize FN, FP ≤ 1%
Experimental results
Conclusions
Adversarial classification tasks:accuracy and hardness of evasion
An approach for evaluating the hardness of evasion ofdecision functions
Multiple Classifier Systems: potentially useful toimprove the hardness of evasion