robustness of ids based on adversarial machine learning
TRANSCRIPT
Robustness of IDS based on adversarial machine learning
Paul PeseuxSupervisor : Gregory Blanc
Telecom SudParis
August 28, 2019
1 / 57
Outline
1 Intrusion Detection System
2 NovGAN
3 SWAGAN
4 Conclusion
2 / 57
Outline
1 Intrusion Detection System
2 NovGAN
3 SWAGAN
4 Conclusion
3 / 57
Intrusion Detection System
A system that has to detect intrusions
It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.
It is a binary classifier
4 / 57
Intrusion Detection System
A system that has to detect intrusions
It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.
It is a binary classifier
5 / 57
Intrusion Detection System
A system that has to detect intrusions
It raises alerts, but does not take any actions. Alerts are stored, in adatabase for example.
It is a binary classifier
6 / 57
Intrusion Detection System
Figure: IDS example
7 / 57
Intrusion Detection System
To detect intrusions one can use
Rule based
Statistical
−→ Machine Learning hype to train the classifier
Random Forest [Farnaaz 2016]
Decision Trees [Hota 2014]
Artificail Neural Network [Ingre 2015]
. . .
8 / 57
Intrusion Detection System
To detect intrusions one can use
Rule based
Statistical
−→ Machine Learning hype to train the classifier
Random Forest [Farnaaz 2016]
Decision Trees [Hota 2014]
Artificail Neural Network [Ingre 2015]
. . .
9 / 57
GANs
GAN appeared in 2014 [Goodfellow 2014]. There are a lot of work on it :
[Radford 2016]
[Tolstikhin 2017]
[Arjovsky 2017]
This Person Does Not Exists
CycleGAN [Zhu 2017]
. . .
10 / 57
GANs
Figure: GAN representation. Illustration found in [Spi ]
11 / 57
Outline
1 Intrusion Detection System
2 NovGAN
3 SWAGAN
4 Conclusion
12 / 57
Generator Modification
Attacker G objectives
Evade the IDS
Hurt (DDOS, MITM, Pishing, ...)
Generate very realistic traffic is not good!
13 / 57
Generator Modification
Classical GAN [Goodfellow 2014] Learning can be seen as a min-max gamebetween 2 players :
minG
maxD
EPd[log D(x)] + EPz [log 1− DG (z)]
14 / 57
Generator Modification
Definition
Let’s call hurting function any M C∞ such as
M : T → Rt 7→ M(t)
where 0 ≤ M ≤ 1.
G hurting objective is measured with EPz [MG (z)]
15 / 57
Generator Modification
Definition
Let’s call hurting function any M C∞ such as
M : T → Rt 7→ M(t)
where 0 ≤ M ≤ 1.
G hurting objective is measured with EPz [MG (z)]
16 / 57
Generator Modification
From the Generator point of view [Goodfellow 2014] :
lossG = −Ex∼N (0,1)[log DG (x)]
First idea is
lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]
Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives
17 / 57
Generator Modification
From the Generator point of view [Goodfellow 2014] :
lossG = −Ex∼N (0,1)[log DG (x)]
First idea is
lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]
Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives
18 / 57
Generator Modification
From the Generator point of view [Goodfellow 2014] :
lossG = −Ex∼N (0,1)[log DG (x)]
First idea is
lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]
Proposed (in different context) in [Marriott 2018]
This gives poor results on our datasetsOne has to link those objectives
19 / 57
Generator Modification
From the Generator point of view [Goodfellow 2014] :
lossG = −Ex∼N (0,1)[log DG (x)]
First idea is
lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]
Proposed (in different context) in [Marriott 2018]This gives poor results on our datasets
One has to link those objectives
20 / 57
Generator Modification
From the Generator point of view [Goodfellow 2014] :
lossG = −Ex∼N (0,1)[log DG (x)]
First idea is
lossG = Ex∼N (0,1)[log DG (x)] + Ex∼N (0,1)[MG (x)]
Proposed (in different context) in [Marriott 2018]This gives poor results on our datasetsOne has to link those objectives
21 / 57
Generator Modification
Another way to link those to objectives is weighted average
Weight depend on hurting with LD,G (x) = − log DG (x)
lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]
22 / 57
Generator Modification
Another way to link those to objectives is weighted average
Weight depend on hurting with LD,G (x) = − log DG (x)
lossG = Ex∼N (0,1)[MG (x)LD,G (x) + (1−MG (x))(αLD,G (x) + offset)]
23 / 57
Generator Modification
Figure: Vue de la nouvelle fonction de cout en 3D. Representation pas a l’echelle
24 / 57
Results on MNIST
Figure: Generated Images without modification
25 / 57
Results on MNIST
Figure: Generated images with α = 1 and offset = 10.2
26 / 57
Results on MNIST
Figure: Generated images with α = 2.5 and offset = 10.3
27 / 57
Results on MNIST
Figure: In blue, hurting distribution of data test. In orange, hurting distributionof generated data
28 / 57
Results on NSL-KDD
On network traffic, Hurting is difficult to define.
We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.
Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))
f (Impact) =
{0, if Impact = 0
1.176, otherwise
BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)
29 / 57
Results on NSL-KDD
On network traffic, Hurting is difficult to define.
We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.
Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))
f (Impact) =
{0, if Impact = 0
1.176, otherwise
BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)
30 / 57
Results on NSL-KDD
On network traffic, Hurting is difficult to define.
We extended Common Vulnerability Scoring System (CVSS) score onNSL-KDD dataset.
Exploitability = 20× AccessVector × AttackComplexity × AuthenticationImpact = 10.4× (1− (1−ConfImpact)× (1− IntegImpact)× (1−AvailImpact))
f (Impact) =
{0, if Impact = 0
1.176, otherwise
BaseScore = (0.6× Impact + 0.4× Exploitability − 1.5)× f (Impact)
31 / 57
Results on NSL-KDD
Figure: Hurting value distribution on NSLKDD
Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.
32 / 57
Results on NSL-KDD
Figure: Hurting value distribution on NSLKDD
Surprisingly it is a decent classifier.Warning: Hurting function is ultra data-specific.
33 / 57
Results on NSL-KDD
34 / 57
Limits on NovGAN
No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]
Some mini result on toy example
Figure: Nash equilibrium
35 / 57
Limits on NovGAN
No theoretical proof while major papers [Goodfellow 2014] [Arjovsky 2017][Srivastava 2017]Some mini result on toy example
Figure: Nash equilibrium
36 / 57
Outline
1 Intrusion Detection System
2 NovGAN
3 SWAGAN
4 Conclusion
37 / 57
SWAGAN
Definition
Mode Collapse is a case than sometimes happen during GAN training.The Generator stays stuck in a specific mode of the real data distributionand the gradient is stuck to 0
Figure: Mode Collapse
38 / 57
SWAGAN
Input: n, p, s;Initialization: Create n duos;while n > 1 do
for duo in duos dotrain duo on p epochs
end
i = 0, ~g = ~0, ~d = ~0 ;while i ¡ s do
Shuffle duos;~g += perf (generators), ;~d += perf (discriminators)
endRemove worst generator : argmin(~g);
Remove worst discriminator : argmin(~d);Shuffle duos;
endOutput: Trained Generator and Discriminator;
Algorithm 1: SWAGAN algorithm39 / 57
SWAGAN
Figure: SWAGAN starting Architecture Figure: SWAGAN step
40 / 57
GANs
Figure: GAN representation. Illustration found in [Spi ]
41 / 57
SWAGAN Risk
Figure: SWAGAN Risk
42 / 57
Results on MNIST
Figure: Loss evolution during SWAGAN training
43 / 57
Results on NSL-KDD
Figure: Erratic SWAGAN learning on NSL-KDD
44 / 57
Results on NSL-KDD
Figure: First 2 components NSLKDD data visualization
45 / 57
Outline
1 Intrusion Detection System
2 NovGAN
3 SWAGAN
4 Conclusion
46 / 57
Conclusion
We presented 2 new versions of GANs :
NovGAN based on Generator loss modification
SWAGAN based on shuffling
Great results on MNIST
More difficult on network traffic datasets
47 / 57
Conclusion
We presented 2 new versions of GANs :
NovGAN based on Generator loss modification
SWAGAN based on shuffling
Great results on MNIST
More difficult on network traffic datasets
48 / 57
Conclusion
We presented 2 new versions of GANs :
NovGAN based on Generator loss modification
SWAGAN based on shuffling
Great results on MNIST
More difficult on network traffic datasets
49 / 57
Conclusion
We presented 2 new versions of GANs :
NovGAN based on Generator loss modification
SWAGAN based on shuffling
Great results on MNIST
More difficult on network traffic datasets
50 / 57
Conclusion
We presented 2 new versions of GANs :
NovGAN based on Generator loss modification
SWAGAN based on shuffling
Great results on MNIST
More difficult on network traffic datasets
51 / 57
Conclusion
Thanks for your attentionAny questions ?
52 / 57
Conclusion
Thanks for your attentionAny questions ?
53 / 57
Martın Arjovsky, Soumith Chintala et Leon Bottou.Wasserstein Generative Adversarial Networks.In ICML, 2017.
Nabila Farnaaz et Md. Abdul Jabbar.Random Forest Modeling for Network Intrusion Detection System.2016.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron C. Courville et Yoshua Bengio.Generative Adversarial Nets.In NIPS, 2014.
H. S. Hota et Akhilesh Kumar Shrivas.Decision Tree Techniques Applied on NSL-KDD Data and ItsComparison with Various Feature Selection Techniques.2014.
Bhupendra Ingre et Anamika Yadav.Performance analysis of NSL-KDD dataset using ANN.
54 / 57
2015 International Conference on Signal Processing andCommunication Engineering Systems, pages 92–96, 2015.
Richard T. Marriott, Sami Romdhani et Liming Chen.Intra-class Variation Isolation in Conditional GANs.ArXiv, vol. abs/1811.11296, 2018.
Alec Radford, Luke Metz et Soumith Chintala.Unsupervised Representation Learning with Deep ConvolutionalGenerative Adversarial Networks.CoRR, vol. abs/1511.06434, 2016.
Andrea Missinato.https://www.spindox.it/en/blog/
generative-adversarial-neural-networks/.Accessed: 2019-05-07.
Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann etCharles A. Sutton.VEEGAN: Reducing Mode Collapse in GANs using Implicit VariationalLearning.
55 / 57
In NIPS, 2017.
Ilya O. Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-JohannSimon-Gabriel et Bernhard Scholkopf.AdaGAN: Boosting Generative Models.In NIPS, 2017.
Jun-Yan Zhu, Taesung Park, Phillip Isola et Alexei A. Efros.Unpaired Image-to-Image Translation Using Cycle-ConsistentAdversarial Networks.2017 IEEE International Conference on Computer Vision (ICCV),pages 2242–2251, 2017.
56 / 57
Appendices
MKDD( ~T ) =impact
2× exploitability
with impact = 1− (1− naf )× (1− nfc)× (1− sb+db2 )
and exploitability = 50× naf × ns × sb+db2 × (rs + nr) sa2
naf : num-access-file
ns : num-shells
sb : src-bytes
db : dst-bytes
rs : root-shell
nr : num-root
sa : su-attempted
nfc : num-file-creation
57 / 57