doing research about web application firewalls
TRANSCRIPT
3 Anomaly‐based WAFs
STATISTICAL TECHNIQUES
MARKOV CHAINS
DECISION TREES
4
STOCHASTIC MACHINE LEARNING
Stochastic design
5
NBD FileFeatureExtraction
Training phase
Test phase
PREPROCESSING PROCESSING
Store
Check
Dataset
WAF
Statistical detection models
• Length model.• Character distribution.
• Percentage of letters.• Percentage of digits.• Percentage of non‐alphanumericcharacters.
• Non‐alphanumeric set.8
Test‐ Detection Process
10
Request
Forward Request
RejectRequest
NBD File
Method
Resource
Headers
Arguments
incorrect incorrect incorrect incorrect
correct correct correct
correct
Detection process
• Length model.• Character distribution. • Statistical or Markovian implementation.
11
3 Anomaly‐based WAFs
STATISTICAL TECHNIQUES
MARKOV CHAINS
DECISION TREES
13
STOCHASTIC MACHINE LEARNING
Machine Learning System
Preprocessing
ProcessingTraining Phase
Test Phase
FeatureExtraction
FeatureSelection
Combination of expert knowledge
and n‐gram.
Generic FS (GeFS) [Nguyen et al., 2010].
Decision Trees: C4.5, CART, Random Tree, Random Forest.
14
Data acquisition
• “The most significant challenge that an evaluation faces is the lack of appropriate public datasets for assessing anomaly detection systems” [Sommer and Paxson, 2010], [Tavallaee et al., 2010].
• Private datasets. • Comparison problem.
16
Dataset Public HTTP Labelled Classes Up‐to‐date
Notanony‐mized
UNB ISCX
ECML/PKDD
LBNL
DEFCON
DARPA 98/99Captured
17
Dataset Public HTTP Labelled Classes Up‐to‐date
Notanony‐mized
UNB ISCX
ECML/PKDD
LBNL
DEFCON
DARPA 98/99Captured
CSIC/TORPEDA
19
CSIC dataset
• 36 000 normal requests, 25 000 anomalous requests.
• Modern web attacks: SQLi, buffer overflow, information gathering, CRLFi, XSS, server side include and parameter tampering.
• http://www.tic.itefi.csic.es/dataset/20
TORPEDA dataset• 8 500 normal requests, 15 000 anomalous requests, 55 000 attacks.
• Modern web attacks: SQLi, XSS, buffer overflow, format string attack, LDAPi, OS command injection, HTTP splitting, local file include, server side include, Xpathi, CRLFi, directory browsing, parameter tampering.
• Type of attack.
• http://www.tic.itefi.csic.es/torpeda/21
Results
Technique DetectionRate
False PositiveRate
Statistical 99.4% 0.9%
Markov chain 98.1% 1%
MachineLearning
95.1% 4.9%
23
CSIC dataset.
More about contributions• Proposal of new feature extraction methodsby combining expert knowledge and n‐grams.
24
DetectionRate
FalsePositive Rate
Expert Knowledge 93.55% 7.15%
N‐grams 82.43% 23.85%
Combination 94.41% 6.18%
More about contributions
• Successful application of the GeFSmeasure to web traffic for the first time.
25
Before FS After FSExpert Knowledge 30 11N‐grams 114 12Combination 42 42
• Reduction of the number of features between 63% and 91%.
• Lower resource consumption and processing time.
More about contributions
Total number of requests: 61 000. Reduced to half (Markov and ML) or even to quarter(statistical system).
26
• Optimization of the training phase by reducing the number of training requests.
Comparison
Technique DetectionRate
False Positive Rate
ProcessingTime
(ms/req.)
Number ofTraining requests
Statistical 99.4% 0.9% 0.59 16 383
Markovchain
98.1% 1% 7.9 32 767
ML 95.1% 4.9% 0.3 32 767
27
CSIC dataset.
WAF‐ Defence
“What is defence in conception? The warding off a blow. What is then its characteristic sign? The state of expectancy (or of waiting for this blow).”Carl Von Clausewitz “On war”.
29
Thanks for your attention
30
[email protected]@gmail.com
Carmen Torrano@ctorranog
http://www.itefi.csic.es/es/personal/ torrano‐gimenez‐carmen