ml based detection of users anomaly activities (20th owasp night tokyo, english)
TRANSCRIPT
![Page 1: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/1.jpg)
ML based detection of users anomaly activities
Yury LeonychevESG, Rakuten inc.OWASP Night 9/3/2016
![Page 2: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/2.jpg)
2
Agenda
• Case study presentation• Workshop format
What WhereIDE Continuum Analytics Anaconda https://www.continuum.io/downloads
Python3+NumPy+SciPy+ScikitLearn
https://www.python.org/downloads/http://www.scipy.org/install.html
Model Application https://github.com/tracer0tong/buzzboard
![Page 3: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/3.jpg)
3
Abstract problem definition
1. Browser based activitya. Normal user interacts with browserb. Web application generated activity
2. HTTP request activitya. Normal UAb. Headless browser or script/bot
3. Frontend/Backend data exchange
![Page 4: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/4.jpg)
4
Methodology (CRISP-DM)https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#/media/File:CRISP-DM_Process_Diagram.pngBy Kenneth Jensen License: CC BY-SA 3.0
![Page 5: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/5.jpg)
5
Model description
1. Business understanding – we want to classify “bad” and “good” users, where “bad” users couldn’t enter CAPTCHA, but “good” users – could.
2. Data understanding – HTTP requests and result of CAPTCHA checks.
3. Data preparation – collect requests, prove that this is full set. Get data from users and collect to database.
4. Create model. Define and tune settings for Decision Tree.5. Calculate mistakes, validate model.6. Deploy model to production.
![Page 6: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/6.jpg)
6
Feature extraction
Direct IndirectSize of HTTP request IP address reputation
Length of URI address User reputation
User Agent History based features
Amount of HTTP headers Time based features
Response code/Response time Business logic based features
… …
![Page 7: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/7.jpg)
7
Application workflow
![Page 8: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/8.jpg)
8
Application workflow (Learning Mode)
![Page 9: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/9.jpg)
9
Application workflow (Strict Mode)
![Page 10: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/10.jpg)
10
Decomposition
![Page 11: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/11.jpg)
11
Offline computations
• Offline with Hadoop, Spark (MLlib), Elasticsearch• Realtime with Spark (Streams and MLlib), Kafka• Same technologies available in AWS and Azure
![Page 12: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/12.jpg)
12
Continuous experiment
![Page 13: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/13.jpg)
13
Knowledge matters!
• You should understand what are you doing!– Is it normal to have 1.0 accuracy?– Could we measure Mean Squared Error for our model application?– Have we already chose correct algorithm and parameters?– This is correct feature?
METHODS = ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'HEAD']def MethodFeature(request): return METHODS.index(request.method)
![Page 14: Ml based detection of users anomaly activities (20th OWASP Night Tokyo, English)](https://reader035.vdocuments.site/reader035/viewer/2022070516/58730c311a28ab99088b6e2d/html5/thumbnails/14.jpg)
14
Conclusion
• Use a decomposition (different levels of classification)• Use flexible features collection• Prefer offline computations• Give yourself field for experiments• Don’t forget ML integration – continuous process• Get knowledges about ML