challenges for reliable and large scale evaluation of...

33
Introduction Datasets Malware analysis Performances Conclusion Challenges for Reliable and Large Scale Evaluation of Android Malware Analysis Jean-François Lalande Valérie Viet Triem Tong Mourad Leslous Pierre Graux INVITED TALK – SHPCS 2018 July 19th 2018

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

Introduction Datasets Malware analysis Performances Conclusion

Challenges for Reliable and Large ScaleEvaluation of Android Malware Analysis

Jean-François Lalande Valérie Viet Triem TongMourad Leslous Pierre Graux

INVITED TALK – SHPCS 2018

July 19th 2018

Page 2: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

2 / 31

Introduction Datasets Malware analysis Performances Conclusion

Context

Google Play Store: 3.5 million applications (2017)

Malware are uploaded to the Play Store+ Third party markets

Research efforts:Detection, classificationPayload extraction, unpacking, reverseExecution, triggering

Difficulties for experimenting with Android malware samples

Page 3: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

3 / 31

Introduction Datasets Malware analysis Performances Conclusion

Analyzing malware

About papers that work on Android malware. . .

CopperDroid [Tam et al. 201]:1365 samples3% of payloads executed

IntelliDroid [Wong et al. 2016]:10 samples90% of payloads executed

GroddDroid [us ! 2015]:100 samples24% of payloads executed

. . .

Page 4: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

4 / 31

Introduction Datasets Malware analysis Performances Conclusion

Challenges

About datasetsIs there any datasets?Can we build one?

About analysisWhat are the difficulties?Is it reliable and reproducible?Are samples really malware?

About performancesHow much does it cost?Is it scalable?

Page 5: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

5 / 31

Introduction Datasets Malware analysis Performances Conclusion

1 Introduction

2 Datasets

3 Malware analysis

4 Performances

5 Conclusion

Page 6: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

6 / 31

Introduction Datasets Malware analysis Performances Conclusion

Why building a dataset ?

Papers with Android malware experiments:use extracts of reference datasets:

The Genome project (stopped !) [Zhou et al. 12]Contagio mobile dataset [Mila Parkour]Hand crafted malicious apps (DroidBench [Artz et al. 14])Some Security Challenges’ apps

need to be significant:Tons of apps (e.g. 1.3 million for PhaLibs [Chen et al. 16])Some apps (e.g. 11 for TriggerScope [Fratantonio et al. 16])

A well documented dataset does not exist !Online services give poor information !

Page 7: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

7 / 31

Introduction Datasets Malware analysis Performances Conclusion

Building a dataset

Collect malwarefrom online sources, or researchersstudy the samples manually

Methodology:

manual reverse of 7 samplesmanual triggering (not obvious)execution and information flow capture

By Con-struct + replicant

community [CC BY-SA 3.0]

Page 8: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

8 / 31

Introduction Datasets Malware analysis Performances Conclusion

A collection of malware totally reversed

Kharon dataset: 7 malware1:

http://kharon.gforge.inria.fr/dataset

DroidKungFu, BadNews (2011, 2013)WipeLocker (2014)MobiDash (2015)SaveMe, Cajino (2015)SimpleLocker (2014)

1Approved by Inria’s Operational Legal and Ethical Risk AssessmentCommittee: We warn the readers that these samples have to be used forresearch purpose only. We also advise to carefully check the SHA256 hashof the studied malware samples and to manipulate them in a sandboxedenvironment. In particular, the manipulation of these malware impose tofollow safety rules of your Institutional Review Boards.

Page 9: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

9 / 31

Introduction Datasets Malware analysis Performances Conclusion

Remote admin Tools

Install malicious apps:

Badnews: Obeys to a remote server + delays attackTriggering: Patch the bytecode + Build a fake server

DroidKungFu1 (well known): Delays attackTriggering: Modify ’start’ to 1 in sstimestamp.xml andreboot the device

Page 10: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

10 / 31

Introduction Datasets Malware analysis Performances Conclusion

Blocker / Eraser

Wipes of the SD card and block social apps:

WipeLocker: Delayed AttackTriggering: Launch the app and reboot the device

Page 11: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

11 / 31

Introduction Datasets Malware analysis Performances Conclusion

Adware

Displays adds after some days:MobiDash: Delayed AttackTriggering: Launch the application, reboot the device andmodify com.cardgame.durak_preferences.xml

Page 12: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

12 / 31

Introduction Datasets Malware analysis Performances Conclusion

Spyware

Steals contacts, sms, IMEI, . . .SaveMe: Verifies the Internet accessTriggering: Enable Internet access and lauch the app

Cajino: Obeys a Baidu remote serverTriggering: Simulate a server command with an Intent

Page 13: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

13 / 31

Introduction Datasets Malware analysis Performances Conclusion

Ransomware

Encrypts user’s files and asks for paying:

SimpleLockerWaits the reboot of the deviceTriggering: send a BOOT_COMPLETED intent

More details about SimpleLocker...

Page 14: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

14 / 31

Introduction Datasets Malware analysis Performances Conclusion

Example: SimpleLocker

The main malicious functions:

org.simplelocker.MainService.onCreate()org.simplelocker.MainService$4.run()org.simplelocker.TorSender.sendCheck(final Context context)org.simplelocker.FilesEncryptor.encrypt()org.simplelocker.AesCrypt.AesCrypt(final String s)

The encryption loop:

final AesCrypt aesCrypt = new AesCrypt("jndlasf074hr");

for (final String s : this.filesToEncrypt) {aesCrypt.encrypt(s, String.valueOf(s) + ".enc");new File(s).delete();

}

Page 15: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

15 / 31

Introduction Datasets Malware analysis Performances Conclusion

Dataset overview

Type Name Protection against dynamic Analysis→ Remediation

RAT BadnewsObeys to a remote server and delays the attack

→ Modify the apk→ Build a fake server

Ransomware SimpleLocker Waits the reboot of the device→ send a BOOT_COMPLETED intent

RAT DroidKungFu Delayed Attack→ Modify the value start to 1 in sstimestamp.xml

Adware MobiDashDelayed Attack

→ Launch the infected application, reboot the deviceand modify com.cardgame.durak_preferences.xml

Spyware SaveMe Verifies the Internet access→ Enable Internet access and launch the application

Eraser+LK WipeLocker Delayed Attack→ Press the icon launcher and reboot the device

Spyware Cajino Obeys to a remote server→ Simulate the remote server by sending an intent

Page 16: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

16 / 31

Introduction Datasets Malware analysis Performances Conclusion

New recent datasets

AndroZoo [Allix et al. 2016]3 million appsWith pairs of applications (repackaged ?)

The AMD dataset [Wei et al. 2017]24,650 samplesWith contextual informations (classes, actions, . . . )

We need more contextual information !Where is the payload ?How to trigger the payload ?Which device do I need ?

Page 17: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

17 / 31

Introduction Datasets Malware analysis Performances Conclusion

1 Introduction

2 Datasets

3 Malware analysis

4 Performances

5 Conclusion

Page 18: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

18 / 31

Introduction Datasets Malware analysis Performances Conclusion

Designing an experiment from scratch

Page 19: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

19 / 31

Introduction Datasets Malware analysis Performances Conclusion

Check that a sample is a malware?

Manually. . . for 10 samples ok, but for more ?

Ask VirusTotal!

∼45 antiviruses softwareUse a threshold to decide (e.g. 20 antiviruses)Free upload API (few samples / day)Used by others in papers

Is it a good idea?

Page 20: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

20 / 31

Introduction Datasets Malware analysis Performances Conclusion

An experiment with 683 fresh samples

Threshold of x antiviruses recognizing a sample?

Page 21: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

21 / 31

Introduction Datasets Malware analysis Performances Conclusion

Designing an experiment from scratch

Page 22: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

22 / 31

Introduction Datasets Malware analysis Performances Conclusion

Analyzing malware

Main analysis methods are:

static analysis:⇒ try to recognize knowncharacteristics of malware in thecode/resources of studied applications

dynamic analysis:⇒ try to execute the malware

Page 23: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

22 / 31

Introduction Datasets Malware analysis Performances Conclusion

Analyzing malware

Main analysis methods are:

static analysis:⇒ try to recognize knowncharacteristics of malware in thecode/resources of studied applications

dynamic analysis:⇒ try to execute the malware

Countermeasures: reflec-tion, obfuscation, dynamicloading, encryption, native

Countermeasures:logic bomb, time

bomb, remote server

Page 24: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

23 / 31

Introduction Datasets Malware analysis Performances Conclusion

Solving attacker’s countermeasures

Possible solutions against attacker’s countermeasures.

Problem Solutionmalformed files ignore it if possible

reflection execute itdynamic loading execute itlogic/time bomb force conditions

native code watch it from the kernelpacking ???

(dead) remote server ???

Page 25: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

24 / 31

Introduction Datasets Malware analysis Performances Conclusion

Malware analysis

We have developed software for:

Static, dynamic analysisSmartphone flashing with custom kernelMore info: http://kharon.gforge.inria.fr

Dynamic analysis requires a lot of efforts to be automatized.

Page 26: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

25 / 31

Introduction Datasets Malware analysis Performances Conclusion

1 Introduction

2 Datasets

3 Malware analysis

4 Performances

5 Conclusion

Page 27: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

26 / 31

Introduction Datasets Malware analysis Performances Conclusion

Challenges

Is it reliable ?How to evaluate the results ? (execution of the payload)Is it scalable ?

Some results with:

AMD dataset: [Wei et al. 2017]A “native” dataset: apps using native code

Page 28: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

27 / 31

Introduction Datasets Malware analysis Performances Conclusion

Reliability

Some malware crash (and people don’t care. . . )

Crash ratio (at launch time):AMD dataset: 5%Our native dataset: 20%

If no crash, how to evaluate the results ?

We need to know where is the payload !

Page 29: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

28 / 31

Introduction Datasets Malware analysis Performances Conclusion

Performances

Time evaluation (average):

For one app and one payload:

Flashing device: 60 sStatic analysis: 7 sDynamic analysis (execution): 4 mTotal: 5 m

From the AMD dataset: 135 samples100 payloads per appAll Android OS: 8 versionsTotal: 1 year of experiments (with 1 device)

Page 30: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

28 / 31

Introduction Datasets Malware analysis Performances Conclusion

Performances

Time evaluation (average):

For one app and one payload:

Flashing device: 60 sStatic analysis: 7 sDynamic analysis (execution): 4 mTotal: 5 m

From the AMD dataset: 135 samples100 payloads per appAll Android OS: 8 versionsTotal: 1 year of experiments (with 1 device)

Page 31: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

29 / 31

Introduction Datasets Malware analysis Performances Conclusion

Scalability

We need a real smartphone.

At this time we use:

A server running our softwareA pool of 1 to 5 smartphones (USB limitations ?)

Frontend visible at:http://kharon.irisa.fr

Page 32: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

30 / 31

Introduction Datasets Malware analysis Performances Conclusion

Conclusion

Designing experiments on Android malwareis a difficult challenge!

We still have difficulties:

how to scale experiments?how to evaluate payload execution?

http://kharon.gforge.inria.frhttp://kharon.gforge.inria.fr/datasethttp://kharon.irisa.fr

Page 33: Challenges for Reliable and Large Scale Evaluation of ...people.rennes.inria.fr/Jean-Francois.Lalande/talks/shpcs18.pdfChallenges for Reliable and Large Scale Evaluation of Android

c©Inria / C. Morel

Questions ?