towards a trustworthy android ecosystem 1 yan chen lab of internet and security technology...

Towards a Trustworthy Android Ecosystem

1

Yan ChenLab of Internet and Security Technology

Northwestern University

Smartphone Security

• Ubiquity - Smartphones and mobile devices– Smartphone sales already exceed PC sales– The growth will continue

• Performance better than PCs of last decade– Samsung Galaxy S4 1.6 GHz quad core, 2 G

memory

2

Android Dominance

• Android world-wide market share ~ 70%

• Android market share in US ~50%

3

(Credit: Kantar Worldpanel ComTech)

Android Problems

• Malware detection– Offline– Real time, on phone

• Privacy leakage detection– Offline– Real time, on phone

• For both rootkits and ad malware/spyware

• Improving usability of security mechanisms5

New Challenges

• New operating systems– Different design → Different threats

• Different architecture– ARM (Advanced RISC Machines) vs x86– Dalvik vs Java (on Android)

• Constrained environment– CPU, memory– Battery– User perception 6

Our Solutions

• AppsPlayground [ACM CODASPY’13]

– Automatic, large-scale dynamic analysis of Android apps– System released with hundreds of download

• DroidChamelon [ACM ASIACCS’13, IEEE Transaction on Information Forensics and Security 14]

– Evaluation of latest Android anti-malware tools– System released upon wide interest from media and industry

• PrivacyShield– Real-time information-flow tracking for privacy leakage detection– With zero platform modification– App in alpha test, to be released soon

• AutoCog– Check whether sensitive permissions requested by apps are consistent with its

natural-language description– App just released at Google play store

• Large scale malware detection and measurement of ads and ad libraries7

8

Recognition

8

Interest from vendors

PrivacyShield

Real-time Privacy Leakage Detection without System Modification for

Android

9

Motivation

• Android permissions are insufficient– User still does not know if some private

information will be leaked• Information leakage is more dangerous

than information access– Example 1: popular apps (e.g., Angry

Birds) leak location info with its developer, advertisers and analytics services• Even doesn’t need it for its functionality!

– Example 2: malware apps may steal private data• A camera app trojan send video

recordings out of the phone 10

More Motivation: Mobile Data Management (MDM)

• Bring Your Own Device (BYOD)– The current trend in mobile device management

• Supporting 3rd party apps– Employees need them for personal use– Enterprises may use them to improve

productivity– Chat, dropbox, backup apps…

11

MDM Challenges

• How do apps handle data that they access– Does it remain within the device or the

enterprise?– Is it leaked out to unknown third parties?– Can an employee upload confidential data to a

remote server• The IT administrator desires to view (and

potentially block) such leakage in real time– The IT administrator has limited control over

devices now12

Previous Solutions

• Does not identify the conditions for the leak

• Legitimate Conditions, false positives?

Static analysis

• Requires a custom Android ROM• Unlocked device; end-user

skills

TaintDroid

13

Our Approach

• Give control to the user/BYOD IT administrator• Instead of modifying system, modify the

suspicious app to track privacy-sensitive flows• Advantages

– No system modification– No overhead for the rest of the system– High configurability – easily turn off monitoring for

an app or a trusted library in an app

14

Comparison

Static Analysis TaintDroid Uranine

Accuracy Low (possibly High FP)

Good Good

Overhead None Low Acceptable

System modification

No Yes No

Configurability NA Very Low High

Portable NA No Yes

15

Deployment A: PrivacyShield App

16

By vendor or 3rd party service

Deployment B

17

By Market

Download Instrument

Reinstall Run Alert User

Unmodified Android Middleware

And Libraries

Overall Scenario

18

Challenges and Solutions

• Framework code cannot be modified– Proposed policy-based summarization of framework API

• Accounting for the effects of callbacks– Functions in app code invoked by framework code– Proposed over-tainting techniques that guarantee zero FN

• Accommodating reference semantics– Need to taint objects rather than variables– Proposed a hashtable with weak references to prevent interfering with

garbage collection

• Performance overhead– Proposed path pruning with static analysis

19

Instrumentation Workflow

20

Implementation and Evaluation

• Studied over 1000 apps• Results in general align with

TaintDroid• Performance

– Runtime median overhead is 17%, ¾ are within 61%

– 17% of apps have zero instructions instrumented. The maximum instrumentation fraction is 26%

• PrivacyShield app to be released soon 21

Performance Overhead

22

Limitations

• Native code not handled• Method calls by reflection may sometimes

result in unsound behavior• App may refuse to run if their code is

modified– Currently, only one out of top one hundred

Google Play apps did that

24

PrivacyShield Summary

• A real time app monitoring system on Android without firmware modification– Privacy leakage detection (for both personal

and BYOD)– Patching vulnerabilities– Block popping up ads– …– and many others!

25

AutoCog

Measuring Description-to-permission Fidelity in Android Applications

26

Motivation

• Techniques to evaluate whether application oversteps the user expectation still largely missing– Source of user expectation on an app: its metadata on Google Play

• Natural language description • Permissions

– Example: Navigation application access location valid

SMS application access location invalid

• Few users are discreet enough or have the professional knowledge to infer security implications from metadata of app. – Long-lasting gap between security mechanisms and its usability to

average users

• Goal: assess how well the description implies the usage of sensitive permissions: description-to-permission fidelity 27

Usages

28

• End user: understand if an application is over-privileged and risky to use

• Developer: receive an early feedback on the quality of description • Especially on security-related aspects of the applications

• Market: Help choose more secure applications

Design

• Challenges:– Inferring description semantics

• Diversity of natural language: “contact list”, “address book”, “friends”

– Correlating description semantics with permission semantics• Diversity of functionalities: “enable navigation”, “find friend nearby”,

“display map”

• Solutions: Description-to-permission Relatedness (DPR) Model– Leverage to Description Semantics (DS) Model group texts

by semantic similarity score – Design a learning algorithm to measure how closely a pair of

texts correlated with the target permission 29

Architecture of AutoCog

30

Evaluation

• Assess how AutoCog align with human readers by inferring permission from description– Use AutoCog to infer 11 highly sensitive and most

popular permissions from 1,785 applications – Three professional human readers label the

description as “good” if at least two of them could infer the target permission from the description

31

Evaluation (cont’d)

– Metrics:

32

• Results:

– Confirm limitations of Whyper: limited semantic information, lack of associated APIs, and lack of automation

Precision Recall F-score Accuracy

AutoCog 92.6% 92.0% 92.3% 93.2%

Whyper [3] 85.5% 66.5% 74.8% 79.9%

Measurement

• 49,183 applications from Google Play– Only 9.1% of the applications having permissions that can all be

inferred from description

33

Deployment: AutoCog Application

https://play.google.com/store/apps/details?id=com.version1.autocog

34

Deployment: Web Portal

http://webportal2-autocog.rhcloud.com/

35

Conclusions

• AppsPlayground: Automatic large-scale dynamic analysis of Android apps– System released with hundreds of download

• DroidChamelon: Evaluation of latest Android anti-malware tools– System released upon wide interest from media and industry

• PrivacyShield– Real-time information-flow tracking system with no platform modification– App in alpha test, to be released soon

• AutoCog– Check whether sensitive security permissions of an app are consistent

with its description– App just released at Google play store

• More info and tools: http://list.cs.northwestern.edu/mobile/36

Backup

37

Android Ecosystem

38

DPR Model

• Trained based on a large dataset of application descriptions and permissions

• Noun-phrase based governor-dependent pairs with high correlation in statistics with each permission– CAMERA: (scanner, barcode), (snap, photo);

• Ontologies (based on output of Stanford Parser [2]):– Logic dependency between verb phrase and noun phrase– Logic dependency between noun phrases– Noun phrase with own relationship

• (record, voice), (note, voice), (your voice) RECORD_AUDIO

[2] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng. Parsing with compositional 11 vector grammars. In Proceedings of the ACL, 2013.

39

Example of Detection

Extracted pairs:(search, place), (place, location), (your location)…

Map each extracted pair with DPR model by semantic relatedness score

Once matched, the sentence is labeled as revealing permission

40

Measurement (cont’d)

• Low description-to-permission fidelity has negative impact on application popularity.

Permission Type #install #rating Average(rating)

#Questionable Permissions -0.106 -0.105 -0.110

#Permissions 0.044 0.050 0.044

41

AppsPlayground

Automatic Security Analysis of Android Applications

42

43

AppsPlayground

• A system for offline dynamic analysis– Includes multiple detection techniques for

dynamic analysis

• Challenges– Techniques must be light-weight– Automation requires good exploration

techniques

44

Architecture

Kernel-level monitoring

Taint tracking

API monitoring

Fuzzing

Intelligent input

Event triggering

Disguise techniques

Detection Techniques

Exp

lora

tion

Tech

niqu

es

AppsPlayground

Virtualized Dynamic Analysis Environment

…

…

45

Architecture

Intelligent input

Kernel-level monitoring

Taint tracking

API monitoring

Fuzzing

Event triggering

Disguise techniques

Detection Techniques

Exp

lora

tion

Tech

niqu

es

AppsPlayground

Virtualized Dynamic Analysis Environment

…

…

Contributions

46

Intelligent Input

• Fuzzing is good but has limitations• Another black-box GUI exploration

technique• Capable of filling meaningful text by

inferring surrounding context– Automatically fill out zip codes, phone # and

even login credentials– Sometimes increases

coverage greatly

47

Privacy Leakage Results

• AppsPlayground automates TaintDroid

• Large scale measurements - 3,968 apps from Android Market (Google Play)– 946 leak some info– 844 leak phone identifiers– 212 leak geographic location– Leaks to a number of ad and analytics

domains

48

Malware Detection

• Case studies on DroidDream, FakePlayer, and DroidKungfu

• AppsPlayground’s detection techniques are effective at detecting malicious functionality

• Exploration techniques can help discover more sophisticated malware

DroidChameleon

Evaluating state-of-the-art Android anti-malware against transformation

attacks

49

Introduction

Android malware – a real concern

• Many are very popular

Many Anti-malware offerings for Android

50Source: http://play.google.com/ | retrieved: 4/29/2013

Objective

• Smartphone malware is evolving– Encrypted exploits, encrypted C&C

information, obfuscated class names, …– Polymorphic attacks already seen in the wild

• Technique: transform known malware

51

What is the resistance of Android anti-malware against malware obfuscations?

52

Transformations: Three Types

•No code-level changes or changes to AndroidManifestTrivial•Do not thwart detection by static analysis completely

Detectable by Static Analysis -

DSA

•Capable of thwarting all static analysis based detection

Not detectable by Static Analysis –

NSA

53

Trivial Transformations

• Repacking– Unzip, rezip, re-sign– Changes signing key, checksum of whole app

package• Reassembling

– Disassemble bytecode, AndroidManifest, and resources and reassemble again

– Changes individual files

54

DSA Transformations

• Changing package name• Identifier renaming• Data encryption• Encrypting payloads and native exploits• Call indirections• …

Evaluation

• 10 Anti-malware products evaluated– AVG, Symantec, Lookout, ESET, Dr. Web,

Kaspersky, Trend Micro, ESTSoft (ALYac), Zoner, Webroot

– Mostly million-figure installs; > 10M for three– All fully functional

• 6 Malware samples used– DroidDream, Geinimi, FakePlayer, BgServ,

BaseBridge, Plankton• Last done in February 2013. 55

DroidDream ExampleAVG Symantec Lookout ESET Dr. Web

Repack x

Reassemble x

Rename package x x

EncryptExploit (EE)

x

Rename identifiers (RI)

x x

Encrypt Data (ED) x

Call Indirection (CI) x

RI+EE x x x

EE+ED x

EE+Rename Files x

EE+CI x x

56

DroidDream ExampleKasp. Trend M. ESTSoft Zoner Webroot

Repack

Reassemble x

Rename package x x

EncryptExploit (EE)

x

Rename identifiers (RI)

x x

Encrypt Data (ED) x

Call Indirection (CI)

x

RI+EE x x

EE+ED x x

EE+Rename Files x x

EE+CI x57

Findings

• All the studied tools found vulnerable to common transformations

• At least 43% signatures are not based on code-level artifacts

• 90% signatures do not require static analysis of Bytecode. Only one tool (Dr. Web) found to be using static analysis

58

Signature Evolution

• Study over one year (Feb 2012 – Feb 2013)• Key finding: Anti-malware tools have

evolved towards content-based signatures• Last year 45% of signatures were evaded

by trivial transformations compared to 16% this year

• Content-based signatures are still not sufficient

59

Solutions

Content-based Signatures are not sufficient

Analyze semantics of malware

Dynamic behavioral monitoring can help

• Need platform support for that

60

Takeaways

61

Anti-malware vendors

Need to have semantics-based detection

Google and device manufacturers

Need to provide better platform support for anti-

malware

Impact

• The focus of a Dark Reading article on April 29, 2013

• Then featured by Information Week, The H, heise Security, Security Week, Slashdot, Help Net Security, ISS Source, EFY Times, Tech News Daily, Fudzilla, VirusFreePhone, McCormick Northwestern News, and ScienceDaily.

• Contacted by Lookout, AVG and McAfee regarding transformation samples and tools

62

http://www.informationweek.in/security/13-05-05/mobile_av_apps_fail_to_detect_disguised_malware.aspx

http://www.h-online.com/security/news/item/Android-virus-scanners-are-easily-fooled-1856133.html

http://www.heise.de/security/meldung/Android-Virenscanner-sind-leicht-auszutricksen-1855331.html

http://www.heise.de/security/meldung/Android-Virenscanner-sind-leicht-auszutricksen-1855331.html

http://www.securityweek.com/anti-virus-software-android-fooled-common-techniques-researchers-say

http://it.slashdot.org/story/13/05/07/0226229/popular-android-anti-virus-software-fooled-by-trivial-techniques

http://www.net-security.org/secworld.php?id=14862

http://www.isssource.com/android-virus-scanners-easy-to-trick/

http://www.efytimes.com/e1/fullnews.asp?edid=105498

http://www.technewsdaily.com/17982-android-antivirus-serious-weakness.html

http://www.fudzilla.com/home/item/31301-android-av-is-easily-fooleda

http://virusfreephone.com/2013/05/mobile-av-apps-fail-to-detect-disguised-malware-23/

http://www.mccormick.northwestern.edu/news/articles/2013/05/android-antiviral-products-easily-evaded-northwestern-study-says-yan-chen.html

http://www.sciencedaily.com/releases/2013/05/130530132539.htm

Conclusion

• Developed a systematic framework for transforming malware

• Evaluated latest popular Android anti-malware products

• All products vulnerable to malware transformations

63

Kernel-level Monitoring

• Useful for malware detection• Most root-capable malware can be logged

for vulnerability conditions• Rage-against-the-cage

– Number of live processes for a user reaches a threshold

• Exploid / Gingerbreak– Netlink packets sent to

system daemons 64

Smartphone Security

• Lots of private data– Contacts, messages, call logs, location– Grayware applications, spyware applications– TaintDroid, PiOS, etc. found many leaks– Our independent study estimates about 1/4th

of apps to be leaking• Exploits could cause user money

– Dialing and texting to premium numbers– Malware such as FakePlayer already do this

65

66

Android Threats

• Privacy leakage– Users often have no way to know if there are

privacy leaks– Even legitimate apps may leak private

information without informing user• Malware

– Number increasing consistently– Need to analyze new kinds

67

Dynamic vs. Static

Dynamic Analysis

Static Analysis

Coverage Some code not executed

Mostly sound

Accuracy False negatives False positives

Dynamic Aspects (reflection, dynamic loading)

Handled without additional effort

Possibly unsound for these

Execution context

Easily handled Difficult to handle

Performance Usually slower Usually faster

68

Disguise Techniques

• Make the virtualized environment look like a real phone– Phone identifiers and properties– Data on phone, such as contacts, SMS, files– Data from sensors like GPS– Cannot be perfect

69

Exploration Effectiveness

• Measured in terms of code coverage– 33% mean code coverage

• More than double than trivial• Black box technique• Some code may be dead code• Use symbolic execution in the future

• Fuzzing and intelligent input both important– Fuzzing helps when intelligent input can’t model

GUI– Intelligent input could sign up automatically for 34

different services in large scale experiments

70

Playground: Related Work

• Google Bouncer– Similar aims; closed system

• DroidScope, Usenix Security’12– Malware forensics– Mostly manual

• SmartDroid, SPSM’12– Uses static analysis to guide dynamic

exploration– Complementary to our approach

Threat Mitigation at App level

• Offline analysis– Trustworthiness of app is known before use– Static analysis– Dynamic analysis

• Real-time monitoring– Often more accurate but with runtime

overhead– User has control over app’s actions in real-

time71

Previous Solutions

• Static analysis: not sufficient– It does not identify the conditions under which

a leak happens.• Such conditions may be legitimate or may not

happen at all at run time

– Need real-time monitoring• TaintDroid: real-time but not usable

– Requires installing a custom Android ROM• Not possible with some vendors• End-user does not have the skill-set

72

Callback Example

The toString() method may be called by a framework API and the returned string used elsewhere.

73

Potential Defenses against malicious app

• Server-side Security Check by Controller Vendor– Static analysis– Dynamic analysis

• Runtime Permission Check– Enforce the principle of least privilege on apps

• Principal Isolation• Anomaly-based Behavior Monitoring

74

towards a trustworthy android ecosystem 1 yan chen lab of internet and security technology...

Documents

malware apps

android threats malware

france slide

android market share

motivation android permissions

leaked information leakage

backup apps

popular apps