identifying mobile repackaged applications through …...in order to mitigate the malware trend in...

10
Identifying Mobile Repackaged Applications through Formal Methods Fabio Martinelli 1 , Francesco Mercaldo 1 , Vittoria Nardone 2 , Antonella Santone 2 , Corrado Aaron Visaggio 2 1 Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy 2 Department of Engineering, University of Sannio, Benevento, Italy {fabio.martinelli, francesco.mercaldo}@iit.cnr.it, {vnardone, santone, visaggio}@unisannio.it Keywords: Security, malware, model checking, Android, testing. Abstract: Smartphones and tablets are rapidly become indispensable in every day activities. Android has become the most popular operating system for mobile environments in the world. These devices, owing to the open nature of Android, are continuously exposed to attacks, mostly to data exfiltration and monetary fraud. There are many techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into a legitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer the application in order to embed the malicious code and then (re)distribute them in the official and/or third party markets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of the most representative repackaged trojan in Android environment. We obtain encouraging results, achieving an accuracy equal to 0.9. 1 Introduction In 2015, the volume of mobile malware continued to grow. From 2004 to 2013 security experts of Se- curList detected nearly 200,000 samples of malicious mobile code. In 2014 there were 295,539 new pro- grams, while the number was 884,774 in 2015. Each malware sample has several installation packages: in 2015, they detected 2,961,727 malicious mobile in- stallation packages [SecureList, 2015]. Signature-based malware detection, which is the most common technique adopted by commercial an- timalware for mobile, is often ineffective. Moreover it is costly, as the process for obtaining and classifying a malware signature is laborious and time-consuming. In order to mitigate the malware trend in Febru- ary 2011, Google introduced Bouncer [GoogleMo- bile, 2014] to screen submitted apps for detecting malicious behaviors, but this has not eliminated the problem, as it is discussed in [Oberheide and Miller, 2012]. The Fraunhofer Research Institution for Applied and Integrated Security has performed an evaluation of antivirus for Android [Fedler et al., 2014]: the con- clusion is there are many techniques for evading the detection of most antivirus and for installing mali- cious payload. The most employed installation technique is the so-called repackaging [Zhou and Jiang, 2012]: the attacker decompiles a trusted application to get the source code, then adds the malicious payload and re- compiles the application with the payload to make it available on various market alternatives, and some- times also on the official market. The user is often encouraged to download such malicious applications because they are free versions of trusted applications sold on the official market. Scientific community in last years has proposed a lot of static [Canfora et al., 2016, Liang and Du, 2014, Yerima et al., 2013, Arp et al., 2014, Spre- itzenbarth et al., 2013] and dynamic [Isohara et al., 2011, Tchakount and Dayang, 2013, Reina et al., 2013] techniques basically based on machine learn- ing methods in order to solve the problem: the main limitation in this case is due to the false positive ratio. Indeed existing solutions for protecting privacy and security on smartphones are still ineffective in many facets [Marforio et al., 2011], and many ana- lysts warn that the malware families and their vari- ants for Android are rapidly increasing. This scenario calls for new security models and tools to limit the spreading of malware for smartphones. For these reasons, in this paper we evaluate the effectiveness of a model-checking approach to iden-

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Identifying Mobile Repackaged Applicationsthrough Formal Methods

Fabio Martinelli1, Francesco Mercaldo1, Vittoria Nardone2, Antonella Santone2, Corrado AaronVisaggio2

1Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy2Department of Engineering, University of Sannio, Benevento, Italy

{fabio.martinelli, francesco.mercaldo}@iit.cnr.it, {vnardone, santone, visaggio}@unisannio.it

Keywords: Security, malware, model checking, Android, testing.

Abstract: Smartphones and tablets are rapidly become indispensable in every day activities. Android has become themost popular operating system for mobile environments in the world. These devices, owing to the open natureof Android, are continuously exposed to attacks, mostly to data exfiltration and monetary fraud. There aremany techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into alegitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer theapplication in order to embed the malicious code and then (re)distribute them in the official and/or third partymarkets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of themost representative repackaged trojan in Android environment. We obtain encouraging results, achieving anaccuracy equal to 0.9.

1 Introduction

In 2015, the volume of mobile malware continuedto grow. From 2004 to 2013 security experts of Se-curList detected nearly 200,000 samples of maliciousmobile code. In 2014 there were 295,539 new pro-grams, while the number was 884,774 in 2015. Eachmalware sample has several installation packages: in2015, they detected 2,961,727 malicious mobile in-stallation packages [SecureList, 2015].

Signature-based malware detection, which is themost common technique adopted by commercial an-timalware for mobile, is often ineffective. Moreover itis costly, as the process for obtaining and classifying amalware signature is laborious and time-consuming.

In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted apps for detectingmalicious behaviors, but this has not eliminated theproblem, as it is discussed in [Oberheide and Miller,2012].

The Fraunhofer Research Institution for Appliedand Integrated Security has performed an evaluationof antivirus for Android [Fedler et al., 2014]: the con-clusion is there are many techniques for evading thedetection of most antivirus and for installing mali-cious payload.

The most employed installation technique is theso-called repackaging [Zhou and Jiang, 2012]: theattacker decompiles a trusted application to get thesource code, then adds the malicious payload and re-compiles the application with the payload to make itavailable on various market alternatives, and some-times also on the official market. The user is oftenencouraged to download such malicious applicationsbecause they are free versions of trusted applicationssold on the official market.

Scientific community in last years has proposeda lot of static [Canfora et al., 2016, Liang and Du,2014, Yerima et al., 2013, Arp et al., 2014, Spre-itzenbarth et al., 2013] and dynamic [Isohara et al.,2011, Tchakount and Dayang, 2013, Reina et al.,2013] techniques basically based on machine learn-ing methods in order to solve the problem: the mainlimitation in this case is due to the false positive ratio.

Indeed existing solutions for protecting privacyand security on smartphones are still ineffective inmany facets [Marforio et al., 2011], and many ana-lysts warn that the malware families and their vari-ants for Android are rapidly increasing. This scenariocalls for new security models and tools to limit thespreading of malware for smartphones.

For these reasons, in this paper we evaluate theeffectiveness of a model-checking approach to iden-

Page 2: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

tify Android malware. We evaluate our method us-ing GinMaster malware, one of the most widespreadfamily in mobile malware landscape, with over 6,000known variants belonging to this family.

The approach can be easily applied to otherrepackaged families: as reported in (citazione), theso-called repackaged malware contains the maliciouspayload at installation time, this is the reason whythe payload is embedded into the application and thelogic rules, able to verify the existance of the mali-cious payload, can be verified without run the sample.

The reason why we explore the effectiveness ofour approach on GinMaster family is represented bythe fact that this is one of the most widespread trojanmalware family in mobile environment. The payloadbelonging to this family is embedded into legitimateapplications using repackaging technique. GinMasterhas gone through three significant generations sinceit was first found by researchers from North CarolinaState University on 17 August 2011.

GinMaster is distributed in third-party app mar-kets in China. The attackers injected GinMastercode into thousands of legitimate game, ringtone andpicture applications. These applications have morechance to lure mobile users into installing the mali-cious payload.

The trojan contains a malicious service able toroot specific devices in order to escalate privileges.It also has the ability to modify and delete contents inthe SD card of device, steal confidential informationand send it to a remote website, execute command-and-control services from the remote website, as wellas download and install applications regardless of userinteraction.

The approach can be easily applied to otherrepackaged families: as reported in [Zhou and Jiang,2012], the so-called repackaged malware contains themalicious payload at installation time, this is the rea-son why the payload is embedded into the applica-tion and the logic rules, able to verify the existence ofthe malicious payload, can be verified without run thesample.

The salient characteristics of our methodologyare:

• the use of formal methods;

• the inspection of Java Bytecode and not on thesource code;

• the use of static analysis;

• the capture of malicious behaviours at a finergranularity.

In practice, from the Java Bytecode applicationfiles we generate CCS processes, which are succes-sively used for checking properties expressing the

most common behaviours exhibit by GinMaster fam-ily samples.

Performing automatic analysis on the Bytecodeand not directly on the source code has several ad-vantages:

• independence of the source programming lan-guage;

• identification of GinMaster without decompila-tion even when source code is lacking;

• ease of parsing a lower-level code;

• independence from obfuscation.

The paper proceeds as follows: Section 2 dis-cusses related work; Section 3 describes and moti-vates our approach; Section 4 illustrates the resultsof experiments; finally, conclusions are drawn in theSection 5.

2 Related work

In this section we review the current literature re-lated to malware identification with particular regardsto malicious family identification.

A very thorough dissecting of GinMaster family isprovided in [Yu, 2013]. The paper gives an overviewof three generations of the GinMaster family, exam-ines the core malicious functionality, tracks their evo-lution from source code, and presents notable tech-niques utilized by the specific variants. Basically,Ginmaster is able to set-up via mobile botnet mali-cious code hidden in the affected app. However, in-stead of directly taking advantage of these zombiesdevices to make profit from end-users, the malwarecontroller employs a botnet to generate millions of in-stallations and large volumes of advertising traffic tolegitimate developers and advertising services.

In [Canfora et al., 2016] authors experimentallyevaluate two techniques for detecting Android mal-ware: the first one is based on Hidden Markov Model(HMM), while the second one exploits Structural En-tropy. They demonstrate that these methods obtain aprecision of 0.96 to discriminate a malware applica-tion. In addition they also analyse ransomware sam-ples, obtaining a precision of 0.961 with LADTreeclassification algorithm using the Structural Entropymethod, and a precision of 0.824 with J48 algorithmusing the HMM one in ransomware identification.

Song et al. [Song et al., 2016] propose a frame-work to statically detect Android malware, consistingof four layers of filtering mechanisms: the messagedigest values, the combination of malicious permis-sions, the dangerous permissions, and the dangerous

Page 3: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

intention. As additional contribute, they propose anovel threat degree threshold model of dangerous per-missions on malware detection. They experiment themethod on real mobile devices, using 83 real mobiledevices and achieving a 98.8% pass rate, where theversions of Android range from 2.3 to 5.1.

Neuhaus et al. [Neuhaus and Zimmermann, 2010]crawl the vulnerability reports in the Common Vul-nerability and Exposures database by using topicmodels to find prevalent vulnerability types andnew trends semi-automatically. They analyze 39393unique reports until the end of 2009, with the aim ofcharacterizing many vulnerability trends: SQL injec-tion (PHP), buffer overflows, format strings, cross-site request forgery and so on.

Zhao et al. [Zhao et al., 2014] propose a LDA-based method to analyze the trends of network secu-rity which consist of three steps: collect data fromweb sites, extract topics from the collected data, andmakes the curves of trends over time. They select620 documents sorted by time and extract 10 topicfrom each document. Six interesting topics are dis-covered by LDA model, according to the autors: dns-ddos, vulnerability, mobile-malware, mac-malware,Browser malware and Java-vulnerability.

Formal methods have been applied for studyingmalware in some recent papers. In [Kinder et al.,2005] the authors introduce the specification languageCTPL (Computation Tree Predicate Logic) confirm-ing the malicious behavior of thirteen Windows mal-ware variants using as dataset a set of worms datingfrom 2002 to 2004.

Song et al. present an approach to model Mi-crosoft Windows XP binary programs as a PushDownSystem (PDS) [Song and Touili, 2001]. They evalu-ate 200 malware variants (generated by NGVCK andVCL32 engines) and 8 benign programs.

The tool PoMMaDe [Song and Touili, 2013] isable to detect 600 real malware, 200 malware gen-erated by two malware generators (NGVCK andVCL32), and proves the reliability of benign pro-grams: a Microsoft Windows binary program is mod-eled as a PDS which allows to track the stack of theprogram.

Song et al. model mobile applications using aPDS in order to discovery private data leaking work-ing at Smali code level [Song and Touili, 2014]. Ille-gal flow of information in Java bytecode has been alsostudied in [Bernardeschi et al., 2004], using a staticanalysis approach.

Jacob and colleagues provide a basis for a mal-ware model, founded on the Join-Calculus: they con-sider the system call sequences to build the model [Ja-cob et al., 2010].

Recently, the possibility to identify the maliciouspayload in Android malware using a model checkingbased approach has been explored in [Battista et al.,2016, Mercaldo et al., 2016a, Mercaldo et al., 2016b].Starting from payload behavior definition they for-mulate logic rules and then test them by using areal world dataset composed by Ransomware, Droid-KungFu, Opfake families and update attack samples.

As it emerges from the literature in the last years,formal methods have been applied to detect mobilemalware, but at the best knowledge of the authors theyhave never been applied for identifying specificallythe repackaged attack provided by GinMaster familyon Android malware.

3 The Method

In this section a model checking-based approachfor the detection of GinMaster apps is presented.While model checking [Barbuti et al., 2005] wasoriginally developed to verify the correctness ofsystems against specifications, recently it has beenhighlighted in connection with a variety of dis-ciplines such as biology [De Ruvo et al., 2015],clone detection [Santone, 2011], secure informationflow [Barbuti et al., 2002], among others. In thispaper we present the use of model checking in thesecurity field. Fig. 1 describes all the phases of ourapproach.

During the first phase, we generate a formal modelfrom the Java Bytecode of the .class files derived bythe app under analysis. As formal specification lan-guage, we use Milner’s Calculus of CommunicatingSystems (CCS) [Milner, 1989], one of the most wellknown process algebras. CCS contains basic oper-ators to build finite processes, communication op-erators to express concurrency, and some notion ofrecursion to capture the infinite behaviour. Thus,the formal model is obtained by transforming eachJava Bytecode instruction in CCS processes. Moreprecisely, from the CCS we generate an automatonsuch that the nodes represent instruction addresseswhile the edges (labeled with opcodes) represent thecontrol-flow transitions from one instruction to it suc-cessors(s).

In the second phase, we try to discover Androidmalware GinMaster apps. The behavior of the Gin-Master family is encoded into a property φ expressedin a branching temporal logic: the mu-calculus logic[Stirling, 1989]. Temporal logics are logical for-malisms for expressing properties such as livenessand safety properties. The syntax of the mu-calculus

Page 4: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Figure 1: The Workflow of The Approach.

is the following, where K ranges over sets of actionsand Z ranges over variables:

ϕ ::= tt | ff |Z | ϕ∧ϕ | ϕ∨ϕ | [K]ϕ |

⟨K⟩ϕ | νZ.ϕ | µZ.ϕ

A fixpoint formula has the form µZ.ϕ (resp. νZ.ϕ)where µZ (resp. νZ) binds free occurrences of Z in ϕ.An occurrence of Z is free if it is not within the scopeof a binder µZ (resp. νZ). A formula is closed if itcontains no free variables. µZ.ϕ is the least fixpoint ofthe recursive equation Z = ϕ, while νZ.ϕ is the great-est one. From now on we consider only closed formu-lae.

The satisfaction of a formula ϕ by a state s of atransition system is defined as follows:

• each state satisfies tt and no state satisfies ff ;

• a state satisfies ϕ1∨ϕ2 (ϕ1∧ϕ2) if it satisfies ϕ1 or(and) ϕ2.

• [K] ϕ is satisfied by a state which, for every perfor-mance of an action in K, evolves to a state obeyingϕ.

• ⟨K⟩ ϕ is satisfied by a state which can evolve to astate obeying ϕ by performing an action in K.

For the precise definition of the satisfaction of aclosed formula φ by a state s (written s |=φ) the readercan refer to [Stirling, 1989].

For example, µY.⟨a⟩ tt∧⟨−a,b⟩Y means that “itis possible to perform the action a non preceded bythe action b”.

The CCS formal model, generated by the JavaBytecode during the first phase, is now used to provethe property φ: using the model checking we deter-mine the detection of Ginmaster malware apps.

In Table 1 we show the GinMaster formulae togive the reader the flavour of the approach followed.

Table 1 shows the logic formulae to catch the Gin-Master malicious payload: the first one, i.e., φ1, isable to catch the root ability of the GinMaster mali-cious payload; the second one, i.e., φ2, identifies thegather of user information, one of the most represen-tative mobile trojan behavior, while φ3 identifies thecommands to communicate with the C&C server. Fig-ure 2 shows two snippets of code belonging to a Gin-Master sample. The reported snippets are related tothe two malicious behaviours catch by the logic for-mulae φ1 and φ2.

More precisely, formula φ1 is able to catch follow-ing actions:• “new javalangStringBuilder”: it represents muta-

ble sequence of characters, typically used to buildpath where resources, i.e. files, are located. In thecase of GinMaster the built string represents thepath of the script able to root the device;

• “pushchmod”: to successfully run the script themalware needs to set the admin privileges tothe root script, this action is performed with the“chmod” command;

• “invokeappend”: this represent a method of theStringBuilder class used to concatenate differentString;

• “invokeexec”: this instruction is able to run the

Page 5: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Table 1: The formulae for GinMaster payload detection

φ1 = µX .⟨new javalangStringBuilder⟩φ11 ∨⟨−new javalangStringBuilder⟩Xφ11 = µX .⟨pushchmod⟩φ12 ∨⟨−pushchmod⟩Xφ12 = µX .⟨invokeappend⟩φ13 ∨⟨−invokeappend⟩Xφ13 = µX .⟨invokeexec⟩φ14 ∨⟨−invokeexec⟩Xφ14 = µX .⟨invokewaitFor⟩φ15 ∨⟨−invokewaitFor⟩Xφ15 = µX .⟨push f ile⟩ tt∨⟨−push f ile⟩X

φ2 = µX .⟨pushphone⟩φ21 ∨⟨−pushphone⟩Xφ21 = µX .⟨invokegetSystemService⟩φ22 ∨⟨−invokegetSystemService⟩Xφ22 = µX .⟨checkcastandroidtelephonyTelephonyManager⟩φ23∨

⟨−checkcastandroidtelephonyTelephonyManager⟩Xφ23 = µX .⟨invokegetDeviceId⟩φ24 ∨⟨−invokegetDeviceId⟩Xφ24 = µX .⟨invokegetSubscriberId⟩φ25 ∨⟨−invokegetSubscriberId⟩Xφ25 = µX .⟨invokegetSimSerialNumber⟩φ26 ∨⟨−invokegetSimSerialNumber⟩Xφ26 = µX .⟨invokegetLineUunoNumber⟩φ27 ∨⟨−invokegetLineUunoNumber⟩Xφ27 = µX .⟨new javaioBu f f eredReader⟩ tt∨⟨−new javaioBu f f eredReader⟩X

φ3 = µX .⟨A,B,C,D,E,F,G,H, I,J,K,L,M,N⟩ tt∨⟨−A,B,C,D,E,F,G,H, I,J,K,L,M,N⟩X

Figure 2: Code Snippet Related to Logic Formulae φ1 and φ2.

specified command and arguments in a separateprocess with the specified environment and work-ing directory. The “exec” method, belonging to“Runtime” class, represents the command able torun the script to root the phone that already ob-tained the admin privileges;

• “invokewaitFor”: this instruction causes the cur-rent thread to wait, if necessary, until the processrepresented by this Process object has terminated.This method returns immediately if the subpro-cess has already terminated. If the subprocesshas not yet terminated, the calling thread will beblocked until the subprocess exits. In this case thesubprocess is represented by the execution of thescript to root the device;

• “pushphone”: it represents the retrieval of a file

from an external source and the storage on the de-vice; in this case the stored file in the device is thescript the will be run in a separate thread.

Instead, formula φ2 identifies the personal infor-mation gathering capability of the malicious payloadperforming following actions:

• “invokegetSystemService”: it represents an in-terface to global information about the appli-cation environment. This is an abstract classwhose implementation is provided by the An-droid system. It allows access to application-specific resources and classes, as well as up-callsfor application-level operations such as launchingactivities, broadcasting and receiving intents;

• “checkcastandroidtelephonyTelephonyManager”:it provides access to information about the tele-

Page 6: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

phony services on the device. Applicationscan use the methods in this class to determinetelephony services and states, as well as toaccess some types of subscriber information.Applications can also register a listener to re-ceive notification of telephony state changes;“invokegetDeviceId”: it is a method of the“TelephonyManager” class provided by An-droid environment and it returns the uniquedevice ID, for instance, the IMEI for GSMand the MEID or ESN for CDMA phones;“invokegetSubscriberId”: another method pro-vided by “TelephonyManager” class Returns theunique subscriber ID, for example, the IMSI for aGSM phone;

• “invokegetSimSerialNumber”: this method, be-longing to “TelephonyManager” class, it returnsthe serial number of the SIM inserted into the de-vice;

• “invokegetLineUunoNumber”: this method re-turns the phone number string for line 1;

• “new javaioBu f f eredReader”: an object belong-ing to the BufferedReader class is able to read textfrom a character-input stream and buffer charac-ters. In this case the buffer contains the personalinformation retrieved previously by the maliciouspayload.

The formula φ3 identifies the Command and Con-trol (C&C) list of instructions. Table 2 shows all thestrings of commands and reports a brief description ofthem. In some sample these strings are encrypted andthe algorithm used for decryption is shown in Figure3. The decryption module (as Figure 3 shows) usesthe XOR Byte to Byte with key 0x18 after decodingin Base 64. The first generation of GinMaster fam-ily exhibits the C&C instruction not encrypted, whilein the second one the encryption of the commands isintroduced. In order to evade detection by antimal-ware software, the second generation obfuscates classnames and encrypts URLs as well as C&C instruc-tions. It is impossible to catch this variant by detect-ing the class name or URLs. In both the generations,the malware has the capability of reporting packageinformation relating to packages installed/uninstalledin the system, searching and listing package infor-mation from remote websites, and downloading addi-tional applications to the device without the consentof the user.

The model checker accepts two inputs: the formalmodel of the app and the property expressing the mal-ware characteristics of the GinMaster family. If themodel checker returns true it means that we considerthe app belonging to the GinMaster family, while if

it returns false it means that the app can be eithertrusted of belong to another malware family. As for-mal verification environment in this paper we use theConcurrency Workbench of New Century (CWB-NC)[Cleaveland and Sims, 1996] which supports severaldifferent specification languages, among which CCS.

Since CWB-NC is no longer in active develop-ment, as future work we want to substitute CWB-NC with CALL (standing for Concurrency workbenchdeveloped at AALborg university) [Andersen et al.,2015], which supports CCS as input specification lan-guage (as CWB-NC), but uses a more efficient algo-rithm to perform model checking. Actually, there ex-ist mature tools with modern designs like CADP [Gar-avel et al., 2013] with expressive input languages andefficient analysis methods. However, our aim is to de-velop an initial rapid research prototype to evaluatehow our approach is able to identify GinMaster apps.For the same reason, logic properties are formulatedmanually, but as future work we plan to build rulesautomatically using, for instance, machine learning orclone detection.

An important feature of our approach is that an au-tomatic dissection of an app can be achieved, with theadvantage of the localization into the code of the in-structions that implement the malicious behavior. Tolocalize the payload, no manual inspection is needed.In fact, our approach is able to identify the exact po-sition of the instructions characterizing the maliciousbehaviour, with a precision at method level.

This is a very novel result in the malware analy-sis. In addition, starting from the consideration thatwe analyze Java bytecode, our methodology is able tocorrectly identifies the malicious payload also whentrivial obfuscation techniques (i.e., nop insertion, junkcode, call reordering) are applied to Java source code[Mercaldo et al., 2016b].

4 Experiment

In this section we discuss the experiment weperformed to evaluate the effectiveness of our ap-proach in recognizing GinMaster payload, discrimi-nating samples belonging to other Android malwarefamilies.

4.1 Dataset

The real world samples examined in the experimentwere gathered from the Drebin project’s dataset [Arpet al., 2014, Spreitzenbarth et al., 2013]: a very wellknown collection of malware used in many scien-tific works, which includes the most diffused Android

Page 7: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Table 2: The list of C&C commands.

ACTION STRING DESCRIPTIONA ”htt p : //client.go360days.com/report/ f irst run.do” Report the starting of GinMaster.B ”htt p : //client.go360days.com/request/tableclass.do” show information stored in SQLite databaseC ”htt p : //client.go360days.com/request/con f ig.do” Change the frequency configuration for checking into the server.D ”htt p : //client.go360days.com/request/alert.do” alert last idE ”htt p : //client.go360days.com/request/push.do” soft last idF ”htt p : //client.go360days.com/report/return con f ig.do” show configurationG ”htt p : //client.go360days.com/report/return alert.do” send alertH ”htt p : //client.go360days.com/report/return push.do” push file into the deviceI ”htt p : //client.go360days.com/report/install list.do” Report information when installing a list of packages.J ”htt p : //client.go360days.com/report/listener.do” check the communication between server and deviceK ”htt p : //client.go360days.com/client.php?action = so f t&so f t id = ” Get a link to a specified software.L ”htt p : //client.go360days.com/client.php?action = so f tlist&type = search&word = ” Search a list of software with a specified word.M ”htt p : //client.go360days.com/client.php?action = so f tlist” Get the list of the available softwareN ”htt p : //client.go360days.com/client.php?action = list&list id = 9” Get the software with the specified id

Figure 3: Decryption algorithm used to decode the C&C list of commands.

families.Malware dataset is also partitioned according to

the malware family: each family contains sampleswhich have in common several characteristics, likepayload installation, the kind of attack and events thattrigger malicious payload [Zhou and Jiang, 2012].

Table 3 shows the 10 malware families with thelargest number of applications in our malware datasetwith installation type, kind of attack and event acti-vating malicious payload.

The malware was retrieved from the Drebinproject [Arp et al., 2014, Spreitzenbarth et al., 2013]taking into account the top 10 most populous families.

We briefly describe the malicious payload actionfor the top 10 populous families in our dataset.

1. The samples of FakeInstaller family have themain payload in common but have different codeimplementations, and some of them also have anextra payload. FakeInstaller malware is server-side polymorphic, which means the server couldprovide different .apk files for the same URL re-quest. There are variants of FakeInstaller that notonly send SMS messages to premium rate num-bers, but also include a backdoor to receive com-mands from a remote server. There is a largenumber of variants for this family, and it has dis-tributed in hundreds of websites and alternativemarkets. The members of this family hide their

malicious code inside repackaged version of pop-ular applications. During the installation processthe malware sends expensive SMS messages topremium services owned by the malware authors.

2. DroidKungFu installs a backdoor that allows at-tackers to access the smartphone when they wantand use it as they please. They could even turnit into a bot. This malware encrypts two knownroot exploits, exploit and rage against the cage, tobreak out of the Android security container. Whenit runs, it decrypts these exploits and then contactsa remote server without the user knowing.

3. Plankton uses an available native functionality(i.e., class loading) to forward details like IMEIand browser history to a remote server. It ispresent in a wide number of versions as harmfuladware that download unwanted advertisementsand it changes the browser homepage or add un-wanted bookmarks to it.

4. The Opfake samples make use of an algorithm thatcan change shape over time so to evade the anti-malware. The Opfake malware demands paymentfor the application content through premium textmessages. This family represents an example ofpolymorphic malware in Android environment: itis written with an algorithm that can change shapeover time so to evade any detection by signaturebased antimalware.

Page 8: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Table 3: Families in Drebin dataset with details of the installation method (standalone, repackaging, update), the kind of attack(trojan, botnet), the events that trigger the malicious payload and a brief family description.

Family Installation Attack Activation DescriptionFakeInstaller s t,b server-side polymorphic familyPlankton s,u t,b it uses class loading to forward detailsDroidKungFu r t boot,batt,sys it installs a backdoorGinMaster r t boot malicious service to root devicesBaseBridge r,u t boot,sms,net,batt it sends information to a remote serverAdrd r t net,call it compromises personal dataKmin s t boot it sends info to premium-rate numbersGeinimi r t boot,sms first Android botnetDroidDream r b main botnet, it gained root accessOpfake r t first Android polymorphic malware

5. GinMaster family contains a malicious servicewith the ability to root devices to escalate priv-ileges, steal confidential information and sendto a remote website, as well as install applica-tions without user interaction. It is also a tro-jan application and similarly to the DroidKungFufamily the malware starts its malicious servicesas soon as it receives a BOOT COMPLETEDor USER PRESENT intent. The malware cansuccessfully avoid detection by mobile anti-virussoftware by using polymorphic techniques to hidemalicious code, obfuscating class names for eachinfected object, and randomizing package namesand self-signed certificates for applications.

6. BaseBridge malware sends information to a re-mote server running one ore more malicious ser-vices in background, like IMEI, IMSI and otherfiles to premium-rate numbers. BaseBridge mal-ware is able to obtain the permissions to use Inter-net and to kill the processes of antimalware appli-cation in background.

7. Kmin malware is similar to BaseBridge, but doesnot kill antimalware processes.

8. Geinimi is the first Android malware in the wildthat displays botnet-like capabilities. Once themalware is installed, it has the potential to receivecommands from a remote server that allows theowner of that server to control the phone. Gein-imi makes use of a bytecode obfuscator. The mal-ware belonging to this family is able to read, col-lect, delete SMS, send contact informations to aremote server, make phone call silently and alsolaunch a web browser to a specific URL to startfiles download.

9. Adrd family is very close to Geinimi but with lessserver side commands, it also compromises per-sonal data such as IMEI and IMSI of infected de-vice. In addiction to Geinimi, this one is able to

modify device settings.

10. DroidDream is another example of botnet, itgained root access to device to access unique iden-tification information. This malware could alsodownloads additional malicious programs withoutthe user’s knowledge as well as open the phone upto control by hackers. The name derives from thefact that it was set up to run between the hours of11pm and 8am when users were most likely to besleeping and their phones less likely to be in use.

4.2 Evaluation

To estimate the detection performance of our method-ology we compute the metrics of precision and recall,F-measure (Fm) and Accuracy (Acc), defined as fol-lows:

PR =T P

T P+FP; RC =

T PT P+FN

;

Fm =2PR RCPR+RC

; Acc =T P+T N

T P+FN +FP+T Nwhere T P is the number of malware that was correctlyidentified in the GinMaster family (True Positives),T N is the number of malware correctly identified asnot belonging to the GinMaster family (True Nega-tives), FP is the number of malware that was incor-rectly identified in the GinMaster family (False Posi-tives), and FN is the number of malware that was notidentified as belonging to the GinMaster family (FalseNegatives).

Table 4 shows the results obtained using ourmethod.

We consider in the column GinMaster the sam-ples belonging to GinMaster family, while in the col-umn Malware the malware samples belonging to oth-ers families considered in the study: the detail aboutthe malicious payload of the family we considered isshown in Table 3. We demonstrate the effectiveness

Page 9: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

Table 4: Performance Evaluation

GinMaster Malware TP FP FN TN PR RC Fm Acc100 761 81 2 19 759 0.98 0.81 0.89 0.98

of our approach evaluating 100 malware belongingto GinMaster family and 761 malware belonging toother families.

Results in Table 4 seems to be very promising: weobtain an Accuracy equal to 0.9. Concerning the Gin-Master results, we are not able to identify the mali-cious payloads of just 2 samples on 100. It is worthnoting the above values are also due to the fact that thedataset is unbalanced, i.e., 100 malware belonging toGinMaster family and 761.

5 Conclusion and Future Work

The most common way to inject malicious pay-load in Android environment is represented by therepackaging attack, that basically consists to dis-tribute legitimate well-known applications with themalicious behaviour in order to lure users. In this pa-per we propose an approach, based on formal meth-ods, able to catch the malicious payload related toGinMaster family, one of the most populous repack-aged trojan embed in legitimate Android applications.GinMaster family is able to root Android devices inorder to execute shell scripts with admin privileges,in addition it is able to send personal user informationto the attacker using C&C server. We identified a setof rules specific to GinMaster payload behaviour andwe evaluate the effectiveness of our approach usinga dataset of real-world malware, obtaining an accu-racy equal to 0.9. As future work, we plan to test ourapproach on mobile malware belonging to other fam-ilies that exhibit trojan behaviour to evaluate the ruleset on families with similar payload.

Acknowledgements

This work has been partially supported by H2020EU-funded projects NeCS and C3ISP and EIT-DigitalProject HII.

REFERENCES

Andersen, J. R., Andersen, N., Enevoldsen, S.,Hansen, M. M., Larsen, K. G., Olesen, S. R.,Srba, J., and Wortmann, J. K. (2015). CAAL:

concurrency workbench, aalborg edition. In The-oretical Aspects of Computing - ICTAC 2015 -12th International Colloquium Cali, Colombia,October 29-31, 2015, Proceedings, pages 573–582.

Arp, D., Spreitzenbarth, M., Huebner, M., Gascon,H., and Rieck, K. (2014). Drebin: Efficientand explainable detection of android malwarein your pocket. In Proceedings of 21th AnnualNetwork and Distributed System Security Sym-posium (NDSS). IEEE.

Barbuti, R., De Francesco, N., Santone, A., and Tesei,L. (2002). A notion of non-interference for timedautomata. Fundamenta Informaticae, 51(1-2):1–11. cited By 6.

Barbuti, R., Francesco, N. D., Santone, A., andVaglini, G. (2005). Reduced models for efficientCCS verification. Formal Methods in System De-sign, 26(3):319–350.

Battista, P., Mercaldo, F., Nardone, V., Santone, A.,and Visaggio, C. A. (2016). Identification ofandroid malware families with model checking.In International Conference on Information Sys-tems Security and Privacy. SCITEPRESS.

Bernardeschi, C., De Francesco, N., Lettieri, G., andMartini, L. (2004). Checking secure informationflow in java bytecode by code transformation andstandard bytecode verification. Software - Prac-tice and Experience, 34(13):1225–1255.

Canfora, G., Mercaldo, F., and Visaggio, C. A.(2016). An hmm and structural entropy baseddetector for android malware: An empiricalstudy. Computers & Security, 61:1–18.

Cleaveland, R. and Sims, S. (1996). The ncsu concur-rency workbench. In CAV. Springer.

De Ruvo, G., Nardone, V., Santone, A., Ceccarelli,M., and Cerulo, L. (2015). Infer gene regulatorynetworks from time series data with probabilisticmodel checking. pages 26–32. cited By 7.

Fedler, R., Schutte, J., and Kulicke, M. (2014). Onthe effectiveness of malware protection on an-droid: An evaluation of android antivirus apps,http://www.aisec.fraunhofer.de/.

Garavel, H., Lang, F., Mateescu, R., and Serwe, W.(2013). CADP 2011: a toolbox for the construc-tion and analysis of distributed processes. STTT,15(2):89–107.

Page 10: Identifying Mobile Repackaged Applications through …...In order to mitigate the malware trend in Febru-ary 2011, Google introduced Bouncer [GoogleMo-bile, 2014] to screen submitted

GoogleMobile (2014).http://googlemobile.blogspot.it/2012/02/android-and-security.html.

Isohara, T., Takemori, K., and Kubota, A. (2011).Kernel-based behavior analysis for android mal-ware detection. In Proceedings of Seventh In-ternational Conference on Computational Intel-ligence and Security, pp. 1011-1015.

Jacob, G., Filiol, E., and Debar, H. (2010). Formal-ization of viruses and malware through processalgebras. In International Conference on Avail-ability, Reliability and Security (ARES 2010).IEEE.

Kinder, J., Katzenbeisser, S., Schallhart, C., andVeith, H. (2005). Detecting malicious code bymodel checking. Springer.

Liang, S. and Du, X. (2014). Permission-combination-based scheme for android mobilemalware detection. In International Conferenceon Communications, pages 2301–2306.

Marforio, C., Aurelien, F., and Srdjan, C.(2011). Application collusion attack onthe permission-based security model andits implications for modern smartphonesystems, ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/724.pdf.

Mercaldo, F., Nardone, V., Santone, A., and Visag-gio, C. A. (2016a). Download malware? No,thanks. How formal methods can block updateattacks. In Formal Methods in Software Engi-neering (FormaliSE), 2016 IEEE/ACM 4th FMEWorkshop on. IEEE.

Mercaldo, F., Nardone, V., Santone, A., and Visaggio,C. A. (2016b). Ransomware steals your phone.formal methods rescue it. In International Con-ference on Formal Techniques for DistributedObjects, Components, and Systems, pages 212–221. Springer.

Milner, R. (1989). Communication and concurrency.PHI Series in computer science. Prentice Hall.

Neuhaus, S. and Zimmermann, T. (2010). Securitytrend analysis with cve topic models. In Soft-ware reliability engineering (ISSRE), 2010 IEEE21st international symposium on, pages 111–120. IEEE.

Oberheide, J. and Miller, C. (2012). Dissect-ing the android bouncer. In SummerCon,https://jon.oberheide.org/files/summercon12-bouncer.pdf.

Reina, A., Fattori, A., and Cavallaro, L. (2013).A system call-centric analysis and stimulation

technique to automatically reconstruct androidmalware behaviors. In Proceedings of EuroSec.

Santone, A. (2011). Clone detection through processalgebras and java bytecode. pages 73–74. citedBy 10.

SecureList (2015). https://securelist.com/analysis/kaspersky-security-bulletin/73839/mobile-malware-evolution-2015/.

Song, F. and Touili, T. (2001). Efficient malware de-tection using model-checking. Springer.

Song, F. and Touili, T. (2013). Pommade: Pushdownmodel-checking for malware detection. In Pro-ceedings of the 2013 9th Joint Meeting on Foun-dations of Software Engineering. ACM.

Song, F. and Touili, T. (2014). Model-checking forandroid malware detection. Springer.

Song, J., Han, C., Wang, K., Zhao, J., Ranjan, R., andWang, L. (2016). An integrated static detectionand analysis framework for android. Pervasiveand Mobile Computing.

Spreitzenbarth, M., Echtler, F., Schreck, T., Freling,F. C., and Hoffmann, J. (2013). Mobilesand-box: Looking deeper into android applications.In 28th International ACM Symposium on Ap-plied Computing (SAC). ACM.

Stirling, C. (1989). An introduction to modal and tem-poral logics for ccs. In Concurrency: Theory,Language, And Architecture, pages 2–20.

Tchakount, F. and Dayang, P. (2013). System callsanalysis of malwares on android. In Interna-tional Journal of Science and Tecnology (IJST)Volume, 2 No. 9.

Yerima, S. Y., Sezer, S., McWilliams, G., and Muttik,I. (2013). A new android malware detection ap-proach using bayesian classification. In Interna-tional Conference on Advanced Information Net-working and Applications, pages 121–128.

Yu, R. (2013). Ginmaster: a case study in androidmalware. In Virus bulletin conference, pages 92–104.

Zhao, Y.-B., Liu, S.-M., and Guo, S.-Q. (2014). Ex-traction and prediction of hot topics in networksecurity. In Computer Science and Network Se-curity, 2014 International Conference on, pages347–353.

Zhou, Y. and Jiang, X. (2012). Dissecting androidmalware: Characterization and evolution. In2012 IEEE Symposium on Security and Privacy,

pages 95–109. IEEE.