how does code obfuscation impact energy usage?

How does code obfuscation impact energy usage?

Cagri Sahin1, Mian Wan2, Philip Tornquist1, Ryan McKenna1, Zachary Pearson1,William G. J. Halfond2 and James Clause1,*,†

1Computer Science Department, University of Delaware, Newark, DE, USA2Department of Computer Science, University of Southern California, Los Angeles, CA, USA

ABSTRACT

Software piracy is an important concern for application developers. Such concerns are especially relevant inmobile application development, where piracy rates can be greater than 90%. The most common approachused by mobile developers to prevent piracy is code obfuscation. However, the decision to apply such trans-formations is currently made without regard to the impacts of obfuscations on another area of increasingconcern for mobile application developers, energy usage. Because both software piracy and battery lifeare important concerns, mobile application developers must strike a balance between protecting their appli-cations and preserving the battery lives of their users’ devices. To help them make such choices, we con-ducted an empirical study of the effects of 18 code obfuscations on the amount of energy consumed byexecuting a total of 21 usage scenarios spread across 11 Android applications on four different mobile phoneplatforms. The results of the study indicate that, while obfuscations can have a statistically significant impacton energy usage and are more likely to increase energy usage than to decrease energy usage, the magnitudesof such impacts are unlikely to be meaningful to mobile application users. Copyright © 2016 John Wiley &Sons, Ltd.

Received 6 February 2015; Revised 5 October 2015; Accepted 29 October 2015

KEY WORDS: mobile applications; energy usage; code obfuscation; empirical study

1. INTRODUCTION

Software piracy is an important concern for application developers. A 2011 study conducted by theBusiness Software Alliance found that over 40% of software is pirated, resulting in a potential lossof over $63bn (http://globalstudy.bsa.org/2011/index.html). Such concerns are especially relevant inmobile application development, where piracy rates can top 90% (http://www.businessinsider.com/android-piracy-problem-2015-1). The most common approach used by mobile developers to preventpiracy is code obfuscation—making the code of their applications more difficult for a human tounderstand. Both Microsoft and Google strongly recommend that developers obfuscate theirapplications (http://developer.android.com/tools/help/proguard.html; http://www.microsoft.com/en-us/download/details.aspx?id=7490). Google has even gone so far as to integrate obfuscation into thestandard Android build system.

In practice, there are many different types of transformations (e.g., renaming variables and methods;merging, splitting, and reordering code; and complicating control and data flow) that are used tosuccessfully obfuscate code. However, the decision to apply such transformations is performedwithout regard to their impacts on another area of increasing concern for mobile application

*Correspondence to: James Clause, Computer Science Department, University of Delaware, Newark, DE, USA.†E-mail: [email protected]

Copyright © 2016 John Wiley & Sons, Ltd.

JOURNAL OF SOFTWARE: EVOLUTION AND PROCESSJ. Softw. Evol. and Proc. (2016)Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/smr.1762

http://globalstudy.bsa.org/2011/index.html

http://www.businessinsider.com/android-piracy-problem-2015-1

http://www.businessinsider.com/android-piracy-problem-2015-1

http://www.microsoft.com/en-us/download/details.aspx?id=7490

http://www.microsoft.com/en-us/download/details.aspx?id=7490

developers, energy usage. As a result, an obfuscated application may consume an excessive amount ofpower, draining the battery and causing users to leave poor reviews or to request refunds (http://apigee.com/about/pressrelease/apigee-survey-users-reveal-top-frust rations-lead-bad-mobile-app-reviews).

Because both software piracy and battery life are important concerns, mobile application developersmust strike a balance between (i) protecting their applications and intellectual property and (ii)preserving the limited battery power of the devices where their applications will execute. A majorobstacle to striking an appropriate balance between these concerns is a lack of information abouthow changes to an application impact its energy usage. As a result, developers must either make apoorly informed choice or, more commonly, use an obfuscation tool’s default configuration.Unfortunately, these approaches often result in applications that either consume more energy thannecessary or are not protected as effectively as they could be.

To address the lack of information available to developers, we conducted an initial investigation intothe energy impacts of applying different code obfuscations [1]. This paper presents an empirical studythat improves on this prior work in several ways. First, we extended the original study by adding sixadditional user scenarios, increasing the number of considered scenarios from 15 to 21. Second, werepeated the extended study on three additional mobile phone platforms, increasing the number ofconsidered platforms from one to four. Appropriate text and figures describing the extensions wereadded to Sections 2 and 3. Together, these extensions improve the generalizability of the study byallowing us to investigate the impacts of obfuscations in a wider range of use cases and platforms.In addition, to extending and repeating the original study, we also provide a more detaileddescription of our energy measurement platforms (EMPs) and expanded and updated Section 4.

At a high level, the study presented in this paper is focused on addressing two major questions: (i) How dodifferent obfuscations alter the overall energy usage of an application? and (ii) Are the impacts of obfuscationslikely to be meaningful for mobile application users? To answer these questions, we used four obfuscationtools to create a total of 198 obfuscated versions of 11 Android applications. We executed each obfuscatedversion multiple times on four different mobile phone platforms with one or more different user scenariosand measured the amount of energy that was consumed. In total, we ran over 47,000 executions to gatherthe experimental data. We then performed a statistical analysis of the collected data to investigate theimpacts of the obfuscations on energy usage and answer our research questions.

The results of our study demonstrate the following:

(1) Obfuscations can, and often do, impact the energy usage of applications.(2) Obfuscations can both increase and decrease energy usage, but they are much more likely to in-

crease energy usage.(3) The magnitude of the impacts of obfuscations is comparable with the magnitude of the impacts

of other code level changes, such as applying refactorings.(4) The differences between the impacts of the considered obfuscation configurations and obfusca-

tion tools on energy usage are not statistically significant.(5) The impacts of obfuscation on battery life are unlikely to be meaningful to mobile application

users.

These results are positive for both mobile application users and mobile application developers.Developers can protect their applications without impacting the battery life of the devices wheretheir applications execute.

The remainder of this paper is organized as follows: Section 2 describes the methodology of ourstudy including our independent and dependent variables, considered applications and scenarios,obfuscation approaches, and data collection protocol. Section 3 presents and discusses the results ofthe study including potential threats to its validity. Finally, Sections 4 and 5 discuss related workand present our conclusions and future work.

2. EMPIRICAL STUDY

This section describes the details of our study design, including our independent and dependentvariables, considered applications and scenarios, obfuscation approaches, and data collection

CAGRI SAHIN ET AL.

Copyright © 2016 John Wiley & Sons, Ltd. J. Softw. Evol. and Proc. (2016)DOI: 10.1002/smr

http://apigee.com/about/pressrelease/apigee-survey-users-reveal-top-frust rations-lead-bad-mobile-app-reviews

http://apigee.com/about/pressrelease/apigee-survey-users-reveal-top-frust rations-lead-bad-mobile-app-reviews

protocol. In planning this work, we followed well-known guidelines for empirical study design [2]. Allof our experimental applications, artifacts, and summary data are publicly available at https://bitbucket.org/udse/obfuscation-study. Our raw experimental data are too large to host publicly but are availableupon request.

2.1. Experimental variables

In this study, we considered one dependent variable (the amount of energy consumed by the executionof an application) and one independent variable (the obfuscation applied to the application). To isolatethe impacts of changing our independent variable on our dependent variable, it was necessary tocontrol for the effects of several extraneous variables (e.g., unnecessary changes in the consideredapplication’s code and the inputs used to drive the application). The remainder of this sectiondescribes how we controlled for such extraneous variables.

2.1.1. Controlling for extraneous changes in an application’s code. In many cases, obfuscations arenot formally specified. Because of this, different tools may use the same name to refer to differentsequences of code changes. For example, many obfuscation tools provide a transformation calledstring encryption. At a high level, all of these transformations perform the same operation:encrypting the constant strings in an application so that they cannot be easily understood.However, the specific encryption algorithm used can vary greatly. This flexibility in nomenclaturecan be a potential source of bias and a potential source of confusion in interpreting the results ofthe study. If we compared the impacts of obfuscations that were inconsistently applied, we wouldessentially be comparing different transformations. Similarly, if a developer would apply asubstantially different set of code edits that happen to share the same name as one of theobfuscations that we considered, the results that they observe could be drastically different thanwhat we observed.

In order to avoid these potential problems, we ensured that all obfuscations were applied in aconsistent, repeatable, and well-documented manner. To accomplish this, we relied on severalcommonly used obfuscation tools (Section 4). By using preexisting automated tools, we areensuring that the changes we are making to our considered applications are the same changes that adeveloper would apply if they applied the same obfuscations using the same tool.

2.1.2. Controlling for inconsistencies in executing an application. In general, mobile applications areinteractive and event driven. They accept input, either from a user or from a sensor, perform somecomputation, and generate a response. In our experiments, this interactive nature can introduce apotential source of bias as it is difficult to manually reproduce a given execution exactly. Forexample, a user can often repeatedly perform the same sequence of actions (e.g., enter text into atextbox or click a button) but cannot maintain the same timing between the actions. Although suchdifferences may seem inconsequential, they may lead to observed differences in energy usage thatare not due to changing our independent variable, but rather to differences in how the application isdriven. In order to prevent such bias, it is necessary to be able to reproduce deterministically a givensequence of actions with great fidelity. Capture/replay tools provide this functionality.

Capture/replay tools are designed to allow for the deterministic replay of a sequence of recordedevents. Conceptually, this is accomplished by wrapping an application to insulate it from itsenvironment. When capturing, the wrapper records all of the events that are passed to theapplication from the environment. When replaying, the wrapper replaces the environment and passesthe recorded events to the application. Because precise timing information is recorded during thecapture process, there is very little variability in when events are passed to the application duringreplay. Hence, when using a capture replay tool, any observed variations in energy usage are morelikely to be the result of the obfuscations used rather than inconsistencies in driving the application.

We chose to use RERAN [3] as our capture/replay tool, because it is designed to record and replayAndroid applications. Also, RERAN has a lightweight implementation, and its run-time overhead islow, close to 1%.

CODE OBFUSCATION AND ENERGY USAGE


https://bitbucket.org/udse/obfuscation-study

https://bitbucket.org/udse/obfuscation-study

2.2. Considered applications

As the applications for our study, we used popular, easily accessible Android applications. We selectedAndroid applications for several reasons. First, Android application developers typically care aboutboth the security of their intellectual property and the energy efficiency of their applications.Second, there are many existing obfuscation tools that specifically target Android applications, or,more generally, operate on Java code, that we can use. Third, the source code of many Androidapplications is freely available, allowing us to easily create many different obfuscated versions.Finally, we have extensive infrastructure to run Android applications and measure their energy usage.

Table I lists the specific applications that we selected. The first two columns, Application andDescription, list the name of each application and a brief description of its functionality. The thirdcolumn, LoC, shows the application’s number of lines of code; and the final column, Size, shows thesize of the application’s compiled application package file. The LoC measurement includes only theapplication itself, while the size measurement includes both the application and its necessarylibraries. Because our considered obfuscation tools obfuscate both the application and its libraries,even when the source of such libraries is unavailable, we chose to report both measures to give abetter understanding of the amount of code that is being obfuscated.

We choose these specific applications for several reasons. First, they are representative of a widevariety of common application types (e.g., games, study aids, and productivity tools). Second, theyare popular and widely used. For example, calculator, calendar, and clock are part of the defaultAndroid installation. Finally, they are supported by RERAN. Although RERAN is generallyeffective at replaying user inputs, such as touch events, it does not support replaying networkconnections or other sensor readings (e.g., GPS). As such, we were unable to include applicationsthat depend on these types of inputs.

Note that in order to experiment with these applications successfully, we needed to modify themslightly. Primarily, the modifications were made to their build systems so that we could automatethe obfuscation processes, but in some cases we also needed to modify the application’s source codeto remove sources of randomness that are not handled by RERAN (e.g., we modified the randomnumber generator to use a fixed seed).

2.3. Considered usage scenarios

Our considered applications are driven primarily by user input. To create the inputs necessary fordriving the applications, we examined each application and created one or more usage scenarios.Our goal in creating these scenarios was to capture what we believe to be typical usage patterns forthe application (i.e., actions that users are likely to perform). By focusing on typical scenarios ratherthan scenarios designed to maximize other metrics such as coverage, we were able to gain a betterunderstanding of the impacts of obfuscations on a user’s daily interactions with their mobile device.

Table II shows the specific usage scenarios that we created. The first two columns, Application andName, show the application that is used in the scenario and a distinguishing name. For example,AnkiDroid has two scenarios, AnkiDroid: New Deck and AnkiDroid: Tutorial Deck. The third

Table I. Considered applications.

Application Description LoC Size (MB)

AnkiDroid Flashcard application 44,913 2.4Calculator Default Android calculator 1427 2.6Calendar Default Android calendar 41,715 1.4Clock Default Android clock 13,477 1.0DailyMoney Daily financial tracker 8723 0.4FrozenBubblePlus Bubble popping puzzle game 7517 0.2Nim Mathematical strategy game 1475 0.8OIFileManager File manager 7200 0.7OpenSudoku Sudoku game 6079 0.2SkyMap Astronomy application 10,921 0.7Tomdroid Note-taking application 7955 0.6

CAGRI SAHIN ET AL.


column, Description, provides a brief description of what user actions are performed during thescenario. For example, during the AnkiDroid: New Deck scenario, a new flash card deck is created,and five flash cards are added to the newly created deck. In total, we created 21 scenarios for ourapplications: three for clock, OIFileManager, and SkyMap; two for AnkiDroid, calculator,Dailymoney, and OpenSudoku; and one for calendar, FrozenBubblePlus, Nim, and Tomdroid.

2.4. Considered obfuscations

2.4.1. Obfuscation tools. We had two requirements when choosing obfuscation tools. These were thetools that could (i) obfuscate Android applications and (ii) be easily integrated into the standardAndroid build system. Because we are repeatedly obfuscating multiple applications, manuallyapplying obfuscations is infeasible. Unfortunately, these requirements eliminated the majority of thefree or open source Java obfuscation tools. While such tools can work well for standard Javasoftware, either they introduce changes that result in invalid Android applications when theobfuscated class files are converted to the dex format or they cannot be integrated into the Androidbuild system. The only free obfuscation tool that we found that met our requirements was Proguard4.10 (http://proguard.sourceforge.net), which is the obfuscation tool that is bundled with the AndroidSoftware Development Kit.

Because of the limited number of free tools that met our requirements, we also consideredcommercial obfuscation tools. Here, we found tools that were more likely to fulfill our requirements.However, their trial or evaluation versions are often limited in functionality (e.g., they onlyobfuscate parts of an application, or do not support the full suite of configuration options). As such,they are not suitable for our study. To obtain full-featured versions, we emailed the tool developersand asked if they would be willing to donate a copy of their obfuscation tool. As the result of thisprocess, we obtained copies of three commercial obfuscation tools: Allatori 4.7 (http://www.allatori.com), DashO 7.2 (http://www.preemptive.com/products/dasho), and Zelix KlassMaster 6.1.3 (ZKM)(http://www.zelix.com/klassmaster/).

2.4.2. Obfuscation configurations. After reading the manuals of Allatori, DashO, Proguard, andZKM, we identified several common high-level configurations or obfuscation types:

Table II. Considered usage scenarios.

Application Name Description

AnkiDroid New deck Create a new slide deck containing 5 cards.Tutorial deck Review the 20 cards in the tutorial deck.

Calculator Advance Perform several advanced arithmetic calculations.Standard Perform several basic arithmetic calculations.

Calendar Add event Add a new event, search for it, delete it.Clock Interval Create intervals while running the stopwatch.

Stopwatch Run the stopwatch for 10 s.Timer Run a 10-s countdown timer.

DailyMoney Add detail Enter two transactions.View lists View details and balances.

FrozenBubblePlus Level 1 Play the first level.Nim Easy AI Play 3 rounds with increasing difficulty levels.OIFileManager Create file Create 2 folders, nest folders, delete folders.

Play file View 4 pictures and play a ringtone 3 times.View file Open a file. Navigate directories.

OpenSudoku Easy level 1 Complete a single easy Sudoku grid.Hard level 1 Complete a single hard Sudoku grid.

SkyMap Find Mars Set time to a fixed past date, searches for Mars.Move zoom Arbitrarily zoom in/out, moves along the map.Show component Show each component, toggle night mode.

Tomdroid Notes Create a note, search for text, open the note,delete the note.



http://www.allatori.com

http://www.allatori.com

http://www.preemptive.com/products/dasho

http://www.zelix.com/klassmaster/

• Control-flow (cf): Produces spaghetti logic that is difficult or impossible to decompile by insertingbranching and conditional instructions into the body of a method.

• Rename (rename): Renames packages, classes, methods, and fields to short meaningless names(e.g., a and b) and, if possible, moves classes into a single package.

• Optimize (opt): Removes unused classes, fields, methods, and attributes; performs simplebytecode optimizations (e.g., peephole optimizations); and removes dead code.

• String encryption (se): All constant strings in the application are replaced with an encrypted ver-sion; decryption methods are added so that strings can be decrypted at runtime.

• All (all): Combines all other configurations supported by an obfuscation tool.

Information about the effectiveness of these types of obfuscations can be found in related work(e.g., [4, 5]). Note that, while the specific changes made by each tool for each configuration mayvary (e.g., different string encryption algorithms may be used or branches may be inserted indifferent locations), from the point of view of an application developer, the results are essentiallyidentical. In addition, not every configuration is supported by every tool.

Table III shows which configurations are supported by which tools. The first column, Obfuscationtool, shows our considered obfuscation tools; and the remaining five columns, all, opt, rename, cf, andse, show our considered configurations. A checkmark (✓) indicates that a configuration is supported bya tool, and a blank space indicates that a configuration is not supported. As the table shows, there are 18supported combinations. In the remainder of the paper, we will refer to a combination of an obfuscationtool and an obfuscation configuration as an obfuscation. To the best of our knowledge, the consideredobfuscations are deterministic in that multiple applications of the obfuscation to the same applicationproduce identical results.

2.5. Measurement platforms

To measure the amount of energy consumed by executing an Android application, we used fourcustom-built EMPs. Each EMP is based on a commercial Android smartphone platform: the firstEMP is based on a Nexus 3 with 32GB of storage running Android version 4.3 (Jelly Bean); thesecond EMP is based on a Nexus 4 with 8GB of storage running Android version 4.3 (Jelly Bean);the third EMP is based on a Samsung Galaxy S II with 16GB of storage running Android version4.3 (Jelly Bean); and the fourth EMP is based on a Samsung Galaxy S5 with 16GB of storagerunning Android version 4.4 (Kit Kat). Figure 1 shows pictures of the EMPs we built.

Instead of using the phone’s battery, the EMPs use an external source to power the devices. TheNexus 4 and Galaxy S5 EMPs use a 30-V, 5-A DC power supply (KORAD KA3005D), while theNexus 3 and the Galaxy S II EMPs use a Monsoon Power Monitor from Monsoon Solutions Inc(https://www.msoon.com/LabEquipment/PowerMonitor/). In both cases, using an external powersource ensures that the phone’s battery monitor senses a constant charge level and allows us tocompare results across executions without having to worry about variations in the physical battery’sperformance or the phone’s power-saving infrastructure.

To sample the voltage and current draw of the phone, the Nexus 4 and Galaxy S5 EMPs use twoArduino Unos, each equipped with an Adafruit INA219 High Side DC Current Sensor board. OneArduino is used to sense the voltage and current drawn from the DC power supply, and the other isused to sense the voltage and current draw over the phone’s USB port. The Nexus 3 and Galaxy SII EMPs use the Monsoon Power Monitor, which is equipped with a dual range, self-calibrating,

Table III. Considered obfuscations.

Obfuscation tool

Supported configurations

all opt rename cf se

Allatori ✓ ✓ ✓ ✓ ✓DashO ✓ ✓ ✓ ✓ ✓Proguard ✓ ✓ ✓ZKM ✓ ✓ ✓ ✓ ✓

CAGRI SAHIN ET AL.


https://www.msoon.com/LabEquipment/PowerMonitor/

integrating system. It has two current ranges with a 16-bit analogue-to-digital converter, one with ahigh-resolution range and the other with a low-resolution range. Software continuously calibrateseach of these and selects the proper range during measurement. Both the Arduino/INA219 and theMonsoon Power Monitor report voltage measurements in volts and current measurements inmilliamps.

Because our EMPs measure power consumption via hardware that is external to the phones, they donot introduce any measurement overhead to the application. This is ideal, because it means that we donot have to factor out the amount of energy consumed by the monitoring infrastructure itself. However,it also means that the EMPs and the phones do not share a single clock that can be used to identifywhich samples occurred during an execution of interest. A desktop computer can solve this problemby providing the global clock necessary for performing synchronization. By having the desktopcomputer start the execution of interest over Android Debug Bridge, it is possible to discard powersamples recorded before the start of the execution. Similarly, because the duration of the recordedscenarios is known, it is possible to identify samples that were recorded after the end of the execution.

While the EMP itself does not introduce measurement overhead, the replay infrastructure does—inorder to replay a recorded execution, RERAN installs an application on the phone that injects eventsinto the Android kernel’s device drivers. However, because the RERAN process spends most of itstime sleeping—it only wakes up to inject events—its overhead is negligible. In addition, because weare concerned with energy usage relative to a base line (i.e., before and after applying anobfuscation) rather than absolute numbers, and the energy costs are consistent across executions;factoring out this cost is not necessary.

2.6. Procedure

Figure 2 shows, at a high level, the procedure we followed in our study, divided into four main steps:Obfuscated Application Creation, Replay-able Execution Creation, Data Collection, and Post-processing. The remainder of this section describes these steps in detail.

2.6.1. Obfuscated application creation. The first step in our procedure was to create our set ofobfuscated applications. To create the necessary obfuscated versions, we obfuscated each application(Table I) using each obfuscation (Table III). In total, we created 198 obfuscated applications: 11applications, each with 18 obfuscated versions.

Figure 1. Design of the energy measurement platforms for the Nexus 3/Galaxy S II (a) and theNexus 4/Galaxy S5 (b).



2.6.2. Replay-able execution creation. The second step in our procedure was to create our set ofreplay-able executions. To create the replay-able executions, we manually performed the actionscontained in each scenario while using RERAN’s recording tool. Because the replays produced byRERAN are not portable across mobile phone platforms, we created four replay-able executions foreach scenario, one for each of our considered EMP platforms. This resulted in a total of 84 replay-able executions (21 scenarios ×4 platforms). As a sanity check, we then verified that RERAN couldaccurately replay each execution by running RERAN with the replay-able execution as input andobserving the replayed executions. Table IV shows the durations of the replay-able executions foreach scenario. The first two columns, Application and Name, show the scenario from Table II, andthe remaining four columns, Nexus 3 through Galaxy S5, show the duration of the replay-ableexecution for each platform in seconds.

2.6.3. Data collection. The third step in our procedure was to collect power usage data. To collectpower usage data, we used RERAN to replay each replay-able execution (Table IV) on thecorresponding EMP, using both the unobfuscated and obfuscated versions of the scenario’sapplication. For each EMP, each replay-able execution was executed on each version of the

Figure 2. High-level experimental procedure.

Table IV. Recorded execution durations.

Application Name

Duration (s)

Nexus 3 Nexus 4 Galaxy S II Galaxy S5

AnkiDroid New deck 87 128 83 129Tutorial deck 64 60 108 95

Calculator Advance 51 79 60 74Standard 54 58 57 43

Calendar Add event 108 108 113 104Clock Interval 28 65 32 64

Stopwatch 18 20 19 19Timer 21 19 23 20

DailyMoney Add detail 30 34 50 61View lists 14 16 23 30

FrozenBubblePlus Level 1 29 36 27 45Nim Easy AI 43 75 61 78OIFileManage Create file 46 60 80 58

Play file 60 59 52 55View file 22 18 22 36

OpenSudoku Easy level 1 138 273 237 172Hard level 1 145 135 223 129

SkyMap Find Mars 49 42 56 58Move zoom 21 65 19 63Show component 90 100 68 101

Tomdroid Notes 72 173 62 131

CAGRI SAHIN ET AL.


application (unobfuscated and obfuscated) 30 times as is suggested by well-known guidelines forempirical study design [2]. While each scenario was executing, we recorded the current and voltagemeasurements using the EMP.

To reduce the possibility of noise in the measurements, we terminated all unnecessary applicationsand processes and, when possible, enabled airplane mode. Although we eliminated many possiblesources of noise by carefully configuring the EMPs, small fluctuations in energy usage fromexecution to execution were still possible. For example, garbage collection or other operating-systemlevel processes that could not be disabled may have been able to impact energy usage. Multiple runs(i.e., 30) allowed us to perform a statistical analysis on the impact of obfuscations that took intoaccount the possibility of such fluctuations.

In total, we ran 47,880 executions—21 scenarios × (18 obfuscated versions +1 unobfuscatedversion) × 30 repetitions ×4 EMPs—which took ≈924 h (over 5weeks) of continuous execution timeand resulted in over 15GB of raw power consumption data.

2.6.4. Post-processing. The final step in our procedure was to post-process the collected data byfiltering it and converting it to a usable form. We first filtered the data to remove samples thatoccurred either before or after the execution. We then converted the current and voltage samples topower measurements in watts by multiplying them together and then dividing by 1000:watts = volts ×milliamperes ÷ 1000. Finally, we converted the resulting power measurements to totalenergy usage in joules by summing the results of multiplying each power measurement by thelength of time between itself and the following sample: joules =watts × seconds.

3. DATA ANALYSIS AND DISCUSSION

We refined our overall question of whether or not applying obfuscations can impact the energy usageof an application into the following specific research questions:

• RQ1—Impact. Do obfuscations impact the energy usage of an application? If so, how?• RQ2—Consistency. Are there any significant differences in the impacts of the considered obfus-cation tools or the considered obfuscation configurations?

• RQ3—Importance. Are the impacts of applying obfuscations likely to be meaningful or noticeableto a typical mobile application user?

The remainder of this section discusses the results of our study in terms of these research questions.Note that in answering these questions, we are only analyzing the data for each platform separately.Because the replay-able executions are not identical (Section 2.6.2), it would be inappropriate toanalyze the impacts of the obfuscations across platforms.

3.1. RQ1: Impact

To gather the data necessary to answer our first research question, we performed Mann–Whitney–Wilcoxon (wilcox) tests to determine whether the difference between the amount of energyconsumed by each scenario when run using the unobfuscated version of the application and eachobfuscated version of the application is statistically significant. To check for statisticalsignificance, we chose to use the Mann–Whitney–Wilcoxon test because we have one nominalvariable (the obfuscation applied to the application) and one measurement value (the amount ofenergy consumed by the execution), and the test does not require that the data be normallydistributed. The resulting p values were adjusted using Benjamini and Hochberg’s false discoveryrate controlling method to account for performing multiple comparisons [6]. We chose an α of0.05 and used R version 3.0.3’s implementation of the test (i.e., wilcox.test). Of the 1512 tests thatwe conducted, 378 (21 scenarios × 18 obfuscations) for each of our four platforms, 791 (≈52%)indicated a statistically significant difference in the amount of energy consumed by theunobfuscated and obfuscated versions. For each platform, the number of statistically significantdifferences was 229 (≈61%) for the Nexus 3, 282 (≈75%) for the Nexus 4, 107 (≈28%) for theGalaxy S II, and 173 (≈46%) for the Galaxy S5.



For the cases where there is a statistically significant difference (i.e., p≤ 0.05), we computed Varghaand Delaney’s A12 statistic

‡ to calculate the size of the effect of applying the obfuscation [7]. In general,the A 12 statistic ranges from 0 to 1 and indicates, on average, how often one technique outperformsanother: when A12 is exactly 0.5, the two techniques achieve equal performance; when A12 is lessthan 0.5, the first technique performs worse; and when A 12 is greater than 0.5, the second techniqueis worse. The closer A 12 is to 0 or 1, larger the effect. For our data, A12 represents the probabilitythat the unobfuscated version consumes more energy than the obfuscated version.

Figure 3a–d shows the A 12 statistics we calculated. In the figures, the y-axis shows the consideredscenarios, and the x-axis shows each obfuscation (combination of obfuscation tool and obfuscationconfiguration). For example, the first grouping shows the A12 statistics computed between theunobfuscated version of each application and the obfuscated versions produced by each obfuscationtool when using the all configuration. The color of each cell indicates the size and direction of theeffect. Cells colored blue indicate cases where the unobfuscated version is more likely to consumemore energy than the obfuscated version (i.e., A12 > 0:5 ), and cells that are colored red indicatecases where the unobfuscated version is more likely to consume less energy than the obfuscatedversion (i.e., A 12 < 0:5). In addition, the color’s saturation indicates the size of the effect with thehighest saturation indicating a large effect (A12 between 0.75 and 1.0 or between 0 and 0.25), amedium effect ( A 12 between 0.66 and 0.75 or between 0.25 and 0.33), or a small effect ( A12

between 0.5 and 0.66 or between 0.33 and 0.5). Absent values indicate cases where there is not astatistically significant difference in energy usage between the versions.

From this data, we observe that, when all platforms are considered, obfuscations have a generallynegative impact on energy usage (i.e., they increase energy usage). In 496 out of the 791 cases whenthere is a statistically significant difference in energy usage (≈63% of the time), the obfuscated versionis more likely to consume more energy than the unobfuscated version. In the remaining 295 cases(≈37% of the time), the obfuscated version is more likely to consume less energy than the unobfuscatedversion. In addition, the size of the effect is most often large: the effect size is large for 575 cases(≈73% of the time), medium for 204 cases (26% of the time), and small for 12 cases (≈1% of the time).

When considered individually, obfuscations also have a negative impact for applications that areexecuted on the Nexus 4 and Galaxy S5. In 235 out of the 282 cases (≈83% of the time) for theNexus 4 and 127 out of the 173 cases (≈73% of the time) for the Galaxy S5, when there is astatistically significant difference in energy usage, the obfuscated version is more likely to consumemore energy than the unobfuscated version. For applications that are executed on the Galaxy S II,the obfuscations have a more balanced impact. For only 55 out of the 107 cases (≈51% of thetime), the obfuscated version is more likely to consume more energy than the unobfuscated version.Finally, for applications that are executed on the Nexus 3, obfuscations have a more positive impact.For 150 out of the 229 cases (≈66% of the time), the obfuscated version is more likely to consumeless energy than the unobfuscated version.

Next, we investigated the magnitude of the differences caused by the obfuscations. To determine themagnitude of the differences, we again focused on the cases where there is a significant difference inenergy usage. For each combination of user scenario and obfuscation, we calculated the percentagechange in mean of the energy usage between the obfuscated and unobfuscated versions. The resultsof these computations are shown in Figure 4a–d. The layout of these figures is similar to the layoutof Figure 3a–d. The y-axis shows the usage scenarios, and the x-axis shows the obfuscations. Thecontent of each cell shows the percentage change in mean energy usage. Again, the color of eachcell indicates the direction and magnitude of the change. Blue cells indicate cases where thepercentage change is negative (i.e., energy usage decreased), and red cells indicate cases where thepercentage change is positive (i.e., energy usage increased); darker colors indicate larger values, andabsent values indicate cases where there is not a statistically significant difference in energy usage.

Across all platforms, the percentage change in mean energy usage ranges from approximately equalto �10.1% to ≈6.9% with a median and mean value of ≈0.5% and a standard deviation of ≈2.1

‡Vargha and Delaney’s A12 statistic is a simple linear transformation of Cliff’s δ: A12 ¼ δþ 1ð Þ=2. We prefer A12 becauseit is in the interval [0, 1], while δ is in the interval [�1, 1]. Eliminating the negative sign makes Figure 3 more readable.

CAGRI SAHIN ET AL.


percentage points. For the Nexus 3, the percentage change in mean energy usage ranges fromapproximately equal to �10.1% to ≈3.2% with a median value of approximately equal to �0.7%, amean value of approximately equal to �1.1%, and a standard deviation of ≈2.2 percentage points.For the Nexus 4, the percentage change in mean energy usage ranges from approximately equal to�3.7% to ≈6.6% with a median value of ≈1.2%, a mean value of ≈1.5%, and a standard deviationof ≈1.6 percentage points. For the Galaxy S II, the percentage change in mean energy usage rangesfrom approximately equal to �5.5% to ≈5.5% with a median value of ≈0.2%, a mean value of

Figure 3. Vargha and Delaney’s A12—probability that an unobfuscated version consumes more energy thanan obfuscated version when run on the a. Nexus 3 platform (wilcox, p≤ 0.05); b. Nexus 4 platform (wilcox,Benjamini and Hochberg correction, p ≤ 0.05); c. Galaxy S II platform (wilcox, Benjamini and Hochbergcorrection, p ≤ 0.05); and d. Galaxy S5 platform (wilcox, Benjamini and Hochberg correction, p≤ 0.05).



≈0.01%, and a standard deviation of≈2.0 percentage points. For the Galaxy S5, the percentage changein mean energy usage ranges from approximately equal to �1.6% to ≈6.9% with a median value of≈1.2%, a mean value of ≈1.2%, and a standard deviation of ≈1.6 percentage points.

From these data, it is clear that, while overall obfuscations are more likely to cause an increase inenergy usage than a decrease in energy usage, the magnitude of the change, regardless of direction, islikely to be less than 5%. When compared with the energy impacts of other code level changes, theenergy impacts of obfuscations are closer to the impacts of other focused changes (e.g., refactorings,whose impacts range from �7.50% to 4.54% [8] and switching application programming interface(API) implementations [9]) than to the impacts of more broad changes (e.g., applying design patterns,whose impacts can approach several hundred percent [10]).

Figure 3. Continued

CAGRI SAHIN ET AL.


Based on our investigations into the impacts of obfuscations on energy usage, we have found thefollowing:

(1) Obfuscations can, and often do, impact the energy usage of an application with statisticalsignificance.

(2) Individually, all of our considered obfuscation tools and obfuscation configurations can both in-crease and decrease energy usage.

Figure 4. Percent change in mean energy usage when using an obfuscated version instead of anunobfuscated version when run on the a. Nexus 3 platform (wilcox, p ≤ 0.05); b. Nexus 4 platform (wilcox,Benjamini and Hochberg correction, p ≤ 0.05); c. Galaxy S II platform (wilcox, Benjamini and Hochbergcorrection, p ≤ 0.05); and d. Galaxy S5 platform (wilcox, Benjamini and Hochberg correction, p≤ 0.05).



(3) Across all platforms, the likelihood of causing an increase in energy usage is higher than thelikelihood of causing a decrease in energy usage.

(4) Across all platforms, the magnitude of the percentage change in energy usage is most likely tobe less than 5%.

3.2. RQ2: Consistency

The goal of our second research question is to determine if there is a statistically significant benefit,with respect to energy usage, to using a specific obfuscation tool or specific obfuscationconfiguration. To answer this question, we performed several Kruskal–Wallis tests. We chose to use

Figure 4. Continued

CAGRI SAHIN ET AL.


the Kruskal–Wallis test because we want to compare one measurement value (the amount of energyconsumed by the execution) across multiple samples (obfuscation tools or obfuscationconfigurations), and we do not know if our data are normally distributed. We chose an α of 0.05 andused R version 3.1.2’s implementation of the test (i.e., kruskal.test). In general, if the p valuecalculated by the Kruskal–Wallis test is less than the chosen α, it indicates that at least one of thesamples is significantly different from the others. It does not indicate how many differences occur oramong which samples the differences exist. However, this information can be determined byrunning pairwise Mann–Whitney–Wilcoxon tests with an appropriate correction for performingmultiple comparisons (e.g., Bonferroni correction, and Benjamini and Hochberg correction).

Our first set of Kruskal–Wallis tests checks whether there are any statistically significant differencesin the percentage changes in mean energy usage among obfuscation tools for each obfuscationconfiguration. A p value less than our chosen alpha would indicate that one of the obfuscationconfigurations is statistically different from the others. The results of these computations can be seen inTable V. In this table, the first column, Obfuscation Configuration, shows the name of each obfuscationconfiguration. The next four columns, Nexus 3 through Galaxy S5, show the p value when eachplatform is considered individually; and the final column, All, shows the p value when all fourplatforms are considered together. Because the computed p values are never less than our chosen α(0.05), we cannot reject the null hypothesis. In practice, this means that, with respect to energy usage,there is no statistical benefit to picking one obfuscation tool over another. Consequently, developers arefree to choose their preferred obfuscation tool based on other factors, such as supported obfuscations,price, and ease of use, without having to worry about its impact on energy usage.

Our second set of Kruskal–Wallis tests check whether there are any statistically significantdifferences in the percentage changes in mean energy usage among the obfuscation configurationsfor each obfuscation tool. The result of these computations can be seen in Table VI. The format ofthe table is similar to that of Table V. The first column, Obfuscation Tool, shows the name of eachobfuscation tool. The remaining columns show the p value when each platform is consideredindividually, Nexus 3 through Galaxy S5, and together, All. Again, because the computed p valuesare never less than our chosen α (0.05), we cannot reject the null hypothesis. In practice, this meansthat, with respect to energy usage, there is no statistical benefit to picking one obfuscationconfiguration over another. Again, application developers are free to choose their preferredobfuscation configuration based on factors other than its impact on energy usage.

3.3. RQ3: Importance

Our first two research questions were primarily concerned with discovering if and how obfuscationsimpact the energy usage of applications. The goal of our third research question is to assess whetherthe observed impacts are likely to be meaningful or noticeable to mobile application users.

To answer this question, we first used Equation (1) to calculate, for each platform, the percentage ofbattery charge that is consumed by each scenario when it is executed using the unobfuscated version ofits application and when it is executed using the obfuscated versions of its application.

%charge ¼ E

V� 1000C�3600

�100 (1)

Table V. For an obfuscation configuration, is there a statistically significant difference among theobfuscation tools (%change ~ tool)?

Obfuscation configuration p value

Nexus 3 Nexus 4 Galaxy S II Galaxy S5 All

all 0.39 0.56 0.18 0.45 0.08opt 0.51 0.92 0.20 0.38 0.56rename 0.56 0.75 0.56 0.91 0.83cf 0.67 0.90 0.89 0.90 0.93se 0.80 0.52 0.92 0.87 0.90



In Equation (1), E is the amount of energy in joules consumed by an execution (here, we used themean energy usage of each version of our 30 trials), V is the output voltage of the platform’s battery involts, and C is the electric charge of the platform’s battery in milliampere hours. For the Nexus 3,V=3.7V and C=1900mAh; for the Nexus 4, V=3.8V and C=2100mAh; for the Galaxy S IIV=3.6V and C=1800mAh; and for the Galaxy S5 V=3.8V and C=2800mAh.

We then calculated, using Equation (2), the amount of time needed to drain Nexus 4’s battery fromfull to empty (i.e., battery life) if the scenario were executed continuously using each version of itsapplication.

tdrain ¼ 100%%charge

�D (2)

In Equation (2), %charge is the percentage of battery charge calculated using Equation (1) and D isthe duration of the scenario (Table IV). Note that the unit of measurement for tdrain will be the sameas the unit of measurement for D.

Table VII shows the results of this computation. In the table, the first two columns, Application andName, show the scenario; and the remaining columns, Nexus 3 through Galaxy S5, show, for eachplatform, the mean battery life in hours when the unobfuscated version is run continuously, drainingthe battery from full to empty.

Table VI. For an obfuscation tool, is there a statistically significant difference among the obfuscationconfigurations (%change ~ configuration)?

Obfuscation tool p value

Nexus 3 Nexus 4 Galaxy S II Galaxy S5 All

Allatori 0.35 0.73 0.37 0.82 0.95DashO 0.71 0.79 0.95 0.66 0.72Proguard 0.99 0.80 0.15 0.70 0.11ZKM 0.98 0.58 0.77 0.98 0.69

Table VII. Battery life when using an unobfuscated version.

Battery life (h)

Application Name Nexus 3 Nexus 4 Galaxy S II Galaxy S5

AnkiDroid New deck 4.6 4.6 7.7 8.8Tutorial deck 4.1 4.7 7.8 8.1

Calculator Advance 5.1 4.9 8.0 11.6Standard 5.5 5.4 8.7 12.1

Calendar Add event 3.7 4.2 7.8 8.0Clock Interval 4.9 4.2 8.9 7.0

Stopwatch 3.9 5.3 7.7 10.6Timer 4.2 5.0 8.3 9.8

DailyMoney Add detail 4.3 4.8 9.9 8.8View lists 3.9 4.4 7.7 9.7

FrozenBubblePlus Level 1 3.0 4.1 6.4 5.9Nim Easy AI 3.8 3.9 7.2 7.1OIFileManager Create file 4.6 4.6 9.9 10.4

Play file 4.5 4.5 8.7 9.1View file 4.0 4.1 7.6 5.7

OpenSudoku Easy level 1 4.7 5.4 9.0 9.6Hard level 1 4.8 4.7 8.6 9.2

SkyMap Find Mars 2.7 3.3 3.3 5.7Move zoom 2.9 3.0 2.2 5.6Show component 3.4 4.1 3.6 7.2

Tomdroid Notes 4.6 4.1 7.1 7.9

CAGRI SAHIN ET AL.


Finally, we computed the change in battery life for each scenario and obfuscation by subtractingthe battery life of each obfuscated version from the battery life of the unobfuscated version.Figure 5a–d shows the results of these computations. The five groupings in each figure show thechange in mean battery life in minutes when an obfuscated version is used instead of anunobfuscated version. Again, absent values indicate instances where there was no statisticallysignificant difference in energy usage between the application versions, and the color of each cellindicates the direction and magnitude of the change. Blue cells indicate obfuscations that

Figure 5. Change in mean battery life when using an obfuscated version instead of an unobfuscated versionwhen run on the (a) Nexus 3 platform (wilcox, Benjamini and Hochberg correction, p ≤ 0.05); (b) Nexus 4platform (wilcox, Benjamini and Hochberg correction, p≤ 0.05); (c) Galaxy S II platform (wilcox,Benjamini and Hochberg correction, p≤ 0.05); and (d) Galaxy S5 platform (wilcox, Benjamini and

Hochberg correction, p≤ 0.05).



increase battery life (i.e., changes that are beneficial for users), and red cells indicate obfuscationsthat decrease battery life (i.e., changes that are detrimental to users).

Across all configurations, the change in battery life for the Nexus 3 ranges from approximately equalto �8.4 to ≈22.0min with a mean value of ≈2.5min, a median value of ≈1.9min, and a standarddeviation of ≈4.6min. The change in battery life for the Nexus 4 ranges from approximately equalto �16.3 to ≈10.9min with a mean value of approximately equal to �3.9min, a median value ofapproximately equal to �3.1min, and a standard deviation of ≈4.2min. The change in battery lifefor the Galaxy S II ranges from approximately equal to �13.9 to ≈11.5min with a mean value ofapproximately equal to �1.1min, a median value of approximately equal to �0.9min, and astandard deviation of ≈5.6min and the change in battery life for the Galaxy S5 ranges from

Figure 5. Continued

CAGRI SAHIN ET AL.


approximately equal to �31.4 to ≈9.8min with a mean value of approximately equal to �4.6min, amedian value of approximately equal to �4.9min, and a standard deviation of ≈7.0min.

When only the All configuration is considered, the change in battery life for the Nexus 3 rangesfrom approximately equal to �3.3 to ≈14.1min with a mean value of ≈2.4min, a median value of≈2.1min, and a standard deviation of ≈3.9min. The change in battery life for the Nexus 4 rangesfrom approximately equal to �14.9 to ≈2.9min with a mean value of approximately equal to�3.5min, a median value of approximately equal to �3.2min, and a standard deviation of≈3.7min. The change in battery life for the Galaxy S II ranges from approximately equal to �13.2to ≈9.8min with a mean value of approximately equal to �0.4min, a median value of ≈0.7min,and a standard deviation of ≈6.2min; and the change in battery life for the Galaxy S5 ranges fromapproximately equal to �29.4 to ≈5.5min with a mean value of approximately equal to �5.7min, amedian value of approximately equal to �5.0min, and a standard deviation of ≈6.2min.

Based on these results, we believe that it is unlikely for an application user to notice a decrease inbattery life due to an obfuscation. The observed changes in battery life range from approximatelyequal to �31.4 to ≈22.0min, which, even for the maximum and minimum, represents a change ofless than 10% of the respective phone’s total battery life. Recall that these are the expected changesif the scenarios were executed continuously, draining the battery from full to empty. In practice, thisis unlikely because mobile phone users rarely use an application continuously.

In retrospect, this result makes sense. For mobile applications, recent studies show that themajority of energy is consumed by the phone’s screen, radios, and sensors [11]. The changes madeby the obfuscations do not change how the applications interact with or use these resources.Because the obfuscations make changes to parts of the application that do not consume muchenergy, the impacts of the obfuscations are overshadowed by the more energy-expensive parts ofthe execution.

While users are likely to be indifferent to this conclusion because obfuscations neither harm norimprove their battery life, it is good news for application developers. Now, developers are able toprotect their applications by applying obfuscations without needing to consider the obfuscation’simpacts on energy usage.

3.4. Threats to validity

One of the most significant threats to the validity of our results is the possibility of energy usagemeasurement errors due to either imprecise measurements or failing to control for potential sourcesof noise and nondeterminism. To minimize this threat, we took several steps. First, we designed andimplemented our EMPs with the help of experienced electrical engineers from the Department ofElectrical and Computer Engineering at the University of Delaware or used equipment that had beenindependently built and evaluated (i.e., Monsoon Power Monitor). Second, we offloaded themonitoring infrastructure to external hardware so that it could not add any overhead to the energyusage of the execution that was being measured. Third, we used an existing capture/replay tool todeterministically reproduce a recorded execution to eliminate biases in how the applications wereexecuted. Finally, we terminated all unnecessary applications and processes and, when possible,enabled airplane mode, which helped reduce noise and sources of nondeterminism caused byexternal processes (e.g., garbage collection).

An additional threat to validity is that we considered only 11 applications. Although we selectedthese applications to cover different application types, it is possible that they may not berepresentative of all applications for reasons that would not be readily apparent from interacting withthem. Similarly, considering only Android applications may prevent our conclusions fromgeneralizing to all environments where obfuscations are used (e.g., Windows Phone applications andnon-Android Java applications). In future work, we plan to address these threats by expanding ourstudy to include additional applications and additional devices.

A more specific concern is that, owing to the limitations of RERAN, our set of consideredapplications does not contain applications that make heavy use of the network or sensors beyond thetouch screen. However, although network connections and sensors are known to consume largeamount of power, the changes made by the obfuscation tools will not affect how the applications



use these resources (i.e., the number of types of calls made to these APIs will not be changed).Consequently, we believe that the results we observed will also hold for such applications.

Finally, our choice of usage scenarios for driving each application may not be representative of howthe applications are actually used in practice. However, we believe that this is unlikely because thescenarios were generated by using each application for its intended purposes.

4. RELATED WORK

The most closely related area of work is a recent group of papers that have attempted to identify theunderlying causes of energy usage by empirically investigating the impact of various softwaredevelopment decisions. More specifically, researchers have investigated the impacts of refactorings[8, 12], advertisements [13], design patterns [10, 14, 15], sorting algorithms [16], test suite selection[17], web servers [18], programming models [19, 20], API usage [9, 21], and lock-free datastructures [22] within a single application in addition to investigating trends in an application’senergy usage among versions [23] and among separate implementations of the same specification[24, 25]. In addition, researchers have investigated trends in an application’s energy usage amongversions [23] and how developers ask questions about energy usage [26].

Another area of related work is techniques that attempt to detect energy bugs (e.g., [24, 27–30]).Although such techniques are effective at detecting certain types of energy bugs, it is not clearwhether they would be able to detect unnecessary energy usage as a result of applying obfuscations.In general, these techniques can only detect bugs that result in large, abnormal spikes in energyusage or repeated increases in energy usage without a corresponding user action (e.g., polling theGPS when the screen is off). Applying an obfuscation is unlikely to cause either of these situationsto occur. Rather, it is likely to increase the overall energy usage by a small, constant amount.

In addition to our work at the high level, there is a significant amount of research on optimizingenergy usage at the programming language and system levels. At the programming language level,there are several approaches to helping developers write more energy-efficient software. Such workincludes new type of systems (e.g., [31]), new programming models (e.g., [32–34]), mechanisms forexposing energy-expensive architectural details (e.g., [35, 36]), and manipulating quality of service[37] and the precision of the results of the computation at runtime [38, 39]. At the compiler level,work has focused on optimizing code to use fewer instructions or a more efficient ordering ofinstructions; controlling hibernation, dynamic frequency, and voltage scaling; and remote taskmapping (e.g., [40–50]). At the operating-system level, work has focused on the goals of allowingan operating system to manage energy in the same manner as other system resources (e.g., [51]) andoptimizing the balance between power and performance via the automatic selection of powerpolicies during application execution (e.g., [52–54]). At the hardware level, work focused onreducing excessive CPU cycles (e.g., [55]), capping RAM energy usage (e.g., [56]), and the additionof special core to support common virtual machine operations (e.g., [57]). There is also significantwork in the area of high-performance computing (e.g., [58–64]).

Kansal et al. created Senergy, an API to improve the energy efficiency of applications that usesensor data [65], while Liu et al. created a technique to save energy when mobile applicationsaccess but do not use sensor data. Computation offloading (to another computer or the cloud) canprovide energy savings particularly useful for mobile devices as the battery is limited, and severaltools have been developed for helping programmers indicate which computations to offload [66–69].Li et al. created Nyx to rewrite the server-side code of web applications so that the resulting pageswould consume less energy when displayed on a smartphone [70].

The ability to measure the energy usage of a piece of software is a necessary prerequisite foroptimizing its energy usage. Work in the area of energy usage measurement has been conducted atvarious levels. Hardware instrumentation-based approaches (e.g., [71], https://www.wattsupmeters.com/secure/index.php) use physical instrumentation (i.e., soldering wires to power leads) to measurethe actual power usage of a system. Such approaches have the benefit of being accurate because theymeasure actual power usage; however, they are also expensive and difficult to use because theyrequire specialized hardware. Simulation-based approaches (e.g., [72–74]) use a cycle-accurate

CAGRI SAHIN ET AL.


https://www.wattsupmeters.com/secure/index.php

https://www.wattsupmeters.com/secure/index.php

simulator to replicate the actions of a processor at the architecture level and estimate energy usage ofeach executed cycle. Like hardware instrumentation-based approaches, simulation-based approachescan be accurate, but they are also difficult to use. Finally, estimation-based approaches (e.g., [20, 23,75–82]) build models of energy-influencing features and use such models to estimate energy usage.For example, Hao et al. and Seo et al. construct energy models of Java bytecode and use the modelsto estimate the energy usage of a given method or execution path [20, 76, 77]. Estimation-basedapproaches are frequently less accurate than hardware instrumentation-based or simulation-basedapproaches, but they have the benefits of being more widely applicable and easier to use.

5. CONCLUSIONS AND FUTURE WORK

We have presented an empirical study that investigated the impact of code obfuscations on the energyusage of mobile applications. We considered 11 commonly used Android applications, fourobfuscation tools, five obfuscation configurations, 21 usage scenarios, and four platforms. In total,we generated and ran over 47,000 executions on our EMPs.

The results of this study demonstrate the following:

(1) Obfuscations can, and often do, impact the energy usage of applications with statisticalsignificance.

(2) Obfuscations can both increase and decrease energy usage, but they are more likely to increaseenergy usage.

(3) The magnitude of the impacts of obfuscations is comparable with the magnitude of the impactsof other code level changes, such as applying refactorings.

(4) The differences between the impacts of the considered obfuscations on energy usage are not sta-tistically significant.

(5) The impacts of obfuscation on battery life are unlikely to be meaningful to mobile applicationusers.

As such, we believe that these results are positive news for both mobile application users and mobileapplication developers. Developers can protect their applications without impacting the battery life ofthe devices where their applications execute.

Based on these conclusions, there are several potential areas for future work.First, we plan on enlarging the scope of our study. Although we considered a significant number of

applications and obfuscations, adding additional applications, platforms (e.g., tablets), andarchitectures (e.g., Windows phone) would potentially allow us to confirm or refute our observations.

Second, we plan on investigating the underlying reasons for why the obfuscations exhibit theobserved impacts and whether other types of information can be used to accurately predict theimpacts of applying obfuscations. From the perspective of a software engineer, the answer to thisquestion has a high utility. In most cases, developers do not have access to the type of custom,hardware-based energy profiling tools that we do. As a result, they have no way of identifyingwhether obfuscating their code has or will increase or decrease energy usage. Providing them with theability to make accurate predictions about the impacts of applying obfuscations would be very useful.

ACKNOWLEDGEMENTS

This work is supported in part by National Science Foundation grant no. 1216488 and no. 1321141. Wewould also like to thank Smardec, PreEmptive Solutions, and Zelix Pty. Ltd. for providing full-featuredcopies of their obfuscation tools. Finally, we would like to thank Abram Hindle for his initial guidanceon building the Nexus 4-based EMP.

REFERENCES

1. Sahin C, Tornquist P, McKenna R, Pearson Z, Clause J. How do code obfuscations impact energy usage? Proceedingsof the 30th International Conference on Software Maintenance and Evolution, 2014; 131–140.

2. Arcuri A, Briand L. A practical guide for using statistical tests to assess randomized algorithms in software engineering.Proceedings of the 33rd International Conference on Software Engineering, 2011; 1–10.



3. Gomez L, Neamtiu I, Azim T, Millstein T. RERAN: timing- and touch-sensitive record and replay for Android.Proceedings of the 2013 International Conference on Software Engineering, 2013; 72–81.

4. Ceccato M, Di Penta M, Nagra J, Falcarin P, Ricca F, Torchiano M, Tonella P. The effectiveness of source codeobfuscation: an experimental assessment. Proceedings of 17th IEEE International Conference on Program Compre-hension, 2009; 178–187.

5. Ceccato M, Capiluppi A, Falcarin P, Boldyreff C. A large study on the effect of code obfuscation on the quality ofJava code. Empirical Software Engineering 2014; 1–39.

6. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society. Series B (Methodological) 1995; 57(1):289–300.

7. Vargha A, Delaney HD. A critique and improvement of the ‘CL’ common language effect size statistics of McGrawand Wong. Journal of Educational and Behavioral Statistics 2000; 25(2):101–132.

8. Sahin C, Pollock L, Clause J. How do code refactorings affect energy usage? Proceedings of the 8th InternationalSymposium on Empirical Software Engineering and Measurement, 2014; 36:1–36:10.

9. Manotas I, Pollock L, Clause J. Seeds: a software engineer’s energy-optimization decision support framework. Pro-ceedings of the 2014 International Conference on Software Engineering, 2014. 503–514.

10. Sahin C, Cayci F, Gutiérrez ILM, Clause J, Kiamilev FE, Pollock LL, Winbladh K. Initial explorations on designpattern energy usage. Proceedings of the First International Workshop on Green and Sustainable Software 2012;55–61.

11. Li D, Hao S, Gui J, Halfond WG. An empirical study of the energy consumption of Android applications. Proceed-ings of the International Conference on Software Maintenance and Evolution (ICSME) 2014.

12. da Silva WGP, Brisolara L, Correa UB, Carro L. Evaluation of the impact of code refactoring on embedded softwareefficiency. Proceedings of the 1st Workshop de Sistemas Embarcados 2010; 145–150.

13. Gui J, Mcilroy S, Nagappan M, Halfond WGJ. Truth in advertising: the hidden cost of mobile ads for softwaredevelopers. Proceedings of the 37th International Conference on Software Engineering (ICSE), 2015. To Appear.

14. Litke A, Zotos K, Chatzigeorgiou A, Stephanides G. Energy consumption analysis of design patterns. Proceedings ofthe International Conference on Machine Learning and Software Engineering, 2005; 86–90.

15. Christian Bunse SS. On the energy consumption of design patterns. Proceedings of the 2nd WorkshopEASED@BUIS Energy Aware Software-Engineering and Development, 2013; 7–8.

16. Bunse C, Hopfner H, Roychoudhury S, Mansour E. Choosing the ‘best’ sorting algorithm for optimal energy con-sumption. Proceedings of the 4th International Conference on Software and Data Technologies, 2009; 199–206.

17. Li D, Jin Y, Sahin C, Clause J, Halfond WGJ. Integrated energy-directed test suite optimization. Proceedings of theInternational Symposium on Software Testing and Analysis (ISSTA), 2014.

18. Manotas I, Sahin C, Clause J, Pollock L, Winbladh K. Investigating the impacts of web servers on web applicationenergy usage. Proceedings of the Second International Workshop on Green and Sustainable Software, 2013.

19. Cohen M, Zhu HS, Senem EE, Liu YD. Energy types. Proceedings of the ACM international conference on Objectoriented programming systems languages and applications, 2012; 831–850.

20. Seo C, Malek S, Medvidovic N. Component-level energy consumption estimation for distributed Java-based soft-ware systems. Proceedings of the 11th International Symposium on Component-Based Software Engineering,2008; 97–113.

21. Linares-Vasquez M, Bavota G, Cardenas CEB, Oliveto R, Penta MD, Poshyvanyk D. Mining energy-greedy API us-age patterns in Android apps: an empirical study. Proceedings of the 11th Working Conference on Mining SoftwareRepositories, 2014; 2–11.

22. Hunt N, Sandhu P, Ceze L. Characterizing the performance and energy efficiency of lock-free data structures. Pro-ceedings of the 15th Workshop on Interaction between Compilers and Computer Architectures, 2011; 63–70.

23. Hindle A. Green mining: a methodology of relating software change to power consumption. Proceedings of the 9thIEEE Working Conference on Mining Software Repositories, 2012; 78–87.

24. Pathak A, Jindal A, Hu YC, Midkiff SP. What is keeping my phone awake?: characterizing and detecting no-sleepenergy bugs in smartphone apps. Proceedings of the 10th International Conference on Mobile Systems, Applications,and Services, 2012; 267–280.

25. Arunagiri S, Jordan VJ, Teller PJ, Deroba JC, Shires DR, Park SJ, Nguyen LH. Stereo matching: performance studyof two global algorithms 2011; 80 211Z–80 211Z–17.

26. Pinto G, Castor F, Liu YD. Mining questions about software energy consumption. Proceedings of the 11th WorkingConference on Mining Software Repositories, 2014; 22–31.

27. Liu Y, Xu C, Cheung S. Where has my battery gone? Finding sensor related energy black holes in smartphoneapplications. Proceedings of the 2013 IEEE International Conference on Pervasive Computing and Communica-tions, 2013; 2–10.

28. Priyantha B, Lymberopoulos D, Liu J. LittleRock: enabling energy-efficient continuous sensing on mobile phones.IEEE Pervasive Computing 2011; 12–15.

29. Ma X, Huang P, Jin X, Wang P, Park S, Shen D, Zhou Y, Saul LK, Voelker GM. eDoctor: automatically diagnosingabnormal battery drain issues on smartphones. Proceedings of the 10th USENIX Conference on Networked SystemsDesign and Implementation, 2013; 57–70.

30. Wan M, Jin Y, Li D, Halfond WGJ. Detecting display energy hotspots in Android apps. Proceedings of the 8th IEEEInternational Conference on Software Testing, Verification and Validation (ICST), 2015. To Appear.

31. Liu YD. Energy-efficient synchronization through program patterns. Proceedings of the First International Work-shop on Green and Sustainable Software, 2012; 35–40.

CAGRI SAHIN ET AL.


32. Bartenstein TW, Liu YD. Green streams for data-intensive software. Proceedings of the 2013 International Confer-ence on Software Engineering, 2013; 532–541.

33. Sorber J, Kostadinov A, Garber M, Brennan M, Corner MD, Berger ED. Eon: a language and runtime system for perpet-ual systems. Proceedings of the 5th International Conference on Embedded Networked Sensor Systems, 2007; 161–174.

34. Mainland G, Morrisett G, Welsh M. Flask: staged functional programming for sensor networks. Proceedings of the13th ACM SIGPLAN International Conference on Functional Programming, 2008; 335–346.

35. Liu S, Pattabiraman K, Moscibroda T, Zorn BG. Flikker: saving dram refresh-power through critical datapartitioning. Proceedings of the Sixteenth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems, 2011; 213–224.

36. Lu YH, Benini L, De Micheli G. Requester-aware power reduction. Proceedings of the 13th International Sympo-sium on System Synthesis, 2000; 18–23.

37. Baek W, Chilimbi TM. Green: a framework for supporting energy-conscious programming using controlled approx-imation. Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementa-tion, 2010; 198–209.

38. Esmaeilzadeh H, Sampson A, Ceze L, Burger D. Architecture support for disciplined approximate programming.Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languagesand Operating Systems, 2012; 301–312.

39. Sampson A, Dietl W, Fortuna E, Gnanapragasam D, Ceze L, Grossman D. Enerj: approximate data types for safe andgeneral low-power computation. Proceedings of the 32Nd ACM SIGPLAN Conference on Programming LanguageDesign and Implementation, 2011; 164–174.

40. Hsu CH, Kremer U. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. Pro-ceedings of the ACM SIGPLAN 2003Conference on Programming LanguageDesign and Implementation, 2003; 38–48.

41. Hu C, Jiménez DA, Kremer U. Efficient program power behavior characterization. Proceedings of the 2nd Interna-tional Conference on High Performance Embedded Architectures and Compilers, 2007; 183–197.

42. Huang PK, Ghiasi S. Efficient and scalable compiler-directed energy optimization for realtime applications. ACMTransactions on Design Automation of Electronic Systems 2008; 12:27:1–27:16.

43. Kadayif I, Kandemir M, Chen G, Vijaykrishnan N, Irwin MJ, Sivasubramaniam A. Compiler-directed high-level en-ergy estimation and optimization. ACM Transactions in Embedded Computing Systems 2005; 4:819–850.

44. Kremer U. Low power/energy compiler optimizations. Low-Power Electronics Design 2005; 2–5.45. Rangasamy A, Nagpal R, Srikant Y. Compiler-directed frequency and voltage scaling for a multiple clock domain

microarchitecture. Proceedings of the 5th Conference on Computing Frontiers, 2008; 209–218.46. Saputra H, Kandemir M, Vijaykrishnan N, Irwin MJ, Hu JS, Hsu CH, Kremer U. Energy-conscious compilation

based on voltage scaling. Proceedings of the Joint Conference on Languages, Compilers, and Tools for EmbeddedSystems: Software and Compilers for Embedded Systems, 2002; 2–11.

47. Fraser CW, Hanson DR, Proebsting TA. Engineering a simple, efficient code-generator generator. ACM Letters onProgramming Languages and Systems 1992; 1:213–226.

48. Su CL, Tsui CY, Despain AM. Low power architecture design and compilation techniques for high-performance pro-cessors. Compcon Spring’94, Digest of Papers, 1994; 489–498.

49. Davidson JW, Jinturkar S.Memory access coalescing: a technique for eliminating redundant memory accesses. Proceed-ings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, 1994; 186–195.

50. Mehta H, Owens RM, Irwin MJ, Chen R, Ghosh D. Techniques for low energy software. Proceedings of the 1997International Symposium on Low Power Electronics and Design, 1997; 72–75.

51. Zeng H, Ellis CS, Lebeck AR, Vahdat A. ECOSystem: managing energy as a first class operating system resource.ACM SIGOPS Operating Systems Review 2002; 36:123–132.

52. Pettis N, Ridenour J, Lu YH. Automatic run-time selection of power policies for operating systems. Proceedings ofthe Conference on Design, Automation, and Test in Europe, 2006; 508–513.

53. Flautner K, Reinhardt S, Mudge T. Automatic performance setting for dynamic voltage scaling. Proceedings of the7th Annual International Conference on Mobile Computing and Networking, 2001; 260–271.

54. Pering T, Burd T, Brodersen R. Voltage scheduling in the IpARM microprocessor system. Proceedings of the 2000International Symposium on Low Power Electronics and Design, 2000; 96–101.

55. Srinivasan V, Shenoy GR, Vaddagiri S, Sarma D, Pallipadi V. Energy-aware task and interrupt management inLinux. Proceedings of the Linux Symposium, vol. 2, 2008.

56. David H, Gorbatov E, Hanebutte UR, Khanna R, Le C. RAPL: memory power estimation and capping. Proceedingsof the 16th ACM/IEEE International Symposium on Low Power Electronics and Design, 2010; 189–194.

57. Cao T, Blackburn SM, Gao T, McKinley KS. The yin and yang of power and performance for asymmetric hardwareand managed software. Proceedings of the 39th International Symposium on Computer Architecture, 2012; 225–236.

58. Heo S, Barr K, Asanović K. Reducing power density through activity migration. Proceedings of the 2003 Interna-tional Symposium on Low Power Electronics and Design, 2003; 217–222.

59. Monchiero M, Canal R, González A. Design space exploration for multicore architectures: a power/performance/thermal view. Proceedings of the 20th Annual International Conference on Supercomputing, 2006; 177–186.

60. Musoll E. A thermal-friendly load-balancing technique for multi-core processors. Proceedings of the 9th Interna-tional Symposium on Quality Electronic Design, 2008; 549–552.

61. Douglis F, Krishnan P, Bershad BN. Adaptive disk spin-down policies for mobile computers. Proceedings of the 2ndSymposium on Mobile and Location-Independent Computing, 1995; 121–137.



62. Kravets R, Krishnan P. Power management techniques for mobile communication. Proceedings of the 4th ACM/IEEE International Conference on Mobile Computing and Networking, 1998; 157–168.

63. Iyer A, Marculescu D. Power aware microarchitecture resource scaling. Proceedings of the 2001 Design, Automationand Test in Europe, Conference and Exhibition, 2001; 190–196.

64. Pouwelse J, Langendoen K, Sips H. Dynamic voltage scaling on a low-power microprocessor. Proceedings of the 7thInternational Conference on Mobile Computing and Networking, 2001; 251–259.

65. Kansal A, Saponas S, Brush AB, McKinley KS, Mytkowicz T, Ziola R. The latency, accuracy, and battery (LAB)abstraction: programmer productivity and energy efficiency for continuous mobile context sensing. Proceedings ofthe 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Appli-cations, 2013; 661–676.

66. Kemp R, Palmer N, Kielmann T, Bal H. Cuckoo: a computation offloading framework for smartphones. MobileComputing, Applications, and Services. Second International ICST Conference, MobiCASE 2010, Santa Clara,CA, USA, October 25–28, 2010, Revised Selected Papers, 2012, doi:10.1007/978-3-642-29336-8_4.

67. Kumar K, Liu J, Lu YH. Bhargava B. A survey of computation offloading for mobile systems: Mobile NetworkApplications, 2012.

68. Chun BG, Maniatis P. Augmented smartphone applications through clone cloud execution. Proceedings of the 12thconference on Hot topics in operating systems, HotOS’09, USENIX Association: Berkeley, CA, USA, 2009; 8–8.URL http://dl.acm.org/citation.cfm?id=1855568.1855576.

69. Chen HY, Lin YH, Cheng CM. COCA: computation offload to clouds using AOP. Proceedings of the 2012 12thIEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID’12, IEEEComputer Society: Washington, DC, USA, 2012; 466–473, doi:10.1109/CCGrid.2012.98. URL 10.1109/CCGrid.2012.98.

70. Li D, Tran AH, Halfond WGJ. Making web applications more energy efficient for OLED smartphones. Proceedingsof the International Conference on Software Engineering (ICSE), 2014.

71. Singh D, Peterson PAH, Reiher PL, Kaiser WJ. The Atom LEAP Platform for Energy-efficient Embedded Comput-ing: Architecture, Operation, and System Implementation. University of California: Los Angeles, 2010.

72. Gurumurthi S, Sivasubramaniam A, Irwin MJ, Vijaykrishnan N, Kandemir M, Li T, John LK. Using completemachine simulation for software power estimation: the SoftWatt approach. Proceedings of the 8th International Sym-posium on High-Performance Computer Architecture, 2002; 141–151.

73. Mudge T, Austin T, Grunwald D. The reference manual for the Sim-Panalyzer, version 2.0. http://web.eecs.umich.edu/ panalyzer/.

74. Brooks D, Tiwari V, Martonosi M. Wattch: a framework for architectural-level power analysis and optimizations.Proceedings of the 27th annual International Symposium on Computer Architecture, 2000; 83–94.

75. Tiwari V, Malik S, Wolfe A. Power analysis of embedded software: a first step towards software power minimiza-tion. Proceedings of the 1994 IEEE/ACM International Conference on Computer-aided Design, 1994; 384–390.

76. Seo C, Malek S, Medvidovic N. An energy consumption framework for distributed Java-based systems. Proceedingsof the twenty-second IEEE/ACM International Conference on Automated Software Engineering, 2007; 421–424.

77. Hao S, Li D, Halfond W, Govindan R. Estimating Android applications’ CPU energy usage via bytecode profiling.First International Workshop on Green and Sustainable Software 2012; 1–7.

78. Dong M, Zhong L. Self-constructive, high-rate energy modeling for battery-powered mobile systems. Proceedings ofthe ACM/USENIX International Conference on Mobile Systems, Applications, and Services, 2011.

79. Noureddine A, Bourdon A, Rouvoy R, Seinturier L. A preliminary study of the impact of software engineering onGreenIT. First International Workshop on Green and Sustainable Software 2012; 21–27.

80. Amsel N, Tomlinson B. Green tracker: a tool for estimating the energy consumption of software. Proceedings of the28th International Conference on Human Factors in Computing Systems: Extended Abstracts, 2010; 3337–3342.

81. Li D, Hao S, Halfond WG, Govindan R. Calculating source line level energy information for Android applications.Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), 2013.

82. Hao S, Li D, Halfond WGJ, Govindan R. Estimating mobile application energy consumption using program analysis.Proceedings of the 35th International Conference on Software Engineering (ICSE), 2013.

CAGRI SAHIN ET AL.


http://web.eecs.umich.edu/panalyzer/

http://web.eecs.umich.edu/panalyzer/

how does code obfuscation impact energy usage?

Documents