TOWARDS AUTOMATIC GENERATION OFSECURITY-‐CENTRIC DESCRIPTIONS FOR
ANDROID APPSMu Zhang (NEC Labs)Yue Duan (Syracuse Univ.)Qian Feng (Syracuse Univ.)Heng Yin (Syracuse Univ.)
What an app claims to do What the app actually doesvs.
Permissions: 1) Hard to read.
Felt et al. (SOUPS’12)
2) Insufficient to tell “HOW”
Motivation: Limitation of App Descriptions
Contacts
Phone
Internet
Device ID
Textual Desc.:Not really about security.WHYPER (Security’13)AutoCog (CCS’14)
Features
Pricing
Service
How-‐to
Send “Contacts” over “Internet”
DESCRIBEME: Automatically Deriving Textual Descriptions from Android Program Code
Existing Work: Automated Java Program Summarization
• In Software Engineering Context– Java Methods (ASE’10)– Method Parameters (ICPC’11) – Classes (ICPC’13) – Conditional Statements (ASE’10) – Algorithmic Structure (ICSE’11)
• We are dealing with a DIFFERENT problemExisting Work DESCRIBEME
Purpose Review legacy code Understand security risks
Audience Experienced developers Average users
Desc. Level Intra-‐procedural, structure-‐based Whole-‐program, semantic-‐level
Security-‐awareness1
Conciseness2
Human-‐understandability3
Challenges & Requirements
Approach Overview
Security-‐awareness1 Conciseness2
Human-‐understandability3Human-‐
understandability3
Conciseness2
Human-‐understandability3
Behavior Graph
API Node
API Prototype
Context
Constant set
Constant setStatic Analysis: 22K LOC
Data Dependency
Condition
Security-‐awareness1
Condition Analysis
• Extract only user-‐aware conditions – User Interface– Device Status– Natural Environment
• Present simple logic to users– Equation/Inequation
Our condition analysis is focused only on the conditions that users can observe and evaluate.
Human-‐understandability3
Subgraph Mining
Conciseness2 Human-‐understandability3
Graph Compression: Replace the subgraphs with single nodes
Discovered 109 patterns fundamentally due to design patterns in Android apps.
Natural Language GenerationOnce “a GUI component”, “be”, “clicked”
depending on if “the user”, “select”, “the Button ``Confirm’’ “
Once a GUI component is
clicked
depending on if the user selects the Button ``Confirm’’
Description: Once a GUI component is clicked, the app retrieves your phone number, and encodes the data into format “100/app_id=an1005/ani=%s/dest=%s/phone_number=%s/company=%s/”, and sends data to network, depending on if the user selects the Button “Confirm”.
AggregationConciseness2
EVALUATION: Correctness
Question 1: Is generated description correct?
Total # Correct Missing Desc. False Statement
65 55 6 4
1. Points-‐to Analysis2. Exception handling3. Reflection
Run DESCRIBEME over DroidBench
EVALUATION: Security-‐Awareness
Question 2: Developer’s descriptions cannot faithfully reflect the usage of permissions. Can we do better?
EVALUATION: Improvement of Conciseness
Question 3: Is subgraph mining effective?
EVALUATION: Readability
Question 4: Can average users read the machine generated descriptions?
EVALUATION: Human-‐Understandability
Question 5: Can our descriptions help users avoid risks?
App Download Rate w/ old desc. w/ new desc.
Malware 63.4% 24.7%
Privacy-‐breaching 80.0% 28.2%
Clean 71.1% 59.3%
Conclusion
• We propose a novel technique that automatically describes security-‐related app behaviors to the end users in natural language.
• We implement DESCRIBEMEwhich combines program analysis, subgraph mining and natural language generation to create security-‐centric, concise and human-‐readable descriptions.
Related Work[1] Sridhara et al., Towards Automatically Generating Summary Comments for Java Methods, in ASE’10.[2] Buse et al., Automatically Documenting Program Changes, in ASE’10.[3] Sridhara et al., Automatically Detecting and Describing High Level Actions Within Methods, in ICSE’11.[4] Sridhara et al., Generating Parameter Comments and Integrating with Method Summaries, in ICPC’11.[5] Moreno et al., Automatic Generation of Natural Language Summaries for Java Classes, in ICPC’13.[6] Pandita et al., WHYPER: Towards Automating Risk Assessment of Mobile Applications, in USENIX Security’13[7] Qu et al., AutoCog: Measuring the Description-‐to-‐permission Fidelity in Android Applications
THANK YOU!
UI-related Triggering Conditions
• UI Analysis: – to correlate what the user sees to what the app does
19
<public type=“id” name=“send”
id=“0x7f040003” />
res/values/public.xml
<string name=“send_binarysms”>Send binary sms (to port 8091)</string>
res/values/strings.xml<CheckBox android:id=
“@+id/send” android:text=“@string/send_binarysms”/>
res/layout/main.xml
<id=“0x7f040003”,id name = “send”>
<string name=“send_binarysms”, text=“Send binary sms (to port 8091)”>
<id name = “send”, type=“CheckBox” , string name=“send_binarysms”>
<id=“0x7f040003”,type = “CheckBox”,
text = “Send binary sms (to port 8091)”>
Send binary sms (to port 8091)
Subgraph Mining
SmsManager.getDefault()
SmsManager.sendTextMessage()
1. Singleton Retrieval
divideMessage()
sendMultipartTextMessage()
2. Workflow
getLastKnownLocation()
getLongitude()
3. Hierarchical Data
getLatitude()
Description Model
• 3-‐tuple for APIs– createFromPdu(): {“the app”, “retrieve”, “incoming SMS message”}
• Manually modeling 306 APIs and 103 patterns• Guideline for Word Selection– Straightforward– Distinguishable– Counterexamples:
“Blow into the mic to extinguish the flame like a real candle”“You can now turn recordings into ringtones”
Human-‐understandability3