acase-basedreasoningapproachforautomaticadaptationof...

15
Research Article A Case-Based Reasoning Approach for Automatic Adaptation of Classifiers in Mobile Phishing Detection SanKyawZaw and Sangsuree Vasupongayya Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University, Hatyai, Songkhla 90112, ailand Correspondence should be addressed to Sangsuree Vasupongayya; [email protected] Received 5 April 2019; Accepted 28 May 2019; Published 19 June 2019 Guest Editor: Arash H. Lashkari Copyright © 2019 San Kyaw Zaw and Sangsuree Vasupongayya. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Currently, the smartphone contains lots of sensitive information. e increasing number of smartphone usage makes it more interesting for phishers. Existing phishing detection techniques are performed on their specific features with selected classifiers to get their best accuracy. An effective phishing detection approach is required to adapt the concept drift of mobile phishing and prevent degradation in accuracy. In this work, an adaptive phishing detection approach based on case-based reasoning technique is proposed to handle the concept drift challenge in phishing apps. Several experiments are conducted in order to demonstrate the design decision of our proposed model. e proposed model is evaluated with a large feature set containing 1,065 features from 10 different categories. ese features are extracted from more than 10,000 android applications. Five combinations of features are created in order to mimic new real-world Android apps to evaluate our experiments. Moreover, a reduced feature set is also studied in this work in order to improve the efficiency of the proposed model. Both accuracy and efficiency of the proposed model are evaluated. e experimental results show that our proposed model achieves acceptable accuracy and efficiency for the phishing detection. 1.Introduction Mobile communication is becoming more and more im- portant within the context of Industry 4.0 [1]. e topmost security concern for mobile services is the phishing attack which can violence all confidential information of the mobile user [2]. Phishing attacks are increasing and evolving from a variety of newer methods despite the use of a number of detection approaches to battle mobile phishing attacks. Wombat Security revealed that 83% of organizations ex- perienced phishing attacks in 2018 [3]. Figures published by the UK cyber security firm Alert Logic cited that phishing attacks, ransomware, and data loss as the top concerns [4]. Moreover, cybercrimes such as advanced persistent threats (APTs) and ransomware often start from phishing [5]. Currently, the phishers certainly try to hide their malicious payloads from a detection system using methods such as emulator detections, applications icon hiding, and reflection. APWG Phishing Attack Trends Reports released in March 2019 said a detection of phishing sites has become harder because phishers were obfuscating phishing URLs with multiple redirections [6]. In the context of machine learning, this phenomenon is known as concept drift and it becomes the main challenge to mobile phishing detections. us, the machine learning classifiers, applied in phishing detection models, must adapt to this concept drift in order to prevent any degradation in their detection accuracy. In earlier phishing detection works, the variation of individual machine learning classification algorithm was applied. Each earlier phishing detection approaches showed an acceptable detection accuracy while using specific feature patterns with selected detection algorithms in their specific application domain [7, 8]. Currently, the usage of individual classification algorithm in phishing detection is developing to a combination of multiple classifiers in the form of en- semble methods to produce a better accuracy with more efficiency [9–11]. Unfortunately, most existing ensemble classification techniques in phishing detection could not afford to adapt automatically on the variation of input feature patterns, and it remains as a challenging issue in the phishing detection works [12]. erefore, finding a way to make automatic adaptation classifiers based on the variation Hindawi Journal of Computer Networks and Communications Volume 2019, Article ID 7198435, 14 pages https://doi.org/10.1155/2019/7198435

Upload: others

Post on 02-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Research ArticleA Case-Based Reasoning Approach for Automatic Adaptation ofClassifiers in Mobile Phishing Detection

San Kyaw Zaw and Sangsuree Vasupongayya

Department of Computer Engineering Faculty of Engineering Prince of Songkla University Hatyai Songkhla 90112 ailand

Correspondence should be addressed to Sangsuree Vasupongayya vsangsurcoepsuacth

Received 5 April 2019 Accepted 28 May 2019 Published 19 June 2019

Guest Editor Arash H Lashkari

Copyright copy 2019 San Kyaw Zaw and Sangsuree Vasupongayya is is an open access article distributed under the CreativeCommons Attribution License which permits unrestricted use distribution and reproduction in any medium provided theoriginal work is properly cited

Currently the smartphone contains lots of sensitive information e increasing number of smartphone usage makes it moreinteresting for phishers Existing phishing detection techniques are performed on their specific features with selected classifiers to gettheir best accuracy An effective phishing detection approach is required to adapt the concept drift of mobile phishing and preventdegradation in accuracy In this work an adaptive phishing detection approach based on case-based reasoning technique is proposedto handle the concept drift challenge in phishing apps Several experiments are conducted in order to demonstrate the design decisionof our proposed model e proposed model is evaluated with a large feature set containing 1065 features from 10 differentcategories ese features are extracted from more than 10000 android applications Five combinations of features are created inorder tomimic new real-world Android apps to evaluate our experiments Moreover a reduced feature set is also studied in this workin order to improve the efficiency of the proposed model Both accuracy and efficiency of the proposed model are evaluated eexperimental results show that our proposed model achieves acceptable accuracy and efficiency for the phishing detection

1 Introduction

Mobile communication is becoming more and more im-portant within the context of Industry 40 [1] e topmostsecurity concern for mobile services is the phishing attackwhich can violence all confidential information of themobileuser [2] Phishing attacks are increasing and evolving from avariety of newer methods despite the use of a number ofdetection approaches to battle mobile phishing attacksWombat Security revealed that 83 of organizations ex-perienced phishing attacks in 2018 [3] Figures published bythe UK cyber security firm Alert Logic cited that phishingattacks ransomware and data loss as the top concerns [4]Moreover cybercrimes such as advanced persistent threats(APTs) and ransomware often start from phishing [5]Currently the phishers certainly try to hide their maliciouspayloads from a detection system using methods such asemulator detections applications icon hiding and reflectionAPWG Phishing Attack Trends Reports released in March2019 said a detection of phishing sites has become harderbecause phishers were obfuscating phishing URLs with

multiple redirections [6] In the context of machine learningthis phenomenon is known as concept drift and it becomesthe main challenge to mobile phishing detections us themachine learning classifiers applied in phishing detectionmodels must adapt to this concept drift in order to preventany degradation in their detection accuracy

In earlier phishing detection works the variation ofindividual machine learning classification algorithm wasapplied Each earlier phishing detection approaches showedan acceptable detection accuracy while using specific featurepatterns with selected detection algorithms in their specificapplication domain [7 8] Currently the usage of individualclassification algorithm in phishing detection is developingto a combination of multiple classifiers in the form of en-semble methods to produce a better accuracy with moreefficiency [9ndash11] Unfortunately most existing ensembleclassification techniques in phishing detection could notafford to adapt automatically on the variation of inputfeature patterns and it remains as a challenging issue in thephishing detection works [12] erefore finding a way tomake automatic adaptation classifiers based on the variation

HindawiJournal of Computer Networks and CommunicationsVolume 2019 Article ID 7198435 14 pageshttpsdoiorg10115520197198435

of input features pattern will improve the key quality criteriaof phishing detection accuracy and efficiency

e main objective of this work is to create a mobilephishing detection system using a case-based reasoningapproach for an automatic adaptation of classifiersaccording to the incoming feature patterns By addressingthe optimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection system could provide thebest performance by combining the good performance of allused methods appropriately An adaptive phishing detectionsystem based on a case-based reasoning (CBR) techniquewhich can handle the concept drift challenge in phishingapps is proposed in this work CBR is applied to construct aphishing detection model A knowledge base or case basewill control the detection algorithm by utilizing phishingfeatures as cases Moreover an experimental analysis toverify that our proposed case-based phishing detection issuitable for handling concept drift of mobile phishing at-tacks than existing detection approaches will be conducted

e rest of the paper is organized as follows In Section 2the background information of phishing attacks onsmartphone is presented e machine learning techniqueson phishing detection and the background of case-basedreasoning which models the ensemble of classifier ap-proaches as cases in a knowledge base are also illustratedNext the overview architecture of the proposed adaptivephishing detection system and their detail processes aredescribed in Section 3 e accuracy and performanceanalysis of the proposed system is presented in Section 4econclusions are described in Section 5

2 Theoretical Background

e background technologies are described in this sectione nature of phishing attacks on smartphones and theirattack techniques are presented in Section 21 Section 22presents the literature review on existing phishing detectionsolutions based on machine learning techniques and theirfrequently used features Lastly case-based reasoning clas-sification techniques are explained in Section 23

21 Phishing on Smartphone Nowadays phishers are mo-tivated to target smartphones due to several different rea-sons A smartphone today is as powerful as a desktop orlaptop computer Smartphones usually contain lots ofsensitive information of their owners e increasingnumber of smartphone usage makes it more interesting forattackers e phishing attack techniques are based on twocategories including application-oriented phishing attacksand website-oriented phishing attacks

e application-oriented phishing attacks can be cate-gorized into two types based on their launching methodsFirst the phishing application attempts to hijack (task in-terception) existing legitimate applications and continuouslyperforms task polling e phishing application will launchitself as long as it detects the launch of the target applica-tions ese task interception attacks are specially based on

the fake graphical user interface (GUI) techniques whichcan easily impersonate and are hard to detect since a largetouch screen is used as the primary user interface on mostsmartphones As a result the fake login interface is layeredover the top of the real one and the phishing app appears tobe the target app Second the phishing application(repackaged applications) can directly present itself as thetargeted legitimate app is event may occur when the userdownloads the fake applications from an unofficial appmarket e website-oriented phishing attacks can also becategorized into two types based on their techniques First aphishing website hides (spoof) the URL bar of the targetedwebsites Second a phishing website attempts to overlay thegenuine website with a crafted pop-up windowe spoofingURL is the process of creating a fake or a forged URL whichimpersonates a legitimate and secure website is kind ofURL spoofing attack is harmful and dangerous because thewebsite looks exactly like the original one [13] e fakewebsite asks the user to enter hisher username passwordcredit card number or other information For a legitimatemobile app that includes an embedded web page served overHTTP or a legitimate mobile app that allows the overlayingpop-up window the network attacker can change the loginbutton on the page or substitute a crafted pop-up window sothat there is a link to a page owned by the attacker When theuser clicks the button the user will be taken to the phishingpage within the embedded web frame is way the attackercan steal the user credentials e attacker can then relay thecredentials to the valid website in order to mimic the normalwork flow

Existing solutions in phishing detection show an ac-ceptable accuracy in their specific domain using their tar-geted features and their specified machine learningtechniques us an effective phishing detection that is lessdependent on the features pattern is still needed in this ageis work aims to propose an adaptive phishing detection bycombining many existing techniques

22 Current Phishing Detection Solutions Existing phishingcountermeasures use techniques such as content filteringvisual matching and blacklist or whitelist matching [14]Content filtering system examines the content of webpagesfor suspected URLs Content filtering can be achieved byidentifying statistical differences between legitimate andsuspected phishing contents or constructing a set of rules[15 16] Visual matching computes a visual similarity be-tween the phishing and the legitimate pages based on theimages blocks and layout [17 18] For a blacklisting systemthe known phishing URLs are listed based on a humanverification method e very low false positive rates will beresulted in this approach For a whitelisting system usersspecify the links of trusted sites and frequently accessedwebsites By contrary other new websites will be suspectedas phishing attacks [15]

Possible phishing attacks on mobiles which can launchduring the control transfers are discussed in [19] An in-dicator for the applications identity upon the navigation barof the system to show the currently running application or

2 Journal of Computer Networks and Communications

the current web page was implemented in [20] e per-sonalized security indicators to mobile apps are proposed in[21] However the user-driven decision-making process isstill needed

e unified and trusted login user interface is used inanother group of antiphishing techniques A softwarekeyboard which can be used safely for login input is providedin [22] For the purpose of handling the credential thehardware and software certificates that are used to confirmthe login is proposed in [23] However these approachesrequire some modifications to the client application and theuser effort An antiphishing system for mobile platforms waspresented in [24] e work was continued in [4] to detectthe persistent account registry phishing attacks ey usedOCR technique and their database needs to save everysnapshot of the protected applications and webpages Usingthe QR code in phishing attacks was demonstrated andanalyzed in [25] ey combined the client-server archi-tecture with a digital signature to perform an integritychecking and authentication However the work only fo-cused on the QR code phishing attacks while the phishingmalware was not considered Phishing Detective [26] wascreated to identify whether or not a link in the user e-mailmight send the user to a phishing page However the workwas totally relied on the blacklist URL of Phish Tank da-tabase it might not be able to satisfy other types of phishingattacks such as activity hijacking and repackaging attacks

MP-Shield [27] is an Android application that aims toinspect the flow of IP packets between the origin and thedestination of mobile user applications eir work mainlyemphasized on the monitoring URL for detection purposese types of phishing attack that can be mounted on mobiledevices were identified in [19] e authors conducted ananalysis of ways in which the mobile applications and theweb sites link to each othere common control transfer onmobile and how phishing attacks can bemounted against thecontrol transfer scenarios were discussed e authorspresented possible types of phishing attacks along with theirlegitimate behaviors as summarized in Table 1

According to Table 1 the mobile sender means a mobileapplication that sends the user to a website or anothermobile application while the web sender means a websitethat sends the user to a mobile application or other web sites

Our work will cover these attack models with ten groupsof selected feature categories Each phishing detection ap-proach showed an acceptable detection accuracy while usingdifferent features Unfortunately majority of phishing de-tections may suffer the lack of features for efficient detectionof phishing malwares An optimized solution which useddifferent kinds of features of Android applications to preventthe phishing and malware on Android smartphone is stillneeded Our work will contribute to the finding of an op-timal solution for mobile phishing detection in the sense ofusing the feature independently with various classifiers

23 Case-Based Reasoning Case-based reasoning (CBR) is aproblem-solving approach that solves new problems byadapting or reusing old solutions that were used to solve

similar problems [28] e past experience or previousproblems are saved as cases and each case contains rep-resentative features characteristics of the problem and itssolution e case base is a collection of these cases eknowledge base of the problem-solving experience is usedfor the new problem solving [29] e solutions in the re-trieved cases are reused as a proposed solution to the newproblem us the solution to the new problem can befound from similar known solution in the past

If the new problem situation is exactly the same as theprevious cases then the reuse is simple CBR systems starttheir reasoning from the knowledge unit called cases whilethe data-mining systems most often start from the raw dataCBR systems also belong to the instance-based learningsystems in the field of machine learning that are defined assystems that are capable of automatically improving theirperformance over time As long as the CBR systems learnnew cases in the retain step they are qualified as the learningsystems thus belonging to the machine learning system [30]e learning process of a case-based reasoning approach isshown in Figure 1

Case-based reasoning system performs the learningprocess as follows

(1) Retrieving the most similar case or cases from thecase base to the new problem

(2) Reusing the previous solutions of the similar cases tosolve the new problem

(3) Revising the proposed solution (if necessary)(4) Retaining the solution of the new case for future

problem solving

A new problem to the system is represented as a case andis compared with existing cases in the case base e mostsimilar case or cases are retrieved based on the similaritycomparison of case representationsese retrieved cases areadapted (ie combined and reused) to propose a solution forthe new problem e suggested solution may need to beevaluated and corrected (ie revised) in some cases if it isnot the best solution is verified solution can be addedback as a new case to the case base (ie retained) or asamendments to existing cases in the case base to be used infuture problem solving [28]

3 Architecture Overview

A case-based reasoning model is proposed as an automaticadaptation of classifiers for mobile phishing detection einformation on how to design the case-based adaptiveclassification system is presented in this section e pro-posed system consists of two main parts including theapplication on Android smartphones and the detectionsystem on the cloud environment Figure 2 shows the overallsystem design

As shown in Figure 2 the feature will be extracted fromthe Android application for the phishing detection processe detailed information of features will be discussed inSection 31 en the extracted features will be sent to thecloud environment for phishing detection processes As the

Journal of Computer Networks and Communications 3

main objective of this work is to enhance the phishingdetection processes the detection will be performed on thestatic and dynamic feature from Android malware dataset(described in Section 41) e detailed process of featureextraction is out of this paper scope

e contribution of our work starts on the receiving of theextracted features by the detection system e first process isto retrieve the most similar case from the case base (whichstored previous Android phishing detection approach alongwith the corresponding features) e case-retrieving processwill be described in Section 33 e case base must be set upbefore the case-retrieving process e case base setting upprocess is shown in Figure 3 e details of the case basesetting up process are presented in the following section

According to the retrieved case the most suitableclassification techniques will be used for the adaptive clas-sification If the feature set extracted from the Androidapplication does not match the sets of features stored in the

case base the adaptive classification will select the suitablemethods to process the extracted feature set according to thesimilarity ratio score e selection of suitable methodsmeans choosing the multiple classifiers for the extractedfeature set Finally the final result of the active Androidapplication will be sent to the application on Androidsmartphone to be displayed to the user

31 Feature for Mobile Phishing Existing antiphishing so-lutions on mobile environments were collected and theirfeatures were extracted to identify a phishing attack Underan Android environment the features can be extracted frommiscellaneous sources such as program entities and programoutputs of the runtime monitoring e list of frequentlyused features by existing antiphishing solutions can beclassified into ten classes including Android componentsAndroid API counts API usage action security-sensitive

Define new problem

Retrieve

Reuse

Evaluate and revise

Suggested solution Corrected solution

Retain

New case added to learn

New

Revised

New problem

Solved

Stored cases

Retrieved

New problem

Case base

Figure 1 Case-based reasoning approach

Android application

Extracted features

Show result

Classifiers

Reuse Case selection

Adaptive classification

CloudAndroid smartphone

Case base

Figure 2 Overall system design

Table 1 Legitimate behaviors and their respective phishing attack techniques

Legitimate behavior Respective attack techniques

Mobile senderSocial sharing upgrades game credits opening a targetin the browser send user to embedded http page in

browser that links to https login

Fake mobile login screen task interception schemesquatting keylogging URL bar hidingspoofing fakebrowser using active network attack plus URL bar

spoofing

Web sender Link to mobile e-mail or Twitter payment via PayPal orGoogle checkout and user follows link from http to https

Website spoofs mobile app task interception schemesquatting URL bar hidingspoofing active network

attack plus URL bar spoofing

4 Journal of Computer Networks and Communications

data flow hardware components intent actions permis-sions shell command and strings contents and visual andURLs e details of each feature are given below

(1) Android components a variety of component typeswith specific functionalities (eg components forproviding GUIs and others for running back-ground services) are declared within an Androidapprsquos manifest and these features are collected in[31ndash33]

(2) API count the number of invocations of a specificAndroid API method (eg the malicious apps ac-cess the location APIs twice and the telephonypackage 8 times) are collected in [4 24 27 32]

(3) API usage actions APIs can be used to developapplications in Android platform and also misusedby malicious purposes ere are many approachesto submit the web requests and to ex-filtrate thecaptured data via the API without the Internetpermission Some existing phishing detection works[27 31 32 34] collect the API calls (eg API calls toaccess the sensitive data API calls to access thenetwork communications API calls to send andreceive the SMS messages API calls to execute theexternal commands and API calls frequently usedfor obfuscation)

(4) Security-sensitive data flows a few approaches forAndroid malware detection [31 34 35] use dataflows between security-sensitive Android interfacesto determine if an app is malicious Tracking thisform of information is particularly useful foridentifying privacy leaks

(5) Hardware components the hardware componentsare listed in AndroidManifestxml that is used in theapp (eg to access the camera an app needs toinclude androidhardwarecamera feature) andthese features are collected in [4 36]

(6) Intent actions Android malwares are known to relyupon tracking of an Intent (eg whether a packageis installed or if a device has recently completedbooting) to determine when to perform a maliciousbehavior ese features are used in [32 36]

(7) Permission specific permissions provided by An-droid to execute some risky operations are acquiredby Android malwares ese features are collectedin [34 37 38]

(8) Shell command and strings the features of in-terested strings associated with malicious behaviorsand potential risky shell commands are collected in[36 39] Some of the structural attributes of APKfile such as size of code presence of zip file binaryfile and related information are also included inthis feature group

(9) Contents and visual the main display channel forthe deception of phishing is the web content whichexpresses the intention of the website ese fea-tures consist of the page elements such as the pagetitle the submitted form and the contained linksSome researchers also extract the logo icon and thecontained pictures from the web page and use animage recognition algorithm to identify thephishing website [16ndash18]

(10) URLs web link features for phishing fraud is col-lected based on five criteria including URL andDomain Identity Security and Encryption SourceCode and Java script Page Style and Contents andWeb Address Bar ese features are collected in[4 13 40]

32 Case Representation A case represents an experience atan operational level Typically a case includes the problemspecification the solution and sometimes the outcome isis the most common representation used However moreelaborate case representations can be employed Dependingon the information included in a case different types ofresults can be achieved from the system Cases that describea problem and its solution can be used to derive solutions tonew problems

In general a case specification is described as a set offeaturese features are those aspects of the domain and theproblem that are considered to be most significant in de-termining the solution andor outcome A case represents anexperience In this situation a case should represent thefeatures of the application that is used to determine aphishing attack

In our model a case includes the combination of featuresets ensemble method of classifiers or individual classifi-cation algorithm with their specific parameters the accuracyand performance of the solution and potential facilitationsA case description stored in the phishing detection system isshown in Table 2

Case base Reuse

Monitor andcomplete the system

Retrieve

Revise

Retain

Identification of potential exception in feature patterns

The most similar caseA set of feature pattern

Calculate the similarities

Figure 3 Setting up case base

Journal of Computer Networks and Communications 5

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

of input features pattern will improve the key quality criteriaof phishing detection accuracy and efficiency

e main objective of this work is to create a mobilephishing detection system using a case-based reasoningapproach for an automatic adaptation of classifiersaccording to the incoming feature patterns By addressingthe optimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection system could provide thebest performance by combining the good performance of allused methods appropriately An adaptive phishing detectionsystem based on a case-based reasoning (CBR) techniquewhich can handle the concept drift challenge in phishingapps is proposed in this work CBR is applied to construct aphishing detection model A knowledge base or case basewill control the detection algorithm by utilizing phishingfeatures as cases Moreover an experimental analysis toverify that our proposed case-based phishing detection issuitable for handling concept drift of mobile phishing at-tacks than existing detection approaches will be conducted

e rest of the paper is organized as follows In Section 2the background information of phishing attacks onsmartphone is presented e machine learning techniqueson phishing detection and the background of case-basedreasoning which models the ensemble of classifier ap-proaches as cases in a knowledge base are also illustratedNext the overview architecture of the proposed adaptivephishing detection system and their detail processes aredescribed in Section 3 e accuracy and performanceanalysis of the proposed system is presented in Section 4econclusions are described in Section 5

2 Theoretical Background

e background technologies are described in this sectione nature of phishing attacks on smartphones and theirattack techniques are presented in Section 21 Section 22presents the literature review on existing phishing detectionsolutions based on machine learning techniques and theirfrequently used features Lastly case-based reasoning clas-sification techniques are explained in Section 23

21 Phishing on Smartphone Nowadays phishers are mo-tivated to target smartphones due to several different rea-sons A smartphone today is as powerful as a desktop orlaptop computer Smartphones usually contain lots ofsensitive information of their owners e increasingnumber of smartphone usage makes it more interesting forattackers e phishing attack techniques are based on twocategories including application-oriented phishing attacksand website-oriented phishing attacks

e application-oriented phishing attacks can be cate-gorized into two types based on their launching methodsFirst the phishing application attempts to hijack (task in-terception) existing legitimate applications and continuouslyperforms task polling e phishing application will launchitself as long as it detects the launch of the target applica-tions ese task interception attacks are specially based on

the fake graphical user interface (GUI) techniques whichcan easily impersonate and are hard to detect since a largetouch screen is used as the primary user interface on mostsmartphones As a result the fake login interface is layeredover the top of the real one and the phishing app appears tobe the target app Second the phishing application(repackaged applications) can directly present itself as thetargeted legitimate app is event may occur when the userdownloads the fake applications from an unofficial appmarket e website-oriented phishing attacks can also becategorized into two types based on their techniques First aphishing website hides (spoof) the URL bar of the targetedwebsites Second a phishing website attempts to overlay thegenuine website with a crafted pop-up windowe spoofingURL is the process of creating a fake or a forged URL whichimpersonates a legitimate and secure website is kind ofURL spoofing attack is harmful and dangerous because thewebsite looks exactly like the original one [13] e fakewebsite asks the user to enter hisher username passwordcredit card number or other information For a legitimatemobile app that includes an embedded web page served overHTTP or a legitimate mobile app that allows the overlayingpop-up window the network attacker can change the loginbutton on the page or substitute a crafted pop-up window sothat there is a link to a page owned by the attacker When theuser clicks the button the user will be taken to the phishingpage within the embedded web frame is way the attackercan steal the user credentials e attacker can then relay thecredentials to the valid website in order to mimic the normalwork flow

Existing solutions in phishing detection show an ac-ceptable accuracy in their specific domain using their tar-geted features and their specified machine learningtechniques us an effective phishing detection that is lessdependent on the features pattern is still needed in this ageis work aims to propose an adaptive phishing detection bycombining many existing techniques

22 Current Phishing Detection Solutions Existing phishingcountermeasures use techniques such as content filteringvisual matching and blacklist or whitelist matching [14]Content filtering system examines the content of webpagesfor suspected URLs Content filtering can be achieved byidentifying statistical differences between legitimate andsuspected phishing contents or constructing a set of rules[15 16] Visual matching computes a visual similarity be-tween the phishing and the legitimate pages based on theimages blocks and layout [17 18] For a blacklisting systemthe known phishing URLs are listed based on a humanverification method e very low false positive rates will beresulted in this approach For a whitelisting system usersspecify the links of trusted sites and frequently accessedwebsites By contrary other new websites will be suspectedas phishing attacks [15]

Possible phishing attacks on mobiles which can launchduring the control transfers are discussed in [19] An in-dicator for the applications identity upon the navigation barof the system to show the currently running application or

2 Journal of Computer Networks and Communications

the current web page was implemented in [20] e per-sonalized security indicators to mobile apps are proposed in[21] However the user-driven decision-making process isstill needed

e unified and trusted login user interface is used inanother group of antiphishing techniques A softwarekeyboard which can be used safely for login input is providedin [22] For the purpose of handling the credential thehardware and software certificates that are used to confirmthe login is proposed in [23] However these approachesrequire some modifications to the client application and theuser effort An antiphishing system for mobile platforms waspresented in [24] e work was continued in [4] to detectthe persistent account registry phishing attacks ey usedOCR technique and their database needs to save everysnapshot of the protected applications and webpages Usingthe QR code in phishing attacks was demonstrated andanalyzed in [25] ey combined the client-server archi-tecture with a digital signature to perform an integritychecking and authentication However the work only fo-cused on the QR code phishing attacks while the phishingmalware was not considered Phishing Detective [26] wascreated to identify whether or not a link in the user e-mailmight send the user to a phishing page However the workwas totally relied on the blacklist URL of Phish Tank da-tabase it might not be able to satisfy other types of phishingattacks such as activity hijacking and repackaging attacks

MP-Shield [27] is an Android application that aims toinspect the flow of IP packets between the origin and thedestination of mobile user applications eir work mainlyemphasized on the monitoring URL for detection purposese types of phishing attack that can be mounted on mobiledevices were identified in [19] e authors conducted ananalysis of ways in which the mobile applications and theweb sites link to each othere common control transfer onmobile and how phishing attacks can bemounted against thecontrol transfer scenarios were discussed e authorspresented possible types of phishing attacks along with theirlegitimate behaviors as summarized in Table 1

According to Table 1 the mobile sender means a mobileapplication that sends the user to a website or anothermobile application while the web sender means a websitethat sends the user to a mobile application or other web sites

Our work will cover these attack models with ten groupsof selected feature categories Each phishing detection ap-proach showed an acceptable detection accuracy while usingdifferent features Unfortunately majority of phishing de-tections may suffer the lack of features for efficient detectionof phishing malwares An optimized solution which useddifferent kinds of features of Android applications to preventthe phishing and malware on Android smartphone is stillneeded Our work will contribute to the finding of an op-timal solution for mobile phishing detection in the sense ofusing the feature independently with various classifiers

23 Case-Based Reasoning Case-based reasoning (CBR) is aproblem-solving approach that solves new problems byadapting or reusing old solutions that were used to solve

similar problems [28] e past experience or previousproblems are saved as cases and each case contains rep-resentative features characteristics of the problem and itssolution e case base is a collection of these cases eknowledge base of the problem-solving experience is usedfor the new problem solving [29] e solutions in the re-trieved cases are reused as a proposed solution to the newproblem us the solution to the new problem can befound from similar known solution in the past

If the new problem situation is exactly the same as theprevious cases then the reuse is simple CBR systems starttheir reasoning from the knowledge unit called cases whilethe data-mining systems most often start from the raw dataCBR systems also belong to the instance-based learningsystems in the field of machine learning that are defined assystems that are capable of automatically improving theirperformance over time As long as the CBR systems learnnew cases in the retain step they are qualified as the learningsystems thus belonging to the machine learning system [30]e learning process of a case-based reasoning approach isshown in Figure 1

Case-based reasoning system performs the learningprocess as follows

(1) Retrieving the most similar case or cases from thecase base to the new problem

(2) Reusing the previous solutions of the similar cases tosolve the new problem

(3) Revising the proposed solution (if necessary)(4) Retaining the solution of the new case for future

problem solving

A new problem to the system is represented as a case andis compared with existing cases in the case base e mostsimilar case or cases are retrieved based on the similaritycomparison of case representationsese retrieved cases areadapted (ie combined and reused) to propose a solution forthe new problem e suggested solution may need to beevaluated and corrected (ie revised) in some cases if it isnot the best solution is verified solution can be addedback as a new case to the case base (ie retained) or asamendments to existing cases in the case base to be used infuture problem solving [28]

3 Architecture Overview

A case-based reasoning model is proposed as an automaticadaptation of classifiers for mobile phishing detection einformation on how to design the case-based adaptiveclassification system is presented in this section e pro-posed system consists of two main parts including theapplication on Android smartphones and the detectionsystem on the cloud environment Figure 2 shows the overallsystem design

As shown in Figure 2 the feature will be extracted fromthe Android application for the phishing detection processe detailed information of features will be discussed inSection 31 en the extracted features will be sent to thecloud environment for phishing detection processes As the

Journal of Computer Networks and Communications 3

main objective of this work is to enhance the phishingdetection processes the detection will be performed on thestatic and dynamic feature from Android malware dataset(described in Section 41) e detailed process of featureextraction is out of this paper scope

e contribution of our work starts on the receiving of theextracted features by the detection system e first process isto retrieve the most similar case from the case base (whichstored previous Android phishing detection approach alongwith the corresponding features) e case-retrieving processwill be described in Section 33 e case base must be set upbefore the case-retrieving process e case base setting upprocess is shown in Figure 3 e details of the case basesetting up process are presented in the following section

According to the retrieved case the most suitableclassification techniques will be used for the adaptive clas-sification If the feature set extracted from the Androidapplication does not match the sets of features stored in the

case base the adaptive classification will select the suitablemethods to process the extracted feature set according to thesimilarity ratio score e selection of suitable methodsmeans choosing the multiple classifiers for the extractedfeature set Finally the final result of the active Androidapplication will be sent to the application on Androidsmartphone to be displayed to the user

31 Feature for Mobile Phishing Existing antiphishing so-lutions on mobile environments were collected and theirfeatures were extracted to identify a phishing attack Underan Android environment the features can be extracted frommiscellaneous sources such as program entities and programoutputs of the runtime monitoring e list of frequentlyused features by existing antiphishing solutions can beclassified into ten classes including Android componentsAndroid API counts API usage action security-sensitive

Define new problem

Retrieve

Reuse

Evaluate and revise

Suggested solution Corrected solution

Retain

New case added to learn

New

Revised

New problem

Solved

Stored cases

Retrieved

New problem

Case base

Figure 1 Case-based reasoning approach

Android application

Extracted features

Show result

Classifiers

Reuse Case selection

Adaptive classification

CloudAndroid smartphone

Case base

Figure 2 Overall system design

Table 1 Legitimate behaviors and their respective phishing attack techniques

Legitimate behavior Respective attack techniques

Mobile senderSocial sharing upgrades game credits opening a targetin the browser send user to embedded http page in

browser that links to https login

Fake mobile login screen task interception schemesquatting keylogging URL bar hidingspoofing fakebrowser using active network attack plus URL bar

spoofing

Web sender Link to mobile e-mail or Twitter payment via PayPal orGoogle checkout and user follows link from http to https

Website spoofs mobile app task interception schemesquatting URL bar hidingspoofing active network

attack plus URL bar spoofing

4 Journal of Computer Networks and Communications

data flow hardware components intent actions permis-sions shell command and strings contents and visual andURLs e details of each feature are given below

(1) Android components a variety of component typeswith specific functionalities (eg components forproviding GUIs and others for running back-ground services) are declared within an Androidapprsquos manifest and these features are collected in[31ndash33]

(2) API count the number of invocations of a specificAndroid API method (eg the malicious apps ac-cess the location APIs twice and the telephonypackage 8 times) are collected in [4 24 27 32]

(3) API usage actions APIs can be used to developapplications in Android platform and also misusedby malicious purposes ere are many approachesto submit the web requests and to ex-filtrate thecaptured data via the API without the Internetpermission Some existing phishing detection works[27 31 32 34] collect the API calls (eg API calls toaccess the sensitive data API calls to access thenetwork communications API calls to send andreceive the SMS messages API calls to execute theexternal commands and API calls frequently usedfor obfuscation)

(4) Security-sensitive data flows a few approaches forAndroid malware detection [31 34 35] use dataflows between security-sensitive Android interfacesto determine if an app is malicious Tracking thisform of information is particularly useful foridentifying privacy leaks

(5) Hardware components the hardware componentsare listed in AndroidManifestxml that is used in theapp (eg to access the camera an app needs toinclude androidhardwarecamera feature) andthese features are collected in [4 36]

(6) Intent actions Android malwares are known to relyupon tracking of an Intent (eg whether a packageis installed or if a device has recently completedbooting) to determine when to perform a maliciousbehavior ese features are used in [32 36]

(7) Permission specific permissions provided by An-droid to execute some risky operations are acquiredby Android malwares ese features are collectedin [34 37 38]

(8) Shell command and strings the features of in-terested strings associated with malicious behaviorsand potential risky shell commands are collected in[36 39] Some of the structural attributes of APKfile such as size of code presence of zip file binaryfile and related information are also included inthis feature group

(9) Contents and visual the main display channel forthe deception of phishing is the web content whichexpresses the intention of the website ese fea-tures consist of the page elements such as the pagetitle the submitted form and the contained linksSome researchers also extract the logo icon and thecontained pictures from the web page and use animage recognition algorithm to identify thephishing website [16ndash18]

(10) URLs web link features for phishing fraud is col-lected based on five criteria including URL andDomain Identity Security and Encryption SourceCode and Java script Page Style and Contents andWeb Address Bar ese features are collected in[4 13 40]

32 Case Representation A case represents an experience atan operational level Typically a case includes the problemspecification the solution and sometimes the outcome isis the most common representation used However moreelaborate case representations can be employed Dependingon the information included in a case different types ofresults can be achieved from the system Cases that describea problem and its solution can be used to derive solutions tonew problems

In general a case specification is described as a set offeaturese features are those aspects of the domain and theproblem that are considered to be most significant in de-termining the solution andor outcome A case represents anexperience In this situation a case should represent thefeatures of the application that is used to determine aphishing attack

In our model a case includes the combination of featuresets ensemble method of classifiers or individual classifi-cation algorithm with their specific parameters the accuracyand performance of the solution and potential facilitationsA case description stored in the phishing detection system isshown in Table 2

Case base Reuse

Monitor andcomplete the system

Retrieve

Revise

Retain

Identification of potential exception in feature patterns

The most similar caseA set of feature pattern

Calculate the similarities

Figure 3 Setting up case base

Journal of Computer Networks and Communications 5

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

the current web page was implemented in [20] e per-sonalized security indicators to mobile apps are proposed in[21] However the user-driven decision-making process isstill needed

e unified and trusted login user interface is used inanother group of antiphishing techniques A softwarekeyboard which can be used safely for login input is providedin [22] For the purpose of handling the credential thehardware and software certificates that are used to confirmthe login is proposed in [23] However these approachesrequire some modifications to the client application and theuser effort An antiphishing system for mobile platforms waspresented in [24] e work was continued in [4] to detectthe persistent account registry phishing attacks ey usedOCR technique and their database needs to save everysnapshot of the protected applications and webpages Usingthe QR code in phishing attacks was demonstrated andanalyzed in [25] ey combined the client-server archi-tecture with a digital signature to perform an integritychecking and authentication However the work only fo-cused on the QR code phishing attacks while the phishingmalware was not considered Phishing Detective [26] wascreated to identify whether or not a link in the user e-mailmight send the user to a phishing page However the workwas totally relied on the blacklist URL of Phish Tank da-tabase it might not be able to satisfy other types of phishingattacks such as activity hijacking and repackaging attacks

MP-Shield [27] is an Android application that aims toinspect the flow of IP packets between the origin and thedestination of mobile user applications eir work mainlyemphasized on the monitoring URL for detection purposese types of phishing attack that can be mounted on mobiledevices were identified in [19] e authors conducted ananalysis of ways in which the mobile applications and theweb sites link to each othere common control transfer onmobile and how phishing attacks can bemounted against thecontrol transfer scenarios were discussed e authorspresented possible types of phishing attacks along with theirlegitimate behaviors as summarized in Table 1

According to Table 1 the mobile sender means a mobileapplication that sends the user to a website or anothermobile application while the web sender means a websitethat sends the user to a mobile application or other web sites

Our work will cover these attack models with ten groupsof selected feature categories Each phishing detection ap-proach showed an acceptable detection accuracy while usingdifferent features Unfortunately majority of phishing de-tections may suffer the lack of features for efficient detectionof phishing malwares An optimized solution which useddifferent kinds of features of Android applications to preventthe phishing and malware on Android smartphone is stillneeded Our work will contribute to the finding of an op-timal solution for mobile phishing detection in the sense ofusing the feature independently with various classifiers

23 Case-Based Reasoning Case-based reasoning (CBR) is aproblem-solving approach that solves new problems byadapting or reusing old solutions that were used to solve

similar problems [28] e past experience or previousproblems are saved as cases and each case contains rep-resentative features characteristics of the problem and itssolution e case base is a collection of these cases eknowledge base of the problem-solving experience is usedfor the new problem solving [29] e solutions in the re-trieved cases are reused as a proposed solution to the newproblem us the solution to the new problem can befound from similar known solution in the past

If the new problem situation is exactly the same as theprevious cases then the reuse is simple CBR systems starttheir reasoning from the knowledge unit called cases whilethe data-mining systems most often start from the raw dataCBR systems also belong to the instance-based learningsystems in the field of machine learning that are defined assystems that are capable of automatically improving theirperformance over time As long as the CBR systems learnnew cases in the retain step they are qualified as the learningsystems thus belonging to the machine learning system [30]e learning process of a case-based reasoning approach isshown in Figure 1

Case-based reasoning system performs the learningprocess as follows

(1) Retrieving the most similar case or cases from thecase base to the new problem

(2) Reusing the previous solutions of the similar cases tosolve the new problem

(3) Revising the proposed solution (if necessary)(4) Retaining the solution of the new case for future

problem solving

A new problem to the system is represented as a case andis compared with existing cases in the case base e mostsimilar case or cases are retrieved based on the similaritycomparison of case representationsese retrieved cases areadapted (ie combined and reused) to propose a solution forthe new problem e suggested solution may need to beevaluated and corrected (ie revised) in some cases if it isnot the best solution is verified solution can be addedback as a new case to the case base (ie retained) or asamendments to existing cases in the case base to be used infuture problem solving [28]

3 Architecture Overview

A case-based reasoning model is proposed as an automaticadaptation of classifiers for mobile phishing detection einformation on how to design the case-based adaptiveclassification system is presented in this section e pro-posed system consists of two main parts including theapplication on Android smartphones and the detectionsystem on the cloud environment Figure 2 shows the overallsystem design

As shown in Figure 2 the feature will be extracted fromthe Android application for the phishing detection processe detailed information of features will be discussed inSection 31 en the extracted features will be sent to thecloud environment for phishing detection processes As the

Journal of Computer Networks and Communications 3

main objective of this work is to enhance the phishingdetection processes the detection will be performed on thestatic and dynamic feature from Android malware dataset(described in Section 41) e detailed process of featureextraction is out of this paper scope

e contribution of our work starts on the receiving of theextracted features by the detection system e first process isto retrieve the most similar case from the case base (whichstored previous Android phishing detection approach alongwith the corresponding features) e case-retrieving processwill be described in Section 33 e case base must be set upbefore the case-retrieving process e case base setting upprocess is shown in Figure 3 e details of the case basesetting up process are presented in the following section

According to the retrieved case the most suitableclassification techniques will be used for the adaptive clas-sification If the feature set extracted from the Androidapplication does not match the sets of features stored in the

case base the adaptive classification will select the suitablemethods to process the extracted feature set according to thesimilarity ratio score e selection of suitable methodsmeans choosing the multiple classifiers for the extractedfeature set Finally the final result of the active Androidapplication will be sent to the application on Androidsmartphone to be displayed to the user

31 Feature for Mobile Phishing Existing antiphishing so-lutions on mobile environments were collected and theirfeatures were extracted to identify a phishing attack Underan Android environment the features can be extracted frommiscellaneous sources such as program entities and programoutputs of the runtime monitoring e list of frequentlyused features by existing antiphishing solutions can beclassified into ten classes including Android componentsAndroid API counts API usage action security-sensitive

Define new problem

Retrieve

Reuse

Evaluate and revise

Suggested solution Corrected solution

Retain

New case added to learn

New

Revised

New problem

Solved

Stored cases

Retrieved

New problem

Case base

Figure 1 Case-based reasoning approach

Android application

Extracted features

Show result

Classifiers

Reuse Case selection

Adaptive classification

CloudAndroid smartphone

Case base

Figure 2 Overall system design

Table 1 Legitimate behaviors and their respective phishing attack techniques

Legitimate behavior Respective attack techniques

Mobile senderSocial sharing upgrades game credits opening a targetin the browser send user to embedded http page in

browser that links to https login

Fake mobile login screen task interception schemesquatting keylogging URL bar hidingspoofing fakebrowser using active network attack plus URL bar

spoofing

Web sender Link to mobile e-mail or Twitter payment via PayPal orGoogle checkout and user follows link from http to https

Website spoofs mobile app task interception schemesquatting URL bar hidingspoofing active network

attack plus URL bar spoofing

4 Journal of Computer Networks and Communications

data flow hardware components intent actions permis-sions shell command and strings contents and visual andURLs e details of each feature are given below

(1) Android components a variety of component typeswith specific functionalities (eg components forproviding GUIs and others for running back-ground services) are declared within an Androidapprsquos manifest and these features are collected in[31ndash33]

(2) API count the number of invocations of a specificAndroid API method (eg the malicious apps ac-cess the location APIs twice and the telephonypackage 8 times) are collected in [4 24 27 32]

(3) API usage actions APIs can be used to developapplications in Android platform and also misusedby malicious purposes ere are many approachesto submit the web requests and to ex-filtrate thecaptured data via the API without the Internetpermission Some existing phishing detection works[27 31 32 34] collect the API calls (eg API calls toaccess the sensitive data API calls to access thenetwork communications API calls to send andreceive the SMS messages API calls to execute theexternal commands and API calls frequently usedfor obfuscation)

(4) Security-sensitive data flows a few approaches forAndroid malware detection [31 34 35] use dataflows between security-sensitive Android interfacesto determine if an app is malicious Tracking thisform of information is particularly useful foridentifying privacy leaks

(5) Hardware components the hardware componentsare listed in AndroidManifestxml that is used in theapp (eg to access the camera an app needs toinclude androidhardwarecamera feature) andthese features are collected in [4 36]

(6) Intent actions Android malwares are known to relyupon tracking of an Intent (eg whether a packageis installed or if a device has recently completedbooting) to determine when to perform a maliciousbehavior ese features are used in [32 36]

(7) Permission specific permissions provided by An-droid to execute some risky operations are acquiredby Android malwares ese features are collectedin [34 37 38]

(8) Shell command and strings the features of in-terested strings associated with malicious behaviorsand potential risky shell commands are collected in[36 39] Some of the structural attributes of APKfile such as size of code presence of zip file binaryfile and related information are also included inthis feature group

(9) Contents and visual the main display channel forthe deception of phishing is the web content whichexpresses the intention of the website ese fea-tures consist of the page elements such as the pagetitle the submitted form and the contained linksSome researchers also extract the logo icon and thecontained pictures from the web page and use animage recognition algorithm to identify thephishing website [16ndash18]

(10) URLs web link features for phishing fraud is col-lected based on five criteria including URL andDomain Identity Security and Encryption SourceCode and Java script Page Style and Contents andWeb Address Bar ese features are collected in[4 13 40]

32 Case Representation A case represents an experience atan operational level Typically a case includes the problemspecification the solution and sometimes the outcome isis the most common representation used However moreelaborate case representations can be employed Dependingon the information included in a case different types ofresults can be achieved from the system Cases that describea problem and its solution can be used to derive solutions tonew problems

In general a case specification is described as a set offeaturese features are those aspects of the domain and theproblem that are considered to be most significant in de-termining the solution andor outcome A case represents anexperience In this situation a case should represent thefeatures of the application that is used to determine aphishing attack

In our model a case includes the combination of featuresets ensemble method of classifiers or individual classifi-cation algorithm with their specific parameters the accuracyand performance of the solution and potential facilitationsA case description stored in the phishing detection system isshown in Table 2

Case base Reuse

Monitor andcomplete the system

Retrieve

Revise

Retain

Identification of potential exception in feature patterns

The most similar caseA set of feature pattern

Calculate the similarities

Figure 3 Setting up case base

Journal of Computer Networks and Communications 5

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

main objective of this work is to enhance the phishingdetection processes the detection will be performed on thestatic and dynamic feature from Android malware dataset(described in Section 41) e detailed process of featureextraction is out of this paper scope

e contribution of our work starts on the receiving of theextracted features by the detection system e first process isto retrieve the most similar case from the case base (whichstored previous Android phishing detection approach alongwith the corresponding features) e case-retrieving processwill be described in Section 33 e case base must be set upbefore the case-retrieving process e case base setting upprocess is shown in Figure 3 e details of the case basesetting up process are presented in the following section

According to the retrieved case the most suitableclassification techniques will be used for the adaptive clas-sification If the feature set extracted from the Androidapplication does not match the sets of features stored in the

case base the adaptive classification will select the suitablemethods to process the extracted feature set according to thesimilarity ratio score e selection of suitable methodsmeans choosing the multiple classifiers for the extractedfeature set Finally the final result of the active Androidapplication will be sent to the application on Androidsmartphone to be displayed to the user

31 Feature for Mobile Phishing Existing antiphishing so-lutions on mobile environments were collected and theirfeatures were extracted to identify a phishing attack Underan Android environment the features can be extracted frommiscellaneous sources such as program entities and programoutputs of the runtime monitoring e list of frequentlyused features by existing antiphishing solutions can beclassified into ten classes including Android componentsAndroid API counts API usage action security-sensitive

Define new problem

Retrieve

Reuse

Evaluate and revise

Suggested solution Corrected solution

Retain

New case added to learn

New

Revised

New problem

Solved

Stored cases

Retrieved

New problem

Case base

Figure 1 Case-based reasoning approach

Android application

Extracted features

Show result

Classifiers

Reuse Case selection

Adaptive classification

CloudAndroid smartphone

Case base

Figure 2 Overall system design

Table 1 Legitimate behaviors and their respective phishing attack techniques

Legitimate behavior Respective attack techniques

Mobile senderSocial sharing upgrades game credits opening a targetin the browser send user to embedded http page in

browser that links to https login

Fake mobile login screen task interception schemesquatting keylogging URL bar hidingspoofing fakebrowser using active network attack plus URL bar

spoofing

Web sender Link to mobile e-mail or Twitter payment via PayPal orGoogle checkout and user follows link from http to https

Website spoofs mobile app task interception schemesquatting URL bar hidingspoofing active network

attack plus URL bar spoofing

4 Journal of Computer Networks and Communications

data flow hardware components intent actions permis-sions shell command and strings contents and visual andURLs e details of each feature are given below

(1) Android components a variety of component typeswith specific functionalities (eg components forproviding GUIs and others for running back-ground services) are declared within an Androidapprsquos manifest and these features are collected in[31ndash33]

(2) API count the number of invocations of a specificAndroid API method (eg the malicious apps ac-cess the location APIs twice and the telephonypackage 8 times) are collected in [4 24 27 32]

(3) API usage actions APIs can be used to developapplications in Android platform and also misusedby malicious purposes ere are many approachesto submit the web requests and to ex-filtrate thecaptured data via the API without the Internetpermission Some existing phishing detection works[27 31 32 34] collect the API calls (eg API calls toaccess the sensitive data API calls to access thenetwork communications API calls to send andreceive the SMS messages API calls to execute theexternal commands and API calls frequently usedfor obfuscation)

(4) Security-sensitive data flows a few approaches forAndroid malware detection [31 34 35] use dataflows between security-sensitive Android interfacesto determine if an app is malicious Tracking thisform of information is particularly useful foridentifying privacy leaks

(5) Hardware components the hardware componentsare listed in AndroidManifestxml that is used in theapp (eg to access the camera an app needs toinclude androidhardwarecamera feature) andthese features are collected in [4 36]

(6) Intent actions Android malwares are known to relyupon tracking of an Intent (eg whether a packageis installed or if a device has recently completedbooting) to determine when to perform a maliciousbehavior ese features are used in [32 36]

(7) Permission specific permissions provided by An-droid to execute some risky operations are acquiredby Android malwares ese features are collectedin [34 37 38]

(8) Shell command and strings the features of in-terested strings associated with malicious behaviorsand potential risky shell commands are collected in[36 39] Some of the structural attributes of APKfile such as size of code presence of zip file binaryfile and related information are also included inthis feature group

(9) Contents and visual the main display channel forthe deception of phishing is the web content whichexpresses the intention of the website ese fea-tures consist of the page elements such as the pagetitle the submitted form and the contained linksSome researchers also extract the logo icon and thecontained pictures from the web page and use animage recognition algorithm to identify thephishing website [16ndash18]

(10) URLs web link features for phishing fraud is col-lected based on five criteria including URL andDomain Identity Security and Encryption SourceCode and Java script Page Style and Contents andWeb Address Bar ese features are collected in[4 13 40]

32 Case Representation A case represents an experience atan operational level Typically a case includes the problemspecification the solution and sometimes the outcome isis the most common representation used However moreelaborate case representations can be employed Dependingon the information included in a case different types ofresults can be achieved from the system Cases that describea problem and its solution can be used to derive solutions tonew problems

In general a case specification is described as a set offeaturese features are those aspects of the domain and theproblem that are considered to be most significant in de-termining the solution andor outcome A case represents anexperience In this situation a case should represent thefeatures of the application that is used to determine aphishing attack

In our model a case includes the combination of featuresets ensemble method of classifiers or individual classifi-cation algorithm with their specific parameters the accuracyand performance of the solution and potential facilitationsA case description stored in the phishing detection system isshown in Table 2

Case base Reuse

Monitor andcomplete the system

Retrieve

Revise

Retain

Identification of potential exception in feature patterns

The most similar caseA set of feature pattern

Calculate the similarities

Figure 3 Setting up case base

Journal of Computer Networks and Communications 5

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

data flow hardware components intent actions permis-sions shell command and strings contents and visual andURLs e details of each feature are given below

(1) Android components a variety of component typeswith specific functionalities (eg components forproviding GUIs and others for running back-ground services) are declared within an Androidapprsquos manifest and these features are collected in[31ndash33]

(2) API count the number of invocations of a specificAndroid API method (eg the malicious apps ac-cess the location APIs twice and the telephonypackage 8 times) are collected in [4 24 27 32]

(3) API usage actions APIs can be used to developapplications in Android platform and also misusedby malicious purposes ere are many approachesto submit the web requests and to ex-filtrate thecaptured data via the API without the Internetpermission Some existing phishing detection works[27 31 32 34] collect the API calls (eg API calls toaccess the sensitive data API calls to access thenetwork communications API calls to send andreceive the SMS messages API calls to execute theexternal commands and API calls frequently usedfor obfuscation)

(4) Security-sensitive data flows a few approaches forAndroid malware detection [31 34 35] use dataflows between security-sensitive Android interfacesto determine if an app is malicious Tracking thisform of information is particularly useful foridentifying privacy leaks

(5) Hardware components the hardware componentsare listed in AndroidManifestxml that is used in theapp (eg to access the camera an app needs toinclude androidhardwarecamera feature) andthese features are collected in [4 36]

(6) Intent actions Android malwares are known to relyupon tracking of an Intent (eg whether a packageis installed or if a device has recently completedbooting) to determine when to perform a maliciousbehavior ese features are used in [32 36]

(7) Permission specific permissions provided by An-droid to execute some risky operations are acquiredby Android malwares ese features are collectedin [34 37 38]

(8) Shell command and strings the features of in-terested strings associated with malicious behaviorsand potential risky shell commands are collected in[36 39] Some of the structural attributes of APKfile such as size of code presence of zip file binaryfile and related information are also included inthis feature group

(9) Contents and visual the main display channel forthe deception of phishing is the web content whichexpresses the intention of the website ese fea-tures consist of the page elements such as the pagetitle the submitted form and the contained linksSome researchers also extract the logo icon and thecontained pictures from the web page and use animage recognition algorithm to identify thephishing website [16ndash18]

(10) URLs web link features for phishing fraud is col-lected based on five criteria including URL andDomain Identity Security and Encryption SourceCode and Java script Page Style and Contents andWeb Address Bar ese features are collected in[4 13 40]

32 Case Representation A case represents an experience atan operational level Typically a case includes the problemspecification the solution and sometimes the outcome isis the most common representation used However moreelaborate case representations can be employed Dependingon the information included in a case different types ofresults can be achieved from the system Cases that describea problem and its solution can be used to derive solutions tonew problems

In general a case specification is described as a set offeaturese features are those aspects of the domain and theproblem that are considered to be most significant in de-termining the solution andor outcome A case represents anexperience In this situation a case should represent thefeatures of the application that is used to determine aphishing attack

In our model a case includes the combination of featuresets ensemble method of classifiers or individual classifi-cation algorithm with their specific parameters the accuracyand performance of the solution and potential facilitationsA case description stored in the phishing detection system isshown in Table 2

Case base Reuse

Monitor andcomplete the system

Retrieve

Revise

Retain

Identification of potential exception in feature patterns

The most similar caseA set of feature pattern

Calculate the similarities

Figure 3 Setting up case base

Journal of Computer Networks and Communications 5

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

To define a new case in case base the input features haveto pass through different machine learning classifiers andthe results from each classifiers are calculated to produce thefinal result en the input features the classifiers withparameters the activation function and the final result arestored in the case base (knowledge base) as a new case eprocess of defining a new case to be stored in the case base isshown in Figure 4

33 Case Retrieval Case-based reasoning (CBR) solves anew problem by retrieving the previously solved problemsand their solutions from a knowledge source of cases calledthe case base ere are challenges related to retrievingprocess that still need to be addressed One issue is thecomputation of similarity which is particularly importantduring the retrieving process e effectiveness of a simi-larity measurement is determined by the usefulness of aretrieved case in solving a new problem

e aim of using the CBR approach is the selection of themost similar past phishing detection cases to the newproblem A set of similar cases is selected from the case baseaccording to a similarity criterion that requires the speci-fication of weights corresponding to attributes e as-sessment of case similarity involves the comparison ofattribute values of the new case and that of the past casesstored in the case base e retrieved old cases are rankedaccording to their similarity scores to the attributes of thenew case In this work the nearest neighbor method isapplied to calculate the similarity score and the total sim-ilarity score of a potentially useful case

34 Adaptive Classification System Design e main ob-jective of case-based adaptive classification is to assign asuitable classification technique to the target case (a featureset extracted from Android application) by identifying andanalysing the training case (sets of features that are stored inthe case base) that is similar e proposed case-basedadaptive classification is shown in Figure 5 If the featureset extracted from the active Android application do notmatch with any set of features stored in the case base (thatmeans the extracted feature set is not complete for the case-retrieving process) the adaptive classification will selectsuitable methods to process the extracted feature set eselection of suitable methods has two options First thepossible features are added to the extracted feature set inorder to perform the case-retrieving process and to choose a

suitable classifier Second multiple classifiers are selected toprocess the extracted incomplete feature set Under thesecond option multiple answers resulted from multipleclassifiers are collected in order to produce a final answer bythe way of weighted sum of all answers

4 Detection Model and Evaluation

is section explains how our detection model performsadaptively on the combination of individual classifiers andensemble classifier To verify that our proposed model canimprove the accuracy of the mobile phishing detection anexperiment is conducted using the feature sets (which hasbeen described in Section 31) e experiment was con-ducted by running Weka 38 on a Laptop computer withcore i7 processor 8 GB RAM and Windows81 64 bitoperating systeme cross-validationmethod is used as anevaluation technique to estimate the error rate efficientlyand in an unbiased way by running repeated percentagesplits Firstly the dataset is divided into 10 pieces Eachpiece is used as a testing dataset in turn while the remaining9 pieces together are used as a training dataset We pre-formed 10 simulations (ie experiments are repeated 10times) en all these results are averaged as a single es-timation result Six of the existing machine learning al-gorithms are chosen from different categories and usedwith 10-fold cross-validation methods to evaluate thevariation of accuracy and efficiency

41 Dataset e features are extracted from more than10000 Android malware samples which are collected fromAndroid malware repositories including VirusShare [41]AndroZoo [42] Droid screening [43] and Reveal droid [44]ere are 76 extracted features of Android componentsincluding 31 features of API counts 82 features of API usageactions 421 features of security-sensitive flows 6 features ofhardware components 109 features of intents 82 features ofpermissions 190 features of malicious shell command andstrings 19 features of content visual and 49 features ofURLs us there are 1065 features in total e in-formation of the feature sets used in this experiment isshown in Table 3

42 Machine Learning Classifiers To detect and classify thephishing applications different machine learning classifi-cation techniques are used with an adaptive method Anadaptive classification system is proposed to automaticallychoose a combination of suitable classifiers for the extractedfeatures of an active Android application Various machinelearning techniques were used as the classifier in existingworks [31 32 34 35] Among them six algorithms wereselected from different categories for the coverage usage ofall classification nature e six algorithms include C45(J48) decision table (DT) k-nearest neighbors (IBK) lo-gistic regression (LR) naive Bayes (NB) and support vectormachine (SVM) According to the pretesting on the effec-tiveness of parameter on these classifiers [45] naive Bayes(NB) classifier with supervised discretization function the

Table 2 Case description for mobile phishing detection system

No Name Value1 Case ID Case identification number2 Feature pattern Combination of feature sets

3Ensemble methods of

classifiers (or) classificationalgorithm

Boostingbaggingbayesian(or) algorithm name andtheir specific parameters

4 Accuracy Percentage of correctclassification

5 Performance Runtime (seconds)

6 Journal of Computer Networks and Communications

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

default maximum number of iterations in logistic regression(LR) the confidence factor of 05 for pruning tree for J48classifier and a 1-nearest neighbors (IBK) classifier arechosen for our experiment SVM and decision table clas-sifiers are used with their default parameters

43 Experimental Results and Analysis e accuracy com-parison of six classifiers on the 10 feature sets is shown inTable 4 e italicized values shown in Table 4 represent themaximum detection accuracy among six classifiers for eachfeature set It can be seen that the accuracy of each

Feature extraction

Decisionmaking

Target APK Classifier 1

Classifier 2

Classifier n

Decisionmaking

Add feature Retrieve and reuse

Displayresult

Figure 5 Adaptive classification

ML algorithm 1 ML algorithm 2 ML algorithm n

Take the maximum accuracy

Result 1 Result 2 Result n

Input feature pattern

Final result

Define and store a case in case base

Figure 4 Case defining process (define a new case and store in case base)

Table 3 Feature sets

No Feature sets Numberof features Example features

1 Android components 76 androidmedia androidmediaeffect androidmediaaudiofx androidservicetextserviceandroidservicenotification

2 API counts 31 account_information account_settings audio bluetooth bluetooth_information

3 API usage actions 82 androidutil androidwidget androidrenderscript androidwebkit androidosandroidosstorage androidcontent

4 Security-sensitive flows 421 system_settings____audio system_settings____phone_connection system_settings____voipsystem_settings____database_information

5 Hardware components 6 androidhardwaredisplay androidhardware androidhardwareusbandroidhardwarelocation androidhardwareinput

6 Intent_action 109 action_main action_view action_default action_attach_data action_editaction_insert_or_edit

7 Permission 82 androidpermissionaccess_cache_filesystem androidpermissionaccess_checkin_propertiesandroidpermissionaccess_coarse_location androidpermissionaccess_gps

8 Shell_command_strings 190 runtimeexec createSubprocess cipher-classes longstring SecretKey methodinvokesmall_code_size

9 Content_visual 19 HostnameLength PathLength QueryLength DoubleSlashInPath NumSensitiveWordsEmbeddedBrandName PctExtHyperlinks

10 URLs 49 having_ip_address url_length shortining_service having_at_symboldouble_slash_redirecting prefix_suffix

Total 1065

Journal of Computer Networks and Communications 7

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

classification algorithm depends on the features IBK canprovide a better accuracy in 6 features and J48 can provide abetter accuracy in other 4 features Our work aims to detectmobile phishing in the nature of feature independent withvarious classifiers To create a real-world application arandom feature combination is created because a newAndroid application can consist of any combination offeatures In this experiment 5 random combinations offeatures are created as shown in Table 5

ese 5 feature combination patterns are tested with in-dividual six classifiers and three models of ensemble classifiersto develop a case for our adaptive model Each model is anensemble of six classifiers with different methods in providingthe final answere final answer findingmethods of ensembleclassifiers include the average of probabilities majority votingand maximum probabilities e detection results for 5 sce-narios of random feature combination sets with the six baseclassifiers and three ensemble classifiers are described in Ta-ble 6 e italicized values shown in Table 6 represent themaximumdetection accuracy of 5 cases among nine classifiers

According to the results shown in Table 6 some featurepatterns are more suitable with ensemble techniques whilesome are better used with individual classification tech-niques It can conclude that the accuracy variation ofclassification techniques in mobile phishing detectionheavily relies on the input features

e adaptive method used in our model will choose themost suitable classification approach for a set of inputfeatures Based on the results presented in Table 6 we candevelop a case to be stored in case base for an adaptive choiceof suitable classifiers e tentative cases for building ourcase-based phishing detection model is shown in Table 7

Performing the classification process on these largenumbers of features takes a long runtimee comparison ofruntime to build the detection model on 6 base classifiersand 3 ensemble approaches before selecting the feature isshown in Table 8

To reduce the detection time some features may beomitted because the features may not provide a high impacton the result erefore some experiments are conducted toselect a set of effective features in order to reduce the numberof required features

44 Selecting the Features Feature selection is necessary toreduce the dimension of the feature space With the aim of

getting the benefits of performing a feature selection tech-nique on a large data set such as reducing an overfittingissue improving accuracy and reducing a processing timetwo feature selection techniques are performed in this ex-periment by comparing their results to get the optimizedresults e process of selecting the features can be describedby the following steps

Let U be the universe of feature sets U D11113864

D2 Dv And the dataset Di isin U with number v ofattributes A be Di A1 A2 Av1113864 1113865 en the attributescan be grouped into feature group FGi as FGi

Aa Ab An1113864 1113865 Some attribute evaluation is performedand selected on the worth of each attribute which be-comes a selected feature set FSi Aa Ab Am1113864 1113865 whereFSi isin FGi

Two methods of feature selection techniques are used inthis experiment to confirm the advantages of selecting thefeatures in phishing detections e first method is acorrelation-based feature selection with a ranker searchmethod that evaluates each attribute and lists the results in aranked order e worth of each attribute is evaluated bymeasuring the correlation (Pearsonrsquos) between it and theclass [46]

Pearsonrsquos correlation coefficient is described in equation(1) where all variables have been standardized e corre-lation between a composite and a class label is a function ofthe number of component variables (attributes) in thecomposite and the magnitude of the intercorrelationsamong them together with the magnitude of the correla-tions between the attributes and the class label

If the correlation between each of the attributes in a testand the class label is known and the intercorrelation be-tween each pair of attributes is given then the correlationbetween a composite test consisting of the summed attri-butes and the class label can be predicted from the followingequation

rzc krzi

k + k(kminus 1)rii

1113969 (1)

where rzc is the correlation between the summed attributesand the class label k is the number of attributes rzi is theaverage of the correlations between the attributes and the classlabel and rii is the average intercorrelation between attributes

We get the ranked attributes listed with their corre-sponding class correlation Some attributes which owned no

Table 4 Accuracy comparison of classifiers on 10 features

Feature sets J48 () DT () IBK () LR () NB () SVM ()1 Android components 9323 8902 9340 9016 8467 87952 API count 9585 9302 9566 9190 8920 85253 APIusage_actions 9520 9186 9532 9197 8902 91244 Flow 9305 9103 9332 8718 8745 83175 Hardware components 8900 8906 8912 8906 8902 89066 Intent_action 8689 8573 8713 8464 8375 85537 Permission 9430 9192 9465 9395 8854 94148 Shell_command_strings 7540 7118 7408 7028 6874 70229 Content_visual 9720 9579 9553 9449 9577 938710 URLs 9603 9324 9718 9399 9298 9380

8 Journal of Computer Networks and Communications

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

or less values on the class correlation measures are elimi-nated e resulting reduced feature sets are shown inTable 9

e second method is an information gain attributeevaluation-based feature selection with a ranker searchmethod Information gain ratio evaluation is calculatedby using the following equations In the attributeevaluation processes I index measures the impurity of Da data partition or a set of training tuples is calculatedusing

I(D) 1minus 1113944m

i1p2i (2)

where pi is the probability that a tuple in D belongs to classCi and is estimated by (|CiD||D|) e sum is computedoverm classes when I index considers a binary split for eachattribute First the case whereA is a discrete-valued attributehaving v distinct values A1 A2 Av1113864 1113865 occurring in D isconsidered e expected information provided by that splitis calculated by

IA(D) 1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times I Dj1113872 1113873 (3)

In this equation Dj represents the observations thatcontain the jth attribute e information gain of a binarysplit on attribute A is calculated by

Gain(A) I(D)minus IA(D) (4)

Information gain ratio attempts to correct the in-formation gain calculation by introducing a split in-formation value e mathematical formulation for splitinformation is provided in

SplitInfoA(D) minus1113944v

i1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|times log2

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

|D|⎛⎝ ⎞⎠ (5)

is value represents the potential information gener-ated by splitting the training dataset D into v partitionscorresponding to the v outcomes of a test on attribute Aegain ratio is defined in

Gain ratio (A) Gain(A)

SplitInfoA(D) (6)

e attribute with the maximum gain ratio is selected asthe highest ranked attribute e low-ranked attributes thatprovide a gain ratio less than 00003 are eliminated Afterperforming the two feature selection techniques on the datasetthe reduced feature sets are generated as shown in Table 9

e same detection experiments are conducted with 9classifications on each selected feature set e detectionresults of 5 cases on selected feature sets are described inTables 10 and 11 In this experiment 9 classification ap-proaches with their related parameters are set up as the sameas that of previous experiments (described in Section 42)

According to the results of the reduced datasets with acorrelation attribute evaluation method shown in Table 10 theclassification approaches with the best detection accuracy areslightly changed in 2 cases (feature patterns 3 and 4) Featurepattern 3 is a combination of API count API usage Intent andHardwaree italicized values shown in Table 10 represent themaximum detection accuracy of 5 cases among nine classifierse highest detection accuracy is now provided by ensembleswith AVG and MAJ final answer methods while the highestdetection accuracy is provided by ensembles with the AVGfinal answermethod when full feature set is usede detectionaccuracy is slightly increased for most classifiers in featurepattern 4 which is a combination of flows and Intents features

According to the results shown in Table 11 of the re-duced datasets with an information gain attribute evaluation

Table 5 Scenarios for random combinations of features

Case ID Feature pattern Combination of feature sets Number of features01 Pattern 1 API count +API usage + hardware 11202 Pattern 2 API count + intent 13903 Pattern 3 API count +API usage + intent + hardware 22004 Pattern 4 Flow+ intent 52905 Pattern 5 Flow+ intent +API usage + hardware 610

Table 6 Detection accuracy of 5 scenarios on randomly combined feature patterns

Case ID J48 () DT () IBK () LR () NB () SVM () AVG () MAJ () MAX ()01 9593 9307 9545 9247 8942 9162 9531 9531 928702 9472 9162 9404 9018 8644 8927 9426 9420 913803 9632 9267 9560 9489 9069 9257 9643 9641 943104 9056 8638 9045 8851 8155 8788 9064 9064 885205 9533 8969 9437 9397 9228 9161 9568 9569 9268

Table 7 Tentative cases for mobile phishing detection system

Case ID Featurepattern

Adaptivemethod Accuracy () Run time

(seconds)1 Pattern 1 J48 9593 4432 Pattern 2 J48 9472 4543 Pattern 3 AVG 9643 95184 Pattern 4 AVG MAJ 9064 1744 amp 17465 Pattern 5 MAJ 9569 20550

Journal of Computer Networks and Communications 9

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

method the detection accuracy is increased in 4 cases(feature patterns 1 3 4 and 5) e italicized values shownin Table 11 represent the maximum detection accuracy of 5cases among nine classifiers Moreover the classificationapproaches which produced the best detection accuracy arechanged in 3 cases (feature patterns 3 4 and 5) at is anensemble with AVG final answer finding method providesthe best accuracy for feature patterns 3 4 and 5

e detection accuracy percentages of 5 cases by usingdifferent algorithms are comparatively described in Figure 6is figure represented the detection results from Tables 610 and 11 Each case is represented in 3 situations such as nofeatures selection after correlation attribute evaluationfeature selection and after information gain attributesevaluation feature selectionere are 15 points in the figurerepresenting the 5 cases with 3 conditions e best classifierfor case 01 and case 02 is J48 classifier while ensembleclassifier AVG is the best one for case 03 case 04 and case05 e cases with the best algorithm are used in the case-based reasoning detection method

With the aim of highlighting the performance of featureselection techniques the runtime results of reduced featuresets are collected as described in Tables 12 and 13 e

information gain attribute evaluation method results in alarge number of features than the correlation attributeevaluation method e runtime of the information gainattribute evolution method is also slightly larger than that ofthe correlation attribute evaluation method

e runtime on 5 cases by selecting the features areshowed in Figure 7 is figure compared the runtime fromTables 8 12 and 13 ere are 15 points in the figurerepresenting the 5 cases with 3 conditions

Selecting the features with the information gain attributeevaluation approach is applied on our feature sets to im-prove our model for better accuracy and efficiency epercentages of detection accuracy on 4 feature patterns areimproved as shown in Table 11 while the performances ofthe detection on all feature patterns are improved as shownin Table 13 Table 14 shows the comparison of accuracy andefficiency of full feature sets and reduced feature sets of ourproposed adaptive model e italicized values shown inTable 14 represent the accuracy values when a reducedfeature set is used and the accuracy values are improvedover their counterpart when a full feature set is used

e phishing malware detection task is an imbalancedclassification problem at is there are two classes to be

Table 8 Runtime comparison of 5 scenarios on 9 classification approaches (in seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 443 1863 001 193 064 301 2987 2952 261602 454 3180 00001 257 066 3924 7620 7635 759003 944 5880 001 722 114 1894 9518 9609 974104 1209 14832 00001 528 139 625 1744 1746 1746105 1709 16714 001 786 193 361 20362 20550 20351

Table 9 Information of selected feature sets for 5 cases

CaseID

Feature combinationpattern

Features before featureselection

Features selected by Pearsonrsquoscorrelation

Features selected by informationgain

01 Pattern 1 112 96 10002 Pattern 2 139 114 12003 Pattern 3 220 180 18504 Pattern 4 529 164 26505 Pattern 5 610 227 250

Table 10 Detection accuracy of 5 cases after correlation attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9587 9310 9545 9247 8942 9161 9531 9532 928202 9468 9153 9404 9018 8644 8928 9424 9418 913303 9637 9270 9560 9490 9069 9257 9638 9638 943704 9073 8651 9045 8851 8155 8789 9073 9072 886405 9538 8954 9437 9396 9228 9161 9568 9569 9272

Table 11 Detection accuracy of 5 cases after information gain attribute evaluation feature selection

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 9596 9310 9562 9254 8942 9166 9537 9537 928302 9466 9153 9416 9015 8644 8919 9419 9412 913003 9638 9259 9555 9502 9069 9261 9645 9645 943404 9052 8636 9024 8870 8155 8786 9077 9076 886905 9546 8944 9450 9398 9228 9156 9580 9579 9282

10 Journal of Computer Networks and Communications

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

identified including phishing and benign with one categoryrepresenting the overwhelming majority of the data pointsIn these cases the positive class ldquophishingrdquo is greatly out-numbered by the negative class ese types of problems areexamples of the fairly common case in the data science whenthe accuracy is not a good measure for assessing the modelperformance Intuitively proclaiming all data points asnegative in the phishing detection problem is not helpfuland instead we should focus on identifying the positivecases

In order to assess the effectiveness of our proposedmodel the confusion matrix evaluation is applied accuracyprecision and sensitivity While sensitivity expresses the

ability of a model to find all relevant instances in the datasetprecision expresses the proportion of the instances that ourmodel predicts as positive and they are actually positive efollowing formulas represent their definitions

Accuracy TP + TN

TP + FP + TN + FN

Precision TP

TP + FP

Sensitivity TP

TP + FN

(7)

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

8081828384858687888990919293949596979899

100

Acc

urac

y (

)

Figure 6 Accuracy comparison of 9 classifiers on 5 cases before and after feature selection

Table 12 Runtime comparison after correlation attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 394 1836 001 197 064 338 2745 2692 271802 384 2535 00001 243 050 3946 7207 7212 711903 806 4585 001 720 103 1955 8310 8352 833404 560 4475 00001 515 056 627 6195 6199 620005 884 6988 00001 765 100 320 9023 9006 9045

Table 13 Runtime comparison after information gain attribute evaluation feature selection (seconds)

Case ID J48 DT IBK LR NB SVM AVG MAJ MAX01 405 2025 001 163 054 302 2902 2749 272702 386 2943 0001 235 056 3196 6763 6652 664003 977 5546 001 631 095 1706 8731 9063 906904 683 8736 001 225 093 676 10286 9321 931505 842 10452 0001 537 121 395 11180 10753 10807

Journal of Computer Networks and Communications 11

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

True positive (TP) is the amount of correct positiveprediction false positive (FP) is the incorrect positiveprediction true negative (TN) is the amount of correctnegative prediction and false negative (FN) is the amount ofincorrect negative prediction ese four outcomes form theconfusion matrix as shown in Figure 8

e evaluation of effectiveness on our proposed modelby means of accuracy precision and sensitivity is describedin Table 15 According to the results shown in Table 15 ouradaptive model achieves a good detection accuracy for thephishing features Meanwhile the performance of all theclassifiers gets an acceptable precision and sensitivity ratioAccording to the previous experiments our adaptivephishing detection model using case-based reasoning canperform well on the diversely distributed features

5 Conclusions

An adaptive mobile phishing detection model based on avariation of input feature patterns using a case-based rea-soning (CBR) technique is proposed in this work An ex-perimental analysis is conducted to demonstrate the design

decision of our model and to verify the performance of ourproposed model in handling the concept drift of mobilephishing attacks e proposed model is evaluated with alarge feature set that contains 1065 features from 10 feature

Case

01

no

f- se

lect

ion

Case

01

corr

elat

ion

Case

01

info

gai

n

Case

02

no

f- se

lect

ion

Case

02

corr

elat

ion

Case

02

info

gai

n

Case

03

no

f- se

lect

ion

Case

03

corr

elat

ion

Case

03

info

gai

n

Case

04

no

f- se

lect

ion

Case

04

corr

elat

ion

Case

04

info

gai

n

Case

05

no

f- se

lect

ion

Case

05

corr

elat

ion

Case

05

info

gai

n

J48DTIBK

LRNBSVM

AVGMAJMAX

0

50

100

150

200

Runt

ime (

seco

nds)

Figure 7 Runtime comparison of 9 classifiers on 5 cases before and after feature selection

Table 14 Accuracy and efficiency of proposed adaptive model

Case ID Adaptive (before) Adaptive (after) Accuracy (before) Accuracy (after) Runtime (before) Runtime (after)01 J48 J48 9593 9596 443 40502 J48 J48 9472 9466 454 38603 AVG AVG MAJ 9643 9645 9518 8731 amp 906304 AVG MAJ AVG 9064 9077 1744 amp 1746 1028605 MAJ AVG 9569 9580 20550 11180

Negative

Predicted

NegativeActu

al

Positive

Positive

FP

TP

TN

FN

Figure 8 Confusion matrix

Table 15 Detection results achieved by the proposed model

Case Classifier Accuracy () Precision () Sensitivity ()01 J48 9596 83 7902 J48 9466 87 8603 AVG 9645 92 7504 AVG 9077 84 6205 AVG 9580 90 74

12 Journal of Computer Networks and Communications

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

groups which are frequently collected from Android appsMoreover 5 cases of randomly combined patterns of fea-tures are created in order to provide a diversity of unknownpatterns to mimic new real-world mobile apps Six classi-fication algorithms are chosen from different categories forthe coverage usage of all classification nature on the di-version of feature sets ree ensembles of six base classifiersare used each of which uses different final answer-findingmethods including average majority voting and maximumIn total there are 9 classifiers Due to the involvement ofefficient features in the dataset and the uses of multipleclassifiers the efficiency degradation happened To over-come this hurdle 2 feature selection techniques are appliedon the dataset in order to reduce the size of the featureswhich is the size of the input to the classifiers e twofeature selection techniques used are information gain at-tribute evaluation method and Pearsonrsquos correlation co-efficient attribute evaluation method By addressing theoptimal selection of the suitable classifier to the incomingfeatures using a case-based reasoning approach the pro-posed mobile phishing detection model could provide anaccuracy improvement with an acceptable runtimeincrement

Data Availability

e dataset of the features used in this research is availablefrom the authors upon request

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Higher Education Re-search Promotion and the ailandrsquos Education Hub forSouthern Region of ASEAN Countries Project Office of theHigher Education Commission

References

[1] W Paul H A Manolian and S Lapper ldquoinking digital inindustry 40rdquo Deloitte Insights September 2018 httpswww2deloittecominsightsusenfocusindustry-4-0digital-leaders-in-manufacturing-fourth-industrial-revolutionhtml

[2] ldquoSpam and phishing in Q2 2018rdquo Securelist-Kaspersky LabrsquosCyberthreat Research and Reports 2018

[3] Proofpoint Security Awareness ldquo2019 state of the phish re-portrdquo March 2019 httpswwwwombatsecuritycomstate-of-the-phish

[4] L Wu X Du and J Wu ldquoEffective defense schemes forphishing attacks on mobile computing platformsrdquo IEEETransactions on Vehicular Technology vol 65 no 8pp 6678ndash6691 2016

[5] M Moghimi and A Y Varjani ldquoNew rule-based phishingdetection methodrdquo Expert Systems with Applications vol 53pp 231ndash242 Jul 2016

[6] Baunfirecom and SparkCMS ldquoAPWG phishing attack trendsreport-4Q 2018rdquo Anti-PhishingWorking GroupMarch 2019httpswwwantiphishingorgresourcesapwg-reports

[7] R Basnet S Mukkamala and A H Sung ldquoDetection ofphishing attacks a machine learning approachrdquo in SoftComputing Applications in Industry B Prasad Ed pp 373ndash383 Springer Berlin Heidelberg Berlin Heidelberg 2008

[8] A K Jain and B B Gupta ldquoComparative analysis of featuresbased machine learning approaches for phishing detectionrdquoin Proceedings of the 2016 3rd International Conference onComputing for Sustainable Global Development (INDIACom)pp 2125ndash2130 New Delhi India March 2016

[9] F Toolan and J Carthy ldquoPhishing detection using classifierensemblesrdquo in Proceedings of the 2009 eCrime ResearchersSummit pp 1ndash9 Tacoma WA USA October 2009

[10] H S Hota A K Shrivas and R Hota ldquoAn ensemble modelfor detecting phishing attack with proposed remove-replacefeature selection techniquerdquo Procedia Computer Sciencevol 132 pp 900ndash907 2018

[11] A Comparative Study of Phishing Websites ClassificationBased on Classifier Ensembles ResearchGate BerlinGermany 2019 httpswwwresearchgatenetpublication325483941_A_Comparative_Study_of_Phishing_Websites_Classification_Based_on_Classifier_Ensembles

[12] W Wang Y Li X Wang J Liu and X Zhang ldquoDetectingAndroid malicious apps and categorizing benign apps withensemble of classifiersrdquo Future Generation Computer Systemsvol 78 pp 987ndash994 2018

[13] A Aleroud and L Zhou ldquoPhishing environments techniquesand countermeasures a surveyrdquo Computers and Securityvol 68 pp 160ndash196 2017

[14] H Shahriar T Klintic and V Clincy ldquoMobile phishing at-tacks and mitigation techniquesrdquo Journal of InformationSecurity vol 6 no 3 pp 206ndash212 2015

[15] T M Mahmoud and A M Mahfouz ldquoSMS spam filteringtechnique based on artificial immune systemrdquo InternationalJournal of Computer Science Issues vol 9 no 1 pp 589ndash5972012

[16] J W Yoon H Kim and J H Huh ldquoHybrid spam filtering formobile communicationrdquo Computers and Security vol 29no 4 pp 446ndash459 2010

[17] C H Hsu P Wang and S Pu ldquoIdentify fixed-path phishingattack by STCrdquo in Proceedings of the 8th Annual Collabo-ration Electronic Messaging Anti-Abuse and Spam Confer-ence pp 172ndash175 Perth Australia September 2011

[18] E Medvet E Kirda and C Kruegel ldquoVisual-similarity-basedphishing detectionrdquo in Proceedings of the 4th InternationalConference on Security and Privacy in CommunicationNetworks Istanbul Turkey September 2008

[19] A P Felt and D Wagner Phishing on Mobile DevicesUniversity of California Berkeley CA USA 2011

[20] A Bianchi J Corbetta L Invernizzi Y FratantonioC Kruegel and G Vigna ldquoWhat the app is that Deceptionand countermeasures in the android user interfacerdquo in Pro-ceeding of the 2015 IEEE Symposium on Security and Privacypp 931ndash948 San Jose CA USA May 2015

[21] C Marforio R J Masti C Soriente K Kostiainen andS Capkun ldquoPersonalized security indicators to detect ap-plication phishing attacks in mobile platformsrdquo February2015 httparxivorgabs150206824

[22] D Liu E Cuervo V Pistol R Scudellari and L P CoxldquoScreenPass secure password entry on touchscreen devicesrdquoin Proceeding of the 11th Annual International Conference on

Journal of Computer Networks and Communications 13

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Mobile Systems Applications and Services pp 291ndash304Taipei Taiwan June 2013

[23] D Liu and L P Cox ldquoVeriUI Attested Login for MobileDevicesrdquo in Proceedings of the 15th Workshop on MobileComputing Systems and Applications Santa Barbara CAUSA February 2014

[24] L Wu X Du and J Wu ldquoMobiFish A lightweight anti-phishing scheme for mobile phonesrdquo in Proceedings of the2014 23rd International Conference on Computer Commu-nication and Networks (ICCCN) pp 1ndash8 Shanghai ChinaAugust 2014

[25] V Mavroeidis and M Nicho ldquoQuick response code secure acryptographically secure anti-phishing tool for QR code at-tacksrdquo in Computer Network Security pp 313ndash324 2017

[26] ldquoPhishing detective-apps on Google Playrdquo March 2018httpsplaygooglecomstoreappsdetailsidcomrsoftrandroidphishingdetectiveads

[27] G Bottazzi E Casalicchio D Cingolani F Marturana andM Piu ldquoMP-Shield A framework for phishing detection inmobile devicesrdquo in Proceedings of the 2015 IEEE InternationalConference on Computer and Information Technology Ubiq-uitous Computing and Communications Dependable Auto-nomic and Secure Computing Pervasive Intelligence andComputing pp 1977ndash1983 Liverpool UK October 2015

[28] M M Richter and R O Weber Case-Based ReasoningSpringer Berlin Heidelberg Berlin Heidelberg 2013

[29] S Craw N Wiratunga and R C Rowe ldquoLearning adaptationknowledge to improve case-based reasoningrdquo Artificial In-telligence vol 170 no 16-17 pp 1175ndash1192 Nov 2006

[30] S Begum M U Ahmed P Funk N Xiong and M FolkeldquoCase-based reasoning systems in the health sciences a surveyof recent Trends and developmentsrdquo IEEE Transactions onSystems Man and Cybernetics Part C (Applications andReviews) vol 41 no 4 pp 421ndash434 Jul 2011

[31] S Arzt ldquoFlowDroid precise context flow field object-sensitive and lifecycle-aware taint analysis for androidappsrdquo in Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementationpp 259ndash269 New York NY USA June 2014

[32] L Li A Bartel T F Bissyande et al ldquoIccTA detecting inter-component privacy leaks in android appsrdquo in Proceedings ofthe 37th International Conference on Software Engineeringvol 1 pp 280ndash291 Piscataway NJ USA May 2015

[33] Obfuscation-resilient efficient and accurate detection andfamily identification of android malwaremdashsemanticscholarrdquo March 2018 httpspaperObfuscation-Resilient2C-Efficient2C-and-Accurate-and-Garcia-Hammad959093db69abc3b0fb4f7acc696a7f6ef39d0e23

[34] W Enck ldquoTaintDroid an information-flow tracking systemfor realtime privacy monitoring on smartphonesrdquo Trans-actions on Computer Systems vol 32 no 2 2014

[35] M I Gordon D Kim J Perkins L Gilham N Nguyen andM Rinard ldquoInformation-flow analysis of android applicationsin DroidSaferdquo in Proceedings of the Network and DistributedSystem Security Symposium San Diego CA USA February2015

[36] D Arp M Spreitzenbarth H Gascon and K Rieck ldquoDrebineffective and explainable detection of android malware inyour pocketrdquo in Proceedings of the 2014 Network and Dis-tributed System Security Symposium San Diego CA USAFebruary 2014

[37] N Peiravian and X Zhu ldquoMachine learning for androidmalware detection using permission and API callsrdquo in Pro-ceedings of the 2013 IEEE 25th International Conference on

Tools with Artificial Intelligence pp 300ndash305 Herndon VAUSA November 2013

[38] V Avdiienko K Kuznetsov A Gorla et al ldquoMining apps forabnormal usage of sensitive datardquo in Proceedings of the 37thInternational Conference on Software Engineering vol 1pp 426ndash436 Florence Italy May 2015

[39] H V Nath and B M Mehtre ldquoStatic malware analysis usingmachine learning methodsrdquo in Recent Trends in ComputerNetworks and Distributed Systems Security pp 440ndash450 2014

[40] N Aburarsquoed H Otrok R Mizouni and J Bentahar ldquoMobilephishing attack for Android platformrdquo in Proceedings of the2014 10th International Conference on Innovations in In-formation Technology (IIT) pp 18ndash23 Abu Dhabi UAENovember 2014

[41] VirusSharecom httpsvirussharecom[42] K Allix T F Bissyande J Klein and Y Le Traon ldquoAndrozoo

collecting millions of android apps for the research com-munityrdquo in Proceedings of the 13th International Conferenceon Mining Software Repositories pp 468ndash471 Austin TXUSA May 2016

[43] J Yu Q Huang and C Yian ldquoDroidScreening a practicalframework for real-world Android malware analysisrdquo Secu-rity and Communication Networks vol 9 no 11pp 1435ndash1449

[44] JoshuagaRevealdroidmdashBitbucket httpsbitbucketorgjoshuagarevealdroidsrcmaster

[45] S Kyaw Zaw and S Vasupongayya ldquoRevealing the importantfeatures of mobile phishingrdquo in Proceedings of the 13th In-ternational Conference on Knowledge Information and Cre-ativity Support Systems (KICSS 2018) pp 222ndash226 Pattayaailand November 2018

[46] M A Hall and L A Smith ldquoFeature subset selection acorrelation based filter approachrdquo Progress in Connectionist-based Information Systems vol 2 pp 855ndash858 1997

14 Journal of Computer Networks and Communications

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom