a graph based framework for malicious insider threat detection€¦ · tacks. moreover, “attacks...
TRANSCRIPT
A Graph Based Framework for Malicious Insider Threat Detection
Anagi Gamachchi, Li Sun and Serdar BoztasSchool of Science, RMIT University, Melbourne, Victoria, Australia
(anagi.gamachchi, li.sun, serdar.boztas)@rmit.edu.au
Abstract
While most security projects have focused on fend-ing off attacks coming from outside the organizationalboundaries, a real threat has arisen from the peoplewho are inside those perimeter protections. Insiderthreats have shown their power by hugely affectingnational security, financial stability, and the privacyof many thousands of people. What is in the newsis the tip of the iceberg, with much more going onunder the radar, and some threats never being detected.We propose a hybrid framework based on graphicalanalysis and anomaly detection approaches, to com-bat this severe cyber security threat. Our frameworkanalyzes heterogeneous data in isolating possible ma-licious users hiding behind others. Empirical resultsreveal this framework to be effective in distinguishingthe majority of users who demonstrate typical behav-ior from the minority of users who show suspiciousbehavior.
1. Introduction
The battle between malicious but trusted insidersand organizations in safeguarding information assets isthe biggest and fastest growing cyber security threatin this digital age. Insider threat, or the threat from amalicious insider who is defined as “a current or formeremployee, contractor or business partner who has, orhad, authorized access to an organization’s network,system or data and intentionally exceeded or misusedthat access in a manner that negatively affected theconfidentiality, integrity or availability of the organiza-tion’s information or information systems” [7] has beenidentified as a primary concern within the cybersecu-rity community. Financial loss and reputation damagecaused by this “known unknown” cybersecurity threatfar outweighs that caused by external attacks. Thus,the majority of private and governmental organizationshave identified the severity of the threat posed byinsiders and focused on security control improvements.However, the seriousness of this problem is still risingat an alarming rate threatening most critical infrastruc-ture segments.
One of the most recent articles from CSO magazine[1] compared the cost between external and internal
attacks and noted that while it takes about 50 daysto fix a data breach caused by an internal attack, itonly takes 2 to 5 days in the case of external at-tacks. Moreover, “attacks by malicious insiders are alsothe costliest to fix ($145,000), followed by denial ofservice ($127,000) and Web-based attacks ($96,000)”,indicating the severity of this problem.
The unpredictable nature of human behavior makesthis complex issue much more complicated than ex-pected. This is aggravated by the mobility and hyper-connectivity of people. Insider threat research and sur-veys suggest this problem cannot be considered only asa data driven problem; it needs to be considered as dataand behavior driven problem [5]. A close examinationof user behavior can spot trends and such informa-tion can be used in tightening radars on suspicioususers. Different parameters govern users’ day to dayactions, and behavioral changes exposed in workplaceenvironments will extend the possibilities of isolatingsuspicious users from the rest of the employees. Orga-nizations can suffer after effects such as unmotivatedemployees, inefficient work behavior, if an innocentuser is classified as suspicious. Thus, the decision ofnaming a person as suspicious should be a smooth butcomplex process. Obviously, insider threat detectionwill focus on isolating suspicious users from the others;but it may not be practical to point an employee asa malicious attacker. Also, the effectiveness of theprocess totally depends on the ability of analysis ofmany parameters as possible. Consideration of abovefacts led us to think of an insider threat detectionframework as described in the rest of the paper.Our Contribution: In this paper, we propose a frame-work for isolation of malicious users based on graphi-cal and anomaly detection techniques. The proposedarchitecture is given in Figure 1, and has two ma-jor components “Graphical Processing Unit” (GPU)and “Anomaly Detection Unit” (ADU). Data frommultidimensional sources of an enterprise network isformatted and fed into the GPU, which generates agraph which represents interrelationships between in-formational assets of the network. These input streamscan be from different, i.e., heterogeneous, informa-tional sources with different data formats. These datastreams can be from event logs (logon/logoff), emaillogs, HTTP records, social network access data and
2638
Proceedings of the 50th Hawaii International Conference on System Sciences | 2017
URI: http://hdl.handle.net/10125/41475ISBN: 978-0-9981331-0-2CC-BY-NC-ND
Logon/Logoff Records
Removable Media Usage Records
Psychometric Data
HTTP Access Records
Graphical Processing Unit
Graph/Sub graph Attribute Extraction
Generate Original Graph
Induced Sub Graphs
Anomaly Detection Unit
Isolation Forest Algorithm
Anomaly Scores for Individual Users
Graph/Sub graph Attributes
Figure 1. Framework
various HR records such as psychometric data. Oncethe informational assets are mapped into a network,several graph parameters are calculated for each user.Since the final goal is to isolate the most anomaloususers from the rest, all the attributes are computedfor individual users. The next task of the GPU is togenerate induced subgraphs of each user for differ-ent levels of neighborhoods. Several relevant subgraphproperties (vertex count, edge count, density, diameterand number of peers) are calculated for each level ofsubgraphs. Calculated graph and subgraph parametersare fed into the ADU. In parallel to the above process,time-varying data also fed into the ADU. The isolationforest algorithm is executed for isolating anomaloususers within the ADU unit. Anomaly scores for eachuser is generated as the output of the ADU. Thesevalues are used in identifying and separating possiblemalicious users from the rest of the workforce.
The remainder of this paper is organized as follows.Section 2 describes mostly related work contributedto insider threat detection research. Section 3 is thedataset we used in this research while section 4 de-scribed the adopted methodology. Experimental resultsare discussed in section 5. Section 6 concludes thepaper indicating conclusions and future directions.
2. Related Work
Intrusion Detection System (IDS) based ap-proaches, visualization strategies, honeypot/honeynetapproaches and system call based methods are severaltechniques adopted from external threat detection infinding solutions for insider problem [17]. The workpresented here is focused on the combination of graph-based techniques and anomaly detection approaches.This section will only discuss the most related workunder the above two techniques specifically in theinsider threat domain.
The specialized network anomaly detection(SNAD) model proposed by [6] is a graph basedapproach used for detecting insider actions in
collaborative information systems. In their modelaccess logs are mapped into a bipartite graph. Thesimilarity of users are compared based on the numberof subjects a user accesses from a collaborativeinformation system using the cosine similaritymeasure. In order to determine if a particular access isanomalous or not, the authors considered the influenceof a user on the similarity of the access network bysuppressing each user at a time. Even though theyachieved better performance over their competitors(spectral anomaly detection model), they identifieddifficulties in implementation on real world networks.
Another study [4] proposed a proactive insiderthreat detection by graph learning and psychologicalmodeling of users. The proposed model is a combi-nation of structural anomaly detection and psycholog-ical profiling and explored the possibility of includingdynamical properties of nodal attributes. They haveevaluated results based on a publicly available gamingdata set which might not be very similar to enterprisesystem and network data. Althebyn and Panda [2] havealso suggested the use of graph theory to formalize twocomponents, knowledge graph and object dependencygraphs. A knowledge graph represents knowledge unitsfor a given insider and they are updated over the time.Dependency graph is a global hierarchical graph thatshows all dependencies among various objects. Eventhough this model tries to include accumulated knowl-edge of the insider over time on systems and objects itcan be improved by including several other parameterssuch as user’s behavioral patterns and psychologicalaspects.
Another study performed by Nance and Marty [12]introduced the use of bipartite graphs for identifyingand visualizing insider threat. They tried to establishacceptable insider behavior patterns based on work-group role classifications. Although this approach isquite useful for detecting certain insider threats, ithas the limitation of a high false positive rate. Theframework suggested by [8] is another graph-based ap-proach for malicious insider threat detection which usesthe minimum description length principle for detectinganomalous activities. A recent paper [10] proposed theuse of authentication subgraphs for analyzing usersbehavior within an enterprise network utilizing a setof subgraph attributes in user profiling. Time seriesanalysis of subgraphs and use of bipartite graphs arealso introduced in their work, targeting on a muchcomprehensive analysis in their ongoing work. Anotherrelated framework, the BANDIT (Behavioral AnomalyDetection for Insider Threat) [3] also proposed a two-stage anomaly detection approach in managing insiderattacks which compared user’s behaviour based on ownand peer baselines.
2639
3. The Data Set
Due to the lack of availability of proper insiderthreat datasets we have utilized the insider threatdataset published by CERT Carnegie Mellon Universityfor this research [15]. The dataset “R4.2.tar.bz” hasbeen used for this analysis. According to the datasetowners, this is a “dense needle” dataset with a fairamount of red team scenarios. This dataset consists ofsix broad types of data records (HTTP, logon, device,file, email and psychometric) of 1000 employees overa 17 months period. All HTTP records contain user,PC, URL and web page content with time stamps.“Logon.csv” consists of user logon/logoff activitieswith the corresponding PC with timestamps. “Logon”and “Logoff” are the two types of activities can befound in data records. “Logon” activity corresponds toeither a user login event or a screen unlock event, whilethe “Logoff” event corresponds to user logoff event.Screen locks are not recorded in this dataset. The thirddata file “device.csv” is a collection of data recordsof removable media usage. It indicates insert/removeactions with the relevant user, PC, and timestamp. De-tails of file copies are stored in “file.csv” file with date,user, PC, filename, and content. To get the friendshipnetwork of users, the CERT dataset provides emailcommunication records including from, to, cc andbcc fields. “psychometric.csv” provides psychometricscores based on big five personality traits or five-factormodel (FFM) for the definition of personality.
Among the different work roles which in thedataset, we performed our analysis with three job roles.Table 1 shows the statistics of selected data records.
4. Methodology
The goal of this paper is to introduce a frameworkfor mitigating the insider threat problem using a combi-nation of graph-based approach and an anomaly detec-tion technique. This framework will utilize multidimen-sional inputs such as user interactions with hardwareassets, web access records, email correspondences andpsychometric figures. The graph-based approach is aprominent method of identifying inter-relationships be-tween multidimensional entities. A user’s interactionswith devices are illustrated in a weighted, undirectedlarge scale bipartite graph G = (V,E,W ), where Vis the set of vertices (users), E is the set of edges,and W is the set of edge weights. Set of verticescomprises of two types of entities, users and deviceswhile edges represent user’s interaction with the device.Edge weights correspond to the number of “Logoff” ac-tivities which appeared during the whole time durationof the dataset between an individual user and a device.Graph visualization was carried out using NodeXL [14]and all the other calculations were done using theR statistical computing language [13]. The following
Figure 2. First order subgraph topology for allusers
subsections describe the theoretical background and theimplemented methodology in detail.
Even though the dataset comprises of both “logon”and “logoff” records for individual users, we utilizeonly the “logoff” events for network mapping. Thereason behind this is that we cannot distinguish logonactivities, and screen unlocks as they both recordedas “logon” events. However, the screen locks are notrecorded, and only the logoff events are recorded as“logoff” events. The representation of the use of remov-able media also can be represented by an edge betweenthe corresponding user and the device. However, thiswill convert the graph into a multigraph where the exis-tence of multiple edges is possible among two vertices.To keep the simplicity of the first phase of analysis,we shall exclude the representation of such edges.Integration of the friendship network into the samenetwork would change the structure of the network, andit will lose the multipartite property. Therefore, inter-user relationships based on email data has not beenconsidered for graphical representation. The followinggraph attributes for individual users are captured fromthe above graph G for further analysis.
4.1. Graph Parameters
User’s vertex degree (du): The degree of a vertexis the number of edges connected to it. In the contextof this analysis, we calculated the degree only for theusers. Therefore this value represents the number ofdevices accessed by an individual user.
4.2. User Subgraph (USG) Parameters
As the next step of the analysis, the focus is onthe construction of subgraphs for each user. A deepdive into the different order of subgraphs is how weextract relevant graph parameters in graph analysis.In this work user subgraphs are constructed until thefifth order of neighborhood. We define the user sub-graph (USG) as a weighted undirected graph Gu =(Vu, Eu,Wu) for the entire period of the dataset. Vu
represents the vertex set of Gu while Eu represents
2640
TABLE 1. Data statistics
Functional Unit Department Numberof Users
HTTP Logon Device File Psychometric
Research And Engineering Engineering 129 4,196,817 101,782 67,916 75,335 129Research And Engineering Software Management 101 3,295,774 82,187 44,049 58,173 101Research And Engineering Research 101 3,332,576 79,362 30,906 41,292 101
the edge set of Gu. Link weights are similar to thatof the original graph. We have noticed that all thefirst order subgraphs have a star topology as shown inFigure 2. The number of external nodes can be within(1 : total number of devices). This factor is becauseof the disjoint nature of the two type of vertices. Eventhough we cannot extract much information on deviceaccess similarities with first order subgraphs, we willcontinue to use following subgraph properties of firstorder subgraphs in addition to the higher order sub-graph properties for the completeness of this analysis.
1) Vertex count vujfor j = 1 : 5
2) Edge count eujfor j = 1 : 5
3) Density pujfor j = 1 : 5
The density of a graph is the ratio of edges to allpossible edges given the number of vertices.
4) Diameter dujfor j = 1 : 5
The diameter of a graph is the largest shortest pathbetween any two vertices.
5) Number of peers pujfor j = 1 : 5
Since the main focus of this analysis is to identifythe most anomalous users, we have chosen thisparameter for evaluation in addition to the basicgraph properties.
The distribution of the above properties has beenillustrated as histograms and further discussed in the“Experimental Results” section of this paper.
4.3. Time Dependent Parameters
We believe that it is not sufficient to consideronly the above parameters in identifying maliciousinsiders without the temporal properties. To carry outa complete and comprehensive analysis, the follow-ing time-based parameters and personality values havebeen identified as important input parameters for theanomaly detection algorithm.
4.3.1. Individual logon logoff events. This parametercan be used in identifying users abnormal logon/logoffactivities as most disgruntled insiders tend to commitmalicious activities after hours [5]. Identifying users’baseline behavior on system/device access is an essen-tial part of malicious insider threat detection problem.For each user, four parameters (minimum, maximum,mean and mode) logon and logoff values have beencalculated. Those four parameters are also fed as aninput parameters to the anomaly detection unit.
4.3.2. Removable media usage events. Removablemedia is among the most popular method used in theftof Intellectual Property (IP) in extracting confidentialinformation from organizations [5]. Tracking the useof removable media can be an excellent informationsource for identifying suspicious events by trusted in-siders . Baseline behavior of removable media usageis captured by the minimum, maximum, mean andmode time of “Insert” and “Remove” activities as inthe logon/logoff event analysis. Time gap betweenconsecutive “Insert - Remove” action has also beenidentified as a good source of information to capturelarge file downloads. The daily number of files copiedby an individual is also used in this analysis.
4.3.3. Web Access Patterns. We can think of users’online behavior as a reflection of their offline behavior,as they tend to publish their feelings, thoughts, likesand dislikes through social media. In addition to theabove fact, web access patterns is also a good indi-cation of their online behavior. Disgruntled insiderstend to access competitors websites and recruitmentagency websites to understand and gather informationon potential opportunities. We have identified that theusers online behavior analysis will be comprehensiveif we include multiple social media data sources andweb access records. However due to the limitations ofdata availability on all these domains we will be strictonly to web access records for this research work. Also,it is evident that the content of web pages may havea direct link with users suspicious behavior. However,we will again restrict content analysis of web pages inthis analysis and will consider it as a future directionof our continuing work.
4.4. Personality Parameters
Psychometric data : Psychological behavior is oneof the other aspects linked to insider attacks. Suddenbehavioral changes can be indications of misuse ofprivileges. Verbal behavior, personality traits, unautho-rized absences, aggressive behavior, are a few indica-tors which can be considered as small markers whichcome before the big attack. Therefore nontechnical be-havioral changes are captured through the psychometricdata provided in the dataset.
Table 2 is a summary of all properties we haveidentified in this analysis to consider as input param-eters in to the isolation forest algorithm to separate
2641
TABLE 2. Selected parameter set
Module ParameterGraph Degree of vertex
Sub Graphs
Vertex CountEdge CountDensityWeighted DiameterNumber of Peers
Logon/LogoffEvents
Minimum/Maximum Logon TimeMean/Mode Logon TimeMinimum/Maximum Logoff TimeMean/Mode Logoff Time
Removable Media
Minimum/Maximum Insert TimeMean/Mode Insert TimeMinimum/Maximum Remove TimeMean/Mode Remove TimeMaximum number of daily file copiesMode of number of daily file copies
Web Access Patterns Number of Unique URLs
PsychometricObservations
O (Openness to experience)C (Conscientiousness)E (Extroversion)A (Agreeableness)N (Neuroticism)
most anomalous users. Due to space limitations, wehave listed some pairs of parameters in the same linein the table, e.g., the line ”Minimum/Maximum InsertTime” covers two parameters, ”Minimum Insert Time”,and ”Maximum Insert Time”.
4.5. Anomaly Detection
Due to the complex nature of insider threat prob-lem, it is extremely hard to pinpoint a user as amalicious insider. Therefore, the first step should bethe identification of possible malicious insiders who aremaximally deviating from peers as well as their normalbehavior. Therefore, as the second stage of our analysis,we will focus on implementing an anomaly detectionalgorithm based on the important graphical propertiesand time dependent properties identified at the previousstage of this analysis. The anomaly detection algo-rithm adopted in this analysis is the “Isolation forest”algorithm, which stands out in effectively separatinganomalous events from the rest of the instances [11].
Isolation Forest Algorithm - iForest: The isola-tion forest algorithm is a model-based approach whichexplicitly isolates anomalies without constructing atypical profile instance. Linear time complexity with alow constant and low memory requirements drives usto use it in our experiments as the enormous amount ofinformation need to be analyzed in the field of insiderthreat. The use of the isolation forest algorithm for thiswork is part of the overall research effort within ourresearch group at RMIT University and CA Pacific,with the details as presented in [16], where it is applied
TABLE 3. Summary of number of parameters
Property Number of Inputparameters
Graph properties 1Sub graph properties 25Logon/Logoff behavior 8Removable media usage 10Web access patterns 1Psychometric observations 5
to a very large enterprise system for anomaly detection.This algorithm also performs well with a large
number of irrelevant attributes and instances wheretraining data set does not contain any anomalies. Thismethod generates an ensemble of iTrees for a givendataset and the instances with the short average path ofiTrees are considered to be anomalies. If the calculatedanomaly score value, s is very close to 1 it can beregarded as a definite anomaly. Instances with s muchsmaller than 0.5 can be considered normal situations.If all the instances return s ≈ (0.5), then the entiresample deemed to be not having any distinct anomalies.
Based on the above-described algorithm, anomalyscores were calculated for each user, for each orderof subgraph for (1:5) separately, based on the fivegraph properties identified in subsection 4.2. In thiscase the iForest algorithm is executed considering 5input parameters. We believe it would be much ef-fective if we incorporate parameter values calculatedfor the different order of subgraphs when calculatinganomaly scores. Therefore anomaly scores correspond-ing to subgraph properties have been computed using25 distinct values obtained for 5 various parameters of1st, 2nd, 3rd, 4thand 5th order of subgraphs. Similarly,anomaly scores correspond to graph parameters, time-dependent parameters and personality parameters (assummarized in Table 2) were calculated independentlyusing iForest algorithm. Finally, anomaly scores corre-spond to each user is calculated as a combination ofall the parameters described in Table 2, in which casethe number of input parameters for the algorithm was50. Breakdown of the number of parameters has beensummarized in Table 3.
5. Experimental Results
This section is dedicated to a comprehensive dis-cussion of results obtained through our analysis. Thediscussion is based on the “Research and Engineer-ing/Engineering” work role, and final results for allthree work roles considered in this analysis are sum-marized in Table 4.
Figure 3 is an illustration of the user’s device accessnetwork. Users are represented in spheres while devicesare represented by triangles. Vertex size correspondsto the degree of the vertex, which is an indication of
2642
Figure 3. Users’ device access network
Figure 4. Degree distribution of users
the number of devices accessed by a particular user.The width of all edges corresponds to the numberof “Logoff” events which occurred during the entireperiod of the dataset. We continue to use the graphicalrepresentation of informational assets as in our previ-ous work [9], as it can be used to precisely indicateinterrelationships between informational assets. Also,it is an efficient means of extracting basic essentialparameters of massively dense log data.
5.1. Graph Parameters
The individual degree distribution is illustrated asa histogram in Figure 4. The degree of a user is thenumber of devices a user access in this analysis. Thisfigure reveals that the majority of users have a lowerdegree while the minority of users have the larger de-gree compared to others. We can think of two possiblereasons behind the few number of users in the tail ofthe distribution. Either these employees are assigned tomultiple devices to perform their day to day operationsor an anomalous behavior. As previously mentioned,we should not directly conclude any of the insiders inthe tail of the distribution as suspicious just by lookingat the number of devices they accessed. However, wecan think them as high-risk profiles among the others.
5.2. Subgraph Parameters
Figure 5(a) illustrates the histogram of vertex countfor the different order of user subgraphs. These his-tograms show the majority of users have a small num-ber of vertices in their subgraphs while the minorityof users have a larger number of vertices in their sub-graphs resulting much complex user subgraphs. Figure5(b) is an illustration of the distribution of edge countacross the different order of subgraphs. These valuesalso follow a distribution which is very similar to vertexcount. The subgraph density and the weighted diameterfor all USGs also shown in figure 5(c) and figure5(d) respectively. Density histograms show a similarpattern for most of the cases. Even though we couldnot find any obvious reason for this nature of distribu-tion, we think that subgraph density is an importantattribute for this kind of evaluations. Therefore, wecontinued to use that parameter as an input for theADU. Diameter distribution for lower order subgraphsshows similar behavior with a single peak data barwhile higher order subgraphs show similar behaviorwith two distinct peaks. These peaks are an indicationof cluster/clusters of users who have similar behavioralpatterns. The other subgraph property, number of peersalso indicates two broader groups of users correspondsto two significant data bars and few other small groupsof users. These results indicate the significance ofhigher order subgraph analysis in finding little cluesamong the enormous amount of data.
5.3. Time Dependent Parameters
This subsection discusses the results obtained forother time-varying properties, which were identifiedas some of the other governing parameters of insiderthreat problem.
5.3.1. Individual Logon-Logoff Behavior. Figure6(a) is an illustration of users logon behavior for theentire period of the dataset. By looking at this graph,it is evident that the majority of logon activities occurduring early office hours, which can be interpretedas the first logon event of the day. There are someother logon events, especially when we consider the“maximum” logon time which occurs during regularworking hours, which can be treated as logins followedby screen locks during the day. The logon times whichwe need to pay more attention are the events whichhappen during after office hours. We can identify afew users who have minimum and maximum logontimes occurred during the late night, which might beunusual for normal operations. In real world enterprisenetworks, we can expect system user logon activitiesduring this type of time periods for scheduled jobs suchas backups, log rotations and routine activities. But ifwe find such logon activities for non-system users, that
2643
Order 1
Vertex Count
Fre
quen
cy
0 40 80
020
4060
8010
012
0 122
222100000
Order 2
Vertex Count
Fre
quen
cy
0 40 80
020
4060
8010
012
0
92
02
29
231000
Order 3
Vertex Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
83
8
1
23
823100
Order 4
Vertex Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
81
9
1
24
244400
Order 5
Vertex Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
78
000345
1
38
0
(a) Vertex count
Order 1
Edge Count
Fre
quen
cy
0 40 80
020
4060
8010
012
0 122
223000000
Order 2
Edge Count
Fre
quen
cy
0 40 80
020
4060
8010
012
0
92
02
11
1
20
2100
Order 3
Edge Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
83
8
10
25
71220
Order 4
Edge Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
81
821
23
25
223
Order 5
Edge Count
Fre
quen
cy
0 100 250
020
4060
8010
012
0
78
00015
15
1
38
(b) Edge count
Order 1
Density
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
7
011
18
0
11
00
91
Order 2
Density
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
37
00110
13
00
77
Order 3
Density
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
51
00100000
77
Order 4
Density
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
51
00100000
77
Order 5
Density
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
51
00100000
77
(c) Density
Order 1
Diameter
Fre
quen
cy
0 400 800
020
4060
8010
012
0
123431
79
1214
62002000000
Order 2
Diameter
Fre
quen
cy
0 400 800
020
4060
8010
012
0
112345
77
13
107431132001
Order 3
Diameter
Fre
quen
cy
0 500 1500
020
4060
8010
012
0
112331
64
1200004371402000
29
00000100
Order 4
Diameter
Fre
quen
cy
0 400 800
020
4060
8010
012
0
112331
64
12000013734
0
33
Order 5
Diameter
Fre
quen
cy
0 400 10000
2040
6080
100
120
112331
64
12000000038
0
39
0001
(d) Weighted diameter
Order 1
Peers
Fre
quen
cy
0.0 0.4 0.8
020
4060
8010
012
0
129
Order 2
Peers
Fre
quen
cy
0 10 30
020
4060
8010
012
0
92
3 2 0 0 0
31
1
Order 3
Peers
Fre
quen
cy
0 10 30
020
4060
8010
012
0
92
3 2 0 0 0
31
1
Order 4
Peers
Fre
quen
cy
0 20 40
020
4060
8010
012
0
78
650001
25
77
Order 5
Peers
Fre
quen
cy
0 20 40
020
4060
8010
012
0
78
650001
25
77
(e) Number of peers
Figure 5. Histograms of subgraph properties
is for human users, it needs to be further investigated todifferentiate between a genuine or a suspicious activity.One of the other critical parameters of insider threatdetection, the “logoff” behavior of users are illustratedin Figure 6(b). This graph also shows the mean, mode,minimum and maximum logoff times of each user forthe entire period. As can be seen on the graph majorityof “logoff” events happen during late office hours. Asin the case of “logon” behavior we are concern aboutafter hours logoff events which are abnormal comparedto the majority events happen during regular businesshours.
5.3.2. Removable media usage. Figure 7 is anillustration of users’ removable media usage statistics.Similar to logon/logoff analysis, time dependenciesof removable media usage has also been investigated.Figure 7(a) and (b) shows the maximum, minimum,mean and mode times for USB connect and disconnectevents respectively. One important factor noticedthrough this analysis is only a 20% (37/129) ofemployees from the selected designation used USBfile copies, which can not be considered as a typicalbehavioral pattern among the chosen group. Inthis case also events which occur during regularoffice hours can be regarded as normal while eventshappened after hours can be either suspicious or workrelated. Also, we have to be vigilant about large filecopying during after hours as well as normal businesshours. This property can be yield by analysis of thetime gap between consecutive connect and disconnectevents, which we have not computed in this exercise.Figure 7(c) and (d) demonstrate the variation of usersdaily number of file accesses. To identify suspicious
file copies we have considered only the maximumand mode of the number of file copies per day by anindividual. If the difference between the maximumand the mode of the number of file copies is unusual,it can be considered as a suspicious file download.By looking at the histogram, it is clear that theabove difference is less than four for the majority ofemployees while a few of the users deviated from thispattern.
Web Access Patterns. For the completeness of thiswork, we have selected a single parameter, based onindividual users’ web access patterns. Distribution ofthe unique number of URLs accessed by individualusers is illustrated in Figure 8. This results also illus-trate few outliers from rest of the group, which can bedirected for further investigations. We will be exploringthe means of integration of our previous work [9] to getmore input parameters based on web access patterns inthe continuation of this work.
5.4. Anomaly Detection
Figure 9(a) illustrates the anomaly score distribu-tion of the user base for the different order of sub-graphs and the combination of all subgraph properties.Anomaly score distribution clearly indicates major twotypes of users based on the above-proposed subgraphproperties. The users in the tail of the distribution havesmall anomaly score values, and they do not changewith the order of subgraph. However, the other set ofusers who are in the main segment of the distributionhave higher anomaly scores and vary based on the orderof subgraph. Based on the Isolation Forest algorithm,
2644
00:00:0001:00:0002:00:0003:00:0004:00:0005:00:0006:00:0007:00:0008:00:0009:00:0010:00:0011:00:0012:00:0013:00:0014:00:0015:00:0016:00:0017:00:0018:00:0019:00:0020:00:0021:00:0022:00:0023:00:0000:00:00
AA
S04
42A
BC
0253
AH
M04
10A
JD00
74A
MD
0077
AN
T04
30A
RA
0252
ATF
0274
BB
S04
22B
IS02
47B
JM01
11B
LB01
14B
MM
0112
BS
A04
31B
SH
0445
CA
M01
02C
EJ0
109
CF
W02
64C
IM02
71C
JV02
86C
KB
0427
CN
E03
52C
SB
0080
DB
G02
61D
GF
0443
DIB
0081
DIB
0285
DLM
0051
DM
K02
57D
PM
0423
DP
S02
62D
TT
0076
DV
M02
65D
VW
0103
EG
D04
09E
HB
0420
EH
R00
96E
MW
0105
EN
L010
0E
PW
0434
ES
H02
83E
TR
0428
FT
M04
06G
KO
0078
HA
C02
82H
AD
0246
HA
H02
48H
BO
0413
HC
L011
3H
CM
0267
HC
W04
19H
GN
0287
HIA
0414
HK
B04
18H
MM
0108
HP
H00
75H
RB
0351
HW
W04
36IA
T02
63IIA
0092
IIW02
49IJ
M04
44IU
Y02
79JC
V02
45JI
M00
95JM
K00
99K
AT02
70K
BL0
260
KJW
0416
KQ
S04
15K
TG
0091
KV
S04
25K
WE
0082
LDB
0090
LOM
0086
MB
G04
38M
ES
0407
MH
G02
44M
IL02
55M
KW
0441
MO
H02
73N
BC
0412
NB
H02
56N
CE
0097
NC
K04
21N
DM
0417
NJB
0243
NM
S02
80N
WT
0098
OK
B04
39O
KC
0107
OM
S01
01O
NL0
276
ON
W00
83P
DW
0259
PE
S00
85P
GO
0433
PID
0426
PK
B02
81P
KB
0429
PP
F04
35Q
CH
0110
RA
G02
75R
AH
0440
RA
S04
32R
DT
0424
RG
N04
08R
HY
0079
RJB
0088
RM
S02
66R
OB
0272
RS
C00
89R
SM
0277
SD
B01
06S
GS
0268
SJF
0094
SM
W00
84S
OG
0411
SS
G02
50S
SR
0278
TE
B02
51T
EM
0093
TS
B04
37V
AA
0087
WA
S02
54X
AA
0104
ZJS
0269
ZM
C02
84Z
SK
0258
user
Tim
e
variable
max
mean
min
mode
(a) Logon behavior
00:00:0001:00:0002:00:0003:00:0004:00:0005:00:0006:00:0007:00:0008:00:0009:00:0010:00:0011:00:0012:00:0013:00:0014:00:0015:00:0016:00:0017:00:0018:00:0019:00:0020:00:0021:00:0022:00:0023:00:0000:00:00
AA
S04
42A
BC
0253
AH
M04
10A
JD00
74A
MD
0077
AN
T04
30A
RA
0252
ATF
0274
BB
S04
22B
IS02
47B
JM01
11B
LB01
14B
MM
0112
BS
A04
31B
SH
0445
CA
M01
02C
EJ0
109
CF
W02
64C
IM02
71C
JV02
86C
KB
0427
CN
E03
52C
SB
0080
DB
G02
61D
GF
0443
DIB
0081
DIB
0285
DLM
0051
DM
K02
57D
PM
0423
DP
S02
62D
TT
0076
DV
M02
65D
VW
0103
EG
D04
09E
HB
0420
EH
R00
96E
MW
0105
EN
L010
0E
PW
0434
ES
H02
83E
TR
0428
FT
M04
06G
KO
0078
HA
C02
82H
AD
0246
HA
H02
48H
BO
0413
HC
L011
3H
CM
0267
HC
W04
19H
GN
0287
HIA
0414
HK
B04
18H
MM
0108
HP
H00
75H
RB
0351
HW
W04
36IA
T02
63IIA
0092
IIW02
49IJ
M04
44IU
Y02
79JC
V02
45JI
M00
95JM
K00
99K
AT02
70K
BL0
260
KJW
0416
KQ
S04
15K
TG
0091
KV
S04
25K
WE
0082
LDB
0090
LOM
0086
MB
G04
38M
ES
0407
MH
G02
44M
IL02
55M
KW
0441
MO
H02
73N
BC
0412
NB
H02
56N
CE
0097
NC
K04
21N
DM
0417
NJB
0243
NM
S02
80N
WT
0098
OK
B04
39O
KC
0107
OM
S01
01O
NL0
276
ON
W00
83P
DW
0259
PE
S00
85P
GO
0433
PID
0426
PK
B02
81P
KB
0429
PP
F04
35Q
CH
0110
RA
G02
75R
AH
0440
RA
S04
32R
DT
0424
RG
N04
08R
HY
0079
RJB
0088
RM
S02
66R
OB
0272
RS
C00
89R
SM
0277
SD
B01
06S
GS
0268
SJF
0094
SM
W00
84S
OG
0411
SS
G02
50S
SR
0278
TE
B02
51T
EM
0093
TS
B04
37V
AA
0087
WA
S02
54X
AA
0104
ZJS
0269
ZM
C02
84Z
SK
0258
user
Tim
e
variable
max
mean
min
mode
(b) Logoff behavior
Figure 6. Users’ logon and logoff behavior
00:00:0001:00:0002:00:0003:00:0004:00:0005:00:0006:00:0007:00:0008:00:0009:00:0010:00:0011:00:0012:00:0013:00:0014:00:0015:00:0016:00:0017:00:0018:00:0019:00:0020:00:0021:00:0022:00:0023:00:0000:00:00
AH
M04
10A
JD00
74B
BS
0422
BIS
0247
BJM
0111
CE
J010
9C
IM02
71D
IB00
81D
IB02
85D
LM00
51D
MK
0257
DP
M04
23E
SH
0283
FT
M04
06G
KO
0078
HA
D02
46H
BO
0413
HC
M02
67H
MM
0108
HP
H00
75H
RB
0351
HW
W04
36IIW
0249
IJM
0444
JMK
0099
KW
E00
82LO
M00
86M
KW
0441
MO
H02
73N
DM
0417
NM
S02
80N
WT
0098
PP
F04
35R
DT
0424
RS
C00
89R
SM
0277
XA
A01
04
user
Tim
e
variable
max
mean
min
mode
(a) USB connect
00:00:0001:00:0002:00:0003:00:0004:00:0005:00:0006:00:0007:00:0008:00:0009:00:0010:00:0011:00:0012:00:0013:00:0014:00:0015:00:0016:00:0017:00:0018:00:0019:00:0020:00:0021:00:0022:00:0023:00:0000:00:00
AH
M04
10A
JD00
74B
BS
0422
BIS
0247
BJM
0111
CE
J010
9C
IM02
71D
IB00
81D
IB02
85D
LM00
51D
MK
0257
DP
M04
23E
SH
0283
FT
M04
06G
KO
0078
HA
D02
46H
BO
0413
HC
M02
67H
MM
0108
HP
H00
75H
RB
0351
HW
W04
36IIW
0249
IJM
0444
JMK
0099
KW
E00
82LO
M00
86M
KW
0441
MO
H02
73N
DM
0417
NM
S02
80N
WT
0098
PP
F04
35R
DT
0424
RS
C00
89R
SM
0277
XA
A01
04
user
Tim
e
variable
max
mean
min
mode
(b) USB disconnect
0
10
20
AH
M04
10A
JD00
74B
BS
0422
BIS
0247
BJM
0111
CE
J010
9C
IM02
71D
IB00
81D
IB02
85D
LM00
51D
MK
0257
DP
M04
23E
SH
0283
FT
M04
06G
KO
0078
HA
D02
46H
BO
0413
HC
M02
67H
MM
0108
HP
H00
75H
RB
0351
HW
W04
36IIW
0249
IJM
0444
JMK
0099
KW
E00
82LO
M00
86M
KW
0441
MO
H02
73N
DM
0417
NM
S02
80N
WT
0098
PP
F04
35R
DT
0424
RS
C00
89R
SM
0277
XA
A01
04
user
coun
t variable
MAX
MODE
(c) USB file transfer statistics
Differnce between Maximum and Mode number of File Copies
Num
ber
of U
sers
0 2 4 6 8 10
05
1015
20
17
8
4
2
0
2
0
2 2
(d) Difference between maximum and mode number of USBfile transfers
Figure 7. Removable media usage behavior
users with anomalous scores very close to 1 can beconsidered as definite anomalies while the instanceswith anomaly scores much smaller than 0.5 are safe toconsider as typical cases.
5.5. Parameter Dependency
Figure 9(b) is an indication of how the anomalyscores are distributed on different input parameterschosen in this analysis. The dense graph indicated fewpoints above the “Red” color horizontal line which isequivalent to an anomaly score of 0.8. Users belongto those points can be considered as anomalous users.
Figure 8. Distribution of unique URLs accessed byusers
2645
0.3
0.4
0.5
0.6
0.7
0.8
IIW02
49P
ES
0085
ND
M04
17A
BC
0253
DV
W01
03A
JD00
74S
GS
0268
PD
W02
59LO
M00
86S
SG
0250
TS
B04
37N
CE
0097
IIA00
92G
KO
0078
EM
W01
05IA
T02
63Q
CH
0110
DP
M04
23N
MS
0280
EN
L010
0P
KB
0429
DM
K02
57B
JM01
11X
AA
0104
KW
E00
82F
TM
0406
HA
D02
46B
LB01
14H
PH
0075
AA
S04
42H
BO
0413
SD
B01
06AT
F02
74P
PF
0435
MIL
0255
ZM
C02
84D
IB00
81E
TR
0428
SS
R02
78K
TG
0091
CA
M01
02H
IA04
14K
JW04
16S
MW
0084
HM
M01
08E
GD
0409
HC
M02
67S
OG
0411
CN
E03
52R
OB
0272
CIM
0271
EH
B04
20K
QS
0415
MO
H02
73S
JF00
94M
BG
0438
MH
G02
44R
SC
0089
RS
M02
77E
SH
0283
HA
C02
82K
VS
0425
LDB
0090
PK
B02
81A
HM
0410
AM
D00
77A
NT
0430
AR
A02
52B
BS
0422
BIS
0247
BM
M01
12B
SA
0431
BS
H04
45C
EJ0
109
CF
W02
64C
JV02
86C
KB
0427
CS
B00
80D
BG
0261
DG
F04
43D
IB02
85D
LM00
51D
PS
0262
DT
T00
76D
VM
0265
EH
R00
96E
PW
0434
HA
H02
48H
CL0
113
HC
W04
19H
GN
0287
HK
B04
18H
RB
0351
HW
W04
36IJ
M04
44IU
Y02
79JC
V02
45JI
M00
95JM
K00
99K
AT02
70K
BL0
260
ME
S04
07M
KW
0441
NB
C04
12N
BH
0256
NC
K04
21N
JB02
43N
WT
0098
OK
B04
39O
KC
0107
OM
S01
01O
NL0
276
ON
W00
83P
GO
0433
PID
0426
RA
G02
75R
AH
0440
RA
S04
32R
DT
0424
RG
N04
08R
HY
0079
RJB
0088
RM
S02
66T
EB
0251
TE
M00
93V
AA
0087
WA
S02
54Z
JS02
69Z
SK
0258
User
Ano
mal
y_S
core
variable
All
order1
order2
order3
order4
order5
(a) Sub graph properties
0.4
0.6
0.8
BJM
0111
IIW02
49E
SH
0283
HA
D02
46N
BC
0412
FT
M04
06H
PH
0075
PP
F04
35M
OH
0273
EH
B04
20H
RB
0351
NM
S02
80P
DW
0259
DIB
0285
RS
C00
89D
LM00
51M
KW
0441
DP
S02
62P
ID04
26H
KB
0418
CK
B04
27C
FW
0264
EG
D04
09N
JB02
43C
JV02
86A
BC
0253
XA
A01
04H
MM
0108
DM
K02
57H
CM
0267
IAT
0263
BB
S04
22N
DM
0417
AA
S04
42A
JD00
74LO
M00
86S
SR
0278
CE
J010
9H
CW
0419
BIS
0247
KW
E00
82A
MD
0077
QC
H01
10R
GN
0408
TE
M00
93B
LB01
14IJ
M04
44H
BO
0413
PE
S00
85S
JF00
94E
NL0
100
ET
R04
28R
HY
0079
HW
W04
36D
TT
0076
CIM
0271
HG
N02
87R
AS
0432
RS
M02
77C
SB
0080
MB
G04
38O
NL0
276
DP
M04
23IU
Y02
79E
MW
0105
BS
H04
45E
PW
0434
AN
T04
30P
KB
0281
GK
O00
78K
JW04
16JM
K00
99C
NE
0352
RD
T04
24K
BL0
260
RJB
0088
SM
W00
84N
WT
0098
LDB
0090
SD
B01
06H
CL0
113
ON
W00
83S
SG
0250
TE
B02
51H
AC
0282
BS
A04
31R
AH
0440
AR
A02
52W
AS
0254
VA
A00
87IIA
0092
DV
M02
65AT
F02
74O
MS
0101
KAT
0270
RA
G02
75S
GS
0268
AH
M04
10D
IB00
81S
OG
0411
OK
B04
39O
KC
0107
EH
R00
96M
IL02
55N
BH
0256
ZM
C02
84JI
M00
95M
ES
0407
MH
G02
44T
SB
0437
KV
S04
25B
MM
0112
KT
G00
91R
MS
0266
PG
O04
33H
IA04
14N
CK
0421
PK
B04
29D
BG
0261
ZS
K02
58K
QS
0415
DV
W01
03R
OB
0272
ZJS
0269
CA
M01
02N
CE
0097
DG
F04
43H
AH
0248
JCV
0245
User
Ano
mal
y_S
core
Variable
All
Graph
Logon/Logoff
Psychometry
SubGraph
USB
Web
(b) All properties
Figure 9. Anomaly score distribution
To get a better understanding of results illustratedin Figure 9(b), the distribution of users with respectto anomaly scores computed based on all identifiedparameters is shown as a histogram in Figure 10. Wefind that the majority of users have anomaly scores inthe interval [0.4, 0.7], which can be considered to benormal, while a minority of users have anomaly scorevalues above 0.7 which can be regarded as suspicious.The single outlier with anomaly score value higher than0.8 can be tagged for further investigations.
In the case of insider threat detection and preven-tion, the priority is on the isolation of suspicious usersfrom the rest of workforce. Since it was the majorintention of this work, we have used the followingtechnique to validate above results by computing thepercentage of suspicious users based on the differentparameters identified. To perform that, the calculatedanomaly scores are mapped into a binary vector (0, 1)based on a predefined threshold value for each param-eter. The threshold value for each parameter is selectedas the (maximum value − 0.1), keeping a marginof 10% as in most of the experimental cases. In theparticular case of anomaly scores of ≈ 0.5 for the entiredataset, we will exclude that parameter in the validationprocess as that parameter do not contribute much infinding suspicious activities. For all the other casesanomaly scores are mapped as described above. Wepropose the following two methods to check parameterdependencies as discussed below.
Case I : In this approach percentage of suspicioususers are calculated considering the predicted num-ber of anomalous parameters at a time from 1 to 6.Calculated percentages are summarized in Table 4.Calculated percentage values reveal more than 79% ofusers can be considered to have normal behavior, whilethe others have suspicious behavior. Also noted thereare no users who are suspicious when considered morethan three parameters together. The complex nature ofinsider threat problem can govern this type of results.
Case II : In this method, we proposed looking at
the all possible combinations of parameters from thesix main categories selected for this analysis. Afterconsidering the results of above Case I, we could notexpect a significant change in results. However, we be-lieve this would be a possible approach in identificationof parameter dependencies.
6. Conclusions and Future Work
In this paper, we have introduced a frameworkbased on graphical and anomaly detection approachesfor identifying potential malicious insiders. This modelgenerates anomaly scores based on different inputparameters for each user. Considering the nature ofinsider attacks a user can be deemed to be suspiciouseven if a single parameter has been found to be sus-picious. We have adopted graph, subgraph propertiesand statistical methods in generating input parametersfor the anomaly detection algorithm through multi-domain real world information. Empirical results revealthe importance of selected properties in combating thispatient and smart attack. We also found that morethan 79% of users with common behavioral patternswhile the rest of the group shows suspicious behaviorbased on different parameters. Users belong to theminority group can be tagged and directed for furtherinvestigation.
In the continuation of this work, we will focus onintegrating as many as possible input parameters toimprove the effectiveness of the proposed framework.It would include social network data which can be con-sidered as a good source of online behavior and otherstatistical inputs from email and instant messagingcommunications. We believe this framework would bemuch more useful if we can include data from contentanalysis of other sources such as websites access byusers, emails sent/received by users and file transfers toremovable media. We will focus on integrating severalother graphical parameters, few other statistical param-eters such as time gap between USB insert/remove
2646
TABLE 4. Statistics on parameter dependency
Functional unit DepartmentPercentage of Users Corresponds to Number of Anomalous Parameters (%)0 1 2 3 4 5 6
Research And Engineering Engineering 83.72 16.28 0.00 0.00 0.00 0.00 0.00Research And Engineering Software Management 82.18 15.84 0.99 0.99 0.00 0.00 0.00Research And Engineering Research 79.21 17.82 2.97 0.00 0.00 0.00 0.00
Figure 10. Anomaly score distribution of users
actions in expanding this framework. Also, we willexplore the possibilities of integration of similaritybased clustering mechanism which we have introducedin our previous work for web access pattern analysis[9], with the proposed model. Another focus is onextending the analysis for the entire dataset and validateresults with the 2 insider threat cases simulated in thedataset. The other main aspect we are looking towardsis the use of temporal properties in graph analysis toincorporate time factor instead of statistical analysis.Finally, the main goal of this work, as well as ourprevious and future work is to formulate an efficient,scalable and automated insider threat detection andprediction framework.
Acknowledgments
The authors were supported by Australian Re-search Council (ARC) linkage grants, LP110200321and LP140100698. In addition, Anagi Gamachchi issupported by an ATN (Australian Technological Net-work) Industrial Doctoral Training Center (IDTC) PhDscholarship.
References
[1] “By the numbers: Cyber attack costs compared,”2016, accessed on 31/05/2016. [Online]. Avail-able: http://www.csoonline.com/article/3074826/security/by-the-numbers-cyber-attack-costs-compared.html
[2] Q. Althebyan and B. Panda, “A Knowledge-Base Model for In-sider Threat Prediction,” in Information Assurance and SecurityWorkshop, 2007. IAW ’07. IEEE SMC, 2007, pp. 239–246.
[3] V. H. Berk, G. Cybenko, I. G. d. Souza, and J. P. Murphy,“Managing malicious insider risk through bandit,” in SystemScience (HICSS), 2012 45th Hawaii International Conferenceon, Jan 2012, pp. 2422–2430.
[4] O. Brdiczka, L. Juan, B. Price, S. Jianqiang, A. Patil, R. Chow,E. Bart, and N. Ducheneaut, “Proactive Insider Threat Detec-tion through Graph Learning and Psychological Context,” in
Security and Privacy Workshops (SPW), 2012 IEEE Symposiumon, 2012, pp. 142–149.
[5] D. Cappelli, A. Moore, and R. Trzeciak, The CERT Guideto Insider Threats: How to Prevent, Detect, and Respondto Information Technology Crimes (Theft, Sabotage, Fraud).Addison-Wesley Professional, 2012.
[6] Y. Chen, S. Nyemba, W. Zhang, and B. Malin, “Specializingnetwork analysis to detect anomalous insider actions,” SecurityInformatics, vol. 1, no. 1, p. 5, 2012.
[7] A. M. Dawn Cappelli Randall Trzeciak,Timothy J. Shimeall,“Common Sense Guide to Prevention and Detection of InsiderThreats , 3rd Edition,” 2009.
[8] W. Eberle and L. Holder, “Applying graph-based anomalydetection approaches to the discovery of insider threats,” inIntelligence and Security Informatics, 2009. ISI ’09. IEEEInternational Conference on, 2009, pp. 206–208.
[9] A. Gamachchi and S. Boztas, “Web access patterns revealinsiders behavior,” in 2015 Seventh International Workshopon Signal Design and its Applications in Communications(IWSDA), Sept 2015, pp. 70–74.
[10] A. D. Kent, L. M. Liebrock, and J. C. Neil, “Authenticationgraphs: Analyzing user behavior within an enterprise network,”Computers & Security, vol. 48, pp. 150–166, feb 2015.
[11] F. T. Liu, K. M. Ting, and Z. H. Zhou, “Isolation forest,” in2008 Eighth IEEE International Conference on Data Mining,Dec 2008, pp. 413–422.
[12] K. Nance and R. Marty, “Identifying and Visualizing theMalicious Insider Threat Using Bipartite Graphs,” in SystemSciences (HICSS), 2011 44th Hawaii International Conferenceon, 2011, pp. 1–9.
[13] R Core Team, R: A Language and Environment for StatisticalComputing, R Foundation for Statistical Computing, Vienna,Austria, 2013. [Online]. Available: http://www.R-project.org/
[14] M. Smith, N. Milic Frayling, B. Shneiderman, E. Mendes Ro-drigues, J. Leskovec, and D. C, “Nodexl: a free and opennetwork overview, discovery and exploration add-in for excel2007/2010.” [Online]. Available: http://nodexl.codeplex.com
[15] “CERT Insider Threat Data Set,” Software Engineering Insti-tute, Carnegie Mellon University, CERT Division and ExactData LLC. [Online]. Available: https://www.cert.org/insider-threat/tools/
[16] L. Sun, S. Versteeg, S. Boztas, and A. Rao, “Detecting anoma-lous user behaviour using an extended Isolation Forest algo-rithm: An enterprise case study,” Sept 2016, arXiv:1609.06676.
[17] S. Zeadally, B. Yu, D. H. Jeong, and L. Liang, “DetectingInsider Threats: Solutions and Trends,” Information SecurityJournal: A Global Perspective, vol. 21, no. 4, pp. 183–192,2012.
2647