doing user behaviour analytics through interactive visual ... · user overview group organization...

2
Doing User Behaviour Analytics through Interactive Visual User Profiles Phong H. Nguyen * City, University of London, UK Siming Chen Fraunhofer IAIS Natalia Andrienko Fraunhofer IAIS / City, University of London Gennady Andrienko § Fraunhofer IAIS / City, University of London Olivier Thonnard Amadeus, France Cagatay Turkay || City, University of London, UK ABSTRACT User behaviour analytics (UBA) systems offer sophisticated mod- els that capture users’ behaviour over time to identify fraudulent activities that do not match with their profiles. Our paper presents a visual analytics approach to build a comprehensive, multi-faceted understanding of user behaviour. We observe that with the aid of interactive visual interfaces, analysts are able to conduct exploratory and investigative analysis effectively. 1 I NTRODUCTION Fraudsters in online systems are using increasingly more sophis- ticated and complex approaches that are becoming challenging to identify such as by disguising themselves as legitimate users and mimicking users’ activities to stay under the radar inside the sys- tem [1]. To identify such insider threats, computational UBA solu- tions aim to learn probabilistic models of users’ behaviour through their past activities, and trigger alerts when users start behaving in unusual, unexpected ways. However, they are limited in providing in-depth views into users’ behaviour, and valuable analyst time is being lost in identifying the causes of the issues and deciding on whether the alerted signals are indeed problematic cases. In this paper, we propose a visual analytics approach to facilitate effective decision making for cyber security analysts using UBA systems through a multi-faceted understanding of user behaviour. We take a user-centred approach and characterise the problem do- main thorough a study of the challenges, goals, and the analytical tasks within the context of user behaviour analytics systems. We then design a visual analytics framework that encompasses a multi- faceted interactive visual user profile to help investigate the various perspectives of user behaviour concurrently. We demonstrate the efficacy of our approach through a number of use cases conducted as a multidisciplinary team and discuss the limitations observed. 2 DOMAIN CHARACTERISATION In this paper, we work with a UBA model that uses the logs recorded within a login and security server. The log data is split into sessions, each containing an ordered list of timestamped actions performed by the user conducting that session, and meta-information such as IP address, user ID and organisation. We also get an anomaly score that is provided by the underlying UBA. Actions in this context are semantically labelled such as “CreateLoginArea”, “SearchUsr” and DisplayOrgaDetails”. The dataset comprises of 18,956 sessions performed by 1,666 users during a 15 day time window, with 305 unique action types. * e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] § e-mail: [email protected] e-mail: [email protected] || e-mail: [email protected] We take a user-centred approach to the design and development of our solutions. To understand the application domain, we conducted a series of workshops with analysts, who all have considerable experience with these types of investigation mentioned above. Based on these observations, we identified analytical tasks, designed and implemented initial versions of the solutions, presented them to the analysts in follow-up sessions, and gathered feedback to iteratively improve the designs. T1 Identifying users of interest. T2 Understanding the different facets of user behaviour. T3 Investigating patterns and deviations in user behaviour. T4 Putting sessions in the context of a user. 3 VASABI: VISUAL ANALYTICS FOR UNDERSTANDING USER BEHAVI OUR 3.1 Building User Profiles We categorise features of a profile into three broad categories: Inherent features that describe the low-level characteristics of how users utilise the application. Examples: meta-information associated with each session such as IP address and browser. Derived features that are extracted through straightforward statistical computation. Examples: the number of sessions and the number of unique actions. Latent features that provide an in-depth understanding into the behaviour of users through the application of a sophisticated computation. Examples: UBA modelling score. Users’ behaviour in digital systems are largely determined by different roles that they have in their organisations. Each role could be responsible for a few particular tasks. We consider task as a latent feature of user profile and apply a text-based clustering analysis to extract the tasks automatically. Each action is mapped to a word and each session is mapped to a document. A k-means clustering (k = 10) is used to extract the clusters, i.e., tasks. 3.2 Visualising User Profiles We design two views, one for exploring multiple facets of user profile and one for investigating performed tasks in more detail. 3.2.1 Profile View It is essential for analysts to observe many facets of user profile concurrently (T2). For each facet, we use a standard chart to show a summary of user sessions and composite the charts to make a compact representation of a user profile. The visual profiles are stacked together to enable comparison. Analysing anomalous ses- sions within the context of the users performing them helps identify the deviation of user behaviour from the norm (T3, T4). We support this analytical task by superimposing the sessions of interest, as small orange circles, on top of the visual summary. A random noise is added to the vertical position of the dots to avoid overplotting. Mouse hovering a session highlights the same session in other facets, enabling to observe different facets concurrently. 3.2.2 Task Overview Each task, or cluster, can be represented by five dominant actions within a cluster. A task is shown as a set of five squares, each for

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Doing User Behaviour Analytics through Interactive Visual ... · User Overview Group Organization Sort Groups Length Height # Sessions (log) Sort Items Score March Thu 02 Fri 03 Sat

Doing User Behaviour Analytics through Interactive Visual User ProfilesPhong H. Nguyen*

City, University of London, UKSiming Chen†

Fraunhofer IAISNatalia Andrienko‡

Fraunhofer IAIS / City, University of London

Gennady Andrienko§

Fraunhofer IAIS / City, University of LondonOlivier Thonnard¶

Amadeus, FranceCagatay Turkay||

City, University of London, UK

ABSTRACT

User behaviour analytics (UBA) systems offer sophisticated mod-els that capture users’ behaviour over time to identify fraudulentactivities that do not match with their profiles. Our paper presents avisual analytics approach to build a comprehensive, multi-facetedunderstanding of user behaviour. We observe that with the aid ofinteractive visual interfaces, analysts are able to conduct exploratoryand investigative analysis effectively.

1 INTRODUCTION

Fraudsters in online systems are using increasingly more sophis-ticated and complex approaches that are becoming challenging toidentify such as by disguising themselves as legitimate users andmimicking users’ activities to stay under the radar inside the sys-tem [1]. To identify such insider threats, computational UBA solu-tions aim to learn probabilistic models of users’ behaviour throughtheir past activities, and trigger alerts when users start behaving inunusual, unexpected ways. However, they are limited in providingin-depth views into users’ behaviour, and valuable analyst time isbeing lost in identifying the causes of the issues and deciding onwhether the alerted signals are indeed problematic cases.

In this paper, we propose a visual analytics approach to facilitateeffective decision making for cyber security analysts using UBAsystems through a multi-faceted understanding of user behaviour.We take a user-centred approach and characterise the problem do-main thorough a study of the challenges, goals, and the analyticaltasks within the context of user behaviour analytics systems. Wethen design a visual analytics framework that encompasses a multi-faceted interactive visual user profile to help investigate the variousperspectives of user behaviour concurrently. We demonstrate theefficacy of our approach through a number of use cases conductedas a multidisciplinary team and discuss the limitations observed.

2 DOMAIN CHARACTERISATION

In this paper, we work with a UBA model that uses the logs recordedwithin a login and security server. The log data is split into sessions,each containing an ordered list of timestamped actions performedby the user conducting that session, and meta-information such asIP address, user ID and organisation. We also get an anomaly scorethat is provided by the underlying UBA. Actions in this context aresemantically labelled such as “CreateLoginArea”, “SearchUsr” and“DisplayOrgaDetails”. The dataset comprises of 18,956 sessionsperformed by 1,666 users during a 15 day time window, with 305unique action types.

*e-mail: [email protected]†e-mail: [email protected]‡e-mail: [email protected]§e-mail: [email protected]¶e-mail: [email protected]||e-mail: [email protected]

We take a user-centred approach to the design and development ofour solutions. To understand the application domain, we conducteda series of workshops with analysts, who all have considerableexperience with these types of investigation mentioned above. Basedon these observations, we identified analytical tasks, designed andimplemented initial versions of the solutions, presented them to theanalysts in follow-up sessions, and gathered feedback to iterativelyimprove the designs.T1 Identifying users of interest.T2 Understanding the different facets of user behaviour.T3 Investigating patterns and deviations in user behaviour.T4 Putting sessions in the context of a user.

3 VASABI: VISUAL ANALYTICS FOR UNDERSTANDINGUSER BEHAVIOUR

3.1 Building User ProfilesWe categorise features of a profile into three broad categories:

• Inherent features that describe the low-level characteristics ofhow users utilise the application. Examples: meta-informationassociated with each session such as IP address and browser.

• Derived features that are extracted through straightforwardstatistical computation. Examples: the number of sessions andthe number of unique actions.

• Latent features that provide an in-depth understanding into thebehaviour of users through the application of a sophisticatedcomputation. Examples: UBA modelling score.

Users’ behaviour in digital systems are largely determined bydifferent roles that they have in their organisations. Each role couldbe responsible for a few particular tasks. We consider task as a latentfeature of user profile and apply a text-based clustering analysis toextract the tasks automatically. Each action is mapped to a wordand each session is mapped to a document. A k-means clustering(k = 10) is used to extract the clusters, i.e., tasks.

3.2 Visualising User ProfilesWe design two views, one for exploring multiple facets of userprofile and one for investigating performed tasks in more detail.

3.2.1 Profile ViewIt is essential for analysts to observe many facets of user profileconcurrently (T2). For each facet, we use a standard chart to showa summary of user sessions and composite the charts to make acompact representation of a user profile. The visual profiles arestacked together to enable comparison. Analysing anomalous ses-sions within the context of the users performing them helps identifythe deviation of user behaviour from the norm (T3, T4). We supportthis analytical task by superimposing the sessions of interest, assmall orange circles, on top of the visual summary. A random noiseis added to the vertical position of the dots to avoid overplotting.Mouse hovering a session highlights the same session in other facets,enabling to observe different facets concurrently.

3.2.2 Task OverviewEach task, or cluster, can be represented by five dominant actionswithin a cluster. A task is shown as a set of five squares, each for

Page 2: Doing User Behaviour Analytics through Interactive Visual ... · User Overview Group Organization Sort Groups Length Height # Sessions (log) Sort Items Score March Thu 02 Fri 03 Sat

User Overview Sort Items ScoreHeight # Sessions (log)Sort Groups LengthGroup Organization

March Thu 02 Fri 03 Sat 04 Mar 05 Mon 06 Tue 07 Wed 08 Thu 09 Fri 10 Sat 11 Mar 12 Mon 13 Tue 14 Wed 15 Thu 16 Fri 170

100

200300

400

Session Overview

CelestialMadonna

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 7 IE 0 1 3 4 5 6 7 8

Hypnotia0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 7 IE 0 1 3 4 5 6 7 8

ElMuerto

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 7 IE 0 1 3 4 5 6 7 8

AndroidMan

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 7 IE 0 1 3 4 5 6 7 8

Ka-Zar0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 7 IE 0 1 3 4 5 6 7 8

IronMonger

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 7 IE 0 1 3 4 5 6 7 8

Electro0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 7 IE 0 1 3 4 5 6 7 8

DoctorSun

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 7 IE 0 1 3 4 5 6 7 8

Daytripper0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 7 IE 0 1 3 4 5 6 7 8

Sessions Total Unique Actions Global Score Length Duration Starting Time Operating System Browser IP Address TaskUser Profiles

Bloodtide0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

BeautifulDreamer

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 10Win 7

Win 8.1Chrome

EdgeFirefox IE 0 1 2 3 4 6 7 8 9

Century,Turner

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 10Win 7

Win 8.1Chrome

EdgeFirefox IE 0 1 2 3 4 6 7 8 9

Destiny I0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

Headknocker0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

Giant-Man

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 10Win 7

Win 8.1Chrome

EdgeFirefox IE 0 1 2 3 4 6 7 8 9

Eros0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

Leech0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

CrimsonCommando

0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 10000:00

01:0002:00

03:0004:00

05:0000:00

06:0012:00

18:0000:00

Win 10Win 7

Win 8.1Chrome

EdgeFirefox IE 0 1 2 3 4 6 7 8 9

Fixx0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100

00:0001:00

02:0003:00

04:0005:00

00:0006:00

12:0018:00

00:00Win 10

Win 7Win 8.1

ChromeEdge

Firefox IE 0 1 2 3 4 6 7 8 9

Sessions Total Unique Actions Global Score Length Duration Starting Time Operating System Browser IP Address TaskUser Profiles

Change of shifts

An organisation with shared profile traits

An organisation with varying profile traits

Full temporal duration of sessions selected

Complementary Roles

Variations in IPs

0123456789

Task Overview

0123456789

Task Overview

Figure 1: Our visual analytics approach VASABI as used for behaviour analysis on users from two organisations. The organisations with differentcharacteristics are investigated together with the users working within them. Using the full temporal duration of the sessions (top-left corner:Session Overview), one organisation is identified as having common behaviour traits across its members, whereas the other with varyingcharacteristics. In the first organisation (selected through the User Overview at the bottom-left corner), all users seem to use the same systemsand share IPs (observed in the User Profile view at the middle). In the second, users have unique IPs and are more varied in terms of thesystems they use. Looking closely, interesting patterns emerge: what could potentially be a shift change between two employees is observed inthe top organisation with two users (Celestial Madonna and El Muerto) having complementary roles. Task Overviews (two views on the right)indicate the variations within the computationally extracted tasks as carried out in the organisations.

an action, coloured based on the action’s group. This grouping isdone semi-automatically based on a t-sne projection of a word2vectransformation. The left of the view shows two statistics: the distri-bution of tasks in the entire dataset (lighter bars) and the distributionof tasks within a selection of interest, such as sessions performed bya particular user in a given time range (darker bars). This enablescomparison across tasks within a user, or two sets of sessions (T3).

3.3 Visual Analytics EnvironmentThe two User Profile views are integrated into a visual analyticsenvironment for facilitating exploration of user behaviour. Theenvironment is described in our previous papers [2, 3], including thefollowing three views:

• Session Overview provides a temporal distribution of sessionsin the entire dataset.

• User Overview explores sessions of interest (selected in theSession view) with user and organisation perspectives.

• Session Timeline shows actions along a time axis according towhen they occur.

4 EVALUATION

We conducted two one-hour long sessions with domain analysts tounderstand how they would use VASABI to perform their analyses.Here, we describe one of the use cases that were identified duringthe evaluation session: Organisational working patterns. The goalof this use case is to observe the variations in the behaviour profilesacross the users within organisations.

The analysis starts with the analyst selecting the whole time rangeof the data in the Session Overview due to goal of identifying over-arching behaviour (Fig. 1). Two different organisations are selectedin the User Overview (figure edited here to concurrently display twodifferent organisations). The first organisation is observed to havehighly regular behavioural traits. The users in this organisation oftenwork with similar operating systems, browsers and use a sharedIP across the organisation (T1). This is contrasted with anotherorganisation where these traits are less regular and users using awider range of IPs and browsers. This implies that the models builtfor these organisations need to consider these characteristics, e.g.,a deviation from the regular IP should be a more significant source

of unexpectedness for the first organisation compared to the other(T1). Another interesting observation is what might be referred toas a shift change between the two most active users (Fig. 1, topmiddle) in the first organisation by observing their working routines– aspects that can be factored in for a more sophisticated user model(T2, T3). Observing the per-user and per-organisation task profilesalso reveals a further insight. Some users in the first organisation– Celestial Madonna and El Muerto seem to have complementaryroles and perform tasks that are visibly different (see the differencein the two Task Overviews).

5 CONCLUSION

In this paper, we investigate how a visual analytics approach canprovide a comprehensive, multi-faceted understanding of user be-haviour to facilitate effective decision making in UBA-enabled cybersecurity systems. Through a preliminary evaluation, the analystsfound the VASABI tool help them perform real-world tasks usingtheir analytical strategies, explain cases that would not otherwise bepossible, and reduce their analysis time. We also identify limitationsand rooms for improvement including restricted analysis workflowand several visualisation design issues.

ACKNOWLEDGMENTS

This work is supported by the European Commission through theH2020 programme under grant agreement 700692 (DiSIEM).

REFERENCES

[1] J. Care and T. Phillips. Market guide for online frauddetection. https://www.gartner.com/doc/3849295/

market-guide-online-fraud-detection, January 2018.[2] P. H. Nguyen, C. Turkay, G. Andrienko, N. Andrienko, and O. Thonnard.

A Visual Analytics Approach for User Behaviour Understanding throughAction Sequence Analysis. In EuroVis Workshop on Visual Analytics.The Eurographics Association, 2017.

[3] P. H. Nguyen, C. Turkay, G. Andrienko, N. Andrienko, O. Thonnard,and J. Zouaoui. Understanding user behaviour through action sequences:from the usual to the unusual. IEEE Transactions on Visualization andComputer Graphics (in press), 2018.