normalizing variations in feature vector structure in keystroke dynamics authentication systems

Normalizing variations in feature vector structurein keystroke dynamics authentication systems

Zahid Syed • Sean Banerjee • Bojan Cukic

� Springer Science+Business Media New York 2014

Abstract Usernames and passwords stubbornly remain the most prevalent authentication

mechanism. Password secrecy ensures that only genuine users are granted access. If the

secret is breached, impostors gain the access too. One method of strengthening password

authentication is through keystroke dynamics. Keystroke dynamics algorithms typically

constrain the authentication entry to one valid sequence of key presses. In this paper, we

introduce the concept of event sequences. We explore the nature of variations between

multiple valid key-entry sequences and propose a scheme that effectively represents these

variations. We test the efficacy of the new authentication method in distinguishing users.

The experimental results show that typing proficiency of individuals is not the only

determining authentication factor. We show that typing sequence variations contain suf-

ficient discriminatory information to warrant their inclusion into user authentication

methods. Based on these results, we present a novel strategy to create feature vectors for

keystroke dynamics-based authentication. The proposed approach ensures that the feature

vector’s length and structure are related only to the length of the password, independent of

its content or the order of keys pressed. This normalization of feature vector structure has

multiple advantages including leveraging the discriminatory power of event sequences,

faster search-and-retrieval in n-graph-based authentication systems, and simplicity. The

proposed authentication scheme is applicable to both static and continual authentication

systems.

Z. Syed (&)Department of Computer Science, Engineering, and Physics, University of Michigan - Flint, Flint,MI 48502, USAe-mail: [email protected]

S. BanerjeeThe Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USAe-mail: [email protected]

B. CukicUniversity of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USAe-mail: [email protected]

123

Software Qual JDOI 10.1007/s11219-014-9263-1

Keywords Keystroke dynamics � Authentication � Biometrics � Continual authentication

1 Introduction and motivation

The username and password authentication is the de facto standard for computer access

control due to the low cost of implementation and infrastructure. The effectiveness of this

system hinges on the assumption that only the legitimate, genuine user knows his or her

credential. This type of authentication can be circumvented through either brute force or

other more complicated attacks (Vu et al. 2003). A variety of measures are currently used

to harden the credential checking process, including visual captchas, user-specific ques-

tions, account to device or location association, and regular password changes. With the

advent of social networks, user-specific information, possibly hinting on the content of

weak passwords, became easier to obtain (Gross and Acquisti 2005), while IP/MAC

addresses in two way authentication schemes can be faked using off-the-shelf software

(Crenshaw 2009), making password authentication breaches more common. The cost of

maintenance of password authentication is high too (resets, changes, etc.), making the

quest for an effective alternative all the more important.

Traditional biometric authentication techniques such as fingerprint, iris, and face rec-

ognition have not been widely used in consumer devices due to their need for relatively

costly sensors and interoperability issues (Ross and Jain 2004). Thus, enhancements to

low-cost password-based authentication which provide an additional level of trustworthi-

ness for traditional computing devices are appealing.

Keystroke dynamics is a behavioral biometric with good authentication performance

(Bleha et al. 1990). It evaluates the typing rhythm of a user by measuring the duration of

each key press, the latency between successive key presses etc. These time periods are

called the hold and delay times, respectively. Hold times will always exhibit positive

values as a finite amount of time is required to press a key, while delay times may be

positive or negative. A negative delay time occurs when a user presses the succeeding key

prior to releasing the current key.

Keystroke dynamics-based systems can be deployed under either a fixed text or a free

text scenario. Fixed text analysis is constrained to predetermined content and is effective

for building static user authentication systems (such as username and password login

schemes). Free text, on the other hand, can be used to develop continual authentication

systems (Young and Hammon 1989) that determine whether an impostor has taken over

another user’s session.

Current keystroke dynamics literature typically assumes that in order to measure sim-

ilarity between typing patterns of individuals, a text string is entered using the same

sequence of keys. However, the same string can be typed using different keystroke

sequences. For example, an uppercase alphabet (on an ANSI-US keyboard layout) can be

typed by either enabling the Caps-lock key or by using the Shift key modifier. The number

of ways to type it depends on the keyboard layout, the characters in the string and their

order.

This issue of variant typing sequences has not been addressed previously in the key-

stroke dynamics literature. Data collection and analysis avoided this issue by restricting the

keystroke entries used to type a string to a unique sequence (Montalvao et al. 2006;

Killourhy and Maxion 2009; Giot et al. 2009; Allen 2010; Bello et al. 2010). This may

Software Qual J

123

have resulted in removing the confounding factors in keystroke analysis and simplifying

the experimental setup. However, we show that variations in keystroke sequences can

provide valuable discriminatory information and improve the performance of keystroke

dynamics authentication systems.

Machine learning algorithms are constructed by providing data in two phases: the

training phase followed by the testing phase. Both the training and testing data are com-

prised of feature vectors. Each feature vector represents one training/testing instance and is

comprised of a set of attributes (also called features). The number of attributes and their

order within the feature vector are referred to as the feature vector structure.

A natural consequence of considering event sequences in user’s typing is that the feature

vector length and structure will vary for strings of the same length and even for identical

strings. To alleviate this problem, we propose a strategy to represent keystroke dynamics-

based feature vectors using a normalized data structure. This ensures that the feature vector

length and attribute order are dependent only on the string length and character order,

respectively. The proposed method has multiple advantages including leveraging event

sequences for better authentication accuracy, smaller search space for keystroke dynamics

systems that use multiple n-graphs to build user models, and easier data processing

capabilities due to the static feature vector structure.

The remainder of the paper is organized as follows. Section 2 provides a brief overview

of the various keyboard layouts. Section 3 lays out the concepts, terms, and representa-

tions. Section 4 elaborates on the types of variations in keystroke sequences and their

causes. Section 5 provides a description of the data set collected for the purposes of this

study. The effectiveness of user authentication performed over these keystroke sequences

is experimentally analyzed in Sect. 6. The proposed method for normalizing feature vector

structure is described in Sect. 7. Section 8 discusses related work. Section 9 summarizes

the paper and concludes with our thoughts for further work.

2 Keyboard layouts

A computer keyboard consists of character keys, modifier keys for altering the functions of

character keys, navigation keys for moving the text cursor on the screen, etc. Extended

keyboards also have a numeric keypad to facilitate calculations. A keyboard layout refers

to the arrangement of the keys, key labels, or key mappings on a keyboard. Keyboard

layouts can be categorized based on three criteria:

1. Mechanical layout refers to the physical keys and their placement on a keyboard as

shown in Fig. 1. Most keyboards use one of three mechanical layouts. The layouts

include:

(a) ISO layout is dictated by the ISO/IEC 9995 framework (ISO/IEC-9995-3:2010

2010).

(b) ANSI layout follows the ANSI-INCITS 154-1988 framework (ANSI-INCITS-

154-1988 1988).

(c) JIS layout follows the Japanese Industrial Standard JIS X 6002:1980 framework

(JISX-6002:1980 1988).

2. Visual layout refers to the arrangement of the legends (labels, markings, engravings)

that appear on the keys of a keyboard.

Software Qual J

123

3. Functional layout describes the software-based association of a key to character(s) for

all keys on the keyboard. The functional layout determines the key sequence required

to enter a character. For example, one functional layout outputs a character X by

pressing a certain key, while another layout would do so using another key (possibly

with a modifier key such as a Shift). Advanced users can also create custom functional

layouts using off-the-shelf software (Microsoft 2014a). Thus the functional layout

determines the keys to be pressed to output a character.

A functional layout organizes characters that the keyboard can generate into levels.

There are generally two levels on a keyboard. However, the ISO standard also allows for

the level 3. Characters at Level 1 are accessed by simply pressing the corresponding key.

Characters at higher levels are accessed using modifier keys. Level 2 characters are

accessed using the Shift key, and Level 3 characters (in ISO layout) are accessed via the

AltGr key (the ISO layout renames the right Alt key to AltGr—Microsoft 2014b). In

addition, the Caps-lock key can be used to change the case for dual-case characters, e.g, the

Latin alphabets A–Z on an ANSI-US layout. The key types relevant to this study are

enumerated below. These keys are present on all mechanical layouts:

1. Character keys: These are the keys denoted by letters A–Z, digits 0–9, punctuation

marks, and symbols on an ANSI-US layout.

Fig. 1 The ANSI (above) and ISO (below) mechanical layouts for keyboards. The shaded keys indicatethose that have different mappings compared with the other layout. The JIS layout is not pictured here

Software Qual J

123

2. Modifier keys: These are the Alt and Shift keys. A set of these is located on either side

of the keyboard (with the right Alt key replaced by the AltGr key in an ISO layout).

We also include the Caps-lock key in this category.

3. Numeric keypad: The keypad comprises of number keys 0–9 and the math operator

keys (?, �, /, *, .).

3 Terms and representations

For the purposes of this paper, the character keys as defined in Sect. 2 are denoted using

the letter they output on the screen when using the ANSI-US English layout. All keys other

than the character keys are denoted using the abbreviations shown in Table 1.

We define a string as the sequence of characters that are to be created using the

keyboard. A string can be a single character, a word, a sentence, a paragraph, or a com-

bination of these.

A keystroke is defined as the act of pressing and releasing a key. It consists of two

events: Key down (KeyDn) and Key up (KeyUp). The KeyDn event of any key X is

denoted as Xd and the KeyUp event as Xu. If the KeyDn and KeyUp events of a key occur

consecutively, it is denoted as Xdu.

Keystroke dynamics data are derived from the time stamps assigned to KeyDn and

KeyUp events. The primary attributes used to derive other metrics are hold and delay

times. The delay time is the time between two successive key presses, i.e., between two

successive KeyDn events. The hold time is the time a key is pressed, i.e., the time between

the KeyDn and KeyUp events for a key.

We also define a notation to represent the event schema and the event sequence. The

event sequence is the temporal sequence of all KeyDn and KeyUp events used to type

a string. The event schema is a generalized representation of similar event sequences.

The following example is an illustration. Suppose a user types the string W Ehat where

the underlined E indicates that the user erased it using the backspace (Bk) key. One

possible event sequence for this string, using the notation from Table 1, is

fRsd Wdu Rsu Lsd Edu Lsu Bkdu hdu adu tdug. Note that the right Shift key (Rs) is used to

type W while the left Shift key (Ls) is used to type E. Different Shift keys could be

used to type the uppercase letters, resulting in multiple event sequences for this string.

All of these event sequences can be represented using the event schema

fShd Wdu Shu Shd Edu Shu Bkdu hdu adu tdug, where Sh denotes any Shift key.

Table 1 Key abbreviations used in the paper

Key Notation Key Notation

Alt Al Right Shift Rs

Left Alt Lt Left Shift Ls

Right Alt Rt Either Shift Sh

Caps-lock Cl Numpad 0–9 N0–N9

Numpad symbols N?, N-, N/, N*, N., N=

Software Qual J

123

4 Causes of variations in event sequences

The number of event sequences for a given string is determined by the keyboard’s func-

tional layout, the variations in use of modifier keys, and the types of characters in the string

and their order. For clarity, we use the ANSI-based US functional layout to explain the

effect of each of these variables on the number of possible event sequences.

4.1 The effect of change in modifier key usage

The event sequence of prior character(s) affects possible event sequences to type the

subsequent character in a string. For example, consider a simple two-letter string ‘‘KA’’.

Assuming that the Caps-lock is initially disabled, the user can type the string using eight

different event sequences:

1. fShd Kdu Adu Shug—This event schema represents two distinct event sequences where

the user can use the right Shift (Rs) or the left Shift (Ls) key.

2. fShd Kdu Shu Shd Adu Shug—Each of the two characters can be typed using either of the

two Shift keys resulting in four event sequences.

3. fShd Kdu Shu Cldu Adug—Using either of the two Shift keys, the string can be typed in

this schema using two distinct event sequences.

However, if the first character is typed after first enabling the Caps-lock, it can be entered

as fCldu Kdu Adug. Thus, the string KA can be typed using nine different event sequences.

4.2 The effect of string content

Another reason for the variation in the number of event sequences for a fixed string is the

content of the string itself. In order to understand this, we divide characters according to

the number of event sequences that can be used to create them. For the ANSI-US layout of

character and modifier keys, we identify three types of event sequences:

1. Type I characters are typed using only one event sequence irrespective of the

characters before or after. Table 2 lists these for the ANSI-US layout. For example, the

semicolon (;) can only be typed with the event sequence {;du} even when it occurs with

other characters, e.g., fShd Adu Shu ;du wdug.2. Type II characters can be typed through one event sequence when they are typed in

isolation, e.g., the character ‘‘&’’ as in fShd &du Shug. However, when typed together

with other characters, they can be typed using different event sequences, e.g.,

fShd Adu &duShug or fShd Adu Shu Shd &duShug. Table 2 lists the Type II characters for

the ANSI-US layout.

3. Type III characters include all those not in Type I and II. These can be entered using

more than one event sequence both when isolated or when part of a string. On the

ANSI-US layout, alphabets A–Z (lowercase or uppercase) may be entered either by

first enabling Caps-lock such as fCldu Adug or by using a Shift key, e.g., fShd Adu Shug.Lowercase letters can be created by pressing a single key or by using the Shift key if

the Caps-lock is on, fClduShd adu Shug. Numbers 0–9 can be typed using the standard

number keys such as f6dug or using the numeric keypad e.g., fN6dug. Mathematical

operators ðþ � � =Þ can be typed using either the numeric keypad such as fNþdug or

using the number keys e.g., fShd þdu Shug.

Software Qual J

123

4.3 The effect of character order

Table 3 demonstrates how the number of possible event sequences varies based on the

character types in the string and their order. Notice that strings with the same length or

same characters can have different number of possible event sequences due to the change

in character order.

5 KDS-2 data set

We recently collected a keystroke data set (KDS-2) tailored to our research. Each user was

assigned two passwords. Each password contained twelve characters in a uniform format

SUUDLLLLDUUS, where S represents a symbol, U represents an uppercase character,

L represents a lowercase character, and D represents a numeric digit. We recruited 30

human subjects for the study. They were coupled into 15 virtual user pairs. Each user in a

pair was assigned the same two passwords. The users were unaware of the pairing process.

The pairing of users was done to simulate an environment where an impostor learns

someone else’s password, but does not have a chance to observe the genuine user’s

password typing rhythm.

The users were asked to submit their two passwords several times a day. The data were

entered via a web-based front-end application built using Javascript and collected at a

back-end server at our laboratory. The users submitted the data from their own computers.

The Javascript application provided the following information to the back-end server:

Table 2 Type I and II characterson an ANSI-US layout

Punctuation Symbols

Type I Type II Type I Type II

; : [ @ ^ _ \’ ’’ ] # & { [, ! \ $ ( }

? % ) |

Table 3 The number of possibleevent sequences changes due tothe string length or the order ofcharacters within the string

String length String 1 String 2 Event sequences

String 1 String 2

1 K 4 3 2

2 KA 4! 11 4

3 KA: 4!K 24 20

4 KA:a 4!Ka 30 40

4 KAa: 4!aK 38 36

5 KA:a3 4!Ka5 60 80

5 KAa:3 4!aK5 76 72

Software Qual J

123

1. The keys pressed and the time stamps of the key events.

2. IP address.

3. Browser type.

4. Date and time of submission.

In this paper, we only use the key event data, without the knowledge of the placement of

Shift and Alt keys. Consequently, irrespective of whether the typist used the right or left

Shift/Alt key, it is marked only as Shift/Alt key. This limitation in Javascript capture

resulted in some information loss. The subjects were selected with reasonable variation in

demographics so as to remove potential bias in experimental results. Each user was asked

to submit at least 40 legitimate entries per password. The features of the data set are

summarized in Table 4. Most users submitted 50–60 entries per password. The minimum

number of entries submitted for a password was 44. In order to create consistent experi-

ments, explained later, we used 40 entries per password from each user.

6 Experimental analysis

Having demonstrated the variations in event sequences for any given string, we proceed to

demonstrate observations from the point of view of realistic usage scenarios. We want to

learn whether different event sequences have negative impact on the accuracy of key-

stroke-based user authentication. For the experimental analysis, we used KDS-2 data set.

For the remainder of the paper, a user–password combination in the data set is referred to

as a subject. A subject pair refers to the user pair when they are typing the same password.

For example, say Users 3 and 24 typed passwords A and B. The combination of Users 3

and 24 when typing Password A is a subject pair. Similarly, Users 3 and 24 typing

Password B is another subject pair. There are 60 subject password pairs in the study (each

of the 15 user pairs is given two passwords; each can be used in genuine/impostor

attempts).

We set up the experiments to answer three research questions:

1. What is the quantitative difference between users with respect to the event sequences?

2. Is there any correlation between typing proficiency and event sequences exposed by a

user?

3. What is the effect of habituation on the event sequences of a user?

6.1 What is the quantitative difference between users with respect to event sequences?

In this experiment, we quantitatively measure the dissimilarity among paired users with

respect to the event sequences observed while they typed their passwords. To do so, all

keystroke entries in the data set were first represented with an event signature as shown in

Fig. 2. For each password, there are 40 event signatures ordered temporally. Each signature

Table 4 Characteristics ofKDS-2

Each user was asked torepeatedly type 2 passwords

Min # of entries/user 90

Max # of entries/user 245

Avg # of entries/user 118

Total users 30

Software Qual J

123

represents one password entry. Event signature is an abbreviated version of the event

sequence described in Sect. 3. It does not convey the same amount of information as the

event sequence. However, it is easier to construct from the data, and, given the knowledge

of the specific string/password, it can adequately represent every unique event sequence for

that string.

The event signature consists of concatenated tokens, as shown in Fig. 2. The first token

is numeric and represents the number of events in the event sequence for that keystroke

entry. The remaining tokens indicate the user’s decision to use or omit using the Shift or

Capslock key at instances where it was necessary to do so. These instances occurred

primarily when the case of the next character to be typed was different. The remaining

tokens are one of four types: S, NotS, C, NotC. ‘‘S,’’ and ‘‘C’’ indicate that the user used the

Shift or Capslock key, respectively, while ‘‘NotS’’ and ‘‘NotC’’ indicate that he/she did not.

As mentioned in Sect. 5, the data set does not contain information on whether the right or

left Shift key was used.

A sliding window is used on each subject pair (UserA-PasswordX and UserApair-Pass-

wordX) to calculate the proportion of event sequences that are unique to each subject. This

technique is illustrated in Fig. 2. The window is moved down the list one row at a time.

The idea behind this experiment is to simulate a scenario where an impostor has access to

the genuine user’s password and attempts to use it. In this sense, each user in a pair acts as

the impostor to the other while they are compared for similarity on the basis of their event

signatures only.

The sliding window technique generates 31 non-match scores for every subject. Each

non-match score indicates the percentage of entries in the window that are distinct from the

Fig. 2 Computing the difference between any two paired subjects with respect to event signatures: For thecurrent sliding window of 10 entries from each user, user A has two entries with event signatures that aredifferent from user B, while user B has eight entries that are different from user A. Thus, the non-matchscore for user A is 0.2 and 0.8 for user B

Software Qual J

123

paired subject with respect to their event signature, i.e., a higher non-match score indicates

less similarity between subjects. The mean and median non-match scores for each of the 60

subjects were computed, and the distribution of these scores is tabulated in Table 5.

The results show that across all subjects, event signatures were distinct among subjects

in *61 % of the entries in any given window. While this is the mean non-match score, we

further analyzed the distribution shown in Table 5. The results show the percentage of

subjects that had a mean and median non-match score of 100 %, [ 75 %, and [ 50 % for

any given window. These results indicate that for *31 % of the subjects, event sequences

are completely distinct from those of the paired subject in every single window, i.e., these

subject pairs did not have a single similar event sequence throughout the study. This

percentage is much higher (42.8 %) when considering the median non-match score.

Although subjects sometimes displayed high but less than perfect scores for some win-

dows, calculating the median score for a subject over all windows would still give a

median score of 10. Thus, a larger percentage of users appear to have a perfect score.

Another important observation is the degree of variation in dissimilarity between

subjects. 62 % of the subjects showed a score[50 %. For this group, the mean score was

91.3 %. The mean recall rate for subjects with a score of\50 % was 11.5 %. This uneven

distribution of scores is further illustrated in the histogram in Fig. 3.

The experimental results offer empirical evidence that users display considerably dif-

ferent event sequence typing patterns. There are comparatively few subject pairs with non-

match scores between 10 and 90 %. The distribution of non-match scores indicates that the

variations in keystroke sequences create a distinguishing factor in the majority of cases.

Furthermore, as we previously mentioned, this data set does not contain information on

which Shift key (right or left) was used. Based on results reported by other authors on the

importance of Shift key behavior (Bartlow and Cukic 2006), we believe that using this

information would distinguish users even further and boost the non-match scores.

6.2 Is there any correlation between typing proficiency and event sequences exposed

by a user?

Upon observing the uneven distribution of recall rates for the subject pairs, we hypothe-

sized that there may be a relation between the user’s typing proficiency and event

sequences. We theorized that fast typists tend to type in a manner that may be dissimilar to

slower typists and that this may be the cause of the uneven distribution of non-match scores

observed in the previous experiment. Therefore, the objective of this experiment is to

determine whether there is any correlation between the typing proficiency of users and the

Table 5 Distribution chart forrecall rates using event sequencesignatures

Mean recall rate % of users

Median Mean

= 100 % 42.3 30.8

[75 % 48.1 46.2

[50 % 61.5 61.5

Overall 60.5 60.8

Recall when [50 % 91.3 89.5

Recall when \50 % 11.5 14.9

Software Qual J

123

event sequences they might use. In order to test this, we first computed the difference in

median typing time and, additionally, the difference in non-match score for each subject

pair, as shown in Table 6. We computed the correlation coefficient between these two sets

of values. This value quantifies the correlation between the difference in typing proficiency

within a subject pair and the dissimilarity between their event sequences.

The analysis lead to the Pearson’s correlation coefficient of 0.016. This low correlation

value means that for a given pair of users, there is little or no correlation between the

difference in typing proficiency and a preference for certain event sequences. In other

words, users do not seem to prefer certain event sequences over others based on their

typing proficiency. Event sequences are independent of user’s typing proficiency. This

result suggests that event sequences provide a discriminatory authentication dimension that

is not dependent on user typing proficiency, but rather on other behavioral characteristics

of each user. Upon examining the data set further, we observed that some hunt-and-peck

users would use the same event sequence as their paired user who were proficient typists.

The fact that users tend to type a string or a password in their own way should be helpful in

differentiating between users even if they have similar typing proficiency. However, almost

the entire keystroke authentication literature ignores typing event sequences, rather con-

centrating on time intervals which are highly correlated with typing proficiency.

Table 6 Calculating correlation between typing proficiency (TT stands for Typing Time) and eventsequence usage

Subject # Grp A users Grp B users Difference

TT Recall TT Recall TT Recall

1 9,348 0.975 7,158 0.975 2,190 0

2 4,299 0.075 5,979 0.75 -1,680 0.025

… … … … … … …60 8,657 0.75 4,693 0.925 3,963 0.175

Correlation 0.016

Fig. 3 Distribution of user non-match scores

Software Qual J

123

6.3 What is the effect of time and habituation on the event sequences used by a user?

Previously, we showed that as users repeatedly type the same string, the typical scenario

for password entry, they exhibit habituation that reduces the time required to type the string

and makes the intra-key intervals become more uniform (Syed et al. 2011). We theorized

that, in a similar manner, users will tend to use fewer event sequences as they habituate to

typing a given string.

In order to analyze this behavior, we used the event signature representation of key-

strokes described in Sect. 6.1. Using a sliding window of ten entries, the similarity in event

signatures was measured using a similarity score S that assigns a higher score when fewer

variations of event signatures are observed. The similarity score is described in more detail

below. Since every subject contributes 40 entries, as the window size of 10 entries slides, it

creates 31 similarity scores as shown in Steps 1 and 2 in Fig. 4. As shown in Step 2 in

Fig. 4, scores denoted 2–31, for each subject, are split into 3 windows of equal length and

the median similarity score for each window is calculated. Thus, each of the 60 subject

pairs contributes a median similarity score for Windows 1, 2, and 3.

Our objective is to measure whether there is a significant difference in the similarity

scores of any of the three windows. This would indicate that users’ typing event sequence

patterns stabilize and exhibit fewer variations overtime. To test this hypothesis, we used

the Friedman’s test.

6.3.1 Friedman’s test

The Friedman test is a nonparametric test for treatment differences in a randomized

complete block design (Friedman 1937). It is used to test for differences between three or

more paired groups when the dependent variable being measured is ordinal. The groups in

this experiment are the windows. The N subjects and k windows are considered separate

independent variables in the analysis. The test statistic for the Friedman’s test is a Chi-

square with ðk � 1Þ degrees of freedom. Thus, the hypotheses for the comparison across

the windows are:

H0: The distributions of median similarity scores are the same across the windows.

H1: The distributions of median similarity scores across the windows are different.

Fig. 4 Procedure for testing user habituation using Friedman’s test

Software Qual J

123

As shown in Fig. 4, for a set of N subjects, three groups (windows) are compared to test

whether there are differences in similarity scores between any two. The values (similarity

scores) across each row are rank-ordered. The resulting ranks are then summed up for each

column. The null hypothesis asserts that there is no difference among the three groups.

Therefore, the sum of ranked scores in each column Rj should approximately bekðNþ1Þ

2. To

measure the degree to which each observed sum of ranked column varies from the null

hypothesis value, the Chi-square statistic v2r for Friedman test is calculated as

v2r ¼

12

Nkðk þ 1ÞXk

j¼1

R2j � 3Nðk þ 1Þ:

The results of Friedman’s test are defined by the parameters v2r and p. The test statistic v2

r is

distributed according to the v2r distribution with k � 1 degrees of freedom. The value p

defines the probability that the calculated v2r value will be obtained if the null hypothesis

was true. In our experiment, we chose the significance level of p ¼ 0:05, i.e., we reject the

null hypothesis if p� 0:05.

The Friedman’s test only determines whether there is a difference in the distribution of

similarity scores among the windows. It does not provide any information on which

pair(s) of windows are different. For this information, a post hoc analysis is carried out

using Wilcoxon signed-rank tests for multiple comparisons between the treatments.

A Bonferroni correction is applied to adjust for the multiple comparisons that are being

made (Bortz et al. 2000). We refer the reader to our previous work for a detailed expla-

nation of the Wilcoxon test in this scenario (Syed et al. 2011).

6.3.2 Similarity score S

The similarity in user typing pattern was measured over a window of consecutive entries

using a score. This similarity measure S is formulated as,

S ¼ Ncommon

Nseq

where Nseq is the number of different event sequences in the window and Ncommon is the

number of times the most common event sequence was typed. For example, consider the

window for Subject 1 shown in Fig. 4. Here Nseq ¼ 2 and Ncommon ¼ 6 and thus S ¼ 3.

6.3.3 Results

The procedure and calculated probability values for the Friedman’s test are given in Step 3 in

Fig. 4. The p value for the v2r statistic was calculated to be 0:0011. This indicates a significant

probability that the event sequence pattern of users changed overtime. Additionally, the post

hoc analysis indicated p ¼ 0:0002 for the difference between Windows 3 and 1, and p ¼0:0679 between Windows 3 and 2. This means that there is a significant difference between

event sequence patterns of Windows 3 and 1, and a near significant difference (the value of p

slightly higher that 0:05) between the subsequent Windows 3 and 2.

The results reveal that entries that are typed later are far more representative of the user

as compared to earlier entries. The implications of these results inform the development of

the authentication model and should be used to update matching strategies for keystroke

dynamics systems. These results reinforce those presented by Syed et al. (2011) where user

Software Qual J

123

habituation in the context of hold times and delay times was first confirmed. When

combined, these results show that in the course of an acclimation period, the user’s typing

profile (based upon hold time, delay time, total time, event sequence used, etc.) changes

significantly. Therefore, keystroke dynamics-based authentication that relies on complex

strings would see improved performance if users are asked to train on a string before

accepting it as a password. Such training would stabilize the event sequence too. Fur-

thermore, to leverage the discriminatory power of keystroke dynamics and event

sequences, it is necessary to discard early password entries whenever more mature ones

become available.

6.4 Threats to internal validity

1. Maturation In our data collections, users may have experienced boredom while typing

the same passwords repeatedly. This problem was addressed by asking the users to

submit as many entries as they were comfortable within each session. The duration of

the study also allowed sufficient time for the subjects to submit the required number of

entries. The users habituated to the password overtime. This effect has been a key part

of our study and, hence, not a threat to its validity.

2. Instrumentation The keyboard type was not controlled during the collection. Different

keyboards may exhibit differing hold and delay times based on their layout and

comfort level. Additionally, users may have used different keyboards while entering

the same password leading to additional variation in their typing patterns. This threat

to validity was reduced by monitoring the IP address and asking the user to specify

their keyboard type at the beginning of each session. Users were asked to memorize

their passwords prior to typing them too. However, it is impossible to ensure that each

user did so.

3. Impostor data The experimental setup coupled users into pairs where one unknowingly

acted as the imposter for the other. However, in the real world, imposter keystroke

samples could vary dramatically. Only a study with a larger user base would allow us

to automatically generate impostor keystroke data and, subsequently, build reasonable

impostor models.

4. Selection We used a user–password combination as a subject while performing

statistical tests. Since each user typed two passwords, each user appeared as two

subjects in the study. Independence between the subjects is required for the Friedman’s

test. It would be possible to suggest that there is a relationship between these ‘‘two’’

subjects due to the fact they represent the same user. However, our reasoning is that

since the passwords being typed are different, the two subjects represented by a single

user are likely independent.

6.5 Threats to external validity

1. Interaction of selection and treatments The age of users who participated in collec-

tions ranged from low-twenties to late sixties. They were of both genders and with

varying typing skills. Very few users listed ‘‘hunt and peck’’ as their typing profi-

ciency, and even among them all showed a typing speed of at least five characters per

second. ‘‘Hunt-and-peck’’ typists may be a more common demographic in the real

world, hence it may be argued that the test population is not representative of a broader

population.

Software Qual J

123

2. Interaction of settings and treatments The users who participated in the collection

were allowed to submit their data at their own convenience. Our objective was to

conduct the study in a realistic setting. Thus, factors such as posture, external

environment, the number of consecutive password entries, and other distractions were

not taken into account and may have an impact on the external validity of our findings.

7 Normalizing feature vector length and structure

7.1 Potential issues with feature vector generation

Based on the discussion about event sequences in Sect. 6, it is clear that this attribute holds

promise as a means to discriminate between users in authentication. It is also apparent that

event sequences may cause the feature vector length and attribute order to vary even for the

same string. This effect is illustrated in Table 7. Here, three feature vectors for the string

‘‘KA:a3’’ are shown when typed using distinct event sequences. The first two sequences are

identical except that the former uses the left Shift key while the latter uses the right Shift

key. The third sequence uses the Caps-lock key to type the uppercase letters. This causes

the feature vector length and attribute order to differ. This variation is problematic when

creating a user model through machine learning because feature vectors need have the

same structure to enable meaningful training.

A simple solution to this problem is to use a feature vector structure that is a union of

the disparate feature vectors. This method, illustrated in Table 7, will accommodate dif-

ferent feature vectors by populating similar attributes into the same column while padding

nonexistent attributes with a zero. However, this method poses some problems. For

example, each time, a different event sequence is encountered; the feature vector may need

to be reformatted to accommodate the new event sequence.

The problem of normalizing feature vector structure has not been addressed in the

existing literature. Table 10 describes the characteristics of current publicly available

keystroke dynamics data sets. The table shows that researchers have either exclusively

used lowercase letters during data collection (Montalvao et al. 2006) or simplified datasets

by converting uppercase letters to lowercase and thereby removing modifier key infor-

mation (Bello et al. 2010). In cases where users submitted complex strings as input, the

user was restricted to using a specific event sequence (Killourhy and Maxion 2009). While

Table 7 Using padding schemes to merge feature vectors generated for the same string ‘‘KA:a3’’ but withdifferent event sequences

Variations in hold times sequence in feature vectors

FV1 Lsh Kh Ah :h ah 3h

FV2 Rsh Kh Ah :h ah 3h

FV3 Clh Kh Ah Lsh :h Clh ah 3h

Merged feature vector Lsh Rsh Clh Kh Ah Lsh :h Clh ah 3h

Modified FV1 Lsh 0 0 Kh Ah 0 :h 0 ah 3h

Modified FV2 0 Rsh 0 Kh Ah 0 :h 0 ah 3h

Modified FV3 0 0 Clh Kh Ah Rsh :h Clh ah 3h

Padding for hold times is shown. Other attributes would be padded in a similar manner

Software Qual J

123

this strategy eliminates the problem of varying feature vector lengths, it seems impractical

and overlooks information that can provide a boost to the accuracy of KDB systems.

7.2 Proposed template

We propose a method that creates a statically formatted feature vector whose length is

defined only by the number and order of characters in the string. The basis of this method is

that any character must be typed using the character key and, possibly, modifier key(s).

Thus, we propose that the feature vector for a string be created by instantiating an attribute

module for every character in the string. Table 8 shows the structure of the attribute

module for any character c. The attribute module consists of data grouped according to the

character key CharKeyc and the modifier keys that might be used simultaneously with

CharKeyc. Each group structure within the attribute module is described below:

1. Caps-lock: This group of values is populated only if the Caps-lock key is used prior to

CharKeyc. This group consists only of the core attributes for the Caps-lock key. Core

attributes refer to keystroke dynamics data such as hold time, delay time, pressure

exerted, etc. Please note that in Table 8, only the hold and delay times (hi and di) are

used to represent core attributes.

2. Shift, Control and Alt key groups: Each of these groups is populated only when the

corresponding modifier key is used along with CharKeyc. In addition to the core

attributes, each of these groups contains a binary value LR to indicate whether the left

or right modifier key is used.

3. Modifier key order: When modifier keys are used, this field indicates their order. The

four modifier keys can be used in sixty-four distinct ways ð4P1 þ 4P2 þ 4P3 þ 4P4Þ, or

may not be used at all. Thus, this value has a range of 0–64.

4. Character key group: This field contains the core attributes for CharKeyc and its virtual

keycode. The virtual keycode is the system-level ID of the character key. It is based on

the key’s position on the keyboard and not the character it displays. Using the virtual

keycode allows an authentication system to easily distinguish between users with

different key mappings or functional layouts.

Table 9 shows the structure of a feature vector of a string of length l. The feature vector for

a string consists of two distinct parts: the attribute modules for each character in the string

and a collection of static attributes. The attribute module structure has been described

previously. The static attributes are string-specific or feature vector-specific attributes, such

as string typed, hardware ID, and operating system. Since these attributes are independent

of the string length and event sequence, the length of the feature vector remains constant

once such attributes have been defined.

Table 8 Structure of an attribute module

Modifier keys Character keyCharKeyc

Modifier key order Caps-lock Shift Control Alt

Coreattribs

LR Coreattribs

LR Coreattribs

LR Coreattribs

VKeycode Coreattribs

0–64 h1 d1 1/0 h2 d2 1/0 h3 d3 1/0 h3 d3 k h4 d4

Software Qual J

123

The above process normalizes the length and structure of the feature vector for a given

string, thus eliminating the variations in feature vector structure due to different event

sequences. Using the proposed method, the length of the feature vector lFV is

lFV ¼ l 5ncore þ nmisc þ 4ð Þ

where l is the string length, ncore is the number of core attributes used, and nmisc is the

number of static attributes.

7.3 Advantages of the proposed feature vectors

There are several advantages of the proposed feature vector format:

1. The length of the resultant feature vector is independent of the string contents and

event sequence used to type it.

2. It can be used in static and continual authentication systems. A normalized feature

vector structure allows for the straightforward training and testing of user models.

3. This approach could lead to faster and more efficient keystroke dynamics-based

continual authentication systems. Such systems aggregate keystroke dynamics data

based on n-graphs. A search-and-retrieve mechanism is then used during authenti-

cation to compare the appropriate n-graphs from the user model to those received as

input. The traditional feature vector format treats identical strings typed with different

event sequences as separate n-graphs. For example, consider the digraphs ‘‘KA’’ and

‘‘Ka.’’ The traditional feature vector format treats these two strings as two distinct

digraphs. However, the proposed feature vector structure can treat {KA, Ka, kA, ka}

as the same n-graph irrespective of their event sequence. This reduces the total number

of distinct n-graphs stored in the system, thus increasing the speed and efficiency of

data retrieval.

4. Barring some special cases explained in Sect. 7.4, every functional layout uses the

character keys and the modifier keys for character entry. This makes the template

suitable for any functional layout.

5. The agnostic nature of the template allows datasets to be portable between multiple

systems that use different sets of core attributes. For example, some keystroke

dynamics systems use attributes such as delay–delay times. Due to the static nature of

the template structure, it would take a fairly simple padding scheme to add or remove

core attributes from the feature vector template.

6. The virtual keycodes provide information on the functional layout of the used

keyboard. Thus, layout information can also be added to the feature vector for

discriminatory analysis.

7. It leverages more information that can be gathered from keystroke capture than current

feature vector templates. This includes all possible modifier keys, order of modifiers

keys used, and virtual keycode.

Table 9 The proposed feature vector structure for a string of length l

Attribute modules Static attributes

Mod 1 � � � Mod i � � � Mod l Attrib 1 � � � Attrib n

The feature vector length remains constant as it is dependent only on the string length

Software Qual J

123

8. The Shift key usage data are included in the feature vector. It has been shown that

using Shift key information increases classifier performance (Bartlow and Cukic

2006).

9. Some systems may not be able to use all the features described in the template. For

example, Java web-based systems do not have access to virtual keycodes produced by

the keyboard as this is a low-level operating system function. Thus, web-based

systems would simply ignore this attribute.

7.4 Limitations

The proposed method does not account for dead keys and compose keys. Dead keys

and compose keys are special modifier keys that are used to produce characters that are

not directly available on the keyboard layout. For example, if a keyboard has a dead

key for the grave accent (‘), the French character (a) can be generated by first pressing

(‘) and then (A). The compose key is used before a set of character keys to output a

special character. For example, the compose key followed by (*) and (n) outputs (n).

The proposed method treats dead keys and compose keys as regular character keys.

8 Related work

The concept of event sequences is a novel research idea although it is fairly intuitive. Our

survey for research related to this topic did not yield any results. We refer the reader to

Banerjee and Woodard (2012) for a comprehensive survey of the keystroke dynamics

research. The survey shows that researchers have made significant efforts in developing

better performing algorithms. However, the data used for research has not stepped out of

the early experimental bounds. Table 10 summarizes the type of strings for which the

keystroke dynamics data were collected and made publicly available. As can be seen,

current research has been limited in attempting to leverage the differences caused by

strings containing both uppercase and lowercase letters. In the case of Montalvao et al.

(2006), one of the databases contains free text data which does contain strings with

uppercase and lowercase letters. However, no information on the modifier keys exists in

any of the data sets. While Killourhy and Maxion (2009) use a complex string, an analysis

of the data set shows that the users were restricted to typing the strings using only one

event sequence.

Bartlow and Cukic (2006) showed that using data from the Shift key modifier in

passwords resulted in an improved classifier performance. The passwords used were of

the same form as those used in this paper. However, as has been explained in this work,

such a string can be typed using many different event sequences. The authors did not

discuss if and how the users and impostors were restricted, if at all, toward uniform

event sequences.

A survey of literature on keystroke dynamics shows that hold time and delay time

are the most commonly used attributes in keystroke dynamics (Banerjee and Woodard

2012). To our knowledge, there have been no work that addresses the problem of

variations in feature vectors for identical strings. This work, particularly suitable for

keystrokes-enhanced password authentication, represents the first attempt to tackle this

problem.

Software Qual J

123

9 Summary and future work

In this paper, we introduced the concept of event sequences in keystroke dynamics. It is

based on the flexibility that keyboards offer in typing a complex string using different key

sequences. Current keystroke data sets available for research restrict the user to type a

single event sequence string. We performed a series of experiments with keystroke event

sequences. These experiments ascertained the effectiveness offered by event sequences in

distinguishing users, the correlation between users’ typing proficiency and the event

sequences, and the effect of habituation on the event sequence uniqueness and stability.

The experiments indicate that event sequences increase separability between individual

users in keystroke dynamics authentication. We showed that as users habituate to a given

string, the variations in the event sequences decrease significantly.

It is important to note that further research is needed to better understand user habit-

uation behavior demonstrated in the paper. While the results in Sect. 6.1 show that event

sequences differentiate between keystroke sequences of paired users, the experiment with a

larger population of users would strengthen generalization. Our experimental results

indicate that the password acclimation period takes more than 30 entries. We also speculate

that some users probably exhibit a shorter acclimation periods. Furthermore, the password

acclimation period may also depend on the complexity of the string and the frequency with

which users type it.

Based on the concept of event sequences, we also described feature vector generation

and normalization. We proposed a strategy to normalize feature vector length and struc-

ture. The reasoning behind this contribution is based on the observation that secure

passwords and continual authentication inevitably deal with complex strings. Event

sequences and standardized feature vectors will lead to simpler, more robust, and,

potentially, more accurate keystroke dynamics authentication systems. Prior research has

shown that utilizing Shift key information (hold, delay times and left/right Shift key)

improves classifier accuracy. Additionally, the empirical evidence provided in this paper

illustrates that event sequences offer valuable discrimination dimensions. The consolida-

tion of multiple n-graphs into a single feature vector reduces the search space and leads to

Table 10 Characteristics of current publicly available keystroke datasets

Dataset Words used Remarks

Montalvaoet al.(2006)

‘‘chocolate, zebra, banana, taxi.’’,‘‘computador calcula’’

Lower case string. Single event sequence

Free text Contains uppercase letters but no shift orCaps-lock data

KillourhyandMaxion(2009)

‘‘.tie5Roanl’’ Single event sequence with noinformation on the Shift key used

Giot et al.(2009)

‘‘greyc laboratory’’ Single event sequence

Allen (2010) ‘‘pr7q1z’’, ‘‘jeffrey allen’’, ‘‘drizzle’’ Lower case string, single event sequence

Bello et al.(2010)

Paragraphs from Spanish translation of: OneThousand and One Arabian Nights and Warand Peace

Lower case paragraph, case insensitive.Quotes, dashes, hyphens removed.Users were allowed to make mistakesbut researchers used digraphs foranalysis

Software Qual J

123

faster search-and-retrieval of data. This may prove valuable in continual authentication

systems where decision times and efficiency are important. Furthermore, adherence to a

static feature vector format provides intangible benefits such as ease during data analysis

and better portability of data among researchers and systems.

References

Allen, J. D. (2010). An analysis of pressure-based keystroke dynamics algorithms. PhD Thesis, SouthernMethodist University.

ANSI-INCITS-154-1988. (1988). Office machines and supplies: Alphanumeric machines—keyboardarrangement. http://www.webstore.ansi.org/

Banerjee, S. P., & Woodard, D. L. (2012). Biometric authentication and identification using keystrokedynamics: A survey. Journal of Pattern Recognition Research, 7(1), 116–139.

Bartlow, N., & Cukic, B. (2006). Evaluating the reliability of credential hardening through keystrokedynamics. In 17th international symposium on software reliability engineering, 2006. ISSRE’06 (pp117–126). IEEE.

Bello, L., Bertacchini, M., Benitez, C., Pizzoni, J. C., & Cipriano, M. (2010). Collection and publication of afixed text keystroke dynamics dataset. In XVI Congreso Argentino de Ciencias de la Computacion.

Bleha, S., Slivinsky, C., & Hussien, B. (1990). Computer-access security systems using keystroke dynamics.IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(12), 1217–1222.

Bortz, J., Lienert, G. A., & Boehnke, K. (2000). Verteilungsfreie methoden in der biostatistik. Berlin:Springer.

Crenshaw, A. (2009). Changing your mac address in window xp/vista, linux and mac os x. http://www.irongeek.com/i.php?page=security/changemac

Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis ofvariance. Journal of the American Statistical Association, 32(200), 675–701.

Giot, R., El-Abed, M., & Rosenberger, C. (2009). Greyc keystroke: A benchmark for keystroke dynamicsbiometric systems. In IEEE 3rd international conference on biometrics: Theory, applications, andsystems, 2009. BTAS’09 (pp 1–6). IEEE.

Gross, R., & Acquisti, A. (2005). Information revelation and privacy in online social networks. In Pro-ceedings of the 2005 ACM workshop on privacy in the electronic society (pp. 71–80). New York:ACM.

ISO/IEC-9995-3:2010. (2010). Information technology: Keyboard layouts for text and office systems—part3: Complementary layouts of the alphanumeric zone of the alphanumeric section. http://www.iso.org/iso/home/store.htm

JISX-6002:1980. (1988). Keyboard layout for information processing using the jis 7 bit coded character set.http://www.webstore.jsa.or.jp/

Killourhy, K. S., & Maxion, R. A. (2009). Comparing anomaly-detection algorithms for keystrokedynamics. In IEEE/IFIP international conference on dependable systems and networks, 2009. DSN’09(pp. 125–134). IEEE.

Microsoft (2014a). The microsoft keyboard layout creator. http://msdn.microsoft.com/en-us/goglobal/bb964665.aspx

Microsoft (2014b) Windows keyboard layouts. http://msdn.microsoft.com/en-us/goglobal/bb964651.aspxMontalvao, J., Almeida, C. A. S., & Freire, E. O. (2006). Equalization of keystroke timing histograms for

improved identification performance. In 2006 International telecommunications symposium (pp.560–565). IEEE.

Ross, A., & Jain, A. (2004). Biometric sensor interoperability: A case study in fingerprints. In D. Maltoni &A. K. Jain (Eds.), Biometric authentication (pp. 134–145). Berlin, Heidelberg: Springer.

Syed, Z., Banerjee, S., Cheng, Q., & Cukic, B. (2011). Effects of user habituation in keystroke dynamics onpassword security policy. In 2011 IEEE 13th international symposium on high-assurance systemsengineering (HASE) (pp. 352–359). IEEE.

Vu, K. P. L., Bhargav, A., & Proctor, R. W. (2003). Imposing password restrictions for multiple accounts:Impact on generation and recall of passwords. In Proceedings of the human factors and ergonomicssociety annual meeting (Vol. 47, pp. 1331–1335). London:SAGE.

Young, J., & Hammon, R. (1989). Method and apparatus for verifying an individual’s identity. https://www.google.com/patents/US4805222. US Patent 4,805,222.

Software Qual J

123

http://www.webstore.ansi.org/

http://www.irongeek.com/i.php?page=security/changemac

http://www.irongeek.com/i.php?page=security/changemac

http://www.iso.org/iso/home/store.htm

http://www.iso.org/iso/home/store.htm

http://www.webstore.jsa.or.jp/

http://msdn.microsoft.com/en-us/goglobal/bb964665.aspx



https://www.google.com/patents/US4805222

https://www.google.com/patents/US4805222

Zahid Syed is an Assistant Professor at the Department of ComputerScience, Engineering, and Physics, University of Michigan - Flint,Flint MI.

Sean Banerjee is a post-doctoral scholar at The Robotics Institute,Carnegie Mellon University, Pittsburgh PA.

Bojan Cukic is a Professor and Department Chair of the College ofComputing and Informatics, University of North Carolina at Charlotte,Charlotte NC.

Software Qual J

123

normalizing variations in feature vector structure in keystroke dynamics authentication systems

Documents