web content recommendation using machine learning on user mouse tracking data

104
Web Content Recommendation using Machine Learning on User Mouse Tracking Data Sparsh Gupta Pembroke College | Computing Laboratory University of Oxford Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science September 2009

Upload: sparshgupta8475

Post on 28-Mar-2015

1.138 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

WebContentRecommendationusingMachineLearningonUserMouseTrackingData

SparshGupta

PembrokeCollege|ComputingLaboratoryUniversityofOxford

Submittedinpartialfulfillmentoftherequirementsforthedegreeof

MasterofScienceinComputerScience

September2009

Page 2: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Abstract

ii

ABSTRACT

Thewebsitesarebecomingmoreandmoredynamicbutnotintelligent.Basedon

certain mouse clicks or user choices, today’s dynamic websites can mold

themselves but cannot predict relevant data intelligently. The data contained in

today’swebsitesisgrowingandthenumberofusersdemandinguniquedifferent

information is also ever increasing. This has created a challenging problem of

deliveringtherightcontenttoeveryuser.

Thisthesisisanoriginalworkconcentratingonsolvingthisproblemofgenerating

relevant content for each individualuser.Oneof theprimary inputsusedby the

project is the mouse movement behavior of the user. If the website capturing

mousemovementsisbuiltinsuchawaythatthemousepointerismostlycloseto

the point of gaze of the user, then the mouse movement behavior would

theoreticallymean tracking the eye of the user. Based on thismousemovement

data,furthercontentcanbepredictedandpersonalizedforeachuserusingoneor

moremachinelearningmodels.

Thisthesisproposesacompletemethodologyofbuildingandimplementingsucha

system. As a proof of concept, an online shopping website has been built and

further tests havebeen conductedwhich gave a remarkable accuracyof 84.09%

whencomparedwiththeactualneedsoftheuser.

Theworking demonstration of the project alongwith its description is available

onlineathttp://sparshgupta.name/MSc/Project

Keywords:adaptiveweb,machinelearning,mousemovement,gazepoint

Page 3: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Acknowledgement

iii

ACKNOWLEDGEMENT

Iamheartilythankfultomysupervisor,Dr.VasilePalade,whoseencouragement,

guidance, confidence in my idea and support from the initial to the final level

enabledme todevelop this project andunderstand the subject. I am thankful to

ComputingLaboratory,UniversityofOxfordforacceptingmyproposalandgiving

meanopportunitytoworkonthisidea.Igratefullyacknowledgethesupportand

helpofallthevolunteerswhohelpedmecollectthedataformywork.

IwouldliketothankProf.LukeOngandPembrokeCollegefortheirco‐operation

andreadiness toalwayshelpmewhenneeded. Iwouldalso like toacknowledge

theeffortsandfacilitiesprovidedbythestaffoftheComputingLaboratoryLibrary,

RadcliffScienceLibraryandPembrokeCollegeLibrary.

Lastly, I offer my regard to my parents, my sister and friends who always

supportedmeinallrespectsduringthecompletionofthisproject.

SparshGupta

Page 4: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofContents

iv

TABLEOFCONTENTS

Abstract........................................................................... ii

Acknowledgement.......................................................... iii

TableofContents........................................................... iv

TableofFigures.............................................................. ix

Introduction........................................................... 1

1.1 APrimer ..................................................................................1

1.1.1 TheWorldWideWeb ............................................................................... 1

1.1.2 Thecomputermousedevice ....................................................................2

1.1.3 Eyetracking ...............................................................................................2

1.1.4 WWWandthemissinggap .....................................................................3

1.1.5 Trackingmousepointertotrackuser’seyes ...........................................3

1.2Motivation.............................................................................. 4

1.3Objectives............................................................................... 5

1.4Structureofthedissertation ................................................. 7

Background,LiteraturereviewandProjectoverview .................................................................8

2.1Coordinationofmouseandeyemovements ........................8

2.2Capturingmousemovements ............................................. 10

2.3Trackingmousemovementtodetermineusersbehaviour 11

Page 5: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofContents

v

2.4Discussion ............................................................................. 11

2.5Projectoverview....................................................................12

DataCollectionandPre‐processing.....................15

3.1 Theinitialwebsite................................................................. 15

3.1.1 Specifications ........................................................................................... 15

3.1.2 Implementation ...................................................................................... 16

3.1.2.1WebpageDesign............................................................................... 17

3.1.2.2DatabaseDesign...............................................................................20

3.1.2.3Implementingmousetracking........................................................22

3.1.2.4Finalproductboughtbytheuser ...................................................25

3.1.3 Testingtheinitialwebsite ......................................................................25

3.2Datacollection ..................................................................... 26

3.3Datacompilationandcleaning ........................................... 27

3.3.1 NeedandSpecifications .........................................................................27

3.3.2 Implementation......................................................................................28

3.3.2.1Datacompilation..............................................................................28

3.3.2.2Datacleaning ...................................................................................30

3.3.2.3Datanormalization .........................................................................32

Buildingmachinelearningmodels..................... 34

4.1MachineLearning ................................................................ 34

4.1.1 WEKA....................................................................................................... 35

4.1.2 WhyMachineLearning?........................................................................ 35

4.2Methodsevaluated................................................................35

Page 6: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofContents

vi

4.2.1 DecisionTree ..........................................................................................36

4.2.2 NeuralNetwork......................................................................................36

4.3Implementedalgorithms......................................................37

4.3.1 DecisionTree(C4.5)...............................................................................38

4.3.2 NeuralNetwork(MultilayerPerceptron).............................................39

4.4Modelbuilding..................................................................... 39

4.4.1 DecisionTree......................................................................................... 40

4.4.1.1Detailsofthechosendecisiontree ................................................ 40

4.4.1.2Testingthedecisiontree.................................................................45

4.4.1.2.1TestingonTrainingData.........................................................45

4.4.1.2.2TestingbyCross‐Validation(folds10) .................................. 46

4.4.1.2.3Discussion ............................................................................... 46

4.4.2 NeuralNetwork .....................................................................................47

4.4.2.1Detailsofthechosenneuralnetwork ........................................... 48

4.4.2.2Testingtheneuralnetworkmodel ................................................ 51

4.4.2.2.1TestingonTrainingData ........................................................52

4.4.2.2.2TestingbyCross‐Validation(folds10)...................................52

4.4.2.2.3Discussion................................................................................53

4.4.3 DecisionTreeVsNeuralNetworks ......................................................54

Embeddingthemachinelearningmodelsinthewebsite ................................................................. 56

5.1WhatandWhy? .................................................................... 56

5.2Specifications ....................................................................... 56

5.3Implementation ....................................................................57

5.3.1 ImplementingtheDecisionTreemodel ...............................................59

Page 7: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofContents

vii

5.3.2 ImplementingtheNeuralNetworkmodel...........................................59

5.4Usingmodeloutputs ...........................................................60

5.5Whatnext ............................................................................. 62

TestingandResults..............................................64

6.1Testingmethodology ........................................................... 64

6.2Testingformodelaccuracy.................................................. 64

6.2.1 Testingdatacollection...........................................................................65

6.2.2 ModeltestinginWEKAusingtestdata.............................................. 66

6.2.2.1DecisionTreemodel........................................................................67

6.2.2.2NeuralNetworkmodel .................................................................. 68

6.2.3 Discussion.............................................................................................. 69

6.3Testingtimeperformanceofthemodels............................ 69

6.3.1 DecisionTreemodel ..............................................................................70

6.3.2 NeuralNetworkmodel ..........................................................................70

6.3.3 Discussion............................................................................................... 71

6.4Results ...................................................................................71

Conclusion ........................................................... 73

FutureWork ...............................................................................75

Bibliography......................................................... 78

Appendix:SourceCode........................................ 82

Page 8: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofContents

viii

HTMLfinalwebpage .........................................................................................82

TheJavaScriptfile ..............................................................................................87

TheCSSfile ....................................................................................................... 90

ThePHPscripts .................................................................................................92

data.php.........................................................................................................92

connect.php...................................................................................................92

bought.php....................................................................................................92

alignData.php................................................................................................92

predict.php ....................................................................................................93

Page 9: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TableofFigures

ix

TABLEOFFIGURES

Figure1:Projectoutline.................................................................................................................... 14

Figure2:Screenshotofthetophalfofthedevelopedwebpage...................................... 17

Figure3:Screenshotofthedevelopedwebpage.................................................................... 18

Figure4:Codegiventoeachsectionofthewebpage........................................................... 19

Figure5:Screenshotwithacellhighlighted ............................................................................ 20

Figure6:Databasetable'data' ....................................................................................................... 21

Figure7:Databasetable'bought' .................................................................................................. 21

Figure8:ParametersusedforbuildingtheDecisionTreemodel .................................. 42

Figure9:ParametersusedforbuildingtheNeuralNetworkmodel ............................. 50

Figure10:Screenshotofthepredictiondonebythemodel ............................................. 62

Page 10: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

1

1 INTRODUCTION

Thischapterincludesabriefoverviewofafewterms.Itthendiscussesthe

coordinationbetweeneyeandmousemovementandhowmousemovement

data can be used as pseudo eye tracking data. Later, this chapter talks

aboutthemotivationbehindthisprojectandclarifies theobjectivesof the

researchandthestructureofthisdocument.

1.1 APrimer

ThissectionofthechapterwilldiscussabriefhistoryoftheWorldWideWeb(WWW),

the use of a computer mouse and the current eye tracking technology. It will later

explainhowtheWWWcanbe improvedbyusingeyetrackingdataandhowamouse

pointercanbeusedtocollectpseudoeyetrackingdata.

1.1.1 TheWorldWideWeb

In1990,CERN launched theworld’s firstwebsite1,whichwasonlya few linesof text

and hyperlinks. In its nineteen years of journey, today’s websites have completely

revolutionized. The plain text is now being accompaniedwith all sorts of richmedia

1CERN,Welcometoinfo.cern.ch/,http://info.cern.ch/.

Page 11: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

2

including images, music, videos, animations, colours etc. Dynamic data from ever‐

increasingdatabasesisrapidlyreplacingthestaticcontentofthewebsites.Webservers

arenowcapableofmorerealtimecomputing.Datacannotonlybeshowntoauserbut

canbecollectedfromhimeasily.Recently,thesuccessofAJAX1hascompletelychanged

thewebexperiencebymakingitmuchmoreinteractiveandmoredatadriven.

Today, Internet has changed everything, from how we do business, how we study,

connectwithfriendsandingeneral,howwelive.

1.1.2 Thecomputermousedevice

Mostofthepeopleintheworldusesacomputer‐pointingdevice(generallyamouse)to

navigatethroughawebsite.Theyclickhyperlinksspreadacrossdifferentsectionsofa

webpage,selecttextsorscrollthroughalongpageusingacomputermouse.Mousecan

safely be called as a personal assistant while working on a computer and especially

whilebrowsingawebsite.

1.1.3 Eyetracking

EyetrackingorGazetrackingisaprocessofmeasuringthegaze,i.e.,keepingatrackof

thepointatwhichauserislooking.Mostofthewebsiteshavevisualinformationinthe

formoftexts,images,graphics,etc.,andalmostalltheinformationauserattainsfroma

websiteisbyperceivingitthoughhiseyes.

Eyetrackingwhenemployedtoawebsitecanbeimaginedasamethodofdetermining

theportiononthescreenatwhichtheuserislooking.Thisinformationcanpotentially

1W3Schools,Ajax,http://www.w3schools.com/Ajax/.

Page 12: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

3

givea fair ideaaboutthesectionsmostrelevanttohim.Themoretimeauserspends

lookingataparticularsection,readingitorsimplyviewingit,themoreinterestedheis

inthatsectioncomparedtotheothersonthesamepage.

1.1.4 WWWandthemissinggap

Websites have started becoming dynamic by accepting inputs from a user,which are

thenusedtoselectrelevantcontentorinformationforhim.Thekindsofinput,current

websites primarily employ are: mouse clicks, key presses, text entered or choices

chosenbytheuser in the formelementof thepage.This,onthecontrary,meansthat

incasetheuserisnotinterestedingivinganydataasinput,thewebsitewouldendby

beingstaticorwithoutanyinformationonuserneeds.

Theeye trackingdata, if captured for a generaluser, canbeutilizedvastly inmaking

today’swebsitesmore adaptive and intelligent by harnessing the knowledge of users

interest and information he ismost interested in.Without seeking any external data

fromtheuser,hisinterestsandneedscanbedeterminedbasedonhiseyemovements

andhecanbeservedthedataheismostinterestedin.

1.1.5 Trackingmousepointertotrackuser’seyes

Therehasbeena lotofresearch in improvingthecomputingexperience forauserby

trackinghiseyelocation,butthereareafewdrawbacksassociatedwithit.Firstly,the

tracking equipment is expensive and the user needs to physically wear the tracking

gadget.Not everyoneusing Internetwouldwant or canwear the tracking equipment

andhencethegeneralpublicwebsitescannotbemadedependentonthem.Thereare

alsoongoingresearchestodeterminethemovementofeyesusingacameradevice,but

as of now the accuracy of determining the gaze position is low and it depends on

movements of the user, lighting conditions and, most importantly, the user need to

Page 13: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

4

download an external software. Because of these limitations of the eye tracking

methods,therehavebeenresearchesinfindingotheralternatives.

Recently Googlers Kerry Rodden and Xin Fu proposed in their paper (Rodden, et al.

2008)thatmousemovementsshowpotentialasawaytoestimatewheretheuserhas

considered before deciding where to click. There have been other studies that have

providedareasonableestimateofcoordinationofmouseandeyeespeciallyonapagein

which a click is likely to happen. Hence, tracking a user mouse movement can

sometimes be used as a pseudo eye tracking data. There are several interface design

techniquesinHumanComputerInteractionwithwhichawebsitecanmakesurethat,in

mostcases,user’smousepointerisclosetohispointofgaze.Oneofthetechniquesthat

havebeenemployedintheprojectisthemouseovercellhighlighting.Ifthecontentat

thecurrentlocationofthemousepointerishighlightedtomakeitstandoutofrestof

the page, then this can almost always ensure that the mouse pointer movement is

synchronizedwiththeareatheuseriscurrentlyreadingorgazingto.

1.2 Motivation

Many websites do not ask for any explicit input from the user but can still adapt

themselves. They primarily use either some geographical information (which can be

obtained from user’s IP address) or the browser/operating system specifications to

adapt the web content for the user. This adaptation is of course not targeted to an

individualuserandisonlyabroadadaptationtocateragroupofusershavingsimilar

demographicsorpreferences.Theadaptationofawebsitecanbebasedonanysmallest

bit of information from theuser.Themore information thewebsite attains about the

user,thebetteritiscapableofadaptingtohisneeds.

Page 14: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

5

Theprimarymediumof interactionof auserwithawebsite is amousedeviceand it

produces a huge amount of data in the form of mouse movement behaviour. The

motivationbehindthisthesisandprojectistheexistinggapbetweenthedemand

ofmoreuserdataforawebsitetomakeitadaptiveandtheavailabilityofample

datafromtheuserintheformofhismousemovements.

Further,ifawebsiteisdesignedinsuchawaythatmostoftenornottheuser’smouse

pointermovementissynchronizedwithhispointofgaze,asdiscussedinSection1.1.5,

thenthedatacanalsoberoughlycalledaseyetrackingdata.

1.3 Objectives

Theobjectiveofthisprojectistoeffectivelyutilizethemousemovementdataofauser

inmaking theweb contentmore adaptive for him, by dynamically predicting further

relevantcontentforhim.

Inordertoachievetheabovemainobjective, thefollowingsub‐objectivesneedstobe

catered:

• Collectingtheinitialtrainingdatasetofmousemovementbehaviorfromalarge

setofusers inorder to trainandbuildamodel.Thiswill involvedevelopinga

websitewithwell‐definedareaorsectionsorelementswheremousemovements

could be tracked. The website needs to be such that users mouse pointer

synchronizewithhispointofgaze.

• Asking volunteering users to visit this site and choose or select content for

themselves like theydoonanyotherwebsite.Tracking the time spent at each

section / element of thepage,while theuser is browsingon it is the required

data. The target (predicted or dependent) variable is the relevant content for

Page 15: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

6

himandhenceinordertotrainthemodel,thisdatapoint(ascollectedexplicitly

fromtheuser)alsoneedstobesavedinthedatabases.

• Datacleaningandprocessing isanessentialsteprequiredafterdatacollection.

This isbecause it is important to removeall outliers that canharm themodel.

The time user spent at different sections of the webpage should ideally be

normalizedbythetotaltimespentbyhim.

• Buildingmachinelearningmodelsbyusingthecollectedmousemovementdata

as the training and initial testing dataset. The distribution of normalized time

spent by the user at each section of the webpage would be the independent

variablesandhencewouldbecometheinputattributesofthemodel,andfurther

contentfortheuserwillbetheoutputofthemodel,asthedependentvariable.

• Embeddingthemachinelearningmodelsbackintothewebsitesothatthemodel

can be put into use. The website would continue tracking users mouse

movementsandwouldusethebuiltmodeltocomputefurthercontentforhimin

realtime.

• Testing the accuracy of the implementation. To do this, the predicted content

needstobecomparedwiththeactualcontentdesiredbytheuser.

Todemonstrate theobjectives, a sample shoppingwebpagehasbeendeveloped.This

webpagecontainsacomparisonofthespecificationsoffivelaptopmodels.Basedonthe

mouse movement behavior of a user across this page, the best laptop would be

recommendedtohim.

Thiscanbevisualizedasfollows:Ifauserhasabrowsingpatternthatsignifiesthatheis

spendingsay40%timereadingabouttheRAMofthelaptops(furtherdistributionoftime

spentondifferentRAMsizesofdifferentmodels),30%timereadingabouttheprocessors,

20%timeabouttheHardDiscDriveandtherest10%timesimilarlyreadingaboutother

specifications, then based on this data and the developed machine learning model, the

Page 16: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Introduction

7

most suitable laptopcanbe recommended tohim.Theaccuracyof the recommendation

could be checked by comparing the product finally bought by the user and the product

recommendedbythewebsite.

1.4 Structureofthedissertation

This document will start with giving an idea about the related research being done

across the globe. It will then explain the complete implementation outline as a big

pictureoftheproject.InChapter3,thethesiswilldiscussthemethodologyofcollecting

initial training data, which would also involve the complete description of the

development procedure of the initialwebsite. Itwill explain the process of collecting

data along with the structures of the databases and the data cleaning procedure.

Chapter 4 would give the details of the machine learning models built and the

procedureinvolvedalongwiththetestingresultsofthemodelsobtainedonthetraining

data.Chapter5wouldexplaintheprocedureadoptedtoimplementthebuiltmodelinto

thewebsite and the details of theAJAX communication link between themodel, data

and thewebsite.Then the thesisexplains themethodology to collect testingdataand

wouldexplain the testingmethodologyand resultsobtainedon themodel.The thesis

closeswith someconclusionsand theauthor’sviewon thepossibilityof futurework.

Theattachedappendixcontainsallthesourcecode.

Theworkingdemonstrationof theproject,alongwith itsdocumentationandtheGNU

General Public License source code is available online at

http://sparshgupta.name/MSc/Project

Page 17: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

8

2 BACKGROUND,LITERATUREREVIEWANDPROJECTOVERVIEW

This chapter explains the previous work related to the problem already

going on around the world. The chapter is divided into different sections

explainingindependentandcombinedworkgoingonorbeingdoneineach

of the heading. The chapter later summarizes the ongoingwork and also

presentsanoverviewoftheprojectcarriedoutbytheauthor.

Theworkdoneintheprojectisanoriginalideaandthereisnorecordofanyworkbeing

done around using the same methodology. The problem has been tackled to some

extent andhasbeen consideredby a few researchgroupsbut theirmethodologyand

finalconclusionswereverydifferentfromwhathavebeenproposedinthisthesis.The

following parts of the chapterwould highlight some of the recent developments and

workdoneinrelatedfields.

2.1 Coordinationofmouseandeyemovements

Theprimequestionofwhethermousetrackingcanbesubstituted,oratleastpartially

replicate,eyetrackingisactive.

(Chen,AndersonandSohn2001)studiedtherelationshipbetweenthegazepositionof

a user and his cursor position on a computer screen during web browsing. They

Page 18: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

9

conductedtestsonseveralwebsitesandrecordedtheeyeandmousemovementsofthe

uses and studied them separately. They concluded that there is a strong relationship

betweengazepositionandcursorpositionandalsothatthereareregularpattersofthe

coordination.Theyhavealsoargued thatamousecouldprovideusmore information

thanjustxandycoordinateswhichcouldbeusedtodesignbetterinterfacesforhuman

computer interactions. They wrote in their conclusion that “Our data show that the

dwelltimeofcursoramongdifferentregionshasstrongcorrelationtohowlikelyauser

will lookat thatregion.Also, inover75%ofchances,amousesaccadewillmovetoa

meaningfulregionand,inthesecases,itisquitelikelythattheeyegazeisverycloseto

the cursor. This result implies that, by predicting the users' interests on web pages,

moussedevicecouldbeaverygoodalternativetoaneye‐trackerasatoolforusability

evaluation.”

According to the work done at Google labs (Rodden, et al. 2008), several different

pattersofcoordinationbetweeneyeandmousepointerwereobservedonawebsearch

resultpage.The identifiedbehaviorpatters to indicateactiveusageswere– following

theeyehorizontally, following theeyeverticallyandmarkingaparticular result.This

work was completely done on a search results page but clearly concludes that

coordinationbetweenuser’seyeandhismousepointerexists.

Therehavebeenmorestudies(Byrne,etal.1999)andothersontherelationshipand

coordinationbetween eyemovements andmousemovements on theweb.Theyhave

foundthatsomeuserswillusethemousepointertohelpthemreadthepage,ortohelp

themmake a decision about where to click. If was concluded that given an intent /

opportunity to click in the currentuser activity, themouse ismuchmore likely tobe

closetotheeye.Eyetrackingcanprovideinsightsintousers’behaviorwhileusingthe

searchresultspage,buteye‐trackingequipmentisexpensiveandcanonlybeusedfor

studieswhere theuser isphysicallypresent.Theequipmentalsorequirescalibration,

Page 19: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

10

addingoverheadtostudies.Incontrast,thecoordinatesofmousemovementsonaweb

pagecanbecollectedaccuratelyandeasily,inawaythatistransparenttotheuser.This

means that it can be used in studies involving a number of participants working

simultaneously, or remotely by client‐side implementations – greatly increasing the

volumeandvarietyofdataavailable.

Thereisabasicrationalitythatstates"IfImightclick,Imightaswellkeepthemouse

closetomyeyes."Wherethere'snopotential toclick,eitherbecausetheuser is inan

evaluativemodeorthecontentofinterestisdevoidoflinks,themouseandeyediverge.

2.2 Capturingmousemovements

Therecanbeseveraldifferentmethodologiestocapturemousemovementbehaviorofa

userover awebpage.Thisprimarilydependsupon the typeof data required and the

mousemovementexpected.(Arroya,SelkerandWei2006)proposedatoolthatneedno

installationand is capableof trackingusersmousemovement.Thismousemovement

datacanbevisualizedinaninbuiltsystemandcanbeusedtofurtherrefinetheusability

of thewebpage. They however have not proposed anymethodology to automatically

refinethewebpage.

(Edmonds,etal.2007)talksabouttechniqueandusesofmousetrackingonawebsite

but completely from usability point of view. It handles the capturing of the mouse

movementsdataofauser inamoredetailedwaycapturing thecoordinates, rowand

column ID alongwithmany other parameters. Thismethodologywas found effective

butshowednosignificancefromthecurrentproblempointofview.

The paper (Torres and Hernando, Real time mouse tracking registration and

visualizationtoolforusabilityevaluationonwebsitesn.d.),proposesamethodologyto

track mouse movements on a webpage and visualize them on a tool that they have

Page 20: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

11

developed.TheyhaveusedtheHTMLandAJAXlanguagesandhaveproposedamethod

to link the mouse movements with the server logs and web‐stat data to get add‐on

informationoftheuser’sbehavior.

2.3 Tracking mouse movement to determine users

behaviour

There was a famous project named ‘Cheese’ done at the MIT (Mueller and Lockerd

2001),whichextendedtheconventionalwebinterfaceusermodel(basedonresponds

ofonlymouseclicks)toaccountallmousemovementsonapageasanadditionallayer

of information for inferring user interest. They developed a straightforward way to

record all mouse movements on a page, and conducted a user study to analyze and

investigatemousebehaviortrendsandfoundcertainmousebehaviors,commonacross

manyusers.Theyalsoproposedthattherearecertaincategoriesofmousebehaviorand

aftertrackingthem,thewebsitecouldbemoldedaccordingly.

2.4 Discussion

It was found after literature review that a lot of work has been done to prove and

supportthecoordinationofeyeandmousemovementofauseronawebsite.Theeye

trackingdatahasbeenusedbyGoogle to improve theusabilityof their searchpages.

Thereareseveralongoingdiscussionsontheeffectiveuseofeyeormousetrackingdata

tomanuallyrefinethecontentandusabilitydesignofawebpage.

Itwashoweverfoundthatnoworkhasbeendoneinusingthemousetrackingdataina

machine learningmodel to automatically refineorpredict content for awebsite for a

userbasedonhismousemovementoreyemovementbehavior.

Page 21: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

12

2.5 Projectoverview

Theprojectundertakencanbestatedasamethodproposedtoautomaticallyrefineor

predictthecontentsofawebpage,forauser,basedonhismousemovementbehavior.

Fromearlierstudies,asstatedinSection2.1,ithasbeenassumedthatthereiscertainly

somecoordinationbetweenauser’seyemovementandhismousemovement.Basedon

themousemovementsofanindividualuser,hispreferencesforcontentandhisneeds

canbepredictedandthisinformationcanfurtherbeusedbytheownersofthewebsite.

If not the owners, this information can definitely help the user in finding the right

contentforhim.

Todothis,thefirsttaskwastodeviceamethodologytotrackuser’smousemovements

onawebpage.Therecanbeseveralwaysinwhichtrackingcouldbedone,andfurther

there can be several different data points that can be saved for a user based on his

mousemovements.Thethesisproposedamethodtotrackthetimespentbyauserin

every section of a webpage. There were several JavaScript functions written, and

modificationsdone toa standardwebsite toenablemouse tracking inahidden layer.

AJAX was used to connect the JavaScript functions with the server end PHP scripts,

whichwerefurtherconnectedtoMySQLdatabasesforstoringthedata.Todemonstrate

all this, a new dummywebsite imitating a shopping portal was developed. Once the

websitewasdevelopedwithmousetrackingcapabilities,itwasmadeavailabletopublic

for twoweeks.Thiswasdone to collect some initialdataonuser’smousemovement

behavior.Thedatacollectedwasprocessedandcleanedbeforeanalyzingandmodeling

it. This complete step of initial website development and data collection has been

explainedindetailsinChapter3(DataCollectionandPre‐processing).

Itwasthenrequiredtostudyandanalyzethecollecteddataandmakeamodelonitso

that it could be used in the future for new visitors. To do this,WEKAwas used and

Page 22: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

13

differenttypesofmodelsweremade.Themodelstooktheindependentvariablesasthe

timespentindifferentsectionsofthewebpagebythemousepointerandpredictedthe

relevantcontentfortheuserasthedependentvariable.Theyallwerebuiltandtrained

ontheinitiallycollecteddataandweretestedonthesametrainingdata.Afterseveral

iterations, twomodels, onebasedonDecisionTreeand theotheronNeuralNetwork

wereobtainedthatgavesignificantaccuracyonthetrainingdata.Thecompletemodel‐

building phase of the project along with the test results obtained are explained in

Chapter4(Buildingmachinelearningmodels)

Once the twomodels (eachofDecisionTreeandNeuralNetwork)wereobtained, the

taskwas to embed themboth into the initialwebsite.Thiswasnecessary so that the

builtmodelscouldbeusedforfuturevisitorsandthecontentrelevanttothemcanbe

predictedbasedontheirmousemovementactivities.ThemodelswerecodedinPHPon

anapacheserverandwereconnectedwith the front‐endHTMLpageusingAJAX.The

PHP script was made to read the real time mouse movement data of a given user

directly from the MySQL databases and execute the model on it to predict further

contentforhim.ThewholeprocedureisexplainedindetailsinChapter5(Embedding

themachinelearningmodelsinthewebsite)

Afterembeddingthetwomodelsintothewebsite,volunteerswereagainaskedtovisit

thewebsite.Thistimenotonlytheuser’smousemovementswerecapturedbutalsohe

was recommended appropriate content based on one of the two machine learning

models.ThemousemovementdatawassavedintheMySQLdatabasestobeanalyzed

foraccuracylater.ThisstepisexplainedinChapter6(TestingandResults)

Thecollecteddatawasusedasthetestdatasetandthetwomodelswereevaluatedon

their accuracy as well as time performances. It was found that under the present

limitations of lack of data, the Decision tree model edged over the Neural Network

Page 23: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Background,LiteraturereviewandProjectoverview

14

modelbothontheaccuracyaswellasonthetimeperformancefront.Thedetailsofthis

steparementionedinChapter7(Conclusion)

Thewholeprojectcanbeoutlinedasfollows:

Figure1:Projectoutline

BuildingtheInitialwebsitecapableoftrackingmousemovementsofthevisitors

Askingvolunteeringuserstovisitthewebsiteandcapturing

theirmousemovements.Cleaningandcompilingthe

collecteddata.

Usingthecapturedmousemovementdataoftheusers,buildingandtrainingmachine

learningmodels

Codingtheobtainedmachinelearningmodelsbackintothe

website

Collectingtestdatasetfromthefinalwebsite.Thewebsitenowiscapableofrecommendingtheappropriatecontentforauserbasedonhismousemovement

behavior

Testingtheaccuracyofthebuiltmodelsusingthecollectedtestdata.Alsoevaluatingthetimeperformancesofthemodelson

thewebserver

Page 24: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

15

3 DATACOLLECTIONANDPRE‐PROCESSING

Thischapterwillexplainthecompletetrainingdatasetcollectionsteps.This

wouldinvolvedetailsoftheinitialwebsitedevelopedandexplanationofthe

steps followed to obtain the required training data from it. Later, this

chapterwillexplainthedatacompilationandcleaningstepsperformedon

theinitialcollecteddata.

3.1 Theinitialwebsite

To analyze the mousemovement behavior of the users on a webpage, the first step

would be the development of the website under consideration. Since the proposed

methodofanalysisandmodelingthedataismachinelearning,someinitialtrainingdata

is also required. To cater both the needs, a dummy website capable of tracking the

user’smousemovementswas built andmadepublic. Thewebsitewas kept live until

requireddatawasachieved.Thespecificationsanddetailsoftheimplementationareas

follows:

3.1.1 Specifications

Thefunctionalities,requirementsandspecificationsoftheinitialwebpagebuiltare:

• Theuserinterfacedesignoftheinitialwebpageneedstobeexactlysameasthat

oftherequiredfinalwebsite.Thisisimportantbecauseuser’smousemovements

Page 25: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

16

dependontheinterfaceofthewebpage.Itisnecessarythatthedatacollectedto

buildand train themachine‐learningmodel isof the samewebpagewhere the

modelisfinallyrequiredtobeimplemented.

• Themousetrackingneedstobeimplementedinahiddenlayersothattheuser

canexperiencethewebinthesamerichwaywithoutanycompromiseonspeed,

performance.Heshouldnotbeaskedanyexplicitinformationatanytime.

• Asstated insection1.3, thewebpagedevelopedwasadummyshoppingportal

showingfivelaptopmodelscomparingthemontheirconfigurations.

• Therewere5laptopswith22attributesofeachofthem.Therewasanempty(no

laptop)specificationheadinginformationspaceonthelefthandsideofthepage.

Totalsectionsinthebuiltpagewere

5+1( )×22=132,where5arethenumberof

laptops, 1 is for specificationheading category (no laptop space) and22being

thecountofattributesperlaptop.

• Eachofthese132sectionsofthewebpagegetshighlightedassoonasthemouse

pointer reaches it. This ensured that the user is most likely to read the

highlighted section of the webpage and hence ensures that the user’s mouse

pointerisclosetohispointofgaze.Thisstepensuredthatthemousemovement

dataprovidespseudoeyetrackingdataoftheuser.Thecell‐highlightingfeature

wasimplementedusingCascadingStyleSheetswherethecellcolorwaschanged

assoonasmousepointerentersthecell.

• AMySQLdatabasewasconnectedforrecordingthemousepointertimeoneach

sectionofthewebpage.Thefinalproductboughtbythatuserwasalsosavedin

thedatabases.

3.1.2 Implementation

ThewebpagewasdevelopedinHTMLusingPHPastheserversidescriptinglanguage.

JavaScriptandAjaxwasusedtodynamicallytransferdatafromtheHTMLfieldstothe

Page 26: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

17

PHPscripts.DatabasewasdesignedinMySQLandPHPscriptswerewrittentoconnect

andtransferdatabetweenMySQLandtheApacheserver.

3.1.2.1 WebpageDesign

ThewebpagewasdesignedinHTMLinatabularformatwith6columnsand22rows.

Column1hadtheheadingofthespecificationsandrest5columnshadspecificationsof

eachlaptopandeveryrowhadaspecification.Eachofthe132cellshenceobtainedin

the table were corresponding to an independent variable for the model (input

variables).ThescreenshotofthetophalfofthedevelopedpageisshowninFigure2and

thescreenshotofthecompletewebpageisshowninFigure3.

Figure2:Screenshotofthetophalfofthedevelopedwebpage

Itcanbeseenclearlythatthereare6columnsonthewebpageand22rowsandhence

132cells.Sinceeachofthesecellsisaninputvariabletothemodel,theyallweregivena

code.Eachlaptopwasgivenanumberfrom1to5andthespecificationheadingspace

was given the code 0. Each specificationwas given an alphabetic code from ‘a’ to ‘v’.

Hence,eachofthe132sectionsofthewebpagegotthecodeasthecombinationofthe

alphabetofthespecificationandthenumberofthelaptoplikea0,a1,a2,a3,a4,a5,b0,

Page 27: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

18

b1,b2,…,v3,v4,v5.ThecodingmethodologyforthefirstfewcellsisshowninFigure4.

Thesecodeswerenotaddedanywhereonthewebpagebutwereonlyusedwhilecalling

themousetrackingfunctionsaswillbeexplainedinsubsequentsections.

Figure3:Screenshotofthedevelopedwebpage

Page 28: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

19

Figure4:Codegiventoeachsectionofthewebpage

Tomakesurethatinmostcases,theuser’smousepointerisclosetohispointofgaze,a

Cascading Style Sheet was attached with the HTML webpage. The CSS file had two

different style formats that could be applied to each cell. One of the styles was the

normal white background whereas the other format was with blue background to

enablecellhighlighting.Assoonasmouseentersacell, thenormalstylewasreplaced

bythehighlightingstyleforthatcell.Thiswasagainresetassoonasthemouseleaves

thehighlighted cell. Similarly, the rowand the column inwhich themousepointer is

currently present are also highlighted in a light shade of blue. The CSS code of the

differentstylesisavailableintheappendixofthisthesis.Thescreenshotwithacell‘g2’

highlightedisshowninFigure5.

For every visitor of thewebsite a unique user idwas generated as soon as the page

loads.Tokeep theuser id simple, itwaskeptas the current JavaScriptTimevalueat

page load. JavaScript time function returns the current time in milliseconds since

January1,1970.Thisensuredthat inthecurrentscopeoftheproject,allvisitinguser

wouldhaveauniqueuserid.TheJavaScriptcodetogenerateuseridis:

Page 29: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

20

A JavaScript file named ‘mouseover.js’was associatedwith thiswebpagewith several

JavaScriptvariablesandfunctionsrequiredtotrackandrecordmousemovements.The

HTMLcodeofthewebsitewasalsogivenan‘onload’eventtocallaJavaScriptfunction

named‘start_It()’whichtriggersthemousetrackingfunctionalityofthewebsite.

Thealgorithmsofmousetrackingandthecompleteimplementationwouldbeexplained

laterafterthedetailsaboutthedatabasedesign.

Figure5:Screenshotwithacellhighlighted

3.1.2.2 DatabaseDesign

A database was created in MySQL with two tables namely ‘data’ and ‘bought’. The

attributesofthetwotablesare:

var userId=new Date(); userId=userId.getTime();

<body onload="start_It();">

Page 30: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

21

Figure6:Databasetable'data'

Figure7:Databasetable'bought'

Table:data

• userIDTorecordtheuseridoftheuser

• cellIDTosavethecellIDthatwasassignedtoeachsubsectionofthewebpage

• timecontainsthetimeinmillisecondsspentinthecellID

Table:bought

• userIDTorecordtheuseridoftheuser

• boughtTosavethecodeofthefinalproductboughtbytheuser

Thetable‘data’wouldsavethetimespentineachcell,i.e.sectionofthewebpagebya

user. There could be 132 different sections / CellIDs for each user and they all can

appearmultipletimes.Thetimespentineachsectionbyauserwillbetheindependent

variableforthemodel.

Thetable‘bought’ismadetorecordthefinalproductselectedbytheuser.Theattribute

‘userID’inboththetablesistheforeignkeyandistheprimarykeyinthe‘bought’table.

Page 31: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

22

Therationalebehindsuchadesignwastoimplementdatabasenormalizationsothatall

datarepetitioncouldbeavoided.Also,theinsertquerieswouldbesimpleandshortand

hence would be efficient and wont slow the webpage while tracking the mouse and

interactingwith thedatabases simultaneously.Theonlydrawbackof suchadesign is

thatthedatawouldneedmergingbeforeitcouldbeusedfortrainingthemodel.

3.1.2.3 Implementingmousetracking

Eachof the132cellsof thewebpagehada JavaScript ‘onmouseover’and ‘onmouseout’

event statements.OnMouseOver specifies that the ‘movement_in()’ JavaScript function

be called every time themouse comes over that cell. OnMouseOut similarly specifies

that the ‘movement_out(‘cellID’)’ JavaScript function be called when mouse pointer

leavesthecell.Thecodesnippetdemonstratingthesefunctioncallsis:

As soon as mouse pointer enters a cell, the current DateTime was recorded in a

temporaryvariablenamed‘cellEntryDate’inthefunction‘movement_in()’.Thisfunction

wasnotpassedanyattribute.Assoonasthemousepointerexistsacell,thetimespent

in that cell inmillisecondswas calculated by subtracting the ‘cellEntryDate’ from the

currentDateTimeinthefunction‘movement_out(‘CellID’)’.Themovement_out()function

wasalsopassedtheunique2‐lettercellcodetorecordthecellID.Thetimespentinthe

cellalongwiththecellIDwasconcatenatedinthedataqueuevariablenamed‘queue1’

or’queue2’.TheJavaScriptfunctiondefinitionsareasfollows:

function movement_in() { cellEntryDate = new Date(); }

<td onmouseout="movement_out('c1');" onmouseover="movement_in();"></td>

Page 32: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

23

Thedonevariableintheabovecodewastocheckifthecurrentuserisstillactive

andhasnotboughtaproductalready.Flagwasavariabletocheckwhichqueue

variableiscurrentlyavailable.

Twoinstancesofthequeuevariablesweremadetoensurethatwhiletransmittingone

of thequeuedata to theserverviaAJAX, theotherqueuevariablecanrecord thecell

movements.ThisisofgreatimportancespeciallywhentheInternetbandwidthspeedis

lowanddatatransferinworstcasecantakealotoftime.Thisstepalsoensuresthatthe

interactionexperienceoftheuserwillnotbeaffectedwhilemousetrackingisgoingon

inthebackground.

As stated above, the built website had an ‘onload’ JavaScript event calling a function

named ‘start_It()’. The start_it() function is a recursive function which calls the

‘sendData()’ function every 2 seconds. The sendData() function contains the AJAX

statementtotransferthegenerateduserID(variable‘userID’)andthequeuevariables

namely ‘queue1’ or ‘queue2’ to the ‘data.php’ file at the backend server. The self‐

explanatoryJavaScriptfunctionsdefinitionsareasfollows:

function movement_out(cell) { cellExitDate = new Date(); time = cellExitDate.getTime()-cellEntryDate.getTime(); if(done==0) { if(flag==0) { queue1 = queue1+cell+":"+time+"_"; } else { queue2 = queue2+cell+":"+time+"_"; } } }

Page 33: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

24

The ‘sendData()’ JavaScript functionuses standardAJAXcallsandstandard ‘http’

open,onreadystatechangeandsendfunctions.The‘query_string’variablecontains

thePHPfiletowhichtheargumentswerepassedviaGETmethod.

The ‘data.php’ files was coded such that it takes the queue variable as sent by the

JavaScript ‘sendData()’ functionandexplodes thestring toextract thevariouscell IDs

and time values associated with them. It then opens a connection with the MySQL

databaseandinsertsrecordswithcellinformationinthe‘data’tableusingthereceived

userID.Thecompletecodeofthe‘data.php’fileisavailableinappendixofthethesis.

function sendData(){ var query_string;

if(flag==0) { queue2=""; flag=1; query_string = "data.php?userId="+userId+"&queue="+queue1; queue1=""; } else { queue1=""; flag=0; query_string = "data.php?userId="+userId+"&queue="+queue2; queue2=""; } http.open("GET", query_string, true); http.onreadystatechange = handleHttpResponse; http.send(null); }

function start_It(){ if(done==0) { setTimeout("sendData()",2000); } }

Page 34: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

25

3.1.2.4 Finalproductboughtbytheuser

Oncetheuserbrowsethroughthewebpageandscrolledonthetablereadingaboutthe

variousconfigurationsofthefivelaptopsgivingusonecaseofthetrainingdata,hewas

required to select one of the products. This is to simulate the actual shopping portal

scenariowhereapersonreadaboutvariousproductsandfinallybuyoneofit.Toselect

aproduct,heperformsamouseclickoperationonthe‘BuyNow’buttonassociatedwith

theproductasshowninFigure3.

Assoonasany ‘BuyNow’buttononthewebpageistriggeredbytheuser,aJavaScript

functionnamed‘bought(‘ProductID’)’ is invoked.ThisfunctionusestheAJAXprotocols

and sends theuserIDand the IDof theproduct clicked to the ‘bought.php’ fileon the

server. The ‘bought.php’ file on the web server connects to theMySQL database and

insertsthisinformationasarowintheboughttable‘bought’.Thecompletecodeofthe

PHPscript‘bought.php’isavailableinappendixandtheJavaScriptfunctionisasfollows:

Once the user selects the product, further mouse tracking is disabled. Changing the

valueoftheJavaScript‘done’variabledoesthis.

3.1.3 Testingtheinitialwebsite

The website once completed was hosted on a public web server and was tested

thoroughlyforbugsanderrors.Themainpointsinthechecklistwere:

function bought(product){ done=1; var query_bought; query_bought = "bought.php?userId="+userId+"&product="+product; http.open("GET", query_bought, true); http.onreadystatechange = handleHttpResponseBought; http.send(null); }

Page 35: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

26

• Thequeuevariables (‘queue1’ and ‘queue2’) in the JavaScript fileare recording

thecellIDandtimeappropriatelyandthedataisgettingextractedaccuratelyat

theserver.

• DataisbeingsentproperlyfromthefrontendJavaScriptfunctionstothebackend

PHPfilesviaAJAX.

• ThelinkbetweenthedatabaseandPHPfilesisworkingcorrectly.

• Both the tables in the database are getting data and are inserting it properly

withoutanyerror.

3.2 Datacollection

When the website as explained in the previous section was developed and tested

completely, itwasmade open for the general public. Volunteers via email and social

mediawereinvitedtovisitthewebpage.Theselectionofthevolunteerswascompletely

randomandwasprimarilythecontactgroupoftheauthor.Allthevolunteers/visitors

were asked to browse the webpage and buy a product on it (at cost zero, virtually)

similartothewaytheydoonarealshoppingsite.Fromthissampletheinitialtraining

data for the model was collected and saved into the databases as explained in the

previoussection.Nopersonalinformationoranyotherdatawasaskedfromanyvisitor.

Thedurationof this stepdependson the requirementsof the initial trainingdata for

building the model. The more the number of sections in the website, i.e. more the

independentvariablesof themodel,morenumberof cases in the initial trainingdata

wouldberequiredtobuildarelevantmodel.

Inashortspanof14days,292uniquevisitorsaccessedthewebpage.244rowswere

collected in the ‘bought’ table and 16401 tuples were saved in the ‘data’ table. The

expecteduserswerearound350‐400butduetolackofvisibilityoftheprojectandno

Page 36: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

27

compensationavailabletothevolunteers,thenumbercouldnotbereachedandinlieu

ofthetime,thewebsitewastakenoffandthedatawasexportedforfurtheranalysisand

cleaning.

3.3 Datacompilationandcleaning

3.3.1 NeedandSpecifications

Thecollecteddatainthetwotablesneedstobemergedinsuchawaythateachrowof

thenew table correspond toa singleuserandcontainsall informationabouthim, i.e.

eachrowisonecaseofthetrainingdata.Eachcasewouldincludeallthetimesspentin

132sectionsofthewebpagealongwiththeuseridandtheproductfinallyboughtbythe

user.Thisisalsotherequiredformattotrainamachine‐learningmodelinWEKA.

Moreover,thecollecteddataneedstobeanalyzedproperlyandcheckedforanyerrors

inthedata.Theremightbesomeuserswhowouldn’thaveprovidedtheinformationon

the actual product bought and hence the data related to them needs to be scrapped.

Someusersarelikelytospendabsolutelynotimeastheymighthaveaccidentlyvisited

thewebpage and hence all users spending less than some calculated threshold time,

needstobescrapped.Similarlyuserswaitingonasection formorethancertain fixed

timeshouldberemoved.Thesestepsareimportanttoensurethattherearenooutliers

inthecollecteddataandthemodelthatwouldbebuiltandtrainedonthisdataisbest

suitedforgeneralusageonthewebsite.

Sincetheabsolutetimespentondifferentelementofthewebpagedependsonanumber

of other features primarily the speed of an individual user, the data needs to be

normalized.Dividingthetimespentbyauseronanindividualsectionbythetotaltime

Page 37: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

28

spent by that user on the website would give the proportion of time spent by him

readingthatsectionofthewebpage.

Hencethefinal trainingdatashouldonlycontainvalidusersresponsesof theproduct

bought along with the normalized breakup of the time spent by them on various

sectionsofthewebpage.

3.3.2 Implementation

Firstallthedataneedstobecompiledintoasingletableasstatedaboveandthenneeds

tobecleaned.

3.3.2.1 Datacompilation

A new PHP script named ‘alignData.php’ was written to compile the data into more

usable format.This filewouldwriteall thedatatoanewtablenamed ‘finalData’with

following 134 attributes (132 corresponding to the time spent in 132 sections of the

webpage(independentvariables),1torecordtheuserIDoftheuserand1istosavethe

codeofthefinalproductbought(target/dependentvariable).Thefinalproductbought

wouldbethepredictedvariableinourmodelthatshallbediscussedinthenextchapter.

Theattributesofthe‘finalData’tableare:

• userIDTorecordtheuseridoftheuser

• a0Timeinmillisecondsspentincell‘a0’ofthewebpage

• a1Timeinmillisecondsspentincell‘a1’ofthewebpage

• a2Timeinmillisecondsspentincell‘a2’ofthewebpage

• a3Timeinmillisecondsspentincell‘a3’ofthewebpage

• a4Timeinmillisecondsspentincell‘a4’ofthewebpage

• a5Timeinmillisecondsspentincell‘a5’ofthewebpage

Page 38: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

29

• b0Timeinmillisecondsspentincell‘b0’ofthewebpage

• b1Timeinmillisecondsspentincell‘b1’ofthewebpage

• b2Timeinmillisecondsspentincell‘b2’ofthewebpage

• .

• . Similarlyfrom‘b3’to‘u3’

• .

• u4Timeinmillisecondsspentincell‘u4’ofthewebpage

• u5Timeinmillisecondsspentincell‘u5’ofthewebpage

• v0Timeinmillisecondsspentincell‘v0’ofthewebpage

• v1Timeinmillisecondsspentincell‘v1’ofthewebpage

• v2Timeinmillisecondsspentincell‘v2’ofthewebpage

• v3Timeinmillisecondsspentincell‘v3’ofthewebpage

• v4Timeinmillisecondsspentincell‘v4’ofthewebpage

• v5Timeinmillisecondsspentincell‘v5’ofthewebpage

• boughtTosavethecodeofthefinalproductboughtbytheuser

The‘alignData.php’fileselectsalltheresponsesstoredinthetables‘data’and‘bought’

andsave them in the table ‘finalData’.Theattribute ‘userID’ is theprimarykeyof the

table.Thealgorithmthatwasimplementedinthe‘alignData.php’filewas:

1. Selectalistofuniqueusersfromthetable‘data’

2. Foreachuserwithid‘userID’,do‐

a. Selectallthedata(cellIDsandassociatedtime)correspondingtothatuser

fromthetable ‘data’.UsethesumaggregatefunctionintheSQLontime

andgroupthembycellIDs.

b. Thiswill give the total timespentoneachvisitedcell, i.e. sectionof the

webpagevisitedbythatuser.

Page 39: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

30

c. Timespentonallothercells,i.e.sectionsnotvisitedbythatuserismade

zero.

d. Insertall the timevalues foreachcell in the ‘finalData’ tablealongwith

theuser’suserID.

e. Selectthefinalproductboughtbytheuserusingaselectstatementonthe

table‘bought’.Incasetheuserhasnotboughtanyproduct,i.e.theoutput

from the ‘bought’ table for that user is empty, assign him a product

number0.

f. Update the ‘finalData’ table by inserting the value for the ‘bought’ field

correspondingtothatuser.

Aftersuccessfulexecutionofthisalgorithminthe‘alignData.php’script,the‘finalData’

tablecontainedallthedatacollectedfromtheinitialwebsiteinatabularmannerwith

eachrowcorrespondingtoauniqueuser.Thisdatacannowbeuseddirectlyformodel

buildinginWEKAbutitneedssomecleaning.

The‘data’tablehadatotalof16401tupleswith292uniqueuserswhereasthe‘bought’

tablehad244tuples.Afterexecutingtheabovescript,thetotalnumberoftuplesinthe

‘finalData’tablewas292.Outof292tuples,48(292minus244)userswerethosewho

leftthesitewithoutselectinganyproduct.Thistable‘finalData’wasthenexportedina

spreadsheetformat(MicrosoftExcel)foranalysis,visualizationandcleaning.

3.3.2.2 Datacleaning

Onof theobtained292rowsofdata inexcel, thenext task is thedatacleaningstage.

Thisstepistoremovealltheoutliersandothercasesthatcanharmthetrainingofthe

model andeventually canharm themodel.There canbemultiple reasonsbehind the

occurrences of such unwanted cases in the initial dataset such as, non serious

respondents, accidently entering the webpage and closing it immediately, accidently

Page 40: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

31

pressing the enter key, leaving the computer with website on while working on

somethingelse,etc.

Thefollowingstepstocleanthecollecteddatawerefollowed:

• Allthetupleswherethevalueoftheattribute‘bought’ is0, i.e.theuserhasnot

boughtanyproductweredeleted.Thiswasbecausetheobjectiveoftheprojectis

to select the best product for a user and hence the training set should only

containuserswhohaveboughtaproduct.Trainingthemodelondatapredicting

thattheuserwouldnotbuywouldmakethemodelinappropriateforuseinthe

currentproject.

Therewereatotalof48suchtupleswheretheboughtproductvalueas0.

The number of tuples in the left data were 244 each corresponding to a

uniquevisitor.All244usershaveboughtaproduct(dependentvariable is

not0)

• Thetotaltimespentbyauserwascalculatedforalltheusersusingsimpleexcel

inbuiltsumfunction.Thedistributionof the total timespentbydifferentusers

onthebuiltwebpagewasstudied.

It was found that the average time spent by a user on the webpagewas

33.08 seconds. The maximum time spent by a user was 225.8 seconds

whereastheminimumwas1.2seconds.

• Theminimumandthemaximumtimespentbyanyuserwereanalyzedto find

theoutliers.Sincetheminimumtimeinthecurrentdataismuchlowerthanthe

expectedminimumtimeanyseriousvolunteerwouldspend,athresholdvalueof

8secondswasselected.Themaximumtimeof225.8secondswasfoundfeasible

andhencenoupperlimitwascalculated.

Page 41: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

32

Thisvalueof8secondswasanalyzedasafeasiblevaluekeepinginmind

the webpage design. It was assumed that any user taking less than 8

secondsonthatwebpagehasgivenincorrectdataandwillbeconsidered

asanoutlier.Therewere44userswhospentlessthan8secondsonthe

initial website while giving training data for model building. Rows

associated with all 44 users were deleted from the collected sample

leavingthesamplesizeto200tuples.

The average time spent by a user became 40.26 seconds and the

minimumtimespentbyauserinthenewdatasetbecame8.3seconds.

3.3.2.3 Datanormalization

Thedata collected from thevolunteershave the132 time fields corresponding to the

time spent in 132 sections of thewebsite in absolute value. Itwas realized that data

normalizationwouldberequired.Thereasonbehindthiswasthatdifferentpeoplehave

spent different time on the webpage. The time spent depends upon their individual

browsingspeed,readingspeedandotherseveralpersonalattributes.Sincethedesired

model has to cater a general audience, time spent in one section relative to the time

spentintheothersectionswasthoughttobemoreappropriate.

Thereareseveraladvantagesofthisstep,primarilyalsothatthemodelnowwouldbe

capableofpredictinginrealtimeforauserwhoisinprocessofbrowsingthewebpage.

Wheneverthepredictionisneeded,thecurrenttimesspentinvarioussectionscouldbe

normalizedandfedintothemodel.Since,themodelnowwouldbeimmunetoabsolute

timevalue,witheverypredictionforthesameuser,themodelwouldnotbebiasedon

the time spentbyhimbutwoulddependonlyon the relative time spentondifferent

sections of thewebpage. Another advantage is that all the data used for training the

Page 42: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

DataCollectionandPre‐processing

33

modelisnowequivalent.The200casesinthetrainingsetaremorecomparableanddo

notvaryonabsolutescale.Thisstepisexpectedtotrainthemodelbetter.

Implementation

Tocarryoutdatanormalization,thetotaltimespentbyauserwascalculatedinexcel

(alsodone indata cleaning step).Time spent in individual sectionof thewebpageby

thatuserwas thendividedby the total time spent on thewebpagebyhim.This step

gavethepercentagetimespentbytheuserineachsectionofthewebpage.

Thenewdatasetwith200tuplesand134attributes(132independentvariablesand1

dependent variable) with normalized time data was saved in the CSV format, which

couldbeimporteddirectlyintoWEKAformodelbuildingtask.Thenextchapterwould

explaintheprocedureofbuildingmachinelearningmodelsonWEKAusingthedataas

collectedinthischapter.

Page 43: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

34

4 BUILDINGMACHINELEARNINGMODELS

Using the collecteddata, variousmachine­learningmodelswere built and

tested.Thischapterexplainsthecompletemethodologyfollowedalongwith

the details of the models obtained. It later explains the best models that

wereselectedandtherationalebehindthem.

4.1 MachineLearning

According toWikipedia1 “MachineLearning is a scientificdiscipline that is concerned

withthedesignanddevelopmentofalgorithmsthatallowcomputerstolearnbasedon

data.Suchasfromsensordataordatabases.”Itcanbedefinedasasetofalgorithmsto

automatically learn and recognize complex patterns and are capable of making

intelligentdecisionsbasedondata.

There are several softwares available that could be used to build and implement

machine‐learningmodels.MATLABandWEKAaretwocommonlyusedsoftwares.The

modelsusedintheprojectwerebuiltusingWEKA.

1Wikipedia,MachineLearning‐Wikipedia,http://en.wikipedia.org/wiki/Machine_learning.

Page 44: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

35

4.1.1 WEKA

Weka1isopensourcedataminingsoftwarewritteninJava.Itisprimarilyacollectionof

various machine‐learning algorithms that could be applied directly and easily on

differenttypesofdata.Ithasabuilt‐ininterfacetovisualizethedataandcanperform

tasks like attribute selection, clustering etc. It is available under General Public GPU

Licenseandcanbedownloadedfromitswebsite.

4.1.2 WhyMachineLearning?

The primary objective of the project is to automatically learn the user’s mouse

movementbehaviorfromthecollectedtrainingdata.Machinelearningasstatedabove

isabranchofsciencethatdealswithalgorithmsthatarecapableof learningpatterns.

Thisexactlyfitstheprimaryrequirement.

Theprojectfurtherdemandscapabilitytopredictfurthercontentforanewuserbased

onhismousemovements.Machine learningalgorithmsonce trainedona large setof

data are then capable of predicting the value of the dependent variable for any new

case.Moreover,machine‐learningalgorithmscanbetrainedagainandagainwithnew

data.Thecompleteobjectiveoftheprojectcaneasilybecateredusingmachine‐learning

algorithms.

4.2 Methodsevaluated

Inmachinelearning,inordertoclassify/predictforanynewcase,amodelisfirstmade

andtrainedontrainingdata.Therecanbeanumberofdifferent typesofmodels that

1TheUniversityofWaikato,Weka3:DataMiningSoftwareinJava,

http://www.cs.waikato.ac.nz/ml/weka/.

Page 45: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

36

canbebuiltandfurtheralotofdifferentalgorithmstobuiltamodel.Differenttypesof

machine learningmodelsgenerallyusedareDecisionTrees,NeuralNetworks,Genetic

Algorithms,FuzzyNetworksetc.TokeepthescopeofthisprojectinmindonlyDecision

TreesandNeuralNetworksbasedmodelswereevaluated.Thedatawasmodeledusing

both themethods using J48Classification algorithm for decision trees andmultilayer

perceptrons forneuralnetwork.The twomodelswere laterevaluatedon the training

data.

4.2.1 DecisionTree

A decision tree can be defined as a decision support classifier that uses a tree like

structureof conditions and theirpossible consequences.Eachnodeof adecision tree

canbealeafnodeoradecisionnodewhere‐

• Leafnode–Thesenodementionsthevalueofthedependent(target)variable

• Decisionnode–Thesenodescontainoneconditioneachspecifyingsometeston

a single attribute‐value. The outcome of the condition is further divided into

brancheswithsub‐treesorleafnodes.

Theattributethatistobepredictedisknownasthedependentvariable,sinceitsvalue

depends upon, or is decided by, the values of all the other attributes. The other

attributes,whichhelp inpredictingthevalueofthedependentvariable,areknownas

theindependentvariablesinthedataset.

4.2.2 NeuralNetwork

“An Artificial Neural Network is an interconnected assembly of simple processing

elements,unitsornodes(neurons),whosefunctionalityisinspiredbythefunctioningof

thenaturalneuronfrombrain.Theprocessingabilityoftheneuralnetworkisstoredin

Page 46: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

37

theinter‐unitconnectionstrengths,orweights,obtainedbyaprocessoflearningfroma

setoftrainingpatterns.”1

4.3 Implementedalgorithms

ThereareseveralalgorithmsfordecisiontreescommonlyusednowdaysnamelyID3,

C4.5,C5.0etc.Aftercarefulevaluationofthesethreealgorithms,C4.5waschosenforthe

project.ThereasonbehindchoosingC4.5overID3andC5.0were:

• C4.5handles continuous variables in a betterwayby creating a threshold and

then splitting the list on that value. Since all the attributes in the required

decisiontreearecontinuouswhereasthetargetvariablehasfivediscretevalues,

C4.5wasused.

• C4.5hasacapabilitytoprunetrees.Pruningisamethodofgoingbackwardsina

tree to remove any branches that do not help in further classifications and

replacethembyleafnodes.

• C5.0isgenerallyrankedaboveC4.5becauseofitshigherspeedofbuildingatree

andlowmemoryrequirements.Sincethescopeoftheprojectdemandednoneof

these features, therewas no significant advantagewith C5.0. Also C5.0 can be

used to weighting attributes, which wasn’t required in the problem under

consideration.

Similarly, Neural networks can be implemented in one of the various availableways

namely‐ Feedforward neural network, Radial basis function network, Kohonen self‐

organizing network, Recurrent network, Stochastic neural networks, Modular neural

1KevinNGurney,Anintroductiontoneuralnetworks,illustrated(CRCPress,1997).

Page 47: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

38

networks,Holographicassociativememoryetc.Theneuralnetworkimplementedinthe

projectwasafeedforwardneuralnetworkwithnon‐linearactivationfunction.

4.3.1 DecisionTree(C4.5)

WEKAimplementsDecisiontreeC4.5algorithmusing‘J48Decisiontreeclassifier’.The

explanationoftheC4.5algorithmaswellastheJ48implementationisasfollows:

• Whenever a set of items (training set) is encountered, the algorithm identifies

theattributethatdiscriminatesthevarious instancesmostclearly.This isdone

usingthestandardequationofinformationgain

• Amongthepossiblevaluesofthisfeature,ifthereisanyvalueforwhichthereis

noambiguity,thatis,forwhichthedatainstancesfallingwithinitscategoryhave

the same value for the target variable, then that branch is terminated and the

obtainedtargetvalueisassignedtoit.

• For all other cases, another attributes are looked that gives the highest

informationgain.

• Thisiscontinuedinthesamemanneruntileitheracleardecisionofthevalueof

the target variable is reached with a combination of conditions on various

independentvariables/attributes,orwerunoutofattributes.

• Intheeventofrunningoutofattributes,orgettinganambiguousresultfromthe

available information,thebranchisassignedatargetvaluethatthemajorityof

theitemsunderthisbranchpossess.

ThenameoftheclassifierinWEKAthatfollowstheabovementionedC4.5algorithmis

‘weka.classifiers.trees.J48’

Page 48: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

39

4.3.2 NeuralNetwork(MultilayerPerceptron)

Multilayer perceptrons is a feedforward neural network based classifier that uses

backpropogationtoclassifyinstances.Allthenodesinthisnetworkaresigmoids,which

meansthattheactivationfunctionisasigmoid.

In a multilayer perceptron, there is an input layer with a node each for all the

independentvariables,at leastonehiddenlayerandanoutputlayerwithanodeeach

for different classes of the target variable. The network is trainedby initial data that

determinestheappropriateweightsforconnectionsbetweenallthenodesofadjacent

layersandalsodeterminesthebias/thresholdvalueofeachnode.

ThenameoftheclassifierinWEKAis‘weka.classifiers.functions.MultilayerPerceptron’

4.4 Modelbuilding

WEKAwasopenedinExplorermodeandthesavedCSVfilewasopenedusingtheopen

file button in the preprocess tab of WEKA. From the attributes pane, the attribute

userID was deleted. This is because this field is irrelevant in the process of model

building. The filewas then saved in Attribute‐Relation File Format (ARFF) simply by

clickingthesavebutton.ThesavedARFFfilewasopenedinatexteditortochangethe

properties of predicted variable, i.e. attribute ‘bought’ fromnumber to nominal scale.

Thisisessentialstepbecausethe‘bought’variablehasonlyfivediscretevalueseachfor

eachproduct.ThiswillalsoenabletheuseofJ48treeclassifier,asthenominaldatafor

thepredictedvariable is a requirement.To convert ‘bought’ fromnumber tonominal

mode, the property ‘numeric’ was changed to ‘{1,2,3,4,5}’, where 1,2,3,4,5 were the

codes for the five laptopproducts.Theoutputexpected fromthemodels isoneof the

fivelaptopcodes.Filewassavedandclosed.

Page 49: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

40

4.4.1 DecisionTree

The saved ARFF was then re‐opened in WEKA and under the classify tab, J48 tree

classifierwas chosen.There aredifferentparametersof J48 tree classifier likebinary

splits,numberof folds,pruningetc.Using trialanderrormethod,variousparameters

were changed and each model was tested for accuracy on the training data. Models

were tested using two methodologies namely testing directly on training data and

testingusingcrossvalidation.Thesetofparametersgivingthemaximumpercentageof

correctlyclassifiedinstanceswerechosen.Thefinalmodelgivingmaximumaccuracyon

thetrainingdatasetwasalsosavedforlateruse.

4.4.1.1 Detailsofthechosendecisiontree

Thefinalparametersselectedthatgavethebestoutputontrainingdataare‐

• binarySplits:ByWEKAdefinitionofthisparameter,itisconsideredfornominal

variables only. Since the dataset under consideration had no nominal

independentvariable,thevalueofthisattributehadnoimpactonthebuilttree.

• confidenceFactor:Thisattributedefinestheconfidencefactorusedforpruning.

Itwasfoundthataconfidencefactorvalueof0.75,agoodaccuracydecisiontree

wasobtainedwhenC4.5pruningwasused.

• debug:Thisparameterisonlyusedtooutputsomeadditionalinformationatthe

console.Itsvalueofeithertrueorfalsedidn’timpactthefinalmodel.

• minNumObj: Thisdetermines theminimumnumberof instances at every leaf

node.Thisattributewassettoavalueof‘2’.

• numFolds: This parameter determines the amount of data used for reduced‐

errorpruning.Inthedecisiontreebuilt,numFoldswaskeptat ‘11’.Thiswould

meanthatonefoldwasusedforpruning,andrestforgrowingthetree.

Page 50: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

41

• reducedErrorPruning: This was set to ‘False’ as it signifies if reduced‐error

pruningshouldbeusedinsteadofC.4.5pruning.

• saveInstanceData:Thisattributeisjusttosavetheinstanceforvisualizationin

future

• seed:Theseeddetermines thenumberof seeds tobeusedwhile randomizing

thedatawhenreduced‐errorpruningistobeused.Sincereduced‐error‐pruning

wasnotused,seedparameterhadnorelevance.

• subtreeRaising: Subtree raisingwhile pruning is always advisablewhen used

with a high confidence factor. Since a confidence factor of 0.75was used, this

parameterwassetas‘true’.

• unpruned:Sincewewantedpruningtohappen,the ‘unpruned’parameterwas

setto‘false’.

• useLaplace:Thisparameterdeterminesifcountsatleavesaresmoothedbased

onLaplace.Theparameterhadnoinfluenceonthemodeloutput.

Alltheparametersusedinthefinaldecisiontreecanbesummarizedas‐

Page 51: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

42

Figure8:ParametersusedforbuildingtheDecisionTreemodel

TheoutputfromWEKAisasfollow:

===Runinformation===Scheme:weka.classifiers.trees.J48‐L‐C0.75‐M2‐ARelation: MLData_Normalized‐weka.filters.unsupervised.attribute.Remove‐R1Instances:200Attributes:133[listofattributesomitted]

Page 52: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

43

Testmode:evaluateontrainingdata===Classifiermodel(fulltrainingset)===J48prunedtree‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐b5<=0.04509|k4<=0.013828||v1<=0.000362|||r0<=0.000626||||d5<=0.003481|||||d5<=0.001586||||||g4<=0.033267|||||||s3<=0.004874||||||||u1<=0.002108|||||||||f1<=0.039667||||||||||f4<=0.028894|||||||||||i4<=0.004699||||||||||||d2<=0.001173|||||||||||||e5<=0.001377||||||||||||||e1<=0.029566|||||||||||||||r3<=0.000861||||||||||||||||c1<=0.043665|||||||||||||||||a3<=0.206815||||||||||||||||||b1<=0.007319|||||||||||||||||||f3<=0.001471||||||||||||||||||||b4<=0.00214:2(11.0/1.0)||||||||||||||||||||b4>0.00214|||||||||||||||||||||a4<=0.004126:3(3.0)|||||||||||||||||||||a4>0.004126:2(2.0)|||||||||||||||||||f3>0.001471:3(3.0)||||||||||||||||||b1>0.007319|||||||||||||||||||b3<=0.123969:2(12.0/2.0)|||||||||||||||||||b3>0.123969:1(2.0/1.0)|||||||||||||||||a3>0.206815:1(2.0/1.0)||||||||||||||||c1>0.043665:1(3.0/1.0)|||||||||||||||r3>0.000861:3(3.0/1.0)||||||||||||||e1>0.029566:1(3.0)|||||||||||||e5>0.001377:3(2.0)||||||||||||d2>0.001173|||||||||||||s4<=0.002873:2(32.0/1.0)|||||||||||||s4>0.002873:4(2.0)|||||||||||i4>0.004699:1(3.0/1.0)||||||||||f4>0.028894:4(2.0)|||||||||f1>0.039667:3(3.0)||||||||u1>0.002108:3(6.0/1.0)

Page 53: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

44

|||||||s3>0.004874||||||||q1<=0.004708|||||||||r4<=0.007391:3(16.0)|||||||||r4>0.007391:2(2.0)||||||||q1>0.004708:2(2.0/1.0)||||||g4>0.033267|||||||g5<=0.004141||||||||k4<=0.001354:4(8.0)||||||||k4>0.001354:3(3.0/1.0)|||||||g5>0.004141:2(3.0/1.0)|||||d5>0.001586:4(4.0)||||d5>0.003481|||||g5<=0.004141||||||b5<=0.002996|||||||g4<=0.003922:2(4.0)|||||||g4>0.003922:1(2.0)||||||b5>0.002996:3(2.0)|||||g5>0.004141:5(3.0)|||r0>0.000626:4(3.0/1.0)||v1>0.000362|||s4<=0.005561||||t4<=0.002371|||||e0<=0.001979||||||h2<=0.005305:1(18.0/1.0)||||||h2>0.005305:2(2.0)|||||e0>0.001979:2(2.0)||||t4>0.002371:2(2.0/1.0)|||s4>0.005561:2(2.0/1.0)|k4>0.013828||f5<=0.001805:4(9.0/1.0)||f5>0.001805:2(2.0/1.0)b5>0.04509|t3<=0.000515||d4<=0.008991|||e2<=0.011901||||a1<=0.001341|||||g2<=0.001762:4(3.0/1.0)|||||g2>0.001762:5(2.0)||||a1>0.001341:5(4.0)|||e2>0.011901:4(3.0)||d4>0.008991:2(2.0/1.0)|t3>0.000515:3(3.0)NumberofLeaves: 42Sizeofthetree: 83

Page 54: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

45

Timetakentobuildmodel:0.75seconds

4.4.1.2 Testingthedecisiontree

Themodelwastestedusingtwodifferentmethodologiesnamely,testingdirectlyonthe

trainingdatasetandtestingusingcross‐validationwith10folds.

Testingonthetrainingdatagavearesultof89.5%accuracywhereastestingusingcross

validationgaveanaccuracyof66%.Thecompleteresultalongwiththediscussionisas

follows:

4.4.1.2.1 TestingonTrainingData

===Evaluationontrainingset======Summary===CorrectlyClassifiedInstances 179 89.5%IncorrectlyClassifiedInstances 2110.5%Kappastatistic 0.8586Meanabsoluteerror 0.1650Rootmeansquarederror 0.2382Relativeabsoluteerror 54.9013%Rootrelativesquarederror 61.5103%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision Recall F‐MeasureROCArea Class 0.848 0.030 0.848 0.848 0.848 0.953 1 0.959 0.0790.8750.9590.915 0.969 2 0.932 0.0190.932 0.9320.932 0.994 3 0.795 0.0190.9120.7950.849 0.987 4 0.8180.0001.0000.8180.900 0.999 5WeightedAvg.0.895 0.0420.8970.8950.894 0.977===ConfusionMatrix=== a b c d e <‐‐classifiedas 28 4 0 1 0 |a=11 70 1 1 0 |b=2

Page 55: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

46

1 2 41 0 0 |c=33 3 2 310 |d=40 1 0 1 9 |e=5

4.4.1.2.2 TestingbyCross‐Validation(folds10)

===Stratifiedcross‐validation======Summary===CorrectlyClassifiedInstances 132 66%IncorrectlyClassifiedInstances 6834%Kappastatistic 0.5303Meanabsoluteerror 0.2133Rootmeansquarederror 0.308Relativeabsoluteerror 70.9833%Rootrelativesquarederror 79.5133%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision Recall F‐MeasureROCAreaClass 0.545 0.072 0.6000.5450.571 0.865 1 0.890 0.315 0.6190.8900.730 0.871 2 0.500 0.000 1.0000.5000.667 0.874 3 0.538 0.081 0.6180.5380.575 0.833 4 0.545 0.016 0.6670.5450.600 0.983 5WeightedAvg.0.66 0.143 0.7020.660.653 0.869===ConfusionMatrix=== a b c d e <‐‐classifiedas 18 14 0 0 1 |a=1 7 65 0 1 0 |b=2 1 13 22 7 1 |c=3 4 13 0 21 1 |d=4 0 0 0 5 6 |e=5

4.4.1.2.3 Discussion

Testingdirectlyonthetrainingdataclassified179casescorrectlyoutof200,whichis

an accuracy of 89.5%.Accuracywhile testing on training data is always desired very

highbecauseitsignifiestheextenttowhichthemodelhaslearntthetrainingdata.Since

Page 56: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

47

therewere5 classes in the target variable (5 products), any accuracy inmodelmore

than20%(equalprobabilityofeachclassis1/5=0.2=20%)hastobeconsideredgood.

Accuracy of 89.5% iswellwithin the error range and signifies that the built decision

treehaslearntthetrainingdataquiteaccurately.

Testingusing cross‐validation is a process of dividing thedata intodifferent sub sets

andthencarryingouttheanalysisononesubsetandtestingitonother.Doingthiswith

10foldsistheprocessofcarryingoutcross‐validation10timesandaveragingoutthe

accuracy score. Again, as stated above any accuracy of more than 20% is good. The

achievedresultofanaverageof132correctclassificationsoutof200withanaccuracy

of66%iswellwithinthedesiredrange.

Ideally,themodelshouldhavebeentrainedonmoredata.Duetothelimitationoftime,

andno compensation available to volunteers, only200 tuples of useful data couldbe

collected.Itisexpectedthatwiththebiggertrainingdataset,theaccuracyofthemodels

wouldincrease.

4.4.2 NeuralNetwork

The saved ARFF file was re‐opened in WEKA and under the classify tab,

MultilaterPerceptron functionwas chosen. There are different parameters associated

with this neural network function and as done with decision trees, trial and error

methodwasusedtofindthebestset.Thebestsetofparameterswastheonethatgave

maximumaccuracyof classificationon the trainingdataset.Eachobtainedmodelwas

tested using two methodologies namely testing directly on training data and testing

usingcrossvalidation.Aftermultiple iterationsusing trail anderrormethod,amodel

givingagoodaccuracyofclassificationwasobtained.Themodelwasalsosavedforlater

use.

Page 57: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

48

4.4.2.1 Detailsofthechosenneuralnetwork

Thefinalparametersselectedthatgavethebestoutputontrainingdataare‐

• GUI:TheGUIparameterbringsupaninterface.Itdoesn’treallyimpactthefinal

model, unless some changes in the learning rate and momentum are desired

whiletraining.Itwassetas‘False’intheproject.

• autoBuild:AnANNwasbuiltautomaticallyandhencethisparameterwassetto

‘true’

• debug:Thisistoviewadditionalinformationontheconsole.

• decay:Itwasobservedthatthe‘true’decayvaluegaveslightlylessaccuracyand

henceinthefinalmodel,‘decay’wassetto‘false’

• hiddenLayers:Sinceanautomaticneuralnetworkwasdesired,theWEKAwas

lefttodecidethenumberofhiddenlayersandhencethefinalsetofparameters

had a value of ‘a’ in the field of hiddenLayers. ‘a’ when used as a value for

hiddenLayersmean‘automatic’.

• learningRate:Theamountatwhich theweights shouldbeupdatedwasset to

0.1

• momentum:Momentumof0.2wasappliedtotheweightsduringupdating.

• nominalToBinaryFilter:Therewerenonominalvariablesinthedataandhence

thisparameterhadnoimpactonthemodel

• normalizeNumericClass:Sincetheclassisnotnumericbutalreadynormalized,

therewasnouseofusingthisfeatureandhenceitwassetto‘false’

• reset:Whentheresetwassettofalse,noerrormessagewasreceived.Moreover

thesetlearningrateof0.1isalreadyquitelowandhencethisfeaturewassetas

‘false’

• seed:Seedvalueof0wasused.Asincaseofdecisiontrees,thisvalueisusedto

initialize the randomnumbergenerator.Randomnumbersareused for setting

Page 58: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

49

the initialweightsof theconnectionsbetweennodes,andalso forshuffling the

trainingdata.

• trainingTime:Thenumberofepochstotrainthroughwassetto5000.

• validationSetSize:Thepercentagesizeofthevalidationsetwasmade0which

signifiesthatnovalidationsetwillbeusedandinsteadthenetworkwilltrainfor

thespecifiednumberofepochs,i.e.for5000epochs

• validationThreshold:Thisparameterwassetto20whichdictatesthat20times

inarowthevalidationseterrorcangetworsebeforetrainingisterminated.

Theparametersusedinthefinalneuralnetworkmodelcanbesummarizedas:

Page 59: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

50

Figure9:ParametersusedforbuildingtheNeuralNetworkmodel

Page 60: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

51

Itwas found impossible to include the completemodel output in this document, andhencethesummaryofthemodelobtainedisasfollows‐===Runinformation===Scheme:weka.classifiers.functions.MultilayerPerceptron‐L0.1‐M0.2‐N5000‐V0‐S0‐E20‐Ha‐RRelation:MLData_Normalized‐weka.filters.unsupervised.attribute.Remove‐R1Instances:200Attributes:133[listofattributesomitted]Testmode:10‐foldcross‐validation===Classifiermodel(fulltrainingset)===

The chosenneuralnetworkhad1hidden layerwith68nodes.Therewere132 input

nodes accepting 132 normalized time values corresponding to each section of the

webpage.Themodelhad5outputnodeseachforoneofthefivelaptops.

Therewereatotalof73thresholdvaluesfor73nodes(68hiddenlayernodes+5output

nodes)andtherewere9316weightvalues(132*68+68*5)

4.4.2.2 Testingtheneuralnetworkmodel

The neural network model was also tested similarly as decision trees using two

different methodologies namely, tested directly on training set and using cross‐

validationwith10folds.

Itwasfoundthattestingontrainingdatasetgaveanexceptionallygoodresultof95.0%

whereas testing using cross validationwith 10 folds gave a classification accuracy of

41.0%

Page 61: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

52

4.4.2.2.1 TestingonTrainingData

===Evaluationontrainingset======Summary===CorrectlyClassifiedInstances 190 95%IncorrectlyClassifiedInstances 10 5%Kappastatistic 0.9335Meanabsoluteerror 0.0219Rootmeansquarederror 0.1313Relativeabsoluteerror 7.2772%Rootrelativesquarederror 33.8899%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision RecallF‐MeasureROCArea Class 0.939 0.012 0.939 0.9390.9390.966 1 0.918 0.024 0.957 0.9180.937 0.936 2 1 0.026 0.917 1 0.9570.993 3 0.949 0.006 0.974 0.9490.9610.957 4 1 0 1 1 1 1 5WeightedAvg. 0.95 0.017 0.951 0.950.95 0.961===ConfusionMatrix===a b c d e <‐‐classifiedas 312 0 0 0 |a=12 673 1 0 |b=20 0 440 0 |c=30 1 1 370 |d=4 0 0 0 0 11 |e=5

4.4.2.2.2 TestingbyCross‐Validation(folds10)

===Stratifiedcross‐validation======Summary===CorrectlyClassifiedInstances 82 41%IncorrectlyClassifiedInstances 11859%Kappastatistic 0.2165Meanabsoluteerror 0.236Rootmeansquarederror 0.4551Relativeabsoluteerror 78.4778%

Page 62: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

53

Rootrelativesquarederror 117.4608%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRateFPRatePrecision Recall F‐MeasureROCArea Class 0.3330.12 0.355 0.3330.344 0.614 1 0.5750.22 0.6 0.5750.587 0.706 2 0.2950.2370.26 0.2950.277 0.626 3 0.2820.1550.306 0.2820.293 0.652 4 0.455 0.0420.385 0.4550.417 0.856 5WeightedAvg.0.41 0.1850.415 0.41 0.412 0.671===ConfusionMatrix=== a b c d e <‐‐classifiedas 118 5 8 1 |a=1 9 42 192 1 |b=2 6 12 13 112 |c=3 5 7 12 114 |d=4 0 1 1 4 5 |e=5

4.4.2.2.3 Discussion

Testing on the training data classified 190 cases correctly out of 200, which is an

accuracyof95.0%.Suchahighvalueofclassificationaccuracyclearlysignifiesthatthe

builtneuralnetworkmodelhaslearntthetrainingdatawithhighaccuracy.

Testingusing cross‐validation is a process of dividing thedata intodifferent sub sets

andthencarryingouttheanalysisononesubsetandtestingitonother.Doingthiswith

10foldsistheprocessofcarryingoutcross‐validation10timesandaveragingoutthe

accuracy score. The achieved result of an average of 82 correct classifications out of

200,i.e.anaccuracyof41.0%iscomparativelylowbutiswellwithinthedesiredrange.

Ideally,themodelshouldhavebeentrainedonmoredata.Duetothelimitationoftime,

andno compensation available to volunteers, only200 tuples of useful data couldbe

Page 63: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

54

collected.Sincethereisahiddenlaterwith68nodesandatotalof9316weightvalues

areinvolved,amuchbiggertrainingdatasetwasrequired.It isexpectedthatwiththe

biggertrainingdataset,theaccuracyoftestingwouldincrease.

4.4.3 DecisionTreeVsNeuralNetworks

Basedontheinitial200datacases,onemodeleachofdecisiontreeandneuralnetwork

wastrained.Upontestingonthetrainingdataset,decisiontreeshowedslightlybetter

accuracy as compared to the neural network model. The other factors worth

consideringaboutthetwomodelsare:

• Building a neural network model is easy but time consuming in WEKA but

moreover, it slows down the performance of the website after its

implementation.Theobjectiveoftheprojectistodeterminetheproductforthe

users in real‐time while they are still browsing and it will require very fast

computation.Decisiontreesareasetofconditions,whichcanbeevaluatedmuch

efficiently than the calculations and temporary variables required in neural

networks. However, if a parallel web server is used which is capable of

performing calculations faster, a neural network could also be considered for

implementation.

• With time the website would keep accumulating more and more mouse

movement data and the model should be improved / trained on new data

whenever required. Thiswould require re‐implementing the newmodel every

timetheupdatingisdesired.Asstatedabovethiswouldbemoredifficult, time

consuminganderrorproneinneuralnetworksascomparedtodecisiontrees.

• Decision trees are more transparent as compared to neural network models.

Thismeanthatforapersonvisuallyseeingthetwomodels,adecisiontreecould

give him some information where as a neural networks can visually tell him

Page 64: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Buildingmachinelearningmodels

55

nothing.Thiswashowevernotoneofthepointsconsideredbeforetakingafinal

callonthemodeltobechosen.

Despiteallthesepoints, finalmodelsofbothneuralnetworksanddecisiontreeswere

implemented in twosimilarcopiesof thesamewebsite.Further testsofaccuracyand

performancewereconductedlaterinordertoconcludeabettermodelfortheproblem

inhand.

Thenextchapterwillexplainthestepsrequiredtoputthesemodelsintothewebsiteso

thattheycanbeusedinrealtimeforauserforpredictingrelevantcontentforhim.

Page 65: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

56

5 EMBEDDINGTHEMACHINELEARNINGMODELSINTHEWEBSITE

Thischapterexplainsthecompletemethodologyadoptedtoapplythebuilt

machine learning models in the website. It also explains the interaction

betweenthemodelandthewebsiteandhowauser’smousemovementdata

wasusedtopredictthebestcontentforhiminrealtime.

5.1 WhatandWhy?

As explained in the previous chapter, a decision tree and a neural network model

capableofpredicting theproduct theuser ismost likely tobuyweremodeled.These

models needs to be implemented in thewebsite so that they can take furthermouse

movement behavior of new users as input and can predict for him the appropriate

product.

5.2 Specifications

The initialwebsitebuiltasexplained inChapter3 forcollecting the trainingdatawas

modified to implement thedecision treeandneuralnetworkmodels.Someadditional

characteristicsrequiredfromthewebsitewere:

Page 66: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

57

• Themodel should resideon the server.This is essential fromsecuritypointof

viewelseanyuserwouldhaveaccesstothemodelwhichbyreverseengineering

cangiveinformationabouttheproductsboughtbyotherusers.

• Realtimemodelevaluationontherealtimemousemovementdata.

• Realtimetransferofmodeloutputfromthewebservertothefrontendwebsite

sothatthewebsitecanusethemodelprediction.

• Determining the product the user is most likely to buy using the embedded

modelsperiodicallyaftereverysay10seconds.Thiswouldinvolveincludingthe

latestmousemovementdataandtransferringtheoutputagaintothefrontend

HTMLwebsite so that if any change ispredicted in the finalproduct, it canbe

reflectedonthefrontend.

• Allthetrackingandmodelevaluationwascarriedoutinahiddenlayerandthe

user was not asked for any explicit information or was not compromised on

speedandperformance.

• Not tomention, thewebsiteshouldcontinue to trackmousemovementaswas

explainedinearlierchapters.

5.3 Implementation

The website built initially to collect training data had mouse movement tracking

capability.Afewnewfunctionsandscriptswereaddedtoenablethemodelevaluation

onthecapturedmousemovements.

AnewJavaScriptfunctionnamed‘predict()’wasprogrammedintheJavaScriptfile.The

‘predict()’wasarecursive functionthatwasmadetocall itselfevery10seconds.This

wasbecause, itwas expected that theprediction ismade every10 secondsusing the

machine learning model. Every subsequent 10 seconds, the database would contain

Page 67: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

58

more mouse movement data that could be used by the machine learning models to

ideallypredictmoreaccurately.

The ‘predict()’ function takesnoargumentsandcallsaPHPscriptnamed ‘predict.php’

passingittheuserIDofthecurrentuserviaGETmethod.‘predict.php’fileresidesonthe

serverandthecallingfromJavaScriptwasprogrammedusingstandardAJAXprotocols.

ThecodesnippetoftheJavaScript‘predict()’functionis:

The‘predict.php’fileconnectstotheMySQLdatabasesandselectsthemousemovement

dataforthecurrentuserusingasimpleSQL‘SELECT*…’statement.Mousemovement

datawas saved into 132 temporary variables that correspond to each section of the

webpage. The total time spent by the user till now,was also calculatedwhile saving

thesetemporaryvariables.Theabsolutetimevaluespentineachsectionassavedinthe

132temporaryvariableswasthenreplacedbythenormalizedtimespentinthatsection

bydividedtheabsolutetimevaluebythetotaltimespentbythatuser.

Hence,after thisstep the132 temporaryvariables in ‘predict.php’ filewill contain the

normalizedtime/relativetimespentbytheuserincorresponding132sectionsofthe

webpage.These132 temporaryvariablesare the132 independent inputvariables for

themodel.

function autoPredict() { setTimeout("predict()",10000); } function predict() { http.open("GET", "predict.php?userId="+userId, true); http.onreadystatechange = predictResponse; http.send(null); }

Page 68: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

59

The twomodels (decision treeandneuralnetwork)were than codedandweregiven

accesstothese132temporaryvariablessothattheycanevaluatethenormalizedtime

andcanmaketheirrespectivepredictions.Itshouldhoweverbenotedthatfortesting,

only one of the models was used. Both the models were tested separately later for

comparisonpurpose.Theimplementationofthetwomodelsisasfollows:

5.3.1 ImplementingtheDecisionTreemodel

Afunctionnamed‘decisionTree()’wascodedinthePHPfile ‘predict.php’.Thisfunction

hadaccesstoallthe132inputvariablesasstatedabove.

Themodelmade inWEKAhadasetof83 if‐elsestatements (83being thesizeof the

tree).All these83 if‐else statements from theWEKAmodel alongwith theprediction

valuewere coded inPHP.The if‐else statementsweredoing comparisonson the132

independentvariablessoastoimitatethedecisiontree.Theoutputofthissetofif‐else

statementwasonevaluethatisalsotheoutputofthedecisiontreemodel.Thisoutputis

theproducttheuserismostlikelytobuyaccordingtotheimplementeddecisiontree.

Thisvaluewasreturnedtothemainprogrambythefunction.Thecompletecodeofthe

function‘decisionTree()’andthe‘predict.php’fileisavailableinappendix

5.3.2 ImplementingtheNeuralNetworkmodel

Another function named ‘neuralNetwork()’ was implemented. This function also had

accesstothe132independentinputvariablesasstatedabove.

TheneuralnetworkbuiltinWEKAhadonehiddenlayerwith68nodes.Toimplement

this hidden layer, 68 new temporary variables named ‘Node5’, ‘Node6’, ‘Node7’, …..,

‘Node72’werecreatedwithvaluecomputedbasedonstandardneuralnetworkformula.

Page 69: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

60

All the coefficientvaluesaswell as the threshold limitswereusedasgivenbyWEKA

whilemodelbuilding.

Toimplementtheoutputlayerthesameformulawasusedbasedonthesetemporary68

variables(68hiddenlayernodes,i.e.Node5,Node6….Node72).Theoutputlayerofthe

neural network model had 5 nodes corresponded to the five laptop products. The

productcorrespondingtothenodewithhighestvaluewaspredictedasthelaptopthe

currentuserismostlikelytobuy.

5.4 Usingmodeloutputs

Asstatedabove,onlyoneofthetwomodelswasusedatatimeforagivenuser.After

receivingtheoutputfromtheusedmodels(decisiontreeorneuralnetwork),theoutput

wassentbacktothefrontendJavaScriptfunctionnamed‘predictResponse()’viaAJAX.It

shouldbenotedthatmodeloutputwasthecodeofoneofthe5laptopsthatthecurrent

userismostlikelytobuy.

The ‘predictResponse()’ JavaScript function after receiving the prediction can now be

programmed as per the needs. In the current project, the author decided to simply

highlighttheborderofthepredictedlaptopinredcolor.Thepredictedlaptopistheone

theuser ismost likely tobuy thathasbeenpredictedbyoneof themachine‐learning

model based on the user’smousemovement behavior. The function definition of the

‘predictResponse()’functionisasfollows:

Page 70: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

61

The function above gets the response from the PHP script via standard AJAX

http.responseTextfunction.Theoutputwasthenusedtosimplechangethestyleof

the column containing that product. The style of all other columns is first reset

before changing the predicted laptop column style. The JavaScript ‘predict()’

functionhasbeencalledevery10,000milliseconds.Inthecurrentdemonstration,a

popupwasalsoshowntotheuserwiththecodeofthe laptophe ismost likelyto

buy.Thiswasdoneusingthealertstatement.

Therecanbeseveralotherusagesoftheprediction.Itcanbeimaginedthatacustomer

wouldbeservedmoreeasilyandappropriatelyiftheshopkeeperknowstheproductthe

customerismostlikelytobuy.Thecustomercouldbegivenotheroptionssimilartothe

product thatwas predicted. If not used by the content generator of thewebsite, this

predictioncanalwaysbeusedbythevisitorsinfindinginformationhehasbeenlooking

for. The screenshot of the prediction made by the Decision Tree model is shown in

Figure10

function predictResponse() { if (http.readyState == 4) { predictProduct = http.responseText; var colName=Number(predictProduct)+1; document.getElementById("cg2").className=""; document.getElementById("cg3").className=""; document.getElementById("cg4").className=""; document.getElementById("cg5").className=""; document.getElementById("cg6").className=""; document.getElementById("cg"+colName).className="oce-predict"; alert("Product : "+predictProduct);

setTimeout("predict()",10000); } }

Page 71: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

62

Figure10:Screenshotofthepredictiondonebythemodel

5.5 Whatnext

Oncethewebsitewasprogrammedandthemachinelearningmodelswereembedded,it

was again made public and the users were invited to visit it again. All the mouse

movementdatawassavedinthedatabasesasdesignedearlieralongwiththeproduct

the user buys. The userswere also shown the real‐time prediction as per themodel

after every 10 seconds. The prediction done by the model was not saved in any

databasesbecauseoffollowingreasons:

• Connectingthe‘predict.php’filewiththedatabasesandsavingdatawillcertainly

take time. This time used up in saving predicted output would effect the

performanceofthewebsitemainlybecauseitwilldelaythereturnofthemodel

outputfrom‘predict.php’filetothejavaScript‘predictResponse()’function.

• Thefinalpredictiondoneforanyuserasperthemodelcanalwaysbecalculated

again as the databases are keeping a record of themousemovement data for

everyuser.Thiswouldbedonelaterinthetestingphaseoftheproject.

Page 72: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Embeddingthemachinelearningmodelsinthewebsite

63

• Thepredictionwasdoneevery10seconds.Thiswouldmean, thattherewould

beseveralpredictions(average4predictions)doneforeveryuser.Thecountof

four predictionswas estimated because it was earlier found in section 3.3.2.2

that average time spentby auser on thewebpage is 40.26 seconds. Saving all

predictions per user is again a performance issue, as the table saving this is

expectedtogrowwithtime.

The finalwebsitecapableofpredicting theproduct theuser ismost likely tobuywas

madepublicandwaskeptonlinefor7days.Theuserswereagaininvitedusingemails,

socialmedia,chatsetcandwereaskedtosurfonthefinalversionofthewebpage.The

volunteerswererequiredtobuyoneoftheproductsafterevaluatingalltheoptions(5

laptops)availableonthatpage.Whiledoingso,theuserswereshowntheproductthey

aremost likely to buy. Itwas told by the visitors informally via email and in‐person

conversationsthatthepredictionswerequiteaccurate.

The next chapter will explain a much formal and quantitative method of testing the

prediction done by the twomodels. Itwill also describe themethodology adopted to

testthetimeperformancesofthetwomodels.

Page 73: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

64

6 TESTINGANDRESULTS

Thischapterdescribesthecompletetestingphaseoftheproject.Itdescribes

the data collection steps and the parameters on which the models were

evaluated.Italsoexplainsthetestingmethodologyandsummaryofthefinal

resultsobtained.

6.1 Testingmethodology

Thereweretwotypesoftestsconductedtoevaluatetheimplementation.Onetestwas

conductedonWEKAonthecollectedtestdatatocheckfortheclassificationaccuracyof

the model (decision tree or neural network). The other test was conducted on the

‘predict.php’ file tocheck the timeperformanceof thewebsiteafter implementing the

model.

Both the above‐mentioned testswere performed on both themodels separately. The

methodologyadoptedandtheresultsobtainedarementionedinthefollowingsections.

6.2 Testingformodelaccuracy

Testingdatawascollectedwhilethefinalwebsitewasliveandwasusedtofurthertest

thetwomodelsinWEKA.Itwasfoundthatthedecisiontreemodelgaveanaccuracyof

84.09%whereastheneuralnetworkmodelgaveanaccuracyof34.09%onthecollected

testdata.Detailsaboutthetestconductedareasfollows:

Page 74: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

65

6.2.1 Testingdatacollection

While thewebsitewithoneof themachine learningmodelwas live, theusersmouse

trackingdataandthefinalproductboughtbytheuserwasgettingsavedinthetables

‘data’and‘bought’respectively.Itwasfoundthatin7daystime(durationforwhichthe

testwebsitewaslive),49uniqueusersvisitedthewebpage.Therewere1275tuplesin

the ‘data’ table and 44 tuples in the ‘bought’ table. The difference between the

cardinalityofbought tableandthenumberofvisitorswasbecause5users(49minus

44)didn’tclickthebuybuttonandleftthesiteafterbrowsingitforawhile.

Thisdatawasprocessed inthesimilarwayas the initialdataasmentioned insection

3.3.2.Thestepsfollowedtoanalyzeandpreparethetestdataareasfollows:

• Thedatawasconverted intoamoreusable formatusing thephpscriptnamed

‘alignData.php’. Thedetails of this script arementioned in section3.3.2.1. This

stepconvertedthetestdataintoa‘oneuserperrowdata’withthetimevaluesof

eachuserinasamerowalongwiththeproductbought.

• Thisdatawasexportedintoexcelandwasnormalized.Tonormalizethetimes,

totaltimespentbyeachuserwascalculatedandthentimespentineachsection

/cellwasdividedbythetotaltime.Thisisexplainedindetailsinsection3.3.2.3

• Itshouldbenotedthatinthisstepnooutlierswereremoved.Thereasonisthat

thedatawascollected fromtheactualusersand it isexpectedthatallkindsof

people will use the website in all possible way and the accurate measure of

accuracywouldbewhenallthesecasesaretakenintoconsiderationsincluding

anyoutliers.

• ThisdatawassavedinaCSVfilethatisthenopenedinWEKA.

Page 75: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

66

• OpenedtheCSVfileinWEKAandwassavedinWEKAdefaultARFFformat.The

ARFF formatwasopened in a text editor and thepropertyof thebought table

waschangedfromnumbertonominalasstatedinsection4.4

This data was then opened in WEKA again the model testing was carried out as

explainedinfollowingsections:

6.2.2 ModeltestinginWEKAusingtestdata

UsingWEKAthesavedfilesofthetwomodelswereopened.Intheclassifiertab,testing

on supplied test dataset option was chosen and after pressing the set button, the

collectedandnormalizedtestdatafilewasopened.Nowtheloadedmodelwasmadeto

evaluateonthistestingdatabyrightclickingthemodelandselecting“Re‐evaluatethe

modeloncurrent test‐set”.Thismethodwouldevaluate themodelonthetestdataset

collectedandwouldshowtheaccuracyresultsonthistestdata.

ThismethodissimilartorunningthemodelonthewebsiteusingPHP.Theoutputgiven

bythemodelwhiletestinginWEKAwouldbeexactlysametotheonegivenbythePHP

script online. This is because the obtained WEKA model was the one which was

implementedinthewebsite.Thiswasthereasonthatthepredictionswerenotsavedas

stated in section 5.5. Now checking for accuracy is simply comparing the model

predictionwiththeactualproductboughtbytheuser.

Thedetailsof the resultsgivenbyboth themodelwhenevaluatedon the test setare

explainedinthefollowingsubsections‐

Page 76: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

67

6.2.2.1 DecisionTreemodel

Thetestdatasetwith44caseswasevaluatedusingthebuiltdecisiontreemodel.Itwas

foundthatthetreewasabletocorrectlyclassify37outof44caseswithanaccuracyof

84.0909%.

Theoutputobtainedafterre‐evaluationfromWEKAwas:

===Re‐evaluationontestset===UsersuppliedtestsetRelation:MasterData‐1‐weka.filters.unsupervised.attribute.Remove‐R1Instances:unknown(yet).ReadingincrementallyAttributes:133===Summary===CorrectlyClassifiedInstances 37 84.0909%IncorrectlyClassifiedInstances 7 15.9091%Kappastatistic 0.7825Meanabsoluteerror 0.1916Rootmeansquarederror 0.2814TotalNumberofInstances 44===DetailedAccuracyByClass===TPRateFPRatePrecisionRecallF‐MeasureROCAreaClass0.8570.0270.8570.8570.8570.98610.8750.1430.7780.8750.8240.87220.9170.0310.9170.9170.9170.92130.8330.0260.8330.8330.8330.94540.333010.3330.50.895WeightedAvg.0.8410.0680.8510.8410.8340.915===ConfusionMatrix===a b c d e<‐‐classifiedas6 1 0 0 0 |a=11 140 1 0 |b=20 1 110 0 |c=30 0 1 5 0 |d=40 2 0 0 1 |e=5

Page 77: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

68

6.2.2.2 NeuralNetworkmodel

The dataset having 44 test caseswas evaluated on neural networkmodel and itwas

foundthatitclassified15casescorrectly.Thisshowsanaccuracyofonly34.0909%on

testingdataoftheneuralnetworkmodel.

TheoutputobtainedfromWEKAwas:

===Re‐evaluationontestset===UsersuppliedtestsetRelation:MasterData‐1‐weka.filters.unsupervised.attribute.Remove‐R1Instances:unknown(yet).ReadingincrementallyAttributes:133===Summary===CorrectlyClassifiedInstances 1534.0909%IncorrectlyClassifiedInstances2965.9091%Kappastatistic 0.1367Meanabsoluteerror 0.2695Rootmeansquarederror 0.5001TotalNumberofInstances 44===DetailedAccuracyByClass=== TPRateFPRatePrecisionRecall F‐Measure ROCArea Class 0.429 0.135 0.375 0.4290.4 0.695 1 0.313 0.25 0.417 0.3130.357 0.694 2 0.25 0.281 0.25 0.25 0.25 0.505 3 0.5 0.184 0.3 0.5 0.375 0.623 4 0.333 0.024 0.5 0.3330.4 0.78 5WeightedAvg. 0.341 0.216 0.354 0.3410.34 0.639===ConfusionMatrix=== a b c d e <‐‐classifiedas 3 3 0 1 0 |a=1 2 5 7 2 0 |b=2 3 4 3 2 0 |c=3 0 0 2 3 1 |d=4 0 0 0 2 1 |e=5

Page 78: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

69

6.2.3 Discussion

Thetrainingdatagaveanaccuracyof89.5%fordecisiontreewhereasgaveanaccuracy

of95%forneuralnetworks.Thesamedecision treeandneuralnetworkmodelsgave

accuraciesof84.0909%and34.0909%respectivelywhenevaluatedonthetestdataset.

Formodelscomparison,accuracyonthetestdataset, i.e. thedataonwhichmodelhas

notbeen trained is theoneof themost importantparameter.Asdiscussed in section

4.4.3,thereareseveraldrawbacksofusingneuralnetworksinthepresentsituationbut

after conducting the evaluation of the two models on test dataset, it is clear that

decision trees have clearly out performedneural networks and shouldbeusedwhile

predictions.

This however depends on a lot of parameters, most important being the size of the

trainingandtestingdataset.Sincethescopeofthisprojectwaslimited,alargeamount

of data could not be collected but it is advised that both decision trees and neural

networks should be evaluated alongwith othermachine learningmodels before pin‐

pointingononeofthem.

6.3 Testingtimeperformanceofthemodels

AnewPHPscriptwaswrittenandexecutedontheservertoestimatetheaveragetime

themodelprocessing is takingwhenexecuted in real time.Todo this, thePHPscript

wasconnectedtothedatabasecontainingthetestdata.Boththemodelfunctionswere

thencalledandthetimetakenbythemtoevaluateallthe44testcaseswascalculated.

Thiswas averaged out over 44 cases to estimate the average time eachmodel takes

whilemakingeverypredictioninPHP.Thisprocesswascarriedout10timesseparately

to estimate the average time so as to avoid any clasheswith unforeseen tasks at the

serverthatmightdelaythemodelexecution.

Page 79: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

70

Time takenby themodel is an important feature as the expectationof the intelligent

websiteistopredicttheoutputassoonaspossibleandofcourseinreal‐time.Amodel

takingmorethansomethresholdvalueforcalculationsisofnogooduse.Theprocess

andresultsareexplainedinthefollowingsections‐

6.3.1 DecisionTreemodel

The decision treewasmade to execute on all the 44 test cases. The time takenwas

averagedout.Thiswasdone10 timesand theaverage times in seconds takenby the

scripttoevaluatedecisiontreewere:

0.000929258, 0.000544337, 0.000656968, 0.004135495,

0.000538674, 0.000537385, 0.000534681, 0.000545979,

0.000546981, 0.007368538

Fromtheabove10timevalues,thefollowinginsightscanbeseen:

• Minimumtimetakenbythemodelwasapproximately0.00053seconds

• Maximumtimetakenbythemodelwasapproximately0.00737seconds

• Averagetimetakenbythemodelwas0.00163seconds

6.3.2 NeuralNetworkmodel

Asdonefordecisiontrees,theneuralnetworkmodelwasalsomadetoevaluatethe44

test cases. The average time taken was noted. This was done 10 times. The average

timesofexecutiontakenbytheneuralnetworkmodelwere:

0.658177257, 0.543146627, 0.658050104, 0.746059109,

0.482899054, 0.536261456, 0.639314229, 0.505210876,

0.496645451, 0.707032805

Page 80: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

71

Fromtheabove10timevalues,thefollowinginsightscanbeseen:

• Minimumtimetakenbythemodelwasapproximately0.4839seconds

• Maximumtimetakenbythemodelwasapproximately0.7461seconds

• Averagetimetakenbythemodelwas0.5973seconds

6.3.3 Discussion

It is clearly seen that neural network model is taking far more time to execute as

compared to decision treemodel. It was also analyzed that the chosen decision tree

modelrunsatleast350timesfasterthanthechosenneuralnetwork.

Since,theobjectiveistopredictinread‐time,speedisaveryimportantparameterand

decisiontreemodelhascompletelywonthetimeperformancebattle.

6.4 Results

Aftertestingboththemodels(decisiontreeandneuralnetwork)onpredictionaccuracy

andtimeperformanceparameters,itwasclearlyfoundthatdecisiontreeprovedmuch

betterforimplementationinthecurrentproblemascomparedtoneuralnetwork.

Theresultobtainedinthetestsissummarizedbelow:

• Accuracy(ontestdataset):

o DecisionTree:84.0909%

o NeuralNetwork:34.0909%

• TimePerformance(PHPscriptsrunningonapache):

o DecisionTree:0.0016seconds

o NeuralNetwork:0.5973seconds

Page 81: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

TestingandResults

72

It should however be noted that these resultswere obtainedwhen themodelswere

trainedononly200cases.Theneuralnetworkmodelhadatotalof73nodesincluding

68hiddennodes.Toproperly train theneuralnetworka few thousandcaseswereat

leastrequired.Theneuralnetworkmodelwasbuilt toestablishthe fact that itcanbe

used on a website to predict relevant content for the user. The decision tree on the

other hand is also expected to give better results when larger training and testing

datasetsareavailable.

Itshouldalsobenotedthattherewerefiveclassesofthedependentvariable(5possible

laptop products) and hence the model would have been considered void only if the

accuracyiscloseto20%(20%beingtheequallylikelychanceofeachmodel).Sincethe

accuracy obtained for both the machine learning models was far above the 20%

benchmark, both models have shown some promise that they do have potential to

recommendrelevantcontentforauserbasedonhismousemovementbehaviors.

Thenextchapterwillgiveabriefconclusionoftheworkdone.

Page 82: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Conclusion

73

7 CONCLUSION

Thischaptergives theconclusionof theprojectanddiscusses the scopeof

future work possible. It also talks about some other implementations

possibleoftheexplainedmethodology

It has been successfully demonstrated that by building amachine‐learningmodel on

usersmousemovementdata,appropriatecontentforhimcanbepredicted.Thedummy

shoppingwebsite developed, embeddedwith a decision treemachine learningmodel

gavearemarkableaccuracyof84.09%onthetestdata.Theaccuracywasmeasuredas

the ratio of the correct predictions to the total number of predictions done by the

model. Itwasalso found that implementingadecision treemodel inawebsitewould

notaffecttheperformanceofthepageastheaveragetimetakenbythedummymodel

was found to be around 1.6 milliseconds. A Neural network model was similarly

evaluated and it gave an accuracy of 34.09% and took an average time of 577.3

millisecondstoprocessasinglecaseofdata.

The objective of the projectwas to use themousemovement behavior of a user and

predicttheappropriatecontentforhimintelligentlyandinrealtime.Thisobjectivewas

successfullyachievedandseveralothersub‐objectiveswerealsoreachedwhileworking

ontheproject.

User’smousetrackingwasimplementedsuccessfullyusingacompletelynewalgorithm.

ThiswasdoneusingPHP,AJAX,HTMLandMySQL.Theperformanceofthewebsiteafter

Page 83: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Conclusion

74

implementing mouse tracking was not compromised and the accuracy of the mouse

trackingdatacollectedwasfoundtobeveryhigh.Awebpagewasdevelopedimitatinga

shoppingportalandsomehighlightingtechniqueswereappliedtoittomakesurethat

theuser’smousepointerisclosetohispointofgaze.

TheinitialwebsitedevelopedinPHPwasliveforaroundtwoweeksanditcollected200

casesoftrainingdata.Thedatawasthenusedtotraintwoseparatemachine‐learning

models,namelyaDecisionTreemodelandaNeuralNetworkmodel.Both themodels

gavepromisingresultswhentestedonthetrainingdata,whichprovedthatthemodels

builthavelearnedthemousemovementbehaviorappropriately.

Both themachine learningmodels were coded back into thewebsite using PHP and

AJAX.Thewebsitewasmadetocollectmousemovementdatawhichwasdynamically

readbythemodelsandanoutputwasgenerated.Thispredictedoutputwassenttothe

webpage for furtherpersonalization.A totalof44 test caseswerealso collected from

thefinalwebsite.

Using the collected 44 test cases, bothmodels were evaluated and the decision tree

modelwasfoundtoperformextremelywellascomparedtotheneuralnetworkmodel,

bothfromthepointofviewofaccuracyandtimeperformance.Decisiontreeclassified

2.5timesaccuratelyinasetof44cases,andwas350timesfasterthanneuralnetwork

model. This however cannot be generalized as it depends on the size of the initial

training dataset, (which was small in the current scope of the project) and on the

numberofindependentvariables(whichwaslargeinthecurrentimplementation).

Theworkingdemonstrationof theproject,alongwith itsdocumentationandtheGNU

General Public License source code is available online at

http://sparshgupta.name/MSc/Project

Page 84: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Conclusion

75

7.1 FutureWork

The proposed idea has shows a huge potential and there is a lot of scope for future

innovations and improvements if properly explored. The lack of data was the prime

limitationinthecurrentstudy.Ifacommercialwebsiteisrequiredtobeintelligentthen

modelsbuiltonseveralthousandsofcasesoftrainingdatashouldbeusedandoncethat

dataisobtained,possibilitiesofothermachinelearningalgorithmscouldbeexplored.

Thedatacollectedinthetestingphasecanlaterbeusedtotrainthemodels.Thereisa

never‐ending chain of model training and improvement involved in the current

proposed concept and implementation. This is because with time, the website will

accumulate a lot of data that at regular intervals can be used to further train the

implemented model or to make a new model. It is expected that with every

improvementinthemodel,itscapabilitytopredicttherelevantcontentforanewuser

willincrease.

Theproposedimplementationrequiresthateachsectionofthewebsitecallsthemouse

tracking function whenever mouse enters the section and leaves it. This requires

explicit coding of function call statements in every cell. Thismight not bepossible in

highlydynamicwebsitesandhenceworkcouldbedoneon implementing the ideaon

anygivenwebsite,requiringalmostnochangeintheexistingwebcoding.

Inthecurrentproject,theinformationaboutthepredictedcontent(i.e.,thelaptopuser

is most likely to buy) was not exploited. Work can be done to make the website

interacting with the user like a salesman. The website can remove all the products

whichtheuserwouldbeleastinterestedinandcanonlyshowhimproductsheismost

likelytobuy.

Page 85: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

Conclusion

76

Currentimplementationsinvolvedusingonlyasinglemachine‐learningmodelatatime.

Multiple models can be implemented in the webpage and the strength of prediction

made can also be used to further interact with the user. Incase all the different

implementedmodels gave the sameprediction than it canbeassumed tobea strong

predictionandhencethewebpagecanadaptaccordinglyimmediately.

Otherimplementationspossible

AShoppingportalwithintelligentpredictionoftheproductauserismostlikelytobuy

is one of the many implementations possible of the proposed concept. Some other

possibleimplementationscouldbe:

• ASearchEngineFeedbackSystem:Currentsearchenginesdisplaytheresultsin

aformoflistoflinksalongwithasmalltextrelevanttothesearch.Mostofthe

userschoosethelinksafterreadingthetextsnippetassociatedwiththelinkand

they spend different times on different links. Current search feedback is

completelybasedonmouseclickthatinasenseisabinaryfeedback(eitherYes

or No). The feedback system can be made more accurate by determining the

relativetimeauserspentonalinkcomparedtootherlinks.

• News Content Prediction: An online news website shows several news under

differentheadsonapage.Manycommonusershavedifferentprioritiesfornews.

Basedonauser’smousemovementactivity,relevantnewscontentcanbeshown

tohim.Forexample,ifauserisspendingmoretimearoundfootballandcricket

newsheadlinesthanPoliticalheadlines,thenitcanbepredictedthatheismore

interestedinsportnewsand,accordingly,thewebsitecanbemoldedforhim.

Page 86: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

77

Page 87: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

<<Bibliography

78

BIBLIOGRAPHY

Aaltonen, Antti, Aulikki Hyrskykari, and Kari-Jo Räihä. "101 spots, or how do users

read menus?" Conference on Human Factors in Computing Systems, 1998: 132 -

139.

Arroya, Ernesto, Ted Selker, and Willy Wei. "Usability tool for analysis of web

designs using mouse tracks." Conference on Human Factors in Computing Systems,

2006: 484 - 489.

Atterer, Richard, and Albrecht Schmidt. "Tracking the interaction of users with AJAX

applications for usability testing." Conference on Human Factors in Computing

Systems, 2007: 1347 - 1350.

Atterer, Richard, Monica Wnuk, and Albrecht Schmidt. "Knowing the User’s Every

Move – User Activity Tracking for Website Usability Evaluation and Implicit

Interaction." ACM.

Balabanovic, Marko, Yoav Shoham, and Yeogirl Yun. "An Adaptive Agent for

Automated Web Browsing." 1997.

Byrne, Michael D, John R Anderson, Scott Douglass, and Michael Matessa. "Eye

tracking the visual search of click-down menus." Conference on Human Factors in

Computing Systems, 1999.

CERN. Welcome to info.cern.ch/. http://info.cern.ch/.

Page 88: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

<<Bibliography

79

Chen, Mon Chu, John R Anderson, and Myeong Ho Sohn. "What can a mouse

cursor tell us more?: correlation of eye/mouse movements on web browsing."

Conference on Human Factors in Computing Systems, 2001.

Dutta, Partha, Sandip Debnath, and Sandip Sen. "A shopper's assistant."

International Conference on Autonomous Agents, 2001.

Edmonds, A, R White, D Morris, and S Drucker. "Instrumenting the Dynamic Web."

Journal of Web Engineering 6, no. 3 (2007): 243-260.

Edmonds, Andy. "Why the Mouse Doesn't Always Keep Up with the Eye." 2008.

Guo, Qi, and Eugene Agichtein. "Exploring mouse movements for inferring query

intent." Annual ACM Conference on Research and Development in Information

Retrieval, 2008: 1.

Gurney, Kevin N. An introduction to neural networks. illustrated. CRC Press, 1997.

Haykin, Simon. Neural Networks: A comprehensive Foundations. Prentice Hall.

Jayaputera, G. T., S. W. Loke, and A. Zaslavsky. "Design, implementation and run-

time evolution of a mission-based multiagent system." Web Intelligence and Agent

Systems 5, no. 2 (2007): 20.

Kohn, Nicholas, and Takashi Yamauchi. "Feature Inference: Tracking Mouse

Movement."

Linden, Greg. "Geeking with Greg Exploring the future of personalized information."

Page 89: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

<<Bibliography

80

Mitchell, Tom. Decision Tree Learning, Machine Learning. The McGraw-Hill

Companies, Inc., 1997.

Mueller, Florian, and Andrea Lockerd. "Cheese: tracking mouse movement activity

on websites, a tool for user modeling." Conference on Human Factors in Computing

Systems, 2001.

Pazzani, Michael, and Daniel Billsus. "Learning and Revising User Profiles: The

Identification of Interesting Web Sites." Machine Learning 27, no. 3 (1997): 313 -

331.

Perkowitz, M, and O Etzioni. "Towards adaptive web sites: Conceptual framework

and case study." Artificial Intelligence 118, no. 1 (2000): 245 - 275.

Quinlan, J. R. "Improved Use of Continuous Attributes in C4.5." Journal of Artificial

Intelligence Research 4 (1996): 77-90.

Rodden, Kerry, Xin Fu, Anne Aula, and Ian Spiro. "Eye-Mouse Coordination Patterns

on Web Search Results Pages." Conference on Human Factors in Computing

Systems, 2008: 5.

Salzberg, Steven L. "C4.5: Programs for Machine Learning." Machine Learning 16,

no. 3 (1994): 235-240.

Schafer, J. Ben, Joseph Konstan, and John Riedi. "Recommender systems in e-

commerce." Electronic Commerce, 1999.

The University of Waikato. Weka 3: Data Mining Software in Java.

http://www.cs.waikato.ac.nz/ml/weka/.

Page 90: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

81

Torres, Luis A. Leiva, and Roberto Vivo Hernando. "Real time mouse tracking

registration and visualization tool for usability evaluation on websites."

http://smt.speedzinemedia.com/smt/docs/smt_IADIS07.pdf.

Torres, Luis A. Leiva, and Roberto Vivo Hernando. "Real time mouse tracking

registration and visualization tool for usability evaluation on websites."

Usmani, Zeeshan-ul-hassan, Fawzi A. Alghamdi, and Talal Naveed Puri. "Intelligent

Web Interactions - What, When and How?" Web Intelligence & Intelligent Agent,

2008: 3.

W3Schools. Ajax. http://www.w3schools.com/Ajax/.

Wikipedia. C4.5 Algorithm. http://en.wikipedia.org/wiki/C4.5_algorithm.

—. Machine Learning Wikipedia. http://en.wikipedia.org/wiki/Machine_learning.

—. Multilayer Perceptron. http://en.wikipedia.org/wiki/Multilayer_perceptron.

Winston, P. Learning by building identification trees. Addison-Wesley Publishing

Company, 1992.

Witten, Ian H, and Eibe Frank. Data Mining: Practical machine learning tools and

techniques. San Francisco: Morgan Kaufmann, 2005.

Page 91: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

82

APPENDIX:SOURCECODE

HTMLfinalwebpage

The HTML code of the final website developed capable of tracking user’s mouse

movements as well as capable of predicting the relevant product to the user is as

follows:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>MSc Project - Compare Laptops</title> <link rel="stylesheet" type="text/css" href="mouseover.css"/> <script type="text/javascript" src="mouseover.js" ></script> </head> <body onload="start_It();"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td><p class="oce-first"><span class="bold">NOTE:</span> Surf on this page like you do on a shopping portal comparison page and decide on a model based on its configuration and buy it. Thanks</p></td> </tr> <tr> <td>&nbsp;</td> </tr> <tr> <td><table width="100%" border="0" align="center" cellpadding="0" cellspacing="0" onMouseOver="hiliteColumn(event);" onMouseOut="resetColumn(event);" class="one-column-emphasis"> <colgroup class="oce-first" id="na"></colgroup> <colgroup id="cg2" class=""></colgroup> <colgroup id="cg3" class=""></colgroup> <colgroup id="cg4" class=""></colgroup> <colgroup id="cg5" class=""></colgroup> <colgroup id="cg6" class=""></colgroup> <thead> <tr> <th onmouseout="movement_out('a0');" onmouseover="movement_in();">Product Name</th> <th onmouseout="movement_out('a1');" onmouseover="movement_in();">Lenovo IdeaPad Y650 4185</th> <th onmouseout="movement_out('a2');" onmouseover="movement_in();">HP Pavilion dv7-1285dx</th> <th onmouseout="movement_out('a3');" onmouseover="movement_in();">Sony VAIO VGN-P588E</th> <th onmouseout="movement_out('a4');" onmouseover="movement_in();">Dell Studio XPS 16</th> <th onmouseout="movement_out('a5');" onmouseover="movement_in();">Toshiba Satellite A205-S4617</th> </tr> </thead> <tbody> <tr> <td class="oce-first" onmouseout="movement_out('b0');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('b1');" onmouseover="movement_in();"><img src="images/1.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b2');" onmouseover="movement_in();"><img src="images/2.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b3');" onmouseover="movement_in();"><img src="images/3.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b4');" onmouseover="movement_in();"><img src="images/4.gif" width="120" height="90" border="0" /></td>

Page 92: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

83

<td onmouseout="movement_out('b5');" onmouseover="movement_in();"><img src="images/5.gif" width="120" height="90" border="0" /></td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('c0');" onmouseover="movement_in();">Price</td> <td onmouseout="movement_out('c1');" onmouseover="movement_in();">$1,249.00</td> <td onmouseout="movement_out('c2');" onmouseover="movement_in();">$1,199.99</td> <td onmouseout="movement_out('c3');" onmouseover="movement_in();">$1,133.00</td> <td onmouseout="movement_out('c4');" onmouseover="movement_in();">$1,224.00</td> <td onmouseout="movement_out('c5');" onmouseover="movement_in();">$1,249.00</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('d0');" onmouseover="movement_in();">CNET editors' rating</td> <td onmouseout="movement_out('d1');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d2');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d3');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d4');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d5');" onmouseover="movement_in();">3.5/5.0</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('e0');" onmouseover="movement_in();">Average user rating</td> <td onmouseout="movement_out('e1');" onmouseover="movement_in();">No Data</td> <td onmouseout="movement_out('e2');" onmouseover="movement_in();">4.0/5.0</td> <td onmouseout="movement_out('e3');" onmouseover="movement_in();">2.0/5.0</td> <td onmouseout="movement_out('e4');" onmouseover="movement_in();">No Data</td> <td onmouseout="movement_out('e5');" onmouseover="movement_in();">3.0/5.0</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('f0');" onmouseover="movement_in();">Release date</td> <td onmouseout="movement_out('f1');" onmouseover="movement_in();">April 15, 2009</td> <td onmouseout="movement_out('f2');" onmouseover="movement_in();">February 01, 2009</td> <td onmouseout="movement_out('f3');" onmouseover="movement_in();">January 08, 2009</td> <td onmouseout="movement_out('f4');" onmouseover="movement_in();">January 07, 2009</td> <td onmouseout="movement_out('f5');" onmouseover="movement_in();">April 16, 2007</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('g0');" onmouseover="movement_in();">The Bottom Line</td> <td onmouseout="movement_out('g1');" onmouseover="movement_in();">Online media consumers who want a portable laptop with high style and plenty of screen real estate should give the Y650 a look.</td> <td onmouseout="movement_out('g2');" onmouseover="movement_in();">HP's Pavilion dv7-1245dx is a slick multimedia machine with great battery life, but for $1,200, we want a full 1080p display.</td> <td onmouseout="movement_out('g3');" onmouseover="movement_in();">Sony's upscale Atom-powered Lifestyle PC has the components of a cheaper machine but the design of a more expensive one. The end result will be a useful travel PC for some and a conversation piece for others.</td> <td onmouseout="movement_out('g4');" onmouseover="movement_in();">Dell's new 16:9 Studio XPS 16 adds upscale extras such as a leather trim and a backlit keyboard to a fairly standard set of components, without jacking up the price (too much).</td> <td onmouseout="movement_out('g5');" onmouseover="movement_in();">Toshiba adds faster Draft N Wi-Fi to this attractive if otherwise fairly conventional laptop. Just be sure you've got an 802.11n router to go along with it.</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('h0');" onmouseover="movement_in();">Similar Products</td> <td onmouseout="movement_out('h1');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('h2');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('h3');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('h4');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('h5');" onmouseover="movement_in();">&nbsp;</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('i0');" onmouseover="movement_in();">Networking</td> <td onmouseout="movement_out('i1');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br />

Page 93: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

84

- IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - Bluetooth 2.1 EDR<br /> - IEEE 802.11n (draft)</td> <td onmouseout="movement_out('i2');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - IEEE 802.11n (draft) </td> <td onmouseout="movement_out('i3');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - Bluetooth 2.1 EDR<br /> - IEEE 802.11n (draft) </td> <td onmouseout="movement_out('i4');" onmouseover="movement_in();">Network adapter - Gigabit Ethernet</td> <td onmouseout="movement_out('i5');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - IEEE 802.11n (draft)</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('j0');" onmouseover="movement_in();">Graphics Controller</td> <td onmouseout="movement_out('j1');" onmouseover="movement_in();">NVIDIA GeForce G105M - 256 MB</td> <td onmouseout="movement_out('j2');" onmouseover="movement_in();">NVIDIA GeForce 9600M GT - 512 MB</td> <td onmouseout="movement_out('j3');" onmouseover="movement_in();">Intel GMA 500</td> <td onmouseout="movement_out('j4');" onmouseover="movement_in();">ATI Mobility RADEON? HD 3670 - 512MB - 512 MB</td> <td onmouseout="movement_out('j5');" onmouseover="movement_in();">Intel GMA 950 - 8 MB</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('k0');" onmouseover="movement_in();">Notebook Camera</td> <td onmouseout="movement_out('k1');" onmouseover="movement_in();">Integrated - 1.3 Megapixel</td> <td onmouseout="movement_out('k2');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('k3');" onmouseover="movement_in();">Integrated</td> <td onmouseout="movement_out('k4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('k5');" onmouseover="movement_in();">Info unavailable</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('l0');" onmouseover="movement_in();">Optical Storage</td> <td onmouseout="movement_out('l1');" onmouseover="movement_in();">DVD-Writer - Integrated</td> <td onmouseout="movement_out('l2');" onmouseover="movement_in();">DVD?RW (?R DL) / DVD-RAM with LightScribe Technology</td> <td onmouseout="movement_out('l3');" onmouseover="movement_in();">None</td> <td onmouseout="movement_out('l4');" onmouseover="movement_in();">8X DVD+/- RW(DVD/CD read/write) Slot Load Drive</td> <td onmouseout="movement_out('l5');" onmouseover="movement_in();">DVD?RW (?R DL) / DVD-RAM - Integrated</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('m0');" onmouseover="movement_in();">RAM</td> <td onmouseout="movement_out('m1');" onmouseover="movement_in();">4 GB (installed) / 8 GB (max) - DDR3 SDRAM - 1066 MHz - PC3-8500 ( 2 x 2 GB )</td>

Page 94: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

85

<td onmouseout="movement_out('m2');" onmouseover="movement_in();">6 GB (installed) / 8 GB (max) - DDR2 SDRAM</td> <td onmouseout="movement_out('m3');" onmouseover="movement_in();">2 GB (installed) / 2 GB (max) - DDR2 SDRAM - 533 MHz ( 1 x 2 GB )</td> <td onmouseout="movement_out('m4');" onmouseover="movement_in();">4 GB DDR3 SDRAM</td> <td onmouseout="movement_out('m5');" onmouseover="movement_in();">2 GB (installed) / 4 GB (max) - DDR2 SDRAM - 667 MHz - PC2-5300</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('n0');" onmouseover="movement_in();">Cache Memory</td> <td onmouseout="movement_out('n1');" onmouseover="movement_in();">3 MB - L2 cache</td> <td onmouseout="movement_out('n2');" onmouseover="movement_in();">3 MB - L2 cache</td> <td onmouseout="movement_out('n3');" onmouseover="movement_in();">512 KB - L2 cache</td> <td onmouseout="movement_out('n4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('n5');" onmouseover="movement_in();">2 MB - L2 cache</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('o0');" onmouseover="movement_in();">Processor</td> <td onmouseout="movement_out('o1');" onmouseover="movement_in();">Intel Core 2 Duo P8700 / 2.53 GHz ( Dual-Core )</td> <td onmouseout="movement_out('o2');" onmouseover="movement_in();">Intel Core 2 Duo P8600 / 2.4 GHz ( Dual-Core )</td> <td onmouseout="movement_out('o3');" onmouseover="movement_in();">Intel 1.33 GHz</td> <td onmouseout="movement_out('o4');" onmouseover="movement_in();">Intel Core 2 Duo P8700 / 2.53 GHz</td> <td onmouseout="movement_out('o5');" onmouseover="movement_in();">Intel Core 2 Duo T5500 / 1.66 GHz ( Dual-Core )</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('p0');" onmouseover="movement_in();">Hard Drive</td> <td onmouseout="movement_out('p1');" onmouseover="movement_in();">320 GB - Serial ATA-300 - 5400 rpm</td> <td onmouseout="movement_out('p2');" onmouseover="movement_in();">500 GB - Serial ATA-150 - 5400 rpm</td> <td onmouseout="movement_out('p3');" onmouseover="movement_in();">64 GB - Serial ATA-150</td> <td onmouseout="movement_out('p4');" onmouseover="movement_in();">500 GB - 5400 rpm</td> <td onmouseout="movement_out('p5');" onmouseover="movement_in();">250 GB - Serial ATA-150 - 4200 rpm</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('q0');" onmouseover="movement_in();">Display</td> <td onmouseout="movement_out('q1');" onmouseover="movement_in();">16 in TFT active matrix 1366 x 768 ( WXGA ) - VibrantView</td> <td onmouseout="movement_out('q2');" onmouseover="movement_in();">17 in TFT active matrix 1440 x 900 ( WXGA+ ) - BrightView</td> <td onmouseout="movement_out('q3');" onmouseover="movement_in();">8 in TFT active matrix 1600 x 768</td> <td onmouseout="movement_out('q4');" onmouseover="movement_in();">16.0</td> <td onmouseout="movement_out('q5');" onmouseover="movement_in();">15.4 in TFT active matrix 1280 x 800 ( WXGA ) - 24-bit (16.7 million colors)</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('r0');" onmouseover="movement_in();">Battery</td> <td onmouseout="movement_out('r1');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r2');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r3');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('r5');" onmouseover="movement_in();">Lithium ion</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('s0');" onmouseover="movement_in();">Dimensions (WxDxH)</td> <td onmouseout="movement_out('s1');" onmouseover="movement_in();">15.4 in x 10.2 in x 1 in</td> <td onmouseout="movement_out('s2');" onmouseover="movement_in();">15.6 in x 11.2 in x 1.7 in</td> <td onmouseout="movement_out('s3');" onmouseover="movement_in();">9.6 in x 4.7 in x 0.8 in</td>

Page 95: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

86

<td onmouseout="movement_out('s4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('s5');" onmouseover="movement_in();">14.3 in x 10.6 in x 1.3 in</td> </tr><tr> <td class="oce-first" onmouseout="movement_out('t0');" onmouseover="movement_in();">Weight</td> <td onmouseout="movement_out('t1');" onmouseover="movement_in();">5.5 lbs</td> <td onmouseout="movement_out('t2');" onmouseover="movement_in();">7.7 lbs</td> <td onmouseout="movement_out('t3');" onmouseover="movement_in();">1.4 lbs</td> <td onmouseout="movement_out('t4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('t5');" onmouseover="movement_in();">6.4 lbs</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('u0');" onmouseover="movement_in();">OS Provided</td> <td onmouseout="movement_out('u1');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium 64-bit Edition</td> <td onmouseout="movement_out('u2');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium</td> <td onmouseout="movement_out('u3');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium Edition</td> <td onmouseout="movement_out('u4');" onmouseover="movement_in();">Microsoft Windows Vista</td> <td onmouseout="movement_out('u5');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('v0');" onmouseover="movement_in();">Attribute X</td> <td onmouseout="movement_out('v1');" onmouseover="movement_in();">x1</td> <td onmouseout="movement_out('v2');" onmouseover="movement_in();">x2</td> <td onmouseout="movement_out('v3');" onmouseover="movement_in();">x3</td> <td onmouseout="movement_out('v4');" onmouseover="movement_in();">x4</td> <td onmouseout="movement_out('v5');" onmouseover="movement_in();">x5</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('w0');" onmouseover="movement_in();">Attribute Y</td> <td onmouseout="movement_out('w1');" onmouseover="movement_in();">y1</td> <td onmouseout="movement_out('w2');" onmouseover="movement_in();">y2</td> <td onmouseout="movement_out('w3');" onmouseover="movement_in();">y3</td> <td onmouseout="movement_out('w4');" onmouseover="movement_in();">y4</td> <td onmouseout="movement_out('w5');" onmouseover="movement_in();">y5</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('x0');" onmouseover="movement_in();">Attribute Z</td> <td onmouseout="movement_out('x1');" onmouseover="movement_in();">z1</td> <td onmouseout="movement_out('x2');" onmouseover="movement_in();">z2</td> <td onmouseout="movement_out('x3');" onmouseover="movement_in();">z3</td> <td onmouseout="movement_out('x4');" onmouseover="movement_in();">z4</td> <td onmouseout="movement_out('x5');" onmouseover="movement_in();">z5</td> </tr> </tbody> <tfoot> <tr> <td class="oce-first" onmouseout="movement_out('z0');" onmouseover="movement_in();">&nbsp;</td> <td onmouseout="movement_out('z1');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('1');" value="Buy Now" /></td> <td onmouseout="movement_out('z2');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('2');" value="Buy Now" /></td> <td onmouseout="movement_out('z3');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('3');" value="Buy Now" /></td> <td onmouseout="movement_out('z4');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('4');" value="Buy Now" /></td> <td onmouseout="movement_out('z5');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('5');" value="Buy Now" /></td> </tr></tfoot> </table></td> </tr> </table> </body> </html>

Page 96: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

87

TheJavaScriptfile

var http = getHTTPObject(); var cellEntryDate; var cellExitDate; var time; var queue1=""; var queue2=""; var flag=0; var done = 0; var tempQueue=""; var userId=new Date(); userId=userId.getTime(); var startpredict=0; var predictProduct=0; function autoPredict() { setTimeout("predict()",10000); } function predict() { http.open("GET", "predict.php?userId="+userId, true); http.onreadystatechange = predictResponse; http.send(null); } function predictResponse() { if (http.readyState == 4) { predictProduct = http.responseText; var colName=Number(predictProduct)+1; document.getElementById("cg2").className=""; document.getElementById("cg3").className=""; document.getElementById("cg4").className=""; document.getElementById("cg5").className=""; document.getElementById("cg6").className=""; document.getElementById("cg"+colName).className="oce-predict"; alert("Product : "+predictProduct); setTimeout("predict()",10000); } } function handleHttpResponse() { if (http.readyState == 4) { startIt(); } } function handleHttpResponseBought() { if (http.readyState == 4) { alert("Thanks for Participating"); } } function start_It() { if(done==0) { setTimeout("sendData()",2000); } if(startpredict==0) { ++startpredict; autoPredict(); }

Page 97: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

88

} function sendData() { if(flag==0) { queue2=""; flag=1; var query_string = "data.php?userId="+userId+"&queue="+queue1; queue1=""; } else { queue1=""; flag=0; var query_string = "data.php?userId="+userId+"&queue="+queue2; queue2=""; } http.open("GET", query_string, true); http.onreadystatechange = handleHttpResponse; http.send(null); } function movement_in(){ cellEntryDate = new Date(); } function movement_out(cell){ cellExitDate = new Date(); time = cellExitDate.getTime()-cellEntryDate.getTime(); if(done==0) { if(flag==0) { queue1 = queue1+cell+":"+time+"_"; } else { queue2 = queue2+cell+":"+time+"_"; } } } function bought(product){ done=1; var query_bought = "bought.php?userId="+userId+"&product="+product; http.open("GET", query_bought, true); http.onreadystatechange = handleHttpResponseBought; http.send(null); } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); } catch (e) { xmlhttp = false; } } return xmlhttp;

Page 98: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

89

} function hiliteColumn(e) { var o = (document.all) ? e.srcElement : e.target; if (o.nodeName != "TD") return; document.getElementById("cg"+(o.cellIndex+1)).className="over"; } function resetColumn(e) { var o = (document.all) ? e.srcElement : e.target; if (o.nodeName != "TD") return; document.getElementById("cg"+(o.cellIndex+1)).className=""; }

Page 99: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

90

TheCSSfile

body { margin-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; text-align: left; } colgroup.over { background: #ebeeff; } .oce-first { background: #d0dafd; border-right: 10px solid transparent; border-left: 10px solid transparent; min-width:199px; font-size: 14px; padding: 12px 15px; color: #039; text-align:justify; } .oce-predict { background: #d0dafd; border-right: 3px solid #F00; border-left: 3px solid #F00; border-top: 3px solid #F00; border-bottom: 3px solid #F00; min-width:199px; font-size: 14px; padding: 12px 15px; color: #039; text-align:justify; } table.one-column-emphasis { font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif; font-size: 12px; width: 100%; border-collapse: collapse; color: #969; } table.one-column-emphasis th { font-size: 14px; font-weight: bold; padding: 12px 15px; color: #039; text-align:center; } table.one-column-emphasis td { padding: 10px 15px; color: #669; border-top: 1px solid #e8edff; min-width:166px; text-align:center; } table.one-column-emphasis tr:hover td { background: #ebeeff; text-align: center; } table.one-column-emphasis tr:hover td:hover

Page 100: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

91

{ color: #039; background: #94acff; } .bold { font-weight: bold; } .italics { font-style: italic; } .oce-first { text-align: justify; }

Page 101: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

92

ThePHPscripts

data.php

<?php $queue=$HTTP_GET_VARS['queue']; $userId=$HTTP_GET_VARS['userId']; include("connect.php"); $queueArray=explode("_",$queue); for($i=0;$i<substr_count($queue,"_");$i++) { $values=explode(":",$queueArray[$i]); mysql_query("INSERT into data values(\"".$userId."\",\"".$values[0]."\",\"".$values[1]."\")"); } mysql_close($conn); ?>

connect.php

<?php $dbhost = 'localhost:8889'; $dbuser = 'root'; $dbpass = 'root'; $conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql'); $dbname = 'MSc'; mysql_select_db($dbname); ?>

bought.php

<?php $product=$HTTP_GET_VARS['product']; $userId=$HTTP_GET_VARS['userId']; include("connect.php"); mysql_query("INSERT into bought values(\"".$userId."\",\"".$product."\")"); mysql_close($conn); ?>

alignData.php

<?php include("connect.php"); $result=mysql_query("SELECT * FROM `bought`"); while($row = mysql_fetch_array($result)) { $result_1=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$row['userId']."\" order by `cellID`"); $columnNames=""; $values=""; $row_1 = mysql_fetch_array($result_1); $previous_column=$row_1['cellID']; $previous_value=$row_1['time']; while($row_1 = mysql_fetch_array($result_1)) { if($previous_column==$row_1['cellID']) { $previous_value+=$row_1['time']; }

Page 102: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

93

else { $columnNames=$columnNames.",".$previous_column; $values=$values.",\"".$previous_value."\""; $previous_value=$row_1['time']; $previous_column=$row_1['cellID']; } } $columnNames=$columnNames.",".$previous_column.",product"; $values=$values.",\"".$previous_value."\",\"".$row['product']."\""; mysql_query("INSERT INTO finalData(userID".$columnNames.") values (\"".$row['userId']."\"".$values.")"); mysql_query("DELETE from `bought` where `userId` = \"".$row['userId']."\""); mysql_query("DELETE from `data` where `userID` = \"".$row['userId']."\""); } mysql_close($conn); ?>

predict.php

<?php include("connect.php"); $totalTime=0; $result_1=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$_GET['userId']."\" order by cellID"); $result_2=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$_GET['userId']."\" order by cellID"); while($row_2 = mysql_fetch_array($result_2)) $totalTime+=$row_2['time']; $columnNames=""; $values=""; $row_1 = mysql_fetch_array($result_1); $previous_column=$row_1['cellID']; $previous_value=$row_1['time']; while($row_1 = mysql_fetch_array($result_1)) { if($previous_column==$row_1['cellID']) { $previous_value+=$row_1['time']; } else { $$previous_column=$previous_value/$totalTime; $previous_value=$row_1['time']; $previous_column=$row_1['cellID']; } } $$previous_column=$previous_value/$totalTime; decisionTree(); //neuralNetwork(); function decisionTree() { $model_DT=0; if($b5 <= 0.04509) if($k4 <= 0.013828) if($v1 <= 0.000362) if($r0 <= 0.000626) if($d5 <= 0.003481) if($d5 <= 0.001586) if($g4 <= 0.033267) if($s3 <= 0.004874) if($u1 <= 0.002108) if($f1 <= 0.039667) if($f4 <= 0.028894) if($i4 <= 0.004699) if($d2 <= 0.001173) if($e5 <= 0.001377)

Page 103: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

94

if($e1 <= 0.029566) if($r3 <= 0.000861) if($c1 <= 0.043665) if($a3 <= 0.206815) if($b1 <= 0.007319) if($f3 <= 0.001471) if($b4 <= 0.00214) $model_DT=2; else if($a4 <= 0.004126) $model_DT=3; else $model_DT=2; else $model_DT=3; else if($b3 <= 0.123969) $model_DT=2; else $model_DT=1; else $model_DT=1; else $model_DT=1; else $model_DT=3; else $model_DT=1; else $model_DT=3; else if($s4 <= 0.002873) $model_DT=2; else $model_DT=4; else $model_DT=1; else $model_DT=4; else $model_DT=3; else $model_DT=3; else if($q1 <= 0.004708) if($r4 <= 0.007391) $model_DT=3; else $model_DT=2; else $model_DT=2; else if($g5 <= 0.004141) if($k4 <= 0.001354) $model_DT=4; else $model_DT=3; else $model_DT=2; else $model_DT=4; else if($g5 <= 0.004141) if($b5 <= 0.002996) if($g4 <= 0.003922) $model_DT=2; else $model_DT=1; else $model_DT=3; else $model_DT=5; else $model_DT=4; else if($s4 <= 0.005561) if($t4 <= 0.002371) if($e0 <= 0.001979) if($h2 <= 0.005305) $model_DT=1; else $model_DT=2; else $model_DT=2; else $model_DT=2; else $model_DT=2; else if($f5 <= 0.001805) $model_DT=4; else $model_DT=2; else if($t3 <= 0.000515) if($d4 <= 0.008991) if($e2 <= 0.011901) if($a1 <= 0.001341) if($g2 <= 0.001762) $model_DT=4; else $model_DT=5; else $model_DT=5; else $model_DT=4; else $model_DT=2; else $model_DT=3; echo $model_DT; }

Page 104: Web Content Recommendation using Machine Learning on User Mouse Tracking Data

>>Appendix:SourceCode

95

function neuralNetwork() { $Node5=(-0.0209449762256399)+($a0*0.0120761574490061)+($a1*-0.0174298014185729)+($a2*-0.0175622955697642)+($a3*-0.000798046164731245)+($a4*-0.00566210278243689)+($a5*-0.00257021437573848)+($b0*0.0813554156049207)+($b1*-0.0383601651270091)+($b2*0.0315342748963075)+($b3*0.04750940128612)+($b4*0.00444930879229902)+($b5*0.0447743155601993)+($c0*0.0127846301489485)+($c1*0.0167829106398277)+($c2*0.0412283962113621)+($c3*0.0647197008365273)+($c4*0.026137495413712)+($c5*0.0292672102649498)+($d0*0.0575247995032596)+($d1*-0.0248903478567491)+($d2*-0.0356248056960633)+($d3*0.0131503378763436)+($d4*-0.00943722882163672)+($d5*0.0254130310753136)+($e0*0.0953293388209953)+($e1*-0.0358630730881965)+($e2*0.09184645890614)+($e3*0.0879998946588433)+($e4*-0.0210989430518799)+($e5*0.0236328879965554)+($f0*0.0521255666178908)+($f1*0.0562279524027289)+($f2*0.0420766593208718)+($f3*0.0219358641315261)+($f4*0.0500915161629286)+($f5*0.0598788090622592)+($g0*-0.0106339935340819)+($g1*0.0158371741591566)+($g2*0.0828753056435395)+($g3*-0.0152508552198513)+($g4*-0.00815101349601804)+($g5*0.0268439313590316)+($h0*0.070123678107641)+($h1*-0.0147305324346031)+($h2*0.0517135568746786)+($h3*-0.0117294349734072)+($h4*-0.00594235655570873)+($h5*0.0410639065208286)+($i0*-0.00105630930040345)+($i1*-0.00543787837624847)+($i2*0.0603755263497366)+($i3*0.0287693595250936)+($i4*0.0554227984526808)+($i5*0.0600355834517169)+($j0*0.0186135251521197)+($j1*0.00984875030922667)+($j2*0.0193290574626347)+($j3*0.021484574396215)+($j4*0.0484829773111019)+($j5*0.0233728871769681)+($k0*0.0410110073637687)+($k1*-0.00743846515678319)+($k2*0.0446579060767132)+($k3*0.00789530586935209)+($k4*0.0185589336156669)+($k5*0.0178833473514336)+($l0*0.0366297156412459)+($l1*0.0297884220860898)+($l2*0.0450253751867714)+($l3*0.0705159823038729)+($l4*0.074643360814636)+($l5*0.049178643898654)+($m0*0.00649293306157912)+($m1*0.0235761949995652)+($m2*0.0282972581223614)+($m3*0.00995247757969736)+($m4*0.0635360916248171)+($m5*-0.0185514952082912)+($n0*0.0798799834823821)+($n1*-0.0367274799798666)+($n2*0.0461992904934746)+($n3*0.0354383668658634)+($n4*-0.00123240277220675)+($n5*-0.0150807856098709)+($o0*-0.0260784636646052)+($o1*0.0553028912171675)+($o2*0.0802089447351997)+($o3*-0.0235601224487924)+($o4*-0.0281363990127924)+($o5*0.0319917291420718)+($p0*-0.0257109331590629)+($p1*-0.0279769700636828)+($p2*0.0433907293866429)+($p3*-0.0310545628159805)+($p4*0.0348153094694314)+($p5*-0.00776438719161176)+($q0*-0.0069736497593223)+($q1*0.0161811177301145)+($q2*0.0576906924312276)+($q3*0.0441712928131897)+($q4*0.0165528172670987)+($q5*-0.0274805831321372)+($r0*0.0120430047036489)+($r1*-0.000892653621313331)+($r2*0.0868045378672117)+($r3*0.0281943074796785)+($r4*0.0670839346752799)+($r5*0.0110772507057164)+($s0*0.0214207237015366)+($s1*-0.032511653106313)+($s2*0.0328856849361516)+($s3*0.0313926662260086)+($s4*0.0111177031525771)+($s5*0.0284289901014687)+($t0*0.0428425565992686)+($t1*0.0534413420371503)+($t2*0.0244766875457709)+($t3*0.0647078085232812)+($t4*0.0112235270733354)+($t5*0.0097765520400492)+($u0*0.0259846759422365)+($u1*-0.0430507927467189)+($u2*0.107107831659775)+($u3*0.0467301403971514)+($u4*0.0571975966844622)+($u5*-0.0079845822250066)+($v0*0.0303173561775128)+($v1*-0.0043169837441232)+($v2*0.0866140345320475)+($v3*0.00261036151061667)+($v4*0.00523185366643474)+($v5*-0.0239702999191261); // Similar codes for the rest 72 nodes have been omitted. This was done because it was a 40 page long code. The complete code is available online for reference $max=max((1/(1+(1/pow(2.718282,$Node0)))),(1/(1+(1/pow(2.718282,$Node1)))),(1/(1+(1/pow(2.718282,$Node2)))),(1/(1+(1/pow(2.718282,$Node3)))),(1/(1+(1/pow(2.718282,$Node4))))); if($max==(1/(1+(1/pow(2.718282,$Node0))))) echo "1"; else if($max==(1/(1+(1/pow(2.718282,$Node1))))) echo "2"; else if($max==(1/(1+(1/pow(2.718282,$Node2))))) echo "3"; else if($max==(1/(1+(1/pow(2.718282,$Node3))))) echo "4"; else if($max==(1/(1+(1/pow(2.718282,$Node4))))) echo "5"; } mysql_close($conn); ?>