web content recommendation using machine learning on user mouse tracking data
Post on 28-Mar-2015
1.138 Views
Preview:
TRANSCRIPT
WebContentRecommendationusingMachineLearningonUserMouseTrackingData
SparshGupta
PembrokeCollege|ComputingLaboratoryUniversityofOxford
Submittedinpartialfulfillmentoftherequirementsforthedegreeof
MasterofScienceinComputerScience
September2009
Abstract
ii
ABSTRACT
Thewebsitesarebecomingmoreandmoredynamicbutnotintelligent.Basedon
certain mouse clicks or user choices, today’s dynamic websites can mold
themselves but cannot predict relevant data intelligently. The data contained in
today’swebsitesisgrowingandthenumberofusersdemandinguniquedifferent
information is also ever increasing. This has created a challenging problem of
deliveringtherightcontenttoeveryuser.
Thisthesisisanoriginalworkconcentratingonsolvingthisproblemofgenerating
relevant content for each individualuser.Oneof theprimary inputsusedby the
project is the mouse movement behavior of the user. If the website capturing
mousemovementsisbuiltinsuchawaythatthemousepointerismostlycloseto
the point of gaze of the user, then the mouse movement behavior would
theoreticallymean tracking the eye of the user. Based on thismousemovement
data,furthercontentcanbepredictedandpersonalizedforeachuserusingoneor
moremachinelearningmodels.
Thisthesisproposesacompletemethodologyofbuildingandimplementingsucha
system. As a proof of concept, an online shopping website has been built and
further tests havebeen conductedwhich gave a remarkable accuracyof 84.09%
whencomparedwiththeactualneedsoftheuser.
Theworking demonstration of the project alongwith its description is available
onlineathttp://sparshgupta.name/MSc/Project
Keywords:adaptiveweb,machinelearning,mousemovement,gazepoint
Acknowledgement
iii
ACKNOWLEDGEMENT
Iamheartilythankfultomysupervisor,Dr.VasilePalade,whoseencouragement,
guidance, confidence in my idea and support from the initial to the final level
enabledme todevelop this project andunderstand the subject. I am thankful to
ComputingLaboratory,UniversityofOxfordforacceptingmyproposalandgiving
meanopportunitytoworkonthisidea.Igratefullyacknowledgethesupportand
helpofallthevolunteerswhohelpedmecollectthedataformywork.
IwouldliketothankProf.LukeOngandPembrokeCollegefortheirco‐operation
andreadiness toalwayshelpmewhenneeded. Iwouldalso like toacknowledge
theeffortsandfacilitiesprovidedbythestaffoftheComputingLaboratoryLibrary,
RadcliffScienceLibraryandPembrokeCollegeLibrary.
Lastly, I offer my regard to my parents, my sister and friends who always
supportedmeinallrespectsduringthecompletionofthisproject.
SparshGupta
TableofContents
iv
TABLEOFCONTENTS
Abstract........................................................................... ii
Acknowledgement.......................................................... iii
TableofContents........................................................... iv
TableofFigures.............................................................. ix
Introduction........................................................... 1
1.1 APrimer ..................................................................................1
1.1.1 TheWorldWideWeb ............................................................................... 1
1.1.2 Thecomputermousedevice ....................................................................2
1.1.3 Eyetracking ...............................................................................................2
1.1.4 WWWandthemissinggap .....................................................................3
1.1.5 Trackingmousepointertotrackuser’seyes ...........................................3
1.2Motivation.............................................................................. 4
1.3Objectives............................................................................... 5
1.4Structureofthedissertation ................................................. 7
Background,LiteraturereviewandProjectoverview .................................................................8
2.1Coordinationofmouseandeyemovements ........................8
2.2Capturingmousemovements ............................................. 10
2.3Trackingmousemovementtodetermineusersbehaviour 11
TableofContents
v
2.4Discussion ............................................................................. 11
2.5Projectoverview....................................................................12
DataCollectionandPre‐processing.....................15
3.1 Theinitialwebsite................................................................. 15
3.1.1 Specifications ........................................................................................... 15
3.1.2 Implementation ...................................................................................... 16
3.1.2.1WebpageDesign............................................................................... 17
3.1.2.2DatabaseDesign...............................................................................20
3.1.2.3Implementingmousetracking........................................................22
3.1.2.4Finalproductboughtbytheuser ...................................................25
3.1.3 Testingtheinitialwebsite ......................................................................25
3.2Datacollection ..................................................................... 26
3.3Datacompilationandcleaning ........................................... 27
3.3.1 NeedandSpecifications .........................................................................27
3.3.2 Implementation......................................................................................28
3.3.2.1Datacompilation..............................................................................28
3.3.2.2Datacleaning ...................................................................................30
3.3.2.3Datanormalization .........................................................................32
Buildingmachinelearningmodels..................... 34
4.1MachineLearning ................................................................ 34
4.1.1 WEKA....................................................................................................... 35
4.1.2 WhyMachineLearning?........................................................................ 35
4.2Methodsevaluated................................................................35
TableofContents
vi
4.2.1 DecisionTree ..........................................................................................36
4.2.2 NeuralNetwork......................................................................................36
4.3Implementedalgorithms......................................................37
4.3.1 DecisionTree(C4.5)...............................................................................38
4.3.2 NeuralNetwork(MultilayerPerceptron).............................................39
4.4Modelbuilding..................................................................... 39
4.4.1 DecisionTree......................................................................................... 40
4.4.1.1Detailsofthechosendecisiontree ................................................ 40
4.4.1.2Testingthedecisiontree.................................................................45
4.4.1.2.1TestingonTrainingData.........................................................45
4.4.1.2.2TestingbyCross‐Validation(folds10) .................................. 46
4.4.1.2.3Discussion ............................................................................... 46
4.4.2 NeuralNetwork .....................................................................................47
4.4.2.1Detailsofthechosenneuralnetwork ........................................... 48
4.4.2.2Testingtheneuralnetworkmodel ................................................ 51
4.4.2.2.1TestingonTrainingData ........................................................52
4.4.2.2.2TestingbyCross‐Validation(folds10)...................................52
4.4.2.2.3Discussion................................................................................53
4.4.3 DecisionTreeVsNeuralNetworks ......................................................54
Embeddingthemachinelearningmodelsinthewebsite ................................................................. 56
5.1WhatandWhy? .................................................................... 56
5.2Specifications ....................................................................... 56
5.3Implementation ....................................................................57
5.3.1 ImplementingtheDecisionTreemodel ...............................................59
TableofContents
vii
5.3.2 ImplementingtheNeuralNetworkmodel...........................................59
5.4Usingmodeloutputs ...........................................................60
5.5Whatnext ............................................................................. 62
TestingandResults..............................................64
6.1Testingmethodology ........................................................... 64
6.2Testingformodelaccuracy.................................................. 64
6.2.1 Testingdatacollection...........................................................................65
6.2.2 ModeltestinginWEKAusingtestdata.............................................. 66
6.2.2.1DecisionTreemodel........................................................................67
6.2.2.2NeuralNetworkmodel .................................................................. 68
6.2.3 Discussion.............................................................................................. 69
6.3Testingtimeperformanceofthemodels............................ 69
6.3.1 DecisionTreemodel ..............................................................................70
6.3.2 NeuralNetworkmodel ..........................................................................70
6.3.3 Discussion............................................................................................... 71
6.4Results ...................................................................................71
Conclusion ........................................................... 73
FutureWork ...............................................................................75
Bibliography......................................................... 78
Appendix:SourceCode........................................ 82
TableofContents
viii
HTMLfinalwebpage .........................................................................................82
TheJavaScriptfile ..............................................................................................87
TheCSSfile ....................................................................................................... 90
ThePHPscripts .................................................................................................92
data.php.........................................................................................................92
connect.php...................................................................................................92
bought.php....................................................................................................92
alignData.php................................................................................................92
predict.php ....................................................................................................93
TableofFigures
ix
TABLEOFFIGURES
Figure1:Projectoutline.................................................................................................................... 14
Figure2:Screenshotofthetophalfofthedevelopedwebpage...................................... 17
Figure3:Screenshotofthedevelopedwebpage.................................................................... 18
Figure4:Codegiventoeachsectionofthewebpage........................................................... 19
Figure5:Screenshotwithacellhighlighted ............................................................................ 20
Figure6:Databasetable'data' ....................................................................................................... 21
Figure7:Databasetable'bought' .................................................................................................. 21
Figure8:ParametersusedforbuildingtheDecisionTreemodel .................................. 42
Figure9:ParametersusedforbuildingtheNeuralNetworkmodel ............................. 50
Figure10:Screenshotofthepredictiondonebythemodel ............................................. 62
Introduction
1
1 INTRODUCTION
Thischapterincludesabriefoverviewofafewterms.Itthendiscussesthe
coordinationbetweeneyeandmousemovementandhowmousemovement
data can be used as pseudo eye tracking data. Later, this chapter talks
aboutthemotivationbehindthisprojectandclarifies theobjectivesof the
researchandthestructureofthisdocument.
1.1 APrimer
ThissectionofthechapterwilldiscussabriefhistoryoftheWorldWideWeb(WWW),
the use of a computer mouse and the current eye tracking technology. It will later
explainhowtheWWWcanbe improvedbyusingeyetrackingdataandhowamouse
pointercanbeusedtocollectpseudoeyetrackingdata.
1.1.1 TheWorldWideWeb
In1990,CERN launched theworld’s firstwebsite1,whichwasonlya few linesof text
and hyperlinks. In its nineteen years of journey, today’s websites have completely
revolutionized. The plain text is now being accompaniedwith all sorts of richmedia
1CERN,Welcometoinfo.cern.ch/,http://info.cern.ch/.
Introduction
2
including images, music, videos, animations, colours etc. Dynamic data from ever‐
increasingdatabasesisrapidlyreplacingthestaticcontentofthewebsites.Webservers
arenowcapableofmorerealtimecomputing.Datacannotonlybeshowntoauserbut
canbecollectedfromhimeasily.Recently,thesuccessofAJAX1hascompletelychanged
thewebexperiencebymakingitmuchmoreinteractiveandmoredatadriven.
Today, Internet has changed everything, from how we do business, how we study,
connectwithfriendsandingeneral,howwelive.
1.1.2 Thecomputermousedevice
Mostofthepeopleintheworldusesacomputer‐pointingdevice(generallyamouse)to
navigatethroughawebsite.Theyclickhyperlinksspreadacrossdifferentsectionsofa
webpage,selecttextsorscrollthroughalongpageusingacomputermouse.Mousecan
safely be called as a personal assistant while working on a computer and especially
whilebrowsingawebsite.
1.1.3 Eyetracking
EyetrackingorGazetrackingisaprocessofmeasuringthegaze,i.e.,keepingatrackof
thepointatwhichauserislooking.Mostofthewebsiteshavevisualinformationinthe
formoftexts,images,graphics,etc.,andalmostalltheinformationauserattainsfroma
websiteisbyperceivingitthoughhiseyes.
Eyetrackingwhenemployedtoawebsitecanbeimaginedasamethodofdetermining
theportiononthescreenatwhichtheuserislooking.Thisinformationcanpotentially
1W3Schools,Ajax,http://www.w3schools.com/Ajax/.
Introduction
3
givea fair ideaaboutthesectionsmostrelevanttohim.Themoretimeauserspends
lookingataparticularsection,readingitorsimplyviewingit,themoreinterestedheis
inthatsectioncomparedtotheothersonthesamepage.
1.1.4 WWWandthemissinggap
Websites have started becoming dynamic by accepting inputs from a user,which are
thenusedtoselectrelevantcontentorinformationforhim.Thekindsofinput,current
websites primarily employ are: mouse clicks, key presses, text entered or choices
chosenbytheuser in the formelementof thepage.This,onthecontrary,meansthat
incasetheuserisnotinterestedingivinganydataasinput,thewebsitewouldendby
beingstaticorwithoutanyinformationonuserneeds.
Theeye trackingdata, if captured for a generaluser, canbeutilizedvastly inmaking
today’swebsitesmore adaptive and intelligent by harnessing the knowledge of users
interest and information he ismost interested in.Without seeking any external data
fromtheuser,hisinterestsandneedscanbedeterminedbasedonhiseyemovements
andhecanbeservedthedataheismostinterestedin.
1.1.5 Trackingmousepointertotrackuser’seyes
Therehasbeena lotofresearch in improvingthecomputingexperience forauserby
trackinghiseyelocation,butthereareafewdrawbacksassociatedwithit.Firstly,the
tracking equipment is expensive and the user needs to physically wear the tracking
gadget.Not everyoneusing Internetwouldwant or canwear the tracking equipment
andhencethegeneralpublicwebsitescannotbemadedependentonthem.Thereare
alsoongoingresearchestodeterminethemovementofeyesusingacameradevice,but
as of now the accuracy of determining the gaze position is low and it depends on
movements of the user, lighting conditions and, most importantly, the user need to
Introduction
4
download an external software. Because of these limitations of the eye tracking
methods,therehavebeenresearchesinfindingotheralternatives.
Recently Googlers Kerry Rodden and Xin Fu proposed in their paper (Rodden, et al.
2008)thatmousemovementsshowpotentialasawaytoestimatewheretheuserhas
considered before deciding where to click. There have been other studies that have
providedareasonableestimateofcoordinationofmouseandeyeespeciallyonapagein
which a click is likely to happen. Hence, tracking a user mouse movement can
sometimes be used as a pseudo eye tracking data. There are several interface design
techniquesinHumanComputerInteractionwithwhichawebsitecanmakesurethat,in
mostcases,user’smousepointerisclosetohispointofgaze.Oneofthetechniquesthat
havebeenemployedintheprojectisthemouseovercellhighlighting.Ifthecontentat
thecurrentlocationofthemousepointerishighlightedtomakeitstandoutofrestof
the page, then this can almost always ensure that the mouse pointer movement is
synchronizedwiththeareatheuseriscurrentlyreadingorgazingto.
1.2 Motivation
Many websites do not ask for any explicit input from the user but can still adapt
themselves. They primarily use either some geographical information (which can be
obtained from user’s IP address) or the browser/operating system specifications to
adapt the web content for the user. This adaptation is of course not targeted to an
individualuserandisonlyabroadadaptationtocateragroupofusershavingsimilar
demographicsorpreferences.Theadaptationofawebsitecanbebasedonanysmallest
bit of information from theuser.Themore information thewebsite attains about the
user,thebetteritiscapableofadaptingtohisneeds.
Introduction
5
Theprimarymediumof interactionof auserwithawebsite is amousedeviceand it
produces a huge amount of data in the form of mouse movement behaviour. The
motivationbehindthisthesisandprojectistheexistinggapbetweenthedemand
ofmoreuserdataforawebsitetomakeitadaptiveandtheavailabilityofample
datafromtheuserintheformofhismousemovements.
Further,ifawebsiteisdesignedinsuchawaythatmostoftenornottheuser’smouse
pointermovementissynchronizedwithhispointofgaze,asdiscussedinSection1.1.5,
thenthedatacanalsoberoughlycalledaseyetrackingdata.
1.3 Objectives
Theobjectiveofthisprojectistoeffectivelyutilizethemousemovementdataofauser
inmaking theweb contentmore adaptive for him, by dynamically predicting further
relevantcontentforhim.
Inordertoachievetheabovemainobjective, thefollowingsub‐objectivesneedstobe
catered:
• Collectingtheinitialtrainingdatasetofmousemovementbehaviorfromalarge
setofusers inorder to trainandbuildamodel.Thiswill involvedevelopinga
websitewithwell‐definedareaorsectionsorelementswheremousemovements
could be tracked. The website needs to be such that users mouse pointer
synchronizewithhispointofgaze.
• Asking volunteering users to visit this site and choose or select content for
themselves like theydoonanyotherwebsite.Tracking the time spent at each
section / element of thepage,while theuser is browsingon it is the required
data. The target (predicted or dependent) variable is the relevant content for
Introduction
6
himandhenceinordertotrainthemodel,thisdatapoint(ascollectedexplicitly
fromtheuser)alsoneedstobesavedinthedatabases.
• Datacleaningandprocessing isanessentialsteprequiredafterdatacollection.
This isbecause it is important to removeall outliers that canharm themodel.
The time user spent at different sections of the webpage should ideally be
normalizedbythetotaltimespentbyhim.
• Buildingmachinelearningmodelsbyusingthecollectedmousemovementdata
as the training and initial testing dataset. The distribution of normalized time
spent by the user at each section of the webpage would be the independent
variablesandhencewouldbecometheinputattributesofthemodel,andfurther
contentfortheuserwillbetheoutputofthemodel,asthedependentvariable.
• Embeddingthemachinelearningmodelsbackintothewebsitesothatthemodel
can be put into use. The website would continue tracking users mouse
movementsandwouldusethebuiltmodeltocomputefurthercontentforhimin
realtime.
• Testing the accuracy of the implementation. To do this, the predicted content
needstobecomparedwiththeactualcontentdesiredbytheuser.
Todemonstrate theobjectives, a sample shoppingwebpagehasbeendeveloped.This
webpagecontainsacomparisonofthespecificationsoffivelaptopmodels.Basedonthe
mouse movement behavior of a user across this page, the best laptop would be
recommendedtohim.
Thiscanbevisualizedasfollows:Ifauserhasabrowsingpatternthatsignifiesthatheis
spendingsay40%timereadingabouttheRAMofthelaptops(furtherdistributionoftime
spentondifferentRAMsizesofdifferentmodels),30%timereadingabouttheprocessors,
20%timeabouttheHardDiscDriveandtherest10%timesimilarlyreadingaboutother
specifications, then based on this data and the developed machine learning model, the
Introduction
7
most suitable laptopcanbe recommended tohim.Theaccuracyof the recommendation
could be checked by comparing the product finally bought by the user and the product
recommendedbythewebsite.
1.4 Structureofthedissertation
This document will start with giving an idea about the related research being done
across the globe. It will then explain the complete implementation outline as a big
pictureoftheproject.InChapter3,thethesiswilldiscussthemethodologyofcollecting
initial training data, which would also involve the complete description of the
development procedure of the initialwebsite. Itwill explain the process of collecting
data along with the structures of the databases and the data cleaning procedure.
Chapter 4 would give the details of the machine learning models built and the
procedureinvolvedalongwiththetestingresultsofthemodelsobtainedonthetraining
data.Chapter5wouldexplaintheprocedureadoptedtoimplementthebuiltmodelinto
thewebsite and the details of theAJAX communication link between themodel, data
and thewebsite.Then the thesisexplains themethodology to collect testingdataand
wouldexplain the testingmethodologyand resultsobtainedon themodel.The thesis
closeswith someconclusionsand theauthor’sviewon thepossibilityof futurework.
Theattachedappendixcontainsallthesourcecode.
Theworkingdemonstrationof theproject,alongwith itsdocumentationandtheGNU
General Public License source code is available online at
http://sparshgupta.name/MSc/Project
Background,LiteraturereviewandProjectoverview
8
2 BACKGROUND,LITERATUREREVIEWANDPROJECTOVERVIEW
This chapter explains the previous work related to the problem already
going on around the world. The chapter is divided into different sections
explainingindependentandcombinedworkgoingonorbeingdoneineach
of the heading. The chapter later summarizes the ongoingwork and also
presentsanoverviewoftheprojectcarriedoutbytheauthor.
Theworkdoneintheprojectisanoriginalideaandthereisnorecordofanyworkbeing
done around using the same methodology. The problem has been tackled to some
extent andhasbeen consideredby a few researchgroupsbut theirmethodologyand
finalconclusionswereverydifferentfromwhathavebeenproposedinthisthesis.The
following parts of the chapterwould highlight some of the recent developments and
workdoneinrelatedfields.
2.1 Coordinationofmouseandeyemovements
Theprimequestionofwhethermousetrackingcanbesubstituted,oratleastpartially
replicate,eyetrackingisactive.
(Chen,AndersonandSohn2001)studiedtherelationshipbetweenthegazepositionof
a user and his cursor position on a computer screen during web browsing. They
Background,LiteraturereviewandProjectoverview
9
conductedtestsonseveralwebsitesandrecordedtheeyeandmousemovementsofthe
uses and studied them separately. They concluded that there is a strong relationship
betweengazepositionandcursorpositionandalsothatthereareregularpattersofthe
coordination.Theyhavealsoargued thatamousecouldprovideusmore information
thanjustxandycoordinateswhichcouldbeusedtodesignbetterinterfacesforhuman
computer interactions. They wrote in their conclusion that “Our data show that the
dwelltimeofcursoramongdifferentregionshasstrongcorrelationtohowlikelyauser
will lookat thatregion.Also, inover75%ofchances,amousesaccadewillmovetoa
meaningfulregionand,inthesecases,itisquitelikelythattheeyegazeisverycloseto
the cursor. This result implies that, by predicting the users' interests on web pages,
moussedevicecouldbeaverygoodalternativetoaneye‐trackerasatoolforusability
evaluation.”
According to the work done at Google labs (Rodden, et al. 2008), several different
pattersofcoordinationbetweeneyeandmousepointerwereobservedonawebsearch
resultpage.The identifiedbehaviorpatters to indicateactiveusageswere– following
theeyehorizontally, following theeyeverticallyandmarkingaparticular result.This
work was completely done on a search results page but clearly concludes that
coordinationbetweenuser’seyeandhismousepointerexists.
Therehavebeenmorestudies(Byrne,etal.1999)andothersontherelationshipand
coordinationbetween eyemovements andmousemovements on theweb.Theyhave
foundthatsomeuserswillusethemousepointertohelpthemreadthepage,ortohelp
themmake a decision about where to click. If was concluded that given an intent /
opportunity to click in the currentuser activity, themouse ismuchmore likely tobe
closetotheeye.Eyetrackingcanprovideinsightsintousers’behaviorwhileusingthe
searchresultspage,buteye‐trackingequipmentisexpensiveandcanonlybeusedfor
studieswhere theuser isphysicallypresent.Theequipmentalsorequirescalibration,
Background,LiteraturereviewandProjectoverview
10
addingoverheadtostudies.Incontrast,thecoordinatesofmousemovementsonaweb
pagecanbecollectedaccuratelyandeasily,inawaythatistransparenttotheuser.This
means that it can be used in studies involving a number of participants working
simultaneously, or remotely by client‐side implementations – greatly increasing the
volumeandvarietyofdataavailable.
Thereisabasicrationalitythatstates"IfImightclick,Imightaswellkeepthemouse
closetomyeyes."Wherethere'snopotential toclick,eitherbecausetheuser is inan
evaluativemodeorthecontentofinterestisdevoidoflinks,themouseandeyediverge.
2.2 Capturingmousemovements
Therecanbeseveraldifferentmethodologiestocapturemousemovementbehaviorofa
userover awebpage.Thisprimarilydependsupon the typeof data required and the
mousemovementexpected.(Arroya,SelkerandWei2006)proposedatoolthatneedno
installationand is capableof trackingusersmousemovement.Thismousemovement
datacanbevisualizedinaninbuiltsystemandcanbeusedtofurtherrefinetheusability
of thewebpage. They however have not proposed anymethodology to automatically
refinethewebpage.
(Edmonds,etal.2007)talksabouttechniqueandusesofmousetrackingonawebsite
but completely from usability point of view. It handles the capturing of the mouse
movementsdataofauser inamoredetailedwaycapturing thecoordinates, rowand
column ID alongwithmany other parameters. Thismethodologywas found effective
butshowednosignificancefromthecurrentproblempointofview.
The paper (Torres and Hernando, Real time mouse tracking registration and
visualizationtoolforusabilityevaluationonwebsitesn.d.),proposesamethodologyto
track mouse movements on a webpage and visualize them on a tool that they have
Background,LiteraturereviewandProjectoverview
11
developed.TheyhaveusedtheHTMLandAJAXlanguagesandhaveproposedamethod
to link the mouse movements with the server logs and web‐stat data to get add‐on
informationoftheuser’sbehavior.
2.3 Tracking mouse movement to determine users
behaviour
There was a famous project named ‘Cheese’ done at the MIT (Mueller and Lockerd
2001),whichextendedtheconventionalwebinterfaceusermodel(basedonresponds
ofonlymouseclicks)toaccountallmousemovementsonapageasanadditionallayer
of information for inferring user interest. They developed a straightforward way to
record all mouse movements on a page, and conducted a user study to analyze and
investigatemousebehaviortrendsandfoundcertainmousebehaviors,commonacross
manyusers.Theyalsoproposedthattherearecertaincategoriesofmousebehaviorand
aftertrackingthem,thewebsitecouldbemoldedaccordingly.
2.4 Discussion
It was found after literature review that a lot of work has been done to prove and
supportthecoordinationofeyeandmousemovementofauseronawebsite.Theeye
trackingdatahasbeenusedbyGoogle to improve theusabilityof their searchpages.
Thereareseveralongoingdiscussionsontheeffectiveuseofeyeormousetrackingdata
tomanuallyrefinethecontentandusabilitydesignofawebpage.
Itwashoweverfoundthatnoworkhasbeendoneinusingthemousetrackingdataina
machine learningmodel to automatically refineorpredict content for awebsite for a
userbasedonhismousemovementoreyemovementbehavior.
Background,LiteraturereviewandProjectoverview
12
2.5 Projectoverview
Theprojectundertakencanbestatedasamethodproposedtoautomaticallyrefineor
predictthecontentsofawebpage,forauser,basedonhismousemovementbehavior.
Fromearlierstudies,asstatedinSection2.1,ithasbeenassumedthatthereiscertainly
somecoordinationbetweenauser’seyemovementandhismousemovement.Basedon
themousemovementsofanindividualuser,hispreferencesforcontentandhisneeds
canbepredictedandthisinformationcanfurtherbeusedbytheownersofthewebsite.
If not the owners, this information can definitely help the user in finding the right
contentforhim.
Todothis,thefirsttaskwastodeviceamethodologytotrackuser’smousemovements
onawebpage.Therecanbeseveralwaysinwhichtrackingcouldbedone,andfurther
there can be several different data points that can be saved for a user based on his
mousemovements.Thethesisproposedamethodtotrackthetimespentbyauserin
every section of a webpage. There were several JavaScript functions written, and
modificationsdone toa standardwebsite toenablemouse tracking inahidden layer.
AJAX was used to connect the JavaScript functions with the server end PHP scripts,
whichwerefurtherconnectedtoMySQLdatabasesforstoringthedata.Todemonstrate
all this, a new dummywebsite imitating a shopping portal was developed. Once the
websitewasdevelopedwithmousetrackingcapabilities,itwasmadeavailabletopublic
for twoweeks.Thiswasdone to collect some initialdataonuser’smousemovement
behavior.Thedatacollectedwasprocessedandcleanedbeforeanalyzingandmodeling
it. This complete step of initial website development and data collection has been
explainedindetailsinChapter3(DataCollectionandPre‐processing).
Itwasthenrequiredtostudyandanalyzethecollecteddataandmakeamodelonitso
that it could be used in the future for new visitors. To do this,WEKAwas used and
Background,LiteraturereviewandProjectoverview
13
differenttypesofmodelsweremade.Themodelstooktheindependentvariablesasthe
timespentindifferentsectionsofthewebpagebythemousepointerandpredictedthe
relevantcontentfortheuserasthedependentvariable.Theyallwerebuiltandtrained
ontheinitiallycollecteddataandweretestedonthesametrainingdata.Afterseveral
iterations, twomodels, onebasedonDecisionTreeand theotheronNeuralNetwork
wereobtainedthatgavesignificantaccuracyonthetrainingdata.Thecompletemodel‐
building phase of the project along with the test results obtained are explained in
Chapter4(Buildingmachinelearningmodels)
Once the twomodels (eachofDecisionTreeandNeuralNetwork)wereobtained, the
taskwas to embed themboth into the initialwebsite.Thiswasnecessary so that the
builtmodelscouldbeusedforfuturevisitorsandthecontentrelevanttothemcanbe
predictedbasedontheirmousemovementactivities.ThemodelswerecodedinPHPon
anapacheserverandwereconnectedwith the front‐endHTMLpageusingAJAX.The
PHP script was made to read the real time mouse movement data of a given user
directly from the MySQL databases and execute the model on it to predict further
contentforhim.ThewholeprocedureisexplainedindetailsinChapter5(Embedding
themachinelearningmodelsinthewebsite)
Afterembeddingthetwomodelsintothewebsite,volunteerswereagainaskedtovisit
thewebsite.Thistimenotonlytheuser’smousemovementswerecapturedbutalsohe
was recommended appropriate content based on one of the two machine learning
models.ThemousemovementdatawassavedintheMySQLdatabasestobeanalyzed
foraccuracylater.ThisstepisexplainedinChapter6(TestingandResults)
Thecollecteddatawasusedasthetestdatasetandthetwomodelswereevaluatedon
their accuracy as well as time performances. It was found that under the present
limitations of lack of data, the Decision tree model edged over the Neural Network
Background,LiteraturereviewandProjectoverview
14
modelbothontheaccuracyaswellasonthetimeperformancefront.Thedetailsofthis
steparementionedinChapter7(Conclusion)
Thewholeprojectcanbeoutlinedasfollows:
Figure1:Projectoutline
BuildingtheInitialwebsitecapableoftrackingmousemovementsofthevisitors
Askingvolunteeringuserstovisitthewebsiteandcapturing
theirmousemovements.Cleaningandcompilingthe
collecteddata.
Usingthecapturedmousemovementdataoftheusers,buildingandtrainingmachine
learningmodels
Codingtheobtainedmachinelearningmodelsbackintothe
website
Collectingtestdatasetfromthefinalwebsite.Thewebsitenowiscapableofrecommendingtheappropriatecontentforauserbasedonhismousemovement
behavior
Testingtheaccuracyofthebuiltmodelsusingthecollectedtestdata.Alsoevaluatingthetimeperformancesofthemodelson
thewebserver
DataCollectionandPre‐processing
15
3 DATACOLLECTIONANDPRE‐PROCESSING
Thischapterwillexplainthecompletetrainingdatasetcollectionsteps.This
wouldinvolvedetailsoftheinitialwebsitedevelopedandexplanationofthe
steps followed to obtain the required training data from it. Later, this
chapterwillexplainthedatacompilationandcleaningstepsperformedon
theinitialcollecteddata.
3.1 Theinitialwebsite
To analyze the mousemovement behavior of the users on a webpage, the first step
would be the development of the website under consideration. Since the proposed
methodofanalysisandmodelingthedataismachinelearning,someinitialtrainingdata
is also required. To cater both the needs, a dummy website capable of tracking the
user’smousemovementswas built andmadepublic. Thewebsitewas kept live until
requireddatawasachieved.Thespecificationsanddetailsoftheimplementationareas
follows:
3.1.1 Specifications
Thefunctionalities,requirementsandspecificationsoftheinitialwebpagebuiltare:
• Theuserinterfacedesignoftheinitialwebpageneedstobeexactlysameasthat
oftherequiredfinalwebsite.Thisisimportantbecauseuser’smousemovements
DataCollectionandPre‐processing
16
dependontheinterfaceofthewebpage.Itisnecessarythatthedatacollectedto
buildand train themachine‐learningmodel isof the samewebpagewhere the
modelisfinallyrequiredtobeimplemented.
• Themousetrackingneedstobeimplementedinahiddenlayersothattheuser
canexperiencethewebinthesamerichwaywithoutanycompromiseonspeed,
performance.Heshouldnotbeaskedanyexplicitinformationatanytime.
• Asstated insection1.3, thewebpagedevelopedwasadummyshoppingportal
showingfivelaptopmodelscomparingthemontheirconfigurations.
• Therewere5laptopswith22attributesofeachofthem.Therewasanempty(no
laptop)specificationheadinginformationspaceonthelefthandsideofthepage.
Totalsectionsinthebuiltpagewere
�
5+1( )×22=132,where5arethenumberof
laptops, 1 is for specificationheading category (no laptop space) and22being
thecountofattributesperlaptop.
• Eachofthese132sectionsofthewebpagegetshighlightedassoonasthemouse
pointer reaches it. This ensured that the user is most likely to read the
highlighted section of the webpage and hence ensures that the user’s mouse
pointerisclosetohispointofgaze.Thisstepensuredthatthemousemovement
dataprovidespseudoeyetrackingdataoftheuser.Thecell‐highlightingfeature
wasimplementedusingCascadingStyleSheetswherethecellcolorwaschanged
assoonasmousepointerentersthecell.
• AMySQLdatabasewasconnectedforrecordingthemousepointertimeoneach
sectionofthewebpage.Thefinalproductboughtbythatuserwasalsosavedin
thedatabases.
3.1.2 Implementation
ThewebpagewasdevelopedinHTMLusingPHPastheserversidescriptinglanguage.
JavaScriptandAjaxwasusedtodynamicallytransferdatafromtheHTMLfieldstothe
DataCollectionandPre‐processing
17
PHPscripts.DatabasewasdesignedinMySQLandPHPscriptswerewrittentoconnect
andtransferdatabetweenMySQLandtheApacheserver.
3.1.2.1 WebpageDesign
ThewebpagewasdesignedinHTMLinatabularformatwith6columnsand22rows.
Column1hadtheheadingofthespecificationsandrest5columnshadspecificationsof
eachlaptopandeveryrowhadaspecification.Eachofthe132cellshenceobtainedin
the table were corresponding to an independent variable for the model (input
variables).ThescreenshotofthetophalfofthedevelopedpageisshowninFigure2and
thescreenshotofthecompletewebpageisshowninFigure3.
Figure2:Screenshotofthetophalfofthedevelopedwebpage
Itcanbeseenclearlythatthereare6columnsonthewebpageand22rowsandhence
132cells.Sinceeachofthesecellsisaninputvariabletothemodel,theyallweregivena
code.Eachlaptopwasgivenanumberfrom1to5andthespecificationheadingspace
was given the code 0. Each specificationwas given an alphabetic code from ‘a’ to ‘v’.
Hence,eachofthe132sectionsofthewebpagegotthecodeasthecombinationofthe
alphabetofthespecificationandthenumberofthelaptoplikea0,a1,a2,a3,a4,a5,b0,
DataCollectionandPre‐processing
18
b1,b2,…,v3,v4,v5.ThecodingmethodologyforthefirstfewcellsisshowninFigure4.
Thesecodeswerenotaddedanywhereonthewebpagebutwereonlyusedwhilecalling
themousetrackingfunctionsaswillbeexplainedinsubsequentsections.
Figure3:Screenshotofthedevelopedwebpage
DataCollectionandPre‐processing
19
Figure4:Codegiventoeachsectionofthewebpage
Tomakesurethatinmostcases,theuser’smousepointerisclosetohispointofgaze,a
Cascading Style Sheet was attached with the HTML webpage. The CSS file had two
different style formats that could be applied to each cell. One of the styles was the
normal white background whereas the other format was with blue background to
enablecellhighlighting.Assoonasmouseentersacell, thenormalstylewasreplaced
bythehighlightingstyleforthatcell.Thiswasagainresetassoonasthemouseleaves
thehighlighted cell. Similarly, the rowand the column inwhich themousepointer is
currently present are also highlighted in a light shade of blue. The CSS code of the
differentstylesisavailableintheappendixofthisthesis.Thescreenshotwithacell‘g2’
highlightedisshowninFigure5.
For every visitor of thewebsite a unique user idwas generated as soon as the page
loads.Tokeep theuser id simple, itwaskeptas the current JavaScriptTimevalueat
page load. JavaScript time function returns the current time in milliseconds since
January1,1970.Thisensuredthat inthecurrentscopeoftheproject,allvisitinguser
wouldhaveauniqueuserid.TheJavaScriptcodetogenerateuseridis:
DataCollectionandPre‐processing
20
A JavaScript file named ‘mouseover.js’was associatedwith thiswebpagewith several
JavaScriptvariablesandfunctionsrequiredtotrackandrecordmousemovements.The
HTMLcodeofthewebsitewasalsogivenan‘onload’eventtocallaJavaScriptfunction
named‘start_It()’whichtriggersthemousetrackingfunctionalityofthewebsite.
Thealgorithmsofmousetrackingandthecompleteimplementationwouldbeexplained
laterafterthedetailsaboutthedatabasedesign.
Figure5:Screenshotwithacellhighlighted
3.1.2.2 DatabaseDesign
A database was created in MySQL with two tables namely ‘data’ and ‘bought’. The
attributesofthetwotablesare:
var userId=new Date(); userId=userId.getTime();
<body onload="start_It();">
DataCollectionandPre‐processing
21
Figure6:Databasetable'data'
Figure7:Databasetable'bought'
Table:data
• userIDTorecordtheuseridoftheuser
• cellIDTosavethecellIDthatwasassignedtoeachsubsectionofthewebpage
• timecontainsthetimeinmillisecondsspentinthecellID
Table:bought
• userIDTorecordtheuseridoftheuser
• boughtTosavethecodeofthefinalproductboughtbytheuser
Thetable‘data’wouldsavethetimespentineachcell,i.e.sectionofthewebpagebya
user. There could be 132 different sections / CellIDs for each user and they all can
appearmultipletimes.Thetimespentineachsectionbyauserwillbetheindependent
variableforthemodel.
Thetable‘bought’ismadetorecordthefinalproductselectedbytheuser.Theattribute
‘userID’inboththetablesistheforeignkeyandistheprimarykeyinthe‘bought’table.
DataCollectionandPre‐processing
22
Therationalebehindsuchadesignwastoimplementdatabasenormalizationsothatall
datarepetitioncouldbeavoided.Also,theinsertquerieswouldbesimpleandshortand
hence would be efficient and wont slow the webpage while tracking the mouse and
interactingwith thedatabases simultaneously.Theonlydrawbackof suchadesign is
thatthedatawouldneedmergingbeforeitcouldbeusedfortrainingthemodel.
3.1.2.3 Implementingmousetracking
Eachof the132cellsof thewebpagehada JavaScript ‘onmouseover’and ‘onmouseout’
event statements.OnMouseOver specifies that the ‘movement_in()’ JavaScript function
be called every time themouse comes over that cell. OnMouseOut similarly specifies
that the ‘movement_out(‘cellID’)’ JavaScript function be called when mouse pointer
leavesthecell.Thecodesnippetdemonstratingthesefunctioncallsis:
As soon as mouse pointer enters a cell, the current DateTime was recorded in a
temporaryvariablenamed‘cellEntryDate’inthefunction‘movement_in()’.Thisfunction
wasnotpassedanyattribute.Assoonasthemousepointerexistsacell,thetimespent
in that cell inmillisecondswas calculated by subtracting the ‘cellEntryDate’ from the
currentDateTimeinthefunction‘movement_out(‘CellID’)’.Themovement_out()function
wasalsopassedtheunique2‐lettercellcodetorecordthecellID.Thetimespentinthe
cellalongwiththecellIDwasconcatenatedinthedataqueuevariablenamed‘queue1’
or’queue2’.TheJavaScriptfunctiondefinitionsareasfollows:
function movement_in() { cellEntryDate = new Date(); }
<td onmouseout="movement_out('c1');" onmouseover="movement_in();"></td>
DataCollectionandPre‐processing
23
Thedonevariableintheabovecodewastocheckifthecurrentuserisstillactive
andhasnotboughtaproductalready.Flagwasavariabletocheckwhichqueue
variableiscurrentlyavailable.
Twoinstancesofthequeuevariablesweremadetoensurethatwhiletransmittingone
of thequeuedata to theserverviaAJAX, theotherqueuevariablecanrecord thecell
movements.ThisisofgreatimportancespeciallywhentheInternetbandwidthspeedis
lowanddatatransferinworstcasecantakealotoftime.Thisstepalsoensuresthatthe
interactionexperienceoftheuserwillnotbeaffectedwhilemousetrackingisgoingon
inthebackground.
As stated above, the built website had an ‘onload’ JavaScript event calling a function
named ‘start_It()’. The start_it() function is a recursive function which calls the
‘sendData()’ function every 2 seconds. The sendData() function contains the AJAX
statementtotransferthegenerateduserID(variable‘userID’)andthequeuevariables
namely ‘queue1’ or ‘queue2’ to the ‘data.php’ file at the backend server. The self‐
explanatoryJavaScriptfunctionsdefinitionsareasfollows:
function movement_out(cell) { cellExitDate = new Date(); time = cellExitDate.getTime()-cellEntryDate.getTime(); if(done==0) { if(flag==0) { queue1 = queue1+cell+":"+time+"_"; } else { queue2 = queue2+cell+":"+time+"_"; } } }
DataCollectionandPre‐processing
24
The ‘sendData()’ JavaScript functionuses standardAJAXcallsandstandard ‘http’
open,onreadystatechangeandsendfunctions.The‘query_string’variablecontains
thePHPfiletowhichtheargumentswerepassedviaGETmethod.
The ‘data.php’ files was coded such that it takes the queue variable as sent by the
JavaScript ‘sendData()’ functionandexplodes thestring toextract thevariouscell IDs
and time values associated with them. It then opens a connection with the MySQL
databaseandinsertsrecordswithcellinformationinthe‘data’tableusingthereceived
userID.Thecompletecodeofthe‘data.php’fileisavailableinappendixofthethesis.
function sendData(){ var query_string;
if(flag==0) { queue2=""; flag=1; query_string = "data.php?userId="+userId+"&queue="+queue1; queue1=""; } else { queue1=""; flag=0; query_string = "data.php?userId="+userId+"&queue="+queue2; queue2=""; } http.open("GET", query_string, true); http.onreadystatechange = handleHttpResponse; http.send(null); }
function start_It(){ if(done==0) { setTimeout("sendData()",2000); } }
DataCollectionandPre‐processing
25
3.1.2.4 Finalproductboughtbytheuser
Oncetheuserbrowsethroughthewebpageandscrolledonthetablereadingaboutthe
variousconfigurationsofthefivelaptopsgivingusonecaseofthetrainingdata,hewas
required to select one of the products. This is to simulate the actual shopping portal
scenariowhereapersonreadaboutvariousproductsandfinallybuyoneofit.Toselect
aproduct,heperformsamouseclickoperationonthe‘BuyNow’buttonassociatedwith
theproductasshowninFigure3.
Assoonasany ‘BuyNow’buttononthewebpageistriggeredbytheuser,aJavaScript
functionnamed‘bought(‘ProductID’)’ is invoked.ThisfunctionusestheAJAXprotocols
and sends theuserIDand the IDof theproduct clicked to the ‘bought.php’ fileon the
server. The ‘bought.php’ file on the web server connects to theMySQL database and
insertsthisinformationasarowintheboughttable‘bought’.Thecompletecodeofthe
PHPscript‘bought.php’isavailableinappendixandtheJavaScriptfunctionisasfollows:
Once the user selects the product, further mouse tracking is disabled. Changing the
valueoftheJavaScript‘done’variabledoesthis.
3.1.3 Testingtheinitialwebsite
The website once completed was hosted on a public web server and was tested
thoroughlyforbugsanderrors.Themainpointsinthechecklistwere:
function bought(product){ done=1; var query_bought; query_bought = "bought.php?userId="+userId+"&product="+product; http.open("GET", query_bought, true); http.onreadystatechange = handleHttpResponseBought; http.send(null); }
DataCollectionandPre‐processing
26
• Thequeuevariables (‘queue1’ and ‘queue2’) in the JavaScript fileare recording
thecellIDandtimeappropriatelyandthedataisgettingextractedaccuratelyat
theserver.
• DataisbeingsentproperlyfromthefrontendJavaScriptfunctionstothebackend
PHPfilesviaAJAX.
• ThelinkbetweenthedatabaseandPHPfilesisworkingcorrectly.
• Both the tables in the database are getting data and are inserting it properly
withoutanyerror.
3.2 Datacollection
When the website as explained in the previous section was developed and tested
completely, itwasmade open for the general public. Volunteers via email and social
mediawereinvitedtovisitthewebpage.Theselectionofthevolunteerswascompletely
randomandwasprimarilythecontactgroupoftheauthor.Allthevolunteers/visitors
were asked to browse the webpage and buy a product on it (at cost zero, virtually)
similartothewaytheydoonarealshoppingsite.Fromthissampletheinitialtraining
data for the model was collected and saved into the databases as explained in the
previoussection.Nopersonalinformationoranyotherdatawasaskedfromanyvisitor.
Thedurationof this stepdependson the requirementsof the initial trainingdata for
building the model. The more the number of sections in the website, i.e. more the
independentvariablesof themodel,morenumberof cases in the initial trainingdata
wouldberequiredtobuildarelevantmodel.
Inashortspanof14days,292uniquevisitorsaccessedthewebpage.244rowswere
collected in the ‘bought’ table and 16401 tuples were saved in the ‘data’ table. The
expecteduserswerearound350‐400butduetolackofvisibilityoftheprojectandno
DataCollectionandPre‐processing
27
compensationavailabletothevolunteers,thenumbercouldnotbereachedandinlieu
ofthetime,thewebsitewastakenoffandthedatawasexportedforfurtheranalysisand
cleaning.
3.3 Datacompilationandcleaning
3.3.1 NeedandSpecifications
Thecollecteddatainthetwotablesneedstobemergedinsuchawaythateachrowof
thenew table correspond toa singleuserandcontainsall informationabouthim, i.e.
eachrowisonecaseofthetrainingdata.Eachcasewouldincludeallthetimesspentin
132sectionsofthewebpagealongwiththeuseridandtheproductfinallyboughtbythe
user.Thisisalsotherequiredformattotrainamachine‐learningmodelinWEKA.
Moreover,thecollecteddataneedstobeanalyzedproperlyandcheckedforanyerrors
inthedata.Theremightbesomeuserswhowouldn’thaveprovidedtheinformationon
the actual product bought and hence the data related to them needs to be scrapped.
Someusersarelikelytospendabsolutelynotimeastheymighthaveaccidentlyvisited
thewebpage and hence all users spending less than some calculated threshold time,
needstobescrapped.Similarlyuserswaitingonasection formorethancertain fixed
timeshouldberemoved.Thesestepsareimportanttoensurethattherearenooutliers
inthecollecteddataandthemodelthatwouldbebuiltandtrainedonthisdataisbest
suitedforgeneralusageonthewebsite.
Sincetheabsolutetimespentondifferentelementofthewebpagedependsonanumber
of other features primarily the speed of an individual user, the data needs to be
normalized.Dividingthetimespentbyauseronanindividualsectionbythetotaltime
DataCollectionandPre‐processing
28
spent by that user on the website would give the proportion of time spent by him
readingthatsectionofthewebpage.
Hencethefinal trainingdatashouldonlycontainvalidusersresponsesof theproduct
bought along with the normalized breakup of the time spent by them on various
sectionsofthewebpage.
3.3.2 Implementation
Firstallthedataneedstobecompiledintoasingletableasstatedaboveandthenneeds
tobecleaned.
3.3.2.1 Datacompilation
A new PHP script named ‘alignData.php’ was written to compile the data into more
usable format.This filewouldwriteall thedatatoanewtablenamed ‘finalData’with
following 134 attributes (132 corresponding to the time spent in 132 sections of the
webpage(independentvariables),1torecordtheuserIDoftheuserand1istosavethe
codeofthefinalproductbought(target/dependentvariable).Thefinalproductbought
wouldbethepredictedvariableinourmodelthatshallbediscussedinthenextchapter.
Theattributesofthe‘finalData’tableare:
• userIDTorecordtheuseridoftheuser
• a0Timeinmillisecondsspentincell‘a0’ofthewebpage
• a1Timeinmillisecondsspentincell‘a1’ofthewebpage
• a2Timeinmillisecondsspentincell‘a2’ofthewebpage
• a3Timeinmillisecondsspentincell‘a3’ofthewebpage
• a4Timeinmillisecondsspentincell‘a4’ofthewebpage
• a5Timeinmillisecondsspentincell‘a5’ofthewebpage
DataCollectionandPre‐processing
29
• b0Timeinmillisecondsspentincell‘b0’ofthewebpage
• b1Timeinmillisecondsspentincell‘b1’ofthewebpage
• b2Timeinmillisecondsspentincell‘b2’ofthewebpage
• .
• . Similarlyfrom‘b3’to‘u3’
• .
• u4Timeinmillisecondsspentincell‘u4’ofthewebpage
• u5Timeinmillisecondsspentincell‘u5’ofthewebpage
• v0Timeinmillisecondsspentincell‘v0’ofthewebpage
• v1Timeinmillisecondsspentincell‘v1’ofthewebpage
• v2Timeinmillisecondsspentincell‘v2’ofthewebpage
• v3Timeinmillisecondsspentincell‘v3’ofthewebpage
• v4Timeinmillisecondsspentincell‘v4’ofthewebpage
• v5Timeinmillisecondsspentincell‘v5’ofthewebpage
• boughtTosavethecodeofthefinalproductboughtbytheuser
The‘alignData.php’fileselectsalltheresponsesstoredinthetables‘data’and‘bought’
andsave them in the table ‘finalData’.Theattribute ‘userID’ is theprimarykeyof the
table.Thealgorithmthatwasimplementedinthe‘alignData.php’filewas:
1. Selectalistofuniqueusersfromthetable‘data’
2. Foreachuserwithid‘userID’,do‐
a. Selectallthedata(cellIDsandassociatedtime)correspondingtothatuser
fromthetable ‘data’.UsethesumaggregatefunctionintheSQLontime
andgroupthembycellIDs.
b. Thiswill give the total timespentoneachvisitedcell, i.e. sectionof the
webpagevisitedbythatuser.
DataCollectionandPre‐processing
30
c. Timespentonallothercells,i.e.sectionsnotvisitedbythatuserismade
zero.
d. Insertall the timevalues foreachcell in the ‘finalData’ tablealongwith
theuser’suserID.
e. Selectthefinalproductboughtbytheuserusingaselectstatementonthe
table‘bought’.Incasetheuserhasnotboughtanyproduct,i.e.theoutput
from the ‘bought’ table for that user is empty, assign him a product
number0.
f. Update the ‘finalData’ table by inserting the value for the ‘bought’ field
correspondingtothatuser.
Aftersuccessfulexecutionofthisalgorithminthe‘alignData.php’script,the‘finalData’
tablecontainedallthedatacollectedfromtheinitialwebsiteinatabularmannerwith
eachrowcorrespondingtoauniqueuser.Thisdatacannowbeuseddirectlyformodel
buildinginWEKAbutitneedssomecleaning.
The‘data’tablehadatotalof16401tupleswith292uniqueuserswhereasthe‘bought’
tablehad244tuples.Afterexecutingtheabovescript,thetotalnumberoftuplesinthe
‘finalData’tablewas292.Outof292tuples,48(292minus244)userswerethosewho
leftthesitewithoutselectinganyproduct.Thistable‘finalData’wasthenexportedina
spreadsheetformat(MicrosoftExcel)foranalysis,visualizationandcleaning.
3.3.2.2 Datacleaning
Onof theobtained292rowsofdata inexcel, thenext task is thedatacleaningstage.
Thisstepistoremovealltheoutliersandothercasesthatcanharmthetrainingofthe
model andeventually canharm themodel.There canbemultiple reasonsbehind the
occurrences of such unwanted cases in the initial dataset such as, non serious
respondents, accidently entering the webpage and closing it immediately, accidently
DataCollectionandPre‐processing
31
pressing the enter key, leaving the computer with website on while working on
somethingelse,etc.
Thefollowingstepstocleanthecollecteddatawerefollowed:
• Allthetupleswherethevalueoftheattribute‘bought’ is0, i.e.theuserhasnot
boughtanyproductweredeleted.Thiswasbecausetheobjectiveoftheprojectis
to select the best product for a user and hence the training set should only
containuserswhohaveboughtaproduct.Trainingthemodelondatapredicting
thattheuserwouldnotbuywouldmakethemodelinappropriateforuseinthe
currentproject.
Therewereatotalof48suchtupleswheretheboughtproductvalueas0.
The number of tuples in the left data were 244 each corresponding to a
uniquevisitor.All244usershaveboughtaproduct(dependentvariable is
not0)
• Thetotaltimespentbyauserwascalculatedforalltheusersusingsimpleexcel
inbuiltsumfunction.Thedistributionof the total timespentbydifferentusers
onthebuiltwebpagewasstudied.
It was found that the average time spent by a user on the webpagewas
33.08 seconds. The maximum time spent by a user was 225.8 seconds
whereastheminimumwas1.2seconds.
• Theminimumandthemaximumtimespentbyanyuserwereanalyzedto find
theoutliers.Sincetheminimumtimeinthecurrentdataismuchlowerthanthe
expectedminimumtimeanyseriousvolunteerwouldspend,athresholdvalueof
8secondswasselected.Themaximumtimeof225.8secondswasfoundfeasible
andhencenoupperlimitwascalculated.
DataCollectionandPre‐processing
32
Thisvalueof8secondswasanalyzedasafeasiblevaluekeepinginmind
the webpage design. It was assumed that any user taking less than 8
secondsonthatwebpagehasgivenincorrectdataandwillbeconsidered
asanoutlier.Therewere44userswhospentlessthan8secondsonthe
initial website while giving training data for model building. Rows
associated with all 44 users were deleted from the collected sample
leavingthesamplesizeto200tuples.
The average time spent by a user became 40.26 seconds and the
minimumtimespentbyauserinthenewdatasetbecame8.3seconds.
3.3.2.3 Datanormalization
Thedata collected from thevolunteershave the132 time fields corresponding to the
time spent in 132 sections of thewebsite in absolute value. Itwas realized that data
normalizationwouldberequired.Thereasonbehindthiswasthatdifferentpeoplehave
spent different time on the webpage. The time spent depends upon their individual
browsingspeed,readingspeedandotherseveralpersonalattributes.Sincethedesired
model has to cater a general audience, time spent in one section relative to the time
spentintheothersectionswasthoughttobemoreappropriate.
Thereareseveraladvantagesofthisstep,primarilyalsothatthemodelnowwouldbe
capableofpredictinginrealtimeforauserwhoisinprocessofbrowsingthewebpage.
Wheneverthepredictionisneeded,thecurrenttimesspentinvarioussectionscouldbe
normalizedandfedintothemodel.Since,themodelnowwouldbeimmunetoabsolute
timevalue,witheverypredictionforthesameuser,themodelwouldnotbebiasedon
the time spentbyhimbutwoulddependonlyon the relative time spentondifferent
sections of thewebpage. Another advantage is that all the data used for training the
DataCollectionandPre‐processing
33
modelisnowequivalent.The200casesinthetrainingsetaremorecomparableanddo
notvaryonabsolutescale.Thisstepisexpectedtotrainthemodelbetter.
Implementation
Tocarryoutdatanormalization,thetotaltimespentbyauserwascalculatedinexcel
(alsodone indata cleaning step).Time spent in individual sectionof thewebpageby
thatuserwas thendividedby the total time spent on thewebpagebyhim.This step
gavethepercentagetimespentbytheuserineachsectionofthewebpage.
Thenewdatasetwith200tuplesand134attributes(132independentvariablesand1
dependent variable) with normalized time data was saved in the CSV format, which
couldbeimporteddirectlyintoWEKAformodelbuildingtask.Thenextchapterwould
explaintheprocedureofbuildingmachinelearningmodelsonWEKAusingthedataas
collectedinthischapter.
Buildingmachinelearningmodels
34
4 BUILDINGMACHINELEARNINGMODELS
Using the collecteddata, variousmachinelearningmodelswere built and
tested.Thischapterexplainsthecompletemethodologyfollowedalongwith
the details of the models obtained. It later explains the best models that
wereselectedandtherationalebehindthem.
4.1 MachineLearning
According toWikipedia1 “MachineLearning is a scientificdiscipline that is concerned
withthedesignanddevelopmentofalgorithmsthatallowcomputerstolearnbasedon
data.Suchasfromsensordataordatabases.”Itcanbedefinedasasetofalgorithmsto
automatically learn and recognize complex patterns and are capable of making
intelligentdecisionsbasedondata.
There are several softwares available that could be used to build and implement
machine‐learningmodels.MATLABandWEKAaretwocommonlyusedsoftwares.The
modelsusedintheprojectwerebuiltusingWEKA.
1Wikipedia,MachineLearning‐Wikipedia,http://en.wikipedia.org/wiki/Machine_learning.
Buildingmachinelearningmodels
35
4.1.1 WEKA
Weka1isopensourcedataminingsoftwarewritteninJava.Itisprimarilyacollectionof
various machine‐learning algorithms that could be applied directly and easily on
differenttypesofdata.Ithasabuilt‐ininterfacetovisualizethedataandcanperform
tasks like attribute selection, clustering etc. It is available under General Public GPU
Licenseandcanbedownloadedfromitswebsite.
4.1.2 WhyMachineLearning?
The primary objective of the project is to automatically learn the user’s mouse
movementbehaviorfromthecollectedtrainingdata.Machinelearningasstatedabove
isabranchofsciencethatdealswithalgorithmsthatarecapableof learningpatterns.
Thisexactlyfitstheprimaryrequirement.
Theprojectfurtherdemandscapabilitytopredictfurthercontentforanewuserbased
onhismousemovements.Machine learningalgorithmsonce trainedona large setof
data are then capable of predicting the value of the dependent variable for any new
case.Moreover,machine‐learningalgorithmscanbetrainedagainandagainwithnew
data.Thecompleteobjectiveoftheprojectcaneasilybecateredusingmachine‐learning
algorithms.
4.2 Methodsevaluated
Inmachinelearning,inordertoclassify/predictforanynewcase,amodelisfirstmade
andtrainedontrainingdata.Therecanbeanumberofdifferent typesofmodels that
1TheUniversityofWaikato,Weka3:DataMiningSoftwareinJava,
http://www.cs.waikato.ac.nz/ml/weka/.
Buildingmachinelearningmodels
36
canbebuiltandfurtheralotofdifferentalgorithmstobuiltamodel.Differenttypesof
machine learningmodelsgenerallyusedareDecisionTrees,NeuralNetworks,Genetic
Algorithms,FuzzyNetworksetc.TokeepthescopeofthisprojectinmindonlyDecision
TreesandNeuralNetworksbasedmodelswereevaluated.Thedatawasmodeledusing
both themethods using J48Classification algorithm for decision trees andmultilayer
perceptrons forneuralnetwork.The twomodelswere laterevaluatedon the training
data.
4.2.1 DecisionTree
A decision tree can be defined as a decision support classifier that uses a tree like
structureof conditions and theirpossible consequences.Eachnodeof adecision tree
canbealeafnodeoradecisionnodewhere‐
• Leafnode–Thesenodementionsthevalueofthedependent(target)variable
• Decisionnode–Thesenodescontainoneconditioneachspecifyingsometeston
a single attribute‐value. The outcome of the condition is further divided into
brancheswithsub‐treesorleafnodes.
Theattributethatistobepredictedisknownasthedependentvariable,sinceitsvalue
depends upon, or is decided by, the values of all the other attributes. The other
attributes,whichhelp inpredictingthevalueofthedependentvariable,areknownas
theindependentvariablesinthedataset.
4.2.2 NeuralNetwork
“An Artificial Neural Network is an interconnected assembly of simple processing
elements,unitsornodes(neurons),whosefunctionalityisinspiredbythefunctioningof
thenaturalneuronfrombrain.Theprocessingabilityoftheneuralnetworkisstoredin
Buildingmachinelearningmodels
37
theinter‐unitconnectionstrengths,orweights,obtainedbyaprocessoflearningfroma
setoftrainingpatterns.”1
4.3 Implementedalgorithms
ThereareseveralalgorithmsfordecisiontreescommonlyusednowdaysnamelyID3,
C4.5,C5.0etc.Aftercarefulevaluationofthesethreealgorithms,C4.5waschosenforthe
project.ThereasonbehindchoosingC4.5overID3andC5.0were:
• C4.5handles continuous variables in a betterwayby creating a threshold and
then splitting the list on that value. Since all the attributes in the required
decisiontreearecontinuouswhereasthetargetvariablehasfivediscretevalues,
C4.5wasused.
• C4.5hasacapabilitytoprunetrees.Pruningisamethodofgoingbackwardsina
tree to remove any branches that do not help in further classifications and
replacethembyleafnodes.
• C5.0isgenerallyrankedaboveC4.5becauseofitshigherspeedofbuildingatree
andlowmemoryrequirements.Sincethescopeoftheprojectdemandednoneof
these features, therewas no significant advantagewith C5.0. Also C5.0 can be
used to weighting attributes, which wasn’t required in the problem under
consideration.
Similarly, Neural networks can be implemented in one of the various availableways
namely‐ Feedforward neural network, Radial basis function network, Kohonen self‐
organizing network, Recurrent network, Stochastic neural networks, Modular neural
1KevinNGurney,Anintroductiontoneuralnetworks,illustrated(CRCPress,1997).
Buildingmachinelearningmodels
38
networks,Holographicassociativememoryetc.Theneuralnetworkimplementedinthe
projectwasafeedforwardneuralnetworkwithnon‐linearactivationfunction.
4.3.1 DecisionTree(C4.5)
WEKAimplementsDecisiontreeC4.5algorithmusing‘J48Decisiontreeclassifier’.The
explanationoftheC4.5algorithmaswellastheJ48implementationisasfollows:
• Whenever a set of items (training set) is encountered, the algorithm identifies
theattributethatdiscriminatesthevarious instancesmostclearly.This isdone
usingthestandardequationofinformationgain
• Amongthepossiblevaluesofthisfeature,ifthereisanyvalueforwhichthereis
noambiguity,thatis,forwhichthedatainstancesfallingwithinitscategoryhave
the same value for the target variable, then that branch is terminated and the
obtainedtargetvalueisassignedtoit.
• For all other cases, another attributes are looked that gives the highest
informationgain.
• Thisiscontinuedinthesamemanneruntileitheracleardecisionofthevalueof
the target variable is reached with a combination of conditions on various
independentvariables/attributes,orwerunoutofattributes.
• Intheeventofrunningoutofattributes,orgettinganambiguousresultfromthe
available information,thebranchisassignedatargetvaluethatthemajorityof
theitemsunderthisbranchpossess.
ThenameoftheclassifierinWEKAthatfollowstheabovementionedC4.5algorithmis
‘weka.classifiers.trees.J48’
Buildingmachinelearningmodels
39
4.3.2 NeuralNetwork(MultilayerPerceptron)
Multilayer perceptrons is a feedforward neural network based classifier that uses
backpropogationtoclassifyinstances.Allthenodesinthisnetworkaresigmoids,which
meansthattheactivationfunctionisasigmoid.
In a multilayer perceptron, there is an input layer with a node each for all the
independentvariables,at leastonehiddenlayerandanoutputlayerwithanodeeach
for different classes of the target variable. The network is trainedby initial data that
determinestheappropriateweightsforconnectionsbetweenallthenodesofadjacent
layersandalsodeterminesthebias/thresholdvalueofeachnode.
ThenameoftheclassifierinWEKAis‘weka.classifiers.functions.MultilayerPerceptron’
4.4 Modelbuilding
WEKAwasopenedinExplorermodeandthesavedCSVfilewasopenedusingtheopen
file button in the preprocess tab of WEKA. From the attributes pane, the attribute
userID was deleted. This is because this field is irrelevant in the process of model
building. The filewas then saved in Attribute‐Relation File Format (ARFF) simply by
clickingthesavebutton.ThesavedARFFfilewasopenedinatexteditortochangethe
properties of predicted variable, i.e. attribute ‘bought’ fromnumber to nominal scale.
Thisisessentialstepbecausethe‘bought’variablehasonlyfivediscretevalueseachfor
eachproduct.ThiswillalsoenabletheuseofJ48treeclassifier,asthenominaldatafor
thepredictedvariable is a requirement.To convert ‘bought’ fromnumber tonominal
mode, the property ‘numeric’ was changed to ‘{1,2,3,4,5}’, where 1,2,3,4,5 were the
codes for the five laptopproducts.Theoutputexpected fromthemodels isoneof the
fivelaptopcodes.Filewassavedandclosed.
Buildingmachinelearningmodels
40
4.4.1 DecisionTree
The saved ARFF was then re‐opened in WEKA and under the classify tab, J48 tree
classifierwas chosen.There aredifferentparametersof J48 tree classifier likebinary
splits,numberof folds,pruningetc.Using trialanderrormethod,variousparameters
were changed and each model was tested for accuracy on the training data. Models
were tested using two methodologies namely testing directly on training data and
testingusingcrossvalidation.Thesetofparametersgivingthemaximumpercentageof
correctlyclassifiedinstanceswerechosen.Thefinalmodelgivingmaximumaccuracyon
thetrainingdatasetwasalsosavedforlateruse.
4.4.1.1 Detailsofthechosendecisiontree
Thefinalparametersselectedthatgavethebestoutputontrainingdataare‐
• binarySplits:ByWEKAdefinitionofthisparameter,itisconsideredfornominal
variables only. Since the dataset under consideration had no nominal
independentvariable,thevalueofthisattributehadnoimpactonthebuilttree.
• confidenceFactor:Thisattributedefinestheconfidencefactorusedforpruning.
Itwasfoundthataconfidencefactorvalueof0.75,agoodaccuracydecisiontree
wasobtainedwhenC4.5pruningwasused.
• debug:Thisparameterisonlyusedtooutputsomeadditionalinformationatthe
console.Itsvalueofeithertrueorfalsedidn’timpactthefinalmodel.
• minNumObj: Thisdetermines theminimumnumberof instances at every leaf
node.Thisattributewassettoavalueof‘2’.
• numFolds: This parameter determines the amount of data used for reduced‐
errorpruning.Inthedecisiontreebuilt,numFoldswaskeptat ‘11’.Thiswould
meanthatonefoldwasusedforpruning,andrestforgrowingthetree.
Buildingmachinelearningmodels
41
• reducedErrorPruning: This was set to ‘False’ as it signifies if reduced‐error
pruningshouldbeusedinsteadofC.4.5pruning.
• saveInstanceData:Thisattributeisjusttosavetheinstanceforvisualizationin
future
• seed:Theseeddetermines thenumberof seeds tobeusedwhile randomizing
thedatawhenreduced‐errorpruningistobeused.Sincereduced‐error‐pruning
wasnotused,seedparameterhadnorelevance.
• subtreeRaising: Subtree raisingwhile pruning is always advisablewhen used
with a high confidence factor. Since a confidence factor of 0.75was used, this
parameterwassetas‘true’.
• unpruned:Sincewewantedpruningtohappen,the ‘unpruned’parameterwas
setto‘false’.
• useLaplace:Thisparameterdeterminesifcountsatleavesaresmoothedbased
onLaplace.Theparameterhadnoinfluenceonthemodeloutput.
Alltheparametersusedinthefinaldecisiontreecanbesummarizedas‐
Buildingmachinelearningmodels
42
Figure8:ParametersusedforbuildingtheDecisionTreemodel
TheoutputfromWEKAisasfollow:
===Runinformation===Scheme:weka.classifiers.trees.J48‐L‐C0.75‐M2‐ARelation: MLData_Normalized‐weka.filters.unsupervised.attribute.Remove‐R1Instances:200Attributes:133[listofattributesomitted]
Buildingmachinelearningmodels
43
Testmode:evaluateontrainingdata===Classifiermodel(fulltrainingset)===J48prunedtree‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐b5<=0.04509|k4<=0.013828||v1<=0.000362|||r0<=0.000626||||d5<=0.003481|||||d5<=0.001586||||||g4<=0.033267|||||||s3<=0.004874||||||||u1<=0.002108|||||||||f1<=0.039667||||||||||f4<=0.028894|||||||||||i4<=0.004699||||||||||||d2<=0.001173|||||||||||||e5<=0.001377||||||||||||||e1<=0.029566|||||||||||||||r3<=0.000861||||||||||||||||c1<=0.043665|||||||||||||||||a3<=0.206815||||||||||||||||||b1<=0.007319|||||||||||||||||||f3<=0.001471||||||||||||||||||||b4<=0.00214:2(11.0/1.0)||||||||||||||||||||b4>0.00214|||||||||||||||||||||a4<=0.004126:3(3.0)|||||||||||||||||||||a4>0.004126:2(2.0)|||||||||||||||||||f3>0.001471:3(3.0)||||||||||||||||||b1>0.007319|||||||||||||||||||b3<=0.123969:2(12.0/2.0)|||||||||||||||||||b3>0.123969:1(2.0/1.0)|||||||||||||||||a3>0.206815:1(2.0/1.0)||||||||||||||||c1>0.043665:1(3.0/1.0)|||||||||||||||r3>0.000861:3(3.0/1.0)||||||||||||||e1>0.029566:1(3.0)|||||||||||||e5>0.001377:3(2.0)||||||||||||d2>0.001173|||||||||||||s4<=0.002873:2(32.0/1.0)|||||||||||||s4>0.002873:4(2.0)|||||||||||i4>0.004699:1(3.0/1.0)||||||||||f4>0.028894:4(2.0)|||||||||f1>0.039667:3(3.0)||||||||u1>0.002108:3(6.0/1.0)
Buildingmachinelearningmodels
44
|||||||s3>0.004874||||||||q1<=0.004708|||||||||r4<=0.007391:3(16.0)|||||||||r4>0.007391:2(2.0)||||||||q1>0.004708:2(2.0/1.0)||||||g4>0.033267|||||||g5<=0.004141||||||||k4<=0.001354:4(8.0)||||||||k4>0.001354:3(3.0/1.0)|||||||g5>0.004141:2(3.0/1.0)|||||d5>0.001586:4(4.0)||||d5>0.003481|||||g5<=0.004141||||||b5<=0.002996|||||||g4<=0.003922:2(4.0)|||||||g4>0.003922:1(2.0)||||||b5>0.002996:3(2.0)|||||g5>0.004141:5(3.0)|||r0>0.000626:4(3.0/1.0)||v1>0.000362|||s4<=0.005561||||t4<=0.002371|||||e0<=0.001979||||||h2<=0.005305:1(18.0/1.0)||||||h2>0.005305:2(2.0)|||||e0>0.001979:2(2.0)||||t4>0.002371:2(2.0/1.0)|||s4>0.005561:2(2.0/1.0)|k4>0.013828||f5<=0.001805:4(9.0/1.0)||f5>0.001805:2(2.0/1.0)b5>0.04509|t3<=0.000515||d4<=0.008991|||e2<=0.011901||||a1<=0.001341|||||g2<=0.001762:4(3.0/1.0)|||||g2>0.001762:5(2.0)||||a1>0.001341:5(4.0)|||e2>0.011901:4(3.0)||d4>0.008991:2(2.0/1.0)|t3>0.000515:3(3.0)NumberofLeaves: 42Sizeofthetree: 83
Buildingmachinelearningmodels
45
Timetakentobuildmodel:0.75seconds
4.4.1.2 Testingthedecisiontree
Themodelwastestedusingtwodifferentmethodologiesnamely,testingdirectlyonthe
trainingdatasetandtestingusingcross‐validationwith10folds.
Testingonthetrainingdatagavearesultof89.5%accuracywhereastestingusingcross
validationgaveanaccuracyof66%.Thecompleteresultalongwiththediscussionisas
follows:
4.4.1.2.1 TestingonTrainingData
===Evaluationontrainingset======Summary===CorrectlyClassifiedInstances 179 89.5%IncorrectlyClassifiedInstances 2110.5%Kappastatistic 0.8586Meanabsoluteerror 0.1650Rootmeansquarederror 0.2382Relativeabsoluteerror 54.9013%Rootrelativesquarederror 61.5103%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision Recall F‐MeasureROCArea Class 0.848 0.030 0.848 0.848 0.848 0.953 1 0.959 0.0790.8750.9590.915 0.969 2 0.932 0.0190.932 0.9320.932 0.994 3 0.795 0.0190.9120.7950.849 0.987 4 0.8180.0001.0000.8180.900 0.999 5WeightedAvg.0.895 0.0420.8970.8950.894 0.977===ConfusionMatrix=== a b c d e <‐‐classifiedas 28 4 0 1 0 |a=11 70 1 1 0 |b=2
Buildingmachinelearningmodels
46
1 2 41 0 0 |c=33 3 2 310 |d=40 1 0 1 9 |e=5
4.4.1.2.2 TestingbyCross‐Validation(folds10)
===Stratifiedcross‐validation======Summary===CorrectlyClassifiedInstances 132 66%IncorrectlyClassifiedInstances 6834%Kappastatistic 0.5303Meanabsoluteerror 0.2133Rootmeansquarederror 0.308Relativeabsoluteerror 70.9833%Rootrelativesquarederror 79.5133%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision Recall F‐MeasureROCAreaClass 0.545 0.072 0.6000.5450.571 0.865 1 0.890 0.315 0.6190.8900.730 0.871 2 0.500 0.000 1.0000.5000.667 0.874 3 0.538 0.081 0.6180.5380.575 0.833 4 0.545 0.016 0.6670.5450.600 0.983 5WeightedAvg.0.66 0.143 0.7020.660.653 0.869===ConfusionMatrix=== a b c d e <‐‐classifiedas 18 14 0 0 1 |a=1 7 65 0 1 0 |b=2 1 13 22 7 1 |c=3 4 13 0 21 1 |d=4 0 0 0 5 6 |e=5
4.4.1.2.3 Discussion
Testingdirectlyonthetrainingdataclassified179casescorrectlyoutof200,whichis
an accuracy of 89.5%.Accuracywhile testing on training data is always desired very
highbecauseitsignifiestheextenttowhichthemodelhaslearntthetrainingdata.Since
Buildingmachinelearningmodels
47
therewere5 classes in the target variable (5 products), any accuracy inmodelmore
than20%(equalprobabilityofeachclassis1/5=0.2=20%)hastobeconsideredgood.
Accuracy of 89.5% iswellwithin the error range and signifies that the built decision
treehaslearntthetrainingdataquiteaccurately.
Testingusing cross‐validation is a process of dividing thedata intodifferent sub sets
andthencarryingouttheanalysisononesubsetandtestingitonother.Doingthiswith
10foldsistheprocessofcarryingoutcross‐validation10timesandaveragingoutthe
accuracy score. Again, as stated above any accuracy of more than 20% is good. The
achievedresultofanaverageof132correctclassificationsoutof200withanaccuracy
of66%iswellwithinthedesiredrange.
Ideally,themodelshouldhavebeentrainedonmoredata.Duetothelimitationoftime,
andno compensation available to volunteers, only200 tuples of useful data couldbe
collected.Itisexpectedthatwiththebiggertrainingdataset,theaccuracyofthemodels
wouldincrease.
4.4.2 NeuralNetwork
The saved ARFF file was re‐opened in WEKA and under the classify tab,
MultilaterPerceptron functionwas chosen. There are different parameters associated
with this neural network function and as done with decision trees, trial and error
methodwasusedtofindthebestset.Thebestsetofparameterswastheonethatgave
maximumaccuracyof classificationon the trainingdataset.Eachobtainedmodelwas
tested using two methodologies namely testing directly on training data and testing
usingcrossvalidation.Aftermultiple iterationsusing trail anderrormethod,amodel
givingagoodaccuracyofclassificationwasobtained.Themodelwasalsosavedforlater
use.
Buildingmachinelearningmodels
48
4.4.2.1 Detailsofthechosenneuralnetwork
Thefinalparametersselectedthatgavethebestoutputontrainingdataare‐
• GUI:TheGUIparameterbringsupaninterface.Itdoesn’treallyimpactthefinal
model, unless some changes in the learning rate and momentum are desired
whiletraining.Itwassetas‘False’intheproject.
• autoBuild:AnANNwasbuiltautomaticallyandhencethisparameterwassetto
‘true’
• debug:Thisistoviewadditionalinformationontheconsole.
• decay:Itwasobservedthatthe‘true’decayvaluegaveslightlylessaccuracyand
henceinthefinalmodel,‘decay’wassetto‘false’
• hiddenLayers:Sinceanautomaticneuralnetworkwasdesired,theWEKAwas
lefttodecidethenumberofhiddenlayersandhencethefinalsetofparameters
had a value of ‘a’ in the field of hiddenLayers. ‘a’ when used as a value for
hiddenLayersmean‘automatic’.
• learningRate:Theamountatwhich theweights shouldbeupdatedwasset to
0.1
• momentum:Momentumof0.2wasappliedtotheweightsduringupdating.
• nominalToBinaryFilter:Therewerenonominalvariablesinthedataandhence
thisparameterhadnoimpactonthemodel
• normalizeNumericClass:Sincetheclassisnotnumericbutalreadynormalized,
therewasnouseofusingthisfeatureandhenceitwassetto‘false’
• reset:Whentheresetwassettofalse,noerrormessagewasreceived.Moreover
thesetlearningrateof0.1isalreadyquitelowandhencethisfeaturewassetas
‘false’
• seed:Seedvalueof0wasused.Asincaseofdecisiontrees,thisvalueisusedto
initialize the randomnumbergenerator.Randomnumbersareused for setting
Buildingmachinelearningmodels
49
the initialweightsof theconnectionsbetweennodes,andalso forshuffling the
trainingdata.
• trainingTime:Thenumberofepochstotrainthroughwassetto5000.
• validationSetSize:Thepercentagesizeofthevalidationsetwasmade0which
signifiesthatnovalidationsetwillbeusedandinsteadthenetworkwilltrainfor
thespecifiednumberofepochs,i.e.for5000epochs
• validationThreshold:Thisparameterwassetto20whichdictatesthat20times
inarowthevalidationseterrorcangetworsebeforetrainingisterminated.
Theparametersusedinthefinalneuralnetworkmodelcanbesummarizedas:
Buildingmachinelearningmodels
50
Figure9:ParametersusedforbuildingtheNeuralNetworkmodel
Buildingmachinelearningmodels
51
Itwas found impossible to include the completemodel output in this document, andhencethesummaryofthemodelobtainedisasfollows‐===Runinformation===Scheme:weka.classifiers.functions.MultilayerPerceptron‐L0.1‐M0.2‐N5000‐V0‐S0‐E20‐Ha‐RRelation:MLData_Normalized‐weka.filters.unsupervised.attribute.Remove‐R1Instances:200Attributes:133[listofattributesomitted]Testmode:10‐foldcross‐validation===Classifiermodel(fulltrainingset)===
The chosenneuralnetworkhad1hidden layerwith68nodes.Therewere132 input
nodes accepting 132 normalized time values corresponding to each section of the
webpage.Themodelhad5outputnodeseachforoneofthefivelaptops.
Therewereatotalof73thresholdvaluesfor73nodes(68hiddenlayernodes+5output
nodes)andtherewere9316weightvalues(132*68+68*5)
4.4.2.2 Testingtheneuralnetworkmodel
The neural network model was also tested similarly as decision trees using two
different methodologies namely, tested directly on training set and using cross‐
validationwith10folds.
Itwasfoundthattestingontrainingdatasetgaveanexceptionallygoodresultof95.0%
whereas testing using cross validationwith 10 folds gave a classification accuracy of
41.0%
Buildingmachinelearningmodels
52
4.4.2.2.1 TestingonTrainingData
===Evaluationontrainingset======Summary===CorrectlyClassifiedInstances 190 95%IncorrectlyClassifiedInstances 10 5%Kappastatistic 0.9335Meanabsoluteerror 0.0219Rootmeansquarederror 0.1313Relativeabsoluteerror 7.2772%Rootrelativesquarederror 33.8899%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRate FPRate Precision RecallF‐MeasureROCArea Class 0.939 0.012 0.939 0.9390.9390.966 1 0.918 0.024 0.957 0.9180.937 0.936 2 1 0.026 0.917 1 0.9570.993 3 0.949 0.006 0.974 0.9490.9610.957 4 1 0 1 1 1 1 5WeightedAvg. 0.95 0.017 0.951 0.950.95 0.961===ConfusionMatrix===a b c d e <‐‐classifiedas 312 0 0 0 |a=12 673 1 0 |b=20 0 440 0 |c=30 1 1 370 |d=4 0 0 0 0 11 |e=5
4.4.2.2.2 TestingbyCross‐Validation(folds10)
===Stratifiedcross‐validation======Summary===CorrectlyClassifiedInstances 82 41%IncorrectlyClassifiedInstances 11859%Kappastatistic 0.2165Meanabsoluteerror 0.236Rootmeansquarederror 0.4551Relativeabsoluteerror 78.4778%
Buildingmachinelearningmodels
53
Rootrelativesquarederror 117.4608%TotalNumberofInstances 200===DetailedAccuracyByClass=== TPRateFPRatePrecision Recall F‐MeasureROCArea Class 0.3330.12 0.355 0.3330.344 0.614 1 0.5750.22 0.6 0.5750.587 0.706 2 0.2950.2370.26 0.2950.277 0.626 3 0.2820.1550.306 0.2820.293 0.652 4 0.455 0.0420.385 0.4550.417 0.856 5WeightedAvg.0.41 0.1850.415 0.41 0.412 0.671===ConfusionMatrix=== a b c d e <‐‐classifiedas 118 5 8 1 |a=1 9 42 192 1 |b=2 6 12 13 112 |c=3 5 7 12 114 |d=4 0 1 1 4 5 |e=5
4.4.2.2.3 Discussion
Testing on the training data classified 190 cases correctly out of 200, which is an
accuracyof95.0%.Suchahighvalueofclassificationaccuracyclearlysignifiesthatthe
builtneuralnetworkmodelhaslearntthetrainingdatawithhighaccuracy.
Testingusing cross‐validation is a process of dividing thedata intodifferent sub sets
andthencarryingouttheanalysisononesubsetandtestingitonother.Doingthiswith
10foldsistheprocessofcarryingoutcross‐validation10timesandaveragingoutthe
accuracy score. The achieved result of an average of 82 correct classifications out of
200,i.e.anaccuracyof41.0%iscomparativelylowbutiswellwithinthedesiredrange.
Ideally,themodelshouldhavebeentrainedonmoredata.Duetothelimitationoftime,
andno compensation available to volunteers, only200 tuples of useful data couldbe
Buildingmachinelearningmodels
54
collected.Sincethereisahiddenlaterwith68nodesandatotalof9316weightvalues
areinvolved,amuchbiggertrainingdatasetwasrequired.It isexpectedthatwiththe
biggertrainingdataset,theaccuracyoftestingwouldincrease.
4.4.3 DecisionTreeVsNeuralNetworks
Basedontheinitial200datacases,onemodeleachofdecisiontreeandneuralnetwork
wastrained.Upontestingonthetrainingdataset,decisiontreeshowedslightlybetter
accuracy as compared to the neural network model. The other factors worth
consideringaboutthetwomodelsare:
• Building a neural network model is easy but time consuming in WEKA but
moreover, it slows down the performance of the website after its
implementation.Theobjectiveoftheprojectistodeterminetheproductforthe
users in real‐time while they are still browsing and it will require very fast
computation.Decisiontreesareasetofconditions,whichcanbeevaluatedmuch
efficiently than the calculations and temporary variables required in neural
networks. However, if a parallel web server is used which is capable of
performing calculations faster, a neural network could also be considered for
implementation.
• With time the website would keep accumulating more and more mouse
movement data and the model should be improved / trained on new data
whenever required. Thiswould require re‐implementing the newmodel every
timetheupdatingisdesired.Asstatedabovethiswouldbemoredifficult, time
consuminganderrorproneinneuralnetworksascomparedtodecisiontrees.
• Decision trees are more transparent as compared to neural network models.
Thismeanthatforapersonvisuallyseeingthetwomodels,adecisiontreecould
give him some information where as a neural networks can visually tell him
Buildingmachinelearningmodels
55
nothing.Thiswashowevernotoneofthepointsconsideredbeforetakingafinal
callonthemodeltobechosen.
Despiteallthesepoints, finalmodelsofbothneuralnetworksanddecisiontreeswere
implemented in twosimilarcopiesof thesamewebsite.Further testsofaccuracyand
performancewereconductedlaterinordertoconcludeabettermodelfortheproblem
inhand.
Thenextchapterwillexplainthestepsrequiredtoputthesemodelsintothewebsiteso
thattheycanbeusedinrealtimeforauserforpredictingrelevantcontentforhim.
Embeddingthemachinelearningmodelsinthewebsite
56
5 EMBEDDINGTHEMACHINELEARNINGMODELSINTHEWEBSITE
Thischapterexplainsthecompletemethodologyadoptedtoapplythebuilt
machine learning models in the website. It also explains the interaction
betweenthemodelandthewebsiteandhowauser’smousemovementdata
wasusedtopredictthebestcontentforhiminrealtime.
5.1 WhatandWhy?
As explained in the previous chapter, a decision tree and a neural network model
capableofpredicting theproduct theuser ismost likely tobuyweremodeled.These
models needs to be implemented in thewebsite so that they can take furthermouse
movement behavior of new users as input and can predict for him the appropriate
product.
5.2 Specifications
The initialwebsitebuiltasexplained inChapter3 forcollecting the trainingdatawas
modified to implement thedecision treeandneuralnetworkmodels.Someadditional
characteristicsrequiredfromthewebsitewere:
Embeddingthemachinelearningmodelsinthewebsite
57
• Themodel should resideon the server.This is essential fromsecuritypointof
viewelseanyuserwouldhaveaccesstothemodelwhichbyreverseengineering
cangiveinformationabouttheproductsboughtbyotherusers.
• Realtimemodelevaluationontherealtimemousemovementdata.
• Realtimetransferofmodeloutputfromthewebservertothefrontendwebsite
sothatthewebsitecanusethemodelprediction.
• Determining the product the user is most likely to buy using the embedded
modelsperiodicallyaftereverysay10seconds.Thiswouldinvolveincludingthe
latestmousemovementdataandtransferringtheoutputagaintothefrontend
HTMLwebsite so that if any change ispredicted in the finalproduct, it canbe
reflectedonthefrontend.
• Allthetrackingandmodelevaluationwascarriedoutinahiddenlayerandthe
user was not asked for any explicit information or was not compromised on
speedandperformance.
• Not tomention, thewebsiteshouldcontinue to trackmousemovementaswas
explainedinearlierchapters.
5.3 Implementation
The website built initially to collect training data had mouse movement tracking
capability.Afewnewfunctionsandscriptswereaddedtoenablethemodelevaluation
onthecapturedmousemovements.
AnewJavaScriptfunctionnamed‘predict()’wasprogrammedintheJavaScriptfile.The
‘predict()’wasarecursive functionthatwasmadetocall itselfevery10seconds.This
wasbecause, itwas expected that theprediction ismade every10 secondsusing the
machine learning model. Every subsequent 10 seconds, the database would contain
Embeddingthemachinelearningmodelsinthewebsite
58
more mouse movement data that could be used by the machine learning models to
ideallypredictmoreaccurately.
The ‘predict()’ function takesnoargumentsandcallsaPHPscriptnamed ‘predict.php’
passingittheuserIDofthecurrentuserviaGETmethod.‘predict.php’fileresidesonthe
serverandthecallingfromJavaScriptwasprogrammedusingstandardAJAXprotocols.
ThecodesnippetoftheJavaScript‘predict()’functionis:
The‘predict.php’fileconnectstotheMySQLdatabasesandselectsthemousemovement
dataforthecurrentuserusingasimpleSQL‘SELECT*…’statement.Mousemovement
datawas saved into 132 temporary variables that correspond to each section of the
webpage. The total time spent by the user till now,was also calculatedwhile saving
thesetemporaryvariables.Theabsolutetimevaluespentineachsectionassavedinthe
132temporaryvariableswasthenreplacedbythenormalizedtimespentinthatsection
bydividedtheabsolutetimevaluebythetotaltimespentbythatuser.
Hence,after thisstep the132 temporaryvariables in ‘predict.php’ filewill contain the
normalizedtime/relativetimespentbytheuserincorresponding132sectionsofthe
webpage.These132 temporaryvariablesare the132 independent inputvariables for
themodel.
function autoPredict() { setTimeout("predict()",10000); } function predict() { http.open("GET", "predict.php?userId="+userId, true); http.onreadystatechange = predictResponse; http.send(null); }
Embeddingthemachinelearningmodelsinthewebsite
59
The twomodels (decision treeandneuralnetwork)were than codedandweregiven
accesstothese132temporaryvariablessothattheycanevaluatethenormalizedtime
andcanmaketheirrespectivepredictions.Itshouldhoweverbenotedthatfortesting,
only one of the models was used. Both the models were tested separately later for
comparisonpurpose.Theimplementationofthetwomodelsisasfollows:
5.3.1 ImplementingtheDecisionTreemodel
Afunctionnamed‘decisionTree()’wascodedinthePHPfile ‘predict.php’.Thisfunction
hadaccesstoallthe132inputvariablesasstatedabove.
Themodelmade inWEKAhadasetof83 if‐elsestatements (83being thesizeof the
tree).All these83 if‐else statements from theWEKAmodel alongwith theprediction
valuewere coded inPHP.The if‐else statementsweredoing comparisonson the132
independentvariablessoastoimitatethedecisiontree.Theoutputofthissetofif‐else
statementwasonevaluethatisalsotheoutputofthedecisiontreemodel.Thisoutputis
theproducttheuserismostlikelytobuyaccordingtotheimplementeddecisiontree.
Thisvaluewasreturnedtothemainprogrambythefunction.Thecompletecodeofthe
function‘decisionTree()’andthe‘predict.php’fileisavailableinappendix
5.3.2 ImplementingtheNeuralNetworkmodel
Another function named ‘neuralNetwork()’ was implemented. This function also had
accesstothe132independentinputvariablesasstatedabove.
TheneuralnetworkbuiltinWEKAhadonehiddenlayerwith68nodes.Toimplement
this hidden layer, 68 new temporary variables named ‘Node5’, ‘Node6’, ‘Node7’, …..,
‘Node72’werecreatedwithvaluecomputedbasedonstandardneuralnetworkformula.
Embeddingthemachinelearningmodelsinthewebsite
60
All the coefficientvaluesaswell as the threshold limitswereusedasgivenbyWEKA
whilemodelbuilding.
Toimplementtheoutputlayerthesameformulawasusedbasedonthesetemporary68
variables(68hiddenlayernodes,i.e.Node5,Node6….Node72).Theoutputlayerofthe
neural network model had 5 nodes corresponded to the five laptop products. The
productcorrespondingtothenodewithhighestvaluewaspredictedasthelaptopthe
currentuserismostlikelytobuy.
5.4 Usingmodeloutputs
Asstatedabove,onlyoneofthetwomodelswasusedatatimeforagivenuser.After
receivingtheoutputfromtheusedmodels(decisiontreeorneuralnetwork),theoutput
wassentbacktothefrontendJavaScriptfunctionnamed‘predictResponse()’viaAJAX.It
shouldbenotedthatmodeloutputwasthecodeofoneofthe5laptopsthatthecurrent
userismostlikelytobuy.
The ‘predictResponse()’ JavaScript function after receiving the prediction can now be
programmed as per the needs. In the current project, the author decided to simply
highlighttheborderofthepredictedlaptopinredcolor.Thepredictedlaptopistheone
theuser ismost likely tobuy thathasbeenpredictedbyoneof themachine‐learning
model based on the user’smousemovement behavior. The function definition of the
‘predictResponse()’functionisasfollows:
Embeddingthemachinelearningmodelsinthewebsite
61
The function above gets the response from the PHP script via standard AJAX
http.responseTextfunction.Theoutputwasthenusedtosimplechangethestyleof
the column containing that product. The style of all other columns is first reset
before changing the predicted laptop column style. The JavaScript ‘predict()’
functionhasbeencalledevery10,000milliseconds.Inthecurrentdemonstration,a
popupwasalsoshowntotheuserwiththecodeofthe laptophe ismost likelyto
buy.Thiswasdoneusingthealertstatement.
Therecanbeseveralotherusagesoftheprediction.Itcanbeimaginedthatacustomer
wouldbeservedmoreeasilyandappropriatelyiftheshopkeeperknowstheproductthe
customerismostlikelytobuy.Thecustomercouldbegivenotheroptionssimilartothe
product thatwas predicted. If not used by the content generator of thewebsite, this
predictioncanalwaysbeusedbythevisitorsinfindinginformationhehasbeenlooking
for. The screenshot of the prediction made by the Decision Tree model is shown in
Figure10
function predictResponse() { if (http.readyState == 4) { predictProduct = http.responseText; var colName=Number(predictProduct)+1; document.getElementById("cg2").className=""; document.getElementById("cg3").className=""; document.getElementById("cg4").className=""; document.getElementById("cg5").className=""; document.getElementById("cg6").className=""; document.getElementById("cg"+colName).className="oce-predict"; alert("Product : "+predictProduct);
setTimeout("predict()",10000); } }
Embeddingthemachinelearningmodelsinthewebsite
62
Figure10:Screenshotofthepredictiondonebythemodel
5.5 Whatnext
Oncethewebsitewasprogrammedandthemachinelearningmodelswereembedded,it
was again made public and the users were invited to visit it again. All the mouse
movementdatawassavedinthedatabasesasdesignedearlieralongwiththeproduct
the user buys. The userswere also shown the real‐time prediction as per themodel
after every 10 seconds. The prediction done by the model was not saved in any
databasesbecauseoffollowingreasons:
• Connectingthe‘predict.php’filewiththedatabasesandsavingdatawillcertainly
take time. This time used up in saving predicted output would effect the
performanceofthewebsitemainlybecauseitwilldelaythereturnofthemodel
outputfrom‘predict.php’filetothejavaScript‘predictResponse()’function.
• Thefinalpredictiondoneforanyuserasperthemodelcanalwaysbecalculated
again as the databases are keeping a record of themousemovement data for
everyuser.Thiswouldbedonelaterinthetestingphaseoftheproject.
Embeddingthemachinelearningmodelsinthewebsite
63
• Thepredictionwasdoneevery10seconds.Thiswouldmean, thattherewould
beseveralpredictions(average4predictions)doneforeveryuser.Thecountof
four predictionswas estimated because it was earlier found in section 3.3.2.2
that average time spentby auser on thewebpage is 40.26 seconds. Saving all
predictions per user is again a performance issue, as the table saving this is
expectedtogrowwithtime.
The finalwebsitecapableofpredicting theproduct theuser ismost likely tobuywas
madepublicandwaskeptonlinefor7days.Theuserswereagaininvitedusingemails,
socialmedia,chatsetcandwereaskedtosurfonthefinalversionofthewebpage.The
volunteerswererequiredtobuyoneoftheproductsafterevaluatingalltheoptions(5
laptops)availableonthatpage.Whiledoingso,theuserswereshowntheproductthey
aremost likely to buy. Itwas told by the visitors informally via email and in‐person
conversationsthatthepredictionswerequiteaccurate.
The next chapter will explain a much formal and quantitative method of testing the
prediction done by the twomodels. Itwill also describe themethodology adopted to
testthetimeperformancesofthetwomodels.
TestingandResults
64
6 TESTINGANDRESULTS
Thischapterdescribesthecompletetestingphaseoftheproject.Itdescribes
the data collection steps and the parameters on which the models were
evaluated.Italsoexplainsthetestingmethodologyandsummaryofthefinal
resultsobtained.
6.1 Testingmethodology
Thereweretwotypesoftestsconductedtoevaluatetheimplementation.Onetestwas
conductedonWEKAonthecollectedtestdatatocheckfortheclassificationaccuracyof
the model (decision tree or neural network). The other test was conducted on the
‘predict.php’ file tocheck the timeperformanceof thewebsiteafter implementing the
model.
Both the above‐mentioned testswere performed on both themodels separately. The
methodologyadoptedandtheresultsobtainedarementionedinthefollowingsections.
6.2 Testingformodelaccuracy
Testingdatawascollectedwhilethefinalwebsitewasliveandwasusedtofurthertest
thetwomodelsinWEKA.Itwasfoundthatthedecisiontreemodelgaveanaccuracyof
84.09%whereastheneuralnetworkmodelgaveanaccuracyof34.09%onthecollected
testdata.Detailsaboutthetestconductedareasfollows:
TestingandResults
65
6.2.1 Testingdatacollection
While thewebsitewithoneof themachine learningmodelwas live, theusersmouse
trackingdataandthefinalproductboughtbytheuserwasgettingsavedinthetables
‘data’and‘bought’respectively.Itwasfoundthatin7daystime(durationforwhichthe
testwebsitewaslive),49uniqueusersvisitedthewebpage.Therewere1275tuplesin
the ‘data’ table and 44 tuples in the ‘bought’ table. The difference between the
cardinalityofbought tableandthenumberofvisitorswasbecause5users(49minus
44)didn’tclickthebuybuttonandleftthesiteafterbrowsingitforawhile.
Thisdatawasprocessed inthesimilarwayas the initialdataasmentioned insection
3.3.2.Thestepsfollowedtoanalyzeandpreparethetestdataareasfollows:
• Thedatawasconverted intoamoreusable formatusing thephpscriptnamed
‘alignData.php’. Thedetails of this script arementioned in section3.3.2.1. This
stepconvertedthetestdataintoa‘oneuserperrowdata’withthetimevaluesof
eachuserinasamerowalongwiththeproductbought.
• Thisdatawasexportedintoexcelandwasnormalized.Tonormalizethetimes,
totaltimespentbyeachuserwascalculatedandthentimespentineachsection
/cellwasdividedbythetotaltime.Thisisexplainedindetailsinsection3.3.2.3
• Itshouldbenotedthatinthisstepnooutlierswereremoved.Thereasonisthat
thedatawascollected fromtheactualusersand it isexpectedthatallkindsof
people will use the website in all possible way and the accurate measure of
accuracywouldbewhenallthesecasesaretakenintoconsiderationsincluding
anyoutliers.
• ThisdatawassavedinaCSVfilethatisthenopenedinWEKA.
TestingandResults
66
• OpenedtheCSVfileinWEKAandwassavedinWEKAdefaultARFFformat.The
ARFF formatwasopened in a text editor and thepropertyof thebought table
waschangedfromnumbertonominalasstatedinsection4.4
This data was then opened in WEKA again the model testing was carried out as
explainedinfollowingsections:
6.2.2 ModeltestinginWEKAusingtestdata
UsingWEKAthesavedfilesofthetwomodelswereopened.Intheclassifiertab,testing
on supplied test dataset option was chosen and after pressing the set button, the
collectedandnormalizedtestdatafilewasopened.Nowtheloadedmodelwasmadeto
evaluateonthistestingdatabyrightclickingthemodelandselecting“Re‐evaluatethe
modeloncurrent test‐set”.Thismethodwouldevaluate themodelonthetestdataset
collectedandwouldshowtheaccuracyresultsonthistestdata.
ThismethodissimilartorunningthemodelonthewebsiteusingPHP.Theoutputgiven
bythemodelwhiletestinginWEKAwouldbeexactlysametotheonegivenbythePHP
script online. This is because the obtained WEKA model was the one which was
implementedinthewebsite.Thiswasthereasonthatthepredictionswerenotsavedas
stated in section 5.5. Now checking for accuracy is simply comparing the model
predictionwiththeactualproductboughtbytheuser.
Thedetailsof the resultsgivenbyboth themodelwhenevaluatedon the test setare
explainedinthefollowingsubsections‐
TestingandResults
67
6.2.2.1 DecisionTreemodel
Thetestdatasetwith44caseswasevaluatedusingthebuiltdecisiontreemodel.Itwas
foundthatthetreewasabletocorrectlyclassify37outof44caseswithanaccuracyof
84.0909%.
Theoutputobtainedafterre‐evaluationfromWEKAwas:
===Re‐evaluationontestset===UsersuppliedtestsetRelation:MasterData‐1‐weka.filters.unsupervised.attribute.Remove‐R1Instances:unknown(yet).ReadingincrementallyAttributes:133===Summary===CorrectlyClassifiedInstances 37 84.0909%IncorrectlyClassifiedInstances 7 15.9091%Kappastatistic 0.7825Meanabsoluteerror 0.1916Rootmeansquarederror 0.2814TotalNumberofInstances 44===DetailedAccuracyByClass===TPRateFPRatePrecisionRecallF‐MeasureROCAreaClass0.8570.0270.8570.8570.8570.98610.8750.1430.7780.8750.8240.87220.9170.0310.9170.9170.9170.92130.8330.0260.8330.8330.8330.94540.333010.3330.50.895WeightedAvg.0.8410.0680.8510.8410.8340.915===ConfusionMatrix===a b c d e<‐‐classifiedas6 1 0 0 0 |a=11 140 1 0 |b=20 1 110 0 |c=30 0 1 5 0 |d=40 2 0 0 1 |e=5
TestingandResults
68
6.2.2.2 NeuralNetworkmodel
The dataset having 44 test caseswas evaluated on neural networkmodel and itwas
foundthatitclassified15casescorrectly.Thisshowsanaccuracyofonly34.0909%on
testingdataoftheneuralnetworkmodel.
TheoutputobtainedfromWEKAwas:
===Re‐evaluationontestset===UsersuppliedtestsetRelation:MasterData‐1‐weka.filters.unsupervised.attribute.Remove‐R1Instances:unknown(yet).ReadingincrementallyAttributes:133===Summary===CorrectlyClassifiedInstances 1534.0909%IncorrectlyClassifiedInstances2965.9091%Kappastatistic 0.1367Meanabsoluteerror 0.2695Rootmeansquarederror 0.5001TotalNumberofInstances 44===DetailedAccuracyByClass=== TPRateFPRatePrecisionRecall F‐Measure ROCArea Class 0.429 0.135 0.375 0.4290.4 0.695 1 0.313 0.25 0.417 0.3130.357 0.694 2 0.25 0.281 0.25 0.25 0.25 0.505 3 0.5 0.184 0.3 0.5 0.375 0.623 4 0.333 0.024 0.5 0.3330.4 0.78 5WeightedAvg. 0.341 0.216 0.354 0.3410.34 0.639===ConfusionMatrix=== a b c d e <‐‐classifiedas 3 3 0 1 0 |a=1 2 5 7 2 0 |b=2 3 4 3 2 0 |c=3 0 0 2 3 1 |d=4 0 0 0 2 1 |e=5
TestingandResults
69
6.2.3 Discussion
Thetrainingdatagaveanaccuracyof89.5%fordecisiontreewhereasgaveanaccuracy
of95%forneuralnetworks.Thesamedecision treeandneuralnetworkmodelsgave
accuraciesof84.0909%and34.0909%respectivelywhenevaluatedonthetestdataset.
Formodelscomparison,accuracyonthetestdataset, i.e. thedataonwhichmodelhas
notbeen trained is theoneof themost importantparameter.Asdiscussed in section
4.4.3,thereareseveraldrawbacksofusingneuralnetworksinthepresentsituationbut
after conducting the evaluation of the two models on test dataset, it is clear that
decision trees have clearly out performedneural networks and shouldbeusedwhile
predictions.
This however depends on a lot of parameters, most important being the size of the
trainingandtestingdataset.Sincethescopeofthisprojectwaslimited,alargeamount
of data could not be collected but it is advised that both decision trees and neural
networks should be evaluated alongwith othermachine learningmodels before pin‐
pointingononeofthem.
6.3 Testingtimeperformanceofthemodels
AnewPHPscriptwaswrittenandexecutedontheservertoestimatetheaveragetime
themodelprocessing is takingwhenexecuted in real time.Todo this, thePHPscript
wasconnectedtothedatabasecontainingthetestdata.Boththemodelfunctionswere
thencalledandthetimetakenbythemtoevaluateallthe44testcaseswascalculated.
Thiswas averaged out over 44 cases to estimate the average time eachmodel takes
whilemakingeverypredictioninPHP.Thisprocesswascarriedout10timesseparately
to estimate the average time so as to avoid any clasheswith unforeseen tasks at the
serverthatmightdelaythemodelexecution.
TestingandResults
70
Time takenby themodel is an important feature as the expectationof the intelligent
websiteistopredicttheoutputassoonaspossibleandofcourseinreal‐time.Amodel
takingmorethansomethresholdvalueforcalculationsisofnogooduse.Theprocess
andresultsareexplainedinthefollowingsections‐
6.3.1 DecisionTreemodel
The decision treewasmade to execute on all the 44 test cases. The time takenwas
averagedout.Thiswasdone10 timesand theaverage times in seconds takenby the
scripttoevaluatedecisiontreewere:
0.000929258, 0.000544337, 0.000656968, 0.004135495,
0.000538674, 0.000537385, 0.000534681, 0.000545979,
0.000546981, 0.007368538
Fromtheabove10timevalues,thefollowinginsightscanbeseen:
• Minimumtimetakenbythemodelwasapproximately0.00053seconds
• Maximumtimetakenbythemodelwasapproximately0.00737seconds
• Averagetimetakenbythemodelwas0.00163seconds
6.3.2 NeuralNetworkmodel
Asdonefordecisiontrees,theneuralnetworkmodelwasalsomadetoevaluatethe44
test cases. The average time taken was noted. This was done 10 times. The average
timesofexecutiontakenbytheneuralnetworkmodelwere:
0.658177257, 0.543146627, 0.658050104, 0.746059109,
0.482899054, 0.536261456, 0.639314229, 0.505210876,
0.496645451, 0.707032805
TestingandResults
71
Fromtheabove10timevalues,thefollowinginsightscanbeseen:
• Minimumtimetakenbythemodelwasapproximately0.4839seconds
• Maximumtimetakenbythemodelwasapproximately0.7461seconds
• Averagetimetakenbythemodelwas0.5973seconds
6.3.3 Discussion
It is clearly seen that neural network model is taking far more time to execute as
compared to decision treemodel. It was also analyzed that the chosen decision tree
modelrunsatleast350timesfasterthanthechosenneuralnetwork.
Since,theobjectiveistopredictinread‐time,speedisaveryimportantparameterand
decisiontreemodelhascompletelywonthetimeperformancebattle.
6.4 Results
Aftertestingboththemodels(decisiontreeandneuralnetwork)onpredictionaccuracy
andtimeperformanceparameters,itwasclearlyfoundthatdecisiontreeprovedmuch
betterforimplementationinthecurrentproblemascomparedtoneuralnetwork.
Theresultobtainedinthetestsissummarizedbelow:
• Accuracy(ontestdataset):
o DecisionTree:84.0909%
o NeuralNetwork:34.0909%
• TimePerformance(PHPscriptsrunningonapache):
o DecisionTree:0.0016seconds
o NeuralNetwork:0.5973seconds
TestingandResults
72
It should however be noted that these resultswere obtainedwhen themodelswere
trainedononly200cases.Theneuralnetworkmodelhadatotalof73nodesincluding
68hiddennodes.Toproperly train theneuralnetworka few thousandcaseswereat
leastrequired.Theneuralnetworkmodelwasbuilt toestablishthe fact that itcanbe
used on a website to predict relevant content for the user. The decision tree on the
other hand is also expected to give better results when larger training and testing
datasetsareavailable.
Itshouldalsobenotedthattherewerefiveclassesofthedependentvariable(5possible
laptop products) and hence the model would have been considered void only if the
accuracyiscloseto20%(20%beingtheequallylikelychanceofeachmodel).Sincethe
accuracy obtained for both the machine learning models was far above the 20%
benchmark, both models have shown some promise that they do have potential to
recommendrelevantcontentforauserbasedonhismousemovementbehaviors.
Thenextchapterwillgiveabriefconclusionoftheworkdone.
Conclusion
73
7 CONCLUSION
Thischaptergives theconclusionof theprojectanddiscusses the scopeof
future work possible. It also talks about some other implementations
possibleoftheexplainedmethodology
It has been successfully demonstrated that by building amachine‐learningmodel on
usersmousemovementdata,appropriatecontentforhimcanbepredicted.Thedummy
shoppingwebsite developed, embeddedwith a decision treemachine learningmodel
gavearemarkableaccuracyof84.09%onthetestdata.Theaccuracywasmeasuredas
the ratio of the correct predictions to the total number of predictions done by the
model. Itwasalso found that implementingadecision treemodel inawebsitewould
notaffecttheperformanceofthepageastheaveragetimetakenbythedummymodel
was found to be around 1.6 milliseconds. A Neural network model was similarly
evaluated and it gave an accuracy of 34.09% and took an average time of 577.3
millisecondstoprocessasinglecaseofdata.
The objective of the projectwas to use themousemovement behavior of a user and
predicttheappropriatecontentforhimintelligentlyandinrealtime.Thisobjectivewas
successfullyachievedandseveralothersub‐objectiveswerealsoreachedwhileworking
ontheproject.
User’smousetrackingwasimplementedsuccessfullyusingacompletelynewalgorithm.
ThiswasdoneusingPHP,AJAX,HTMLandMySQL.Theperformanceofthewebsiteafter
Conclusion
74
implementing mouse tracking was not compromised and the accuracy of the mouse
trackingdatacollectedwasfoundtobeveryhigh.Awebpagewasdevelopedimitatinga
shoppingportalandsomehighlightingtechniqueswereappliedtoittomakesurethat
theuser’smousepointerisclosetohispointofgaze.
TheinitialwebsitedevelopedinPHPwasliveforaroundtwoweeksanditcollected200
casesoftrainingdata.Thedatawasthenusedtotraintwoseparatemachine‐learning
models,namelyaDecisionTreemodelandaNeuralNetworkmodel.Both themodels
gavepromisingresultswhentestedonthetrainingdata,whichprovedthatthemodels
builthavelearnedthemousemovementbehaviorappropriately.
Both themachine learningmodels were coded back into thewebsite using PHP and
AJAX.Thewebsitewasmadetocollectmousemovementdatawhichwasdynamically
readbythemodelsandanoutputwasgenerated.Thispredictedoutputwassenttothe
webpage for furtherpersonalization.A totalof44 test caseswerealso collected from
thefinalwebsite.
Using the collected 44 test cases, bothmodels were evaluated and the decision tree
modelwasfoundtoperformextremelywellascomparedtotheneuralnetworkmodel,
bothfromthepointofviewofaccuracyandtimeperformance.Decisiontreeclassified
2.5timesaccuratelyinasetof44cases,andwas350timesfasterthanneuralnetwork
model. This however cannot be generalized as it depends on the size of the initial
training dataset, (which was small in the current scope of the project) and on the
numberofindependentvariables(whichwaslargeinthecurrentimplementation).
Theworkingdemonstrationof theproject,alongwith itsdocumentationandtheGNU
General Public License source code is available online at
http://sparshgupta.name/MSc/Project
Conclusion
75
7.1 FutureWork
The proposed idea has shows a huge potential and there is a lot of scope for future
innovations and improvements if properly explored. The lack of data was the prime
limitationinthecurrentstudy.Ifacommercialwebsiteisrequiredtobeintelligentthen
modelsbuiltonseveralthousandsofcasesoftrainingdatashouldbeusedandoncethat
dataisobtained,possibilitiesofothermachinelearningalgorithmscouldbeexplored.
Thedatacollectedinthetestingphasecanlaterbeusedtotrainthemodels.Thereisa
never‐ending chain of model training and improvement involved in the current
proposed concept and implementation. This is because with time, the website will
accumulate a lot of data that at regular intervals can be used to further train the
implemented model or to make a new model. It is expected that with every
improvementinthemodel,itscapabilitytopredicttherelevantcontentforanewuser
willincrease.
Theproposedimplementationrequiresthateachsectionofthewebsitecallsthemouse
tracking function whenever mouse enters the section and leaves it. This requires
explicit coding of function call statements in every cell. Thismight not bepossible in
highlydynamicwebsitesandhenceworkcouldbedoneon implementing the ideaon
anygivenwebsite,requiringalmostnochangeintheexistingwebcoding.
Inthecurrentproject,theinformationaboutthepredictedcontent(i.e.,thelaptopuser
is most likely to buy) was not exploited. Work can be done to make the website
interacting with the user like a salesman. The website can remove all the products
whichtheuserwouldbeleastinterestedinandcanonlyshowhimproductsheismost
likelytobuy.
Conclusion
76
Currentimplementationsinvolvedusingonlyasinglemachine‐learningmodelatatime.
Multiple models can be implemented in the webpage and the strength of prediction
made can also be used to further interact with the user. Incase all the different
implementedmodels gave the sameprediction than it canbeassumed tobea strong
predictionandhencethewebpagecanadaptaccordinglyimmediately.
Otherimplementationspossible
AShoppingportalwithintelligentpredictionoftheproductauserismostlikelytobuy
is one of the many implementations possible of the proposed concept. Some other
possibleimplementationscouldbe:
• ASearchEngineFeedbackSystem:Currentsearchenginesdisplaytheresultsin
aformoflistoflinksalongwithasmalltextrelevanttothesearch.Mostofthe
userschoosethelinksafterreadingthetextsnippetassociatedwiththelinkand
they spend different times on different links. Current search feedback is
completelybasedonmouseclickthatinasenseisabinaryfeedback(eitherYes
or No). The feedback system can be made more accurate by determining the
relativetimeauserspentonalinkcomparedtootherlinks.
• News Content Prediction: An online news website shows several news under
differentheadsonapage.Manycommonusershavedifferentprioritiesfornews.
Basedonauser’smousemovementactivity,relevantnewscontentcanbeshown
tohim.Forexample,ifauserisspendingmoretimearoundfootballandcricket
newsheadlinesthanPoliticalheadlines,thenitcanbepredictedthatheismore
interestedinsportnewsand,accordingly,thewebsitecanbemoldedforhim.
77
<<Bibliography
78
BIBLIOGRAPHY
Aaltonen, Antti, Aulikki Hyrskykari, and Kari-Jo Räihä. "101 spots, or how do users
read menus?" Conference on Human Factors in Computing Systems, 1998: 132 -
139.
Arroya, Ernesto, Ted Selker, and Willy Wei. "Usability tool for analysis of web
designs using mouse tracks." Conference on Human Factors in Computing Systems,
2006: 484 - 489.
Atterer, Richard, and Albrecht Schmidt. "Tracking the interaction of users with AJAX
applications for usability testing." Conference on Human Factors in Computing
Systems, 2007: 1347 - 1350.
Atterer, Richard, Monica Wnuk, and Albrecht Schmidt. "Knowing the User’s Every
Move – User Activity Tracking for Website Usability Evaluation and Implicit
Interaction." ACM.
Balabanovic, Marko, Yoav Shoham, and Yeogirl Yun. "An Adaptive Agent for
Automated Web Browsing." 1997.
Byrne, Michael D, John R Anderson, Scott Douglass, and Michael Matessa. "Eye
tracking the visual search of click-down menus." Conference on Human Factors in
Computing Systems, 1999.
CERN. Welcome to info.cern.ch/. http://info.cern.ch/.
<<Bibliography
79
Chen, Mon Chu, John R Anderson, and Myeong Ho Sohn. "What can a mouse
cursor tell us more?: correlation of eye/mouse movements on web browsing."
Conference on Human Factors in Computing Systems, 2001.
Dutta, Partha, Sandip Debnath, and Sandip Sen. "A shopper's assistant."
International Conference on Autonomous Agents, 2001.
Edmonds, A, R White, D Morris, and S Drucker. "Instrumenting the Dynamic Web."
Journal of Web Engineering 6, no. 3 (2007): 243-260.
Edmonds, Andy. "Why the Mouse Doesn't Always Keep Up with the Eye." 2008.
Guo, Qi, and Eugene Agichtein. "Exploring mouse movements for inferring query
intent." Annual ACM Conference on Research and Development in Information
Retrieval, 2008: 1.
Gurney, Kevin N. An introduction to neural networks. illustrated. CRC Press, 1997.
Haykin, Simon. Neural Networks: A comprehensive Foundations. Prentice Hall.
Jayaputera, G. T., S. W. Loke, and A. Zaslavsky. "Design, implementation and run-
time evolution of a mission-based multiagent system." Web Intelligence and Agent
Systems 5, no. 2 (2007): 20.
Kohn, Nicholas, and Takashi Yamauchi. "Feature Inference: Tracking Mouse
Movement."
Linden, Greg. "Geeking with Greg Exploring the future of personalized information."
<<Bibliography
80
Mitchell, Tom. Decision Tree Learning, Machine Learning. The McGraw-Hill
Companies, Inc., 1997.
Mueller, Florian, and Andrea Lockerd. "Cheese: tracking mouse movement activity
on websites, a tool for user modeling." Conference on Human Factors in Computing
Systems, 2001.
Pazzani, Michael, and Daniel Billsus. "Learning and Revising User Profiles: The
Identification of Interesting Web Sites." Machine Learning 27, no. 3 (1997): 313 -
331.
Perkowitz, M, and O Etzioni. "Towards adaptive web sites: Conceptual framework
and case study." Artificial Intelligence 118, no. 1 (2000): 245 - 275.
Quinlan, J. R. "Improved Use of Continuous Attributes in C4.5." Journal of Artificial
Intelligence Research 4 (1996): 77-90.
Rodden, Kerry, Xin Fu, Anne Aula, and Ian Spiro. "Eye-Mouse Coordination Patterns
on Web Search Results Pages." Conference on Human Factors in Computing
Systems, 2008: 5.
Salzberg, Steven L. "C4.5: Programs for Machine Learning." Machine Learning 16,
no. 3 (1994): 235-240.
Schafer, J. Ben, Joseph Konstan, and John Riedi. "Recommender systems in e-
commerce." Electronic Commerce, 1999.
The University of Waikato. Weka 3: Data Mining Software in Java.
http://www.cs.waikato.ac.nz/ml/weka/.
>>Appendix:SourceCode
81
Torres, Luis A. Leiva, and Roberto Vivo Hernando. "Real time mouse tracking
registration and visualization tool for usability evaluation on websites."
http://smt.speedzinemedia.com/smt/docs/smt_IADIS07.pdf.
Torres, Luis A. Leiva, and Roberto Vivo Hernando. "Real time mouse tracking
registration and visualization tool for usability evaluation on websites."
Usmani, Zeeshan-ul-hassan, Fawzi A. Alghamdi, and Talal Naveed Puri. "Intelligent
Web Interactions - What, When and How?" Web Intelligence & Intelligent Agent,
2008: 3.
W3Schools. Ajax. http://www.w3schools.com/Ajax/.
Wikipedia. C4.5 Algorithm. http://en.wikipedia.org/wiki/C4.5_algorithm.
—. Machine Learning Wikipedia. http://en.wikipedia.org/wiki/Machine_learning.
—. Multilayer Perceptron. http://en.wikipedia.org/wiki/Multilayer_perceptron.
Winston, P. Learning by building identification trees. Addison-Wesley Publishing
Company, 1992.
Witten, Ian H, and Eibe Frank. Data Mining: Practical machine learning tools and
techniques. San Francisco: Morgan Kaufmann, 2005.
>>Appendix:SourceCode
82
APPENDIX:SOURCECODE
HTMLfinalwebpage
The HTML code of the final website developed capable of tracking user’s mouse
movements as well as capable of predicting the relevant product to the user is as
follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>MSc Project - Compare Laptops</title> <link rel="stylesheet" type="text/css" href="mouseover.css"/> <script type="text/javascript" src="mouseover.js" ></script> </head> <body onload="start_It();"> <table width="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td><p class="oce-first"><span class="bold">NOTE:</span> Surf on this page like you do on a shopping portal comparison page and decide on a model based on its configuration and buy it. Thanks</p></td> </tr> <tr> <td> </td> </tr> <tr> <td><table width="100%" border="0" align="center" cellpadding="0" cellspacing="0" onMouseOver="hiliteColumn(event);" onMouseOut="resetColumn(event);" class="one-column-emphasis"> <colgroup class="oce-first" id="na"></colgroup> <colgroup id="cg2" class=""></colgroup> <colgroup id="cg3" class=""></colgroup> <colgroup id="cg4" class=""></colgroup> <colgroup id="cg5" class=""></colgroup> <colgroup id="cg6" class=""></colgroup> <thead> <tr> <th onmouseout="movement_out('a0');" onmouseover="movement_in();">Product Name</th> <th onmouseout="movement_out('a1');" onmouseover="movement_in();">Lenovo IdeaPad Y650 4185</th> <th onmouseout="movement_out('a2');" onmouseover="movement_in();">HP Pavilion dv7-1285dx</th> <th onmouseout="movement_out('a3');" onmouseover="movement_in();">Sony VAIO VGN-P588E</th> <th onmouseout="movement_out('a4');" onmouseover="movement_in();">Dell Studio XPS 16</th> <th onmouseout="movement_out('a5');" onmouseover="movement_in();">Toshiba Satellite A205-S4617</th> </tr> </thead> <tbody> <tr> <td class="oce-first" onmouseout="movement_out('b0');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('b1');" onmouseover="movement_in();"><img src="images/1.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b2');" onmouseover="movement_in();"><img src="images/2.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b3');" onmouseover="movement_in();"><img src="images/3.gif" width="120" height="90" border="0" /></td> <td onmouseout="movement_out('b4');" onmouseover="movement_in();"><img src="images/4.gif" width="120" height="90" border="0" /></td>
>>Appendix:SourceCode
83
<td onmouseout="movement_out('b5');" onmouseover="movement_in();"><img src="images/5.gif" width="120" height="90" border="0" /></td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('c0');" onmouseover="movement_in();">Price</td> <td onmouseout="movement_out('c1');" onmouseover="movement_in();">$1,249.00</td> <td onmouseout="movement_out('c2');" onmouseover="movement_in();">$1,199.99</td> <td onmouseout="movement_out('c3');" onmouseover="movement_in();">$1,133.00</td> <td onmouseout="movement_out('c4');" onmouseover="movement_in();">$1,224.00</td> <td onmouseout="movement_out('c5');" onmouseover="movement_in();">$1,249.00</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('d0');" onmouseover="movement_in();">CNET editors' rating</td> <td onmouseout="movement_out('d1');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d2');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d3');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d4');" onmouseover="movement_in();">3.5/5.0</td> <td onmouseout="movement_out('d5');" onmouseover="movement_in();">3.5/5.0</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('e0');" onmouseover="movement_in();">Average user rating</td> <td onmouseout="movement_out('e1');" onmouseover="movement_in();">No Data</td> <td onmouseout="movement_out('e2');" onmouseover="movement_in();">4.0/5.0</td> <td onmouseout="movement_out('e3');" onmouseover="movement_in();">2.0/5.0</td> <td onmouseout="movement_out('e4');" onmouseover="movement_in();">No Data</td> <td onmouseout="movement_out('e5');" onmouseover="movement_in();">3.0/5.0</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('f0');" onmouseover="movement_in();">Release date</td> <td onmouseout="movement_out('f1');" onmouseover="movement_in();">April 15, 2009</td> <td onmouseout="movement_out('f2');" onmouseover="movement_in();">February 01, 2009</td> <td onmouseout="movement_out('f3');" onmouseover="movement_in();">January 08, 2009</td> <td onmouseout="movement_out('f4');" onmouseover="movement_in();">January 07, 2009</td> <td onmouseout="movement_out('f5');" onmouseover="movement_in();">April 16, 2007</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('g0');" onmouseover="movement_in();">The Bottom Line</td> <td onmouseout="movement_out('g1');" onmouseover="movement_in();">Online media consumers who want a portable laptop with high style and plenty of screen real estate should give the Y650 a look.</td> <td onmouseout="movement_out('g2');" onmouseover="movement_in();">HP's Pavilion dv7-1245dx is a slick multimedia machine with great battery life, but for $1,200, we want a full 1080p display.</td> <td onmouseout="movement_out('g3');" onmouseover="movement_in();">Sony's upscale Atom-powered Lifestyle PC has the components of a cheaper machine but the design of a more expensive one. The end result will be a useful travel PC for some and a conversation piece for others.</td> <td onmouseout="movement_out('g4');" onmouseover="movement_in();">Dell's new 16:9 Studio XPS 16 adds upscale extras such as a leather trim and a backlit keyboard to a fairly standard set of components, without jacking up the price (too much).</td> <td onmouseout="movement_out('g5');" onmouseover="movement_in();">Toshiba adds faster Draft N Wi-Fi to this attractive if otherwise fairly conventional laptop. Just be sure you've got an 802.11n router to go along with it.</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('h0');" onmouseover="movement_in();">Similar Products</td> <td onmouseout="movement_out('h1');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('h2');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('h3');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('h4');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('h5');" onmouseover="movement_in();"> </td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('i0');" onmouseover="movement_in();">Networking</td> <td onmouseout="movement_out('i1');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br />
>>Appendix:SourceCode
84
- IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - Bluetooth 2.1 EDR<br /> - IEEE 802.11n (draft)</td> <td onmouseout="movement_out('i2');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - IEEE 802.11n (draft) </td> <td onmouseout="movement_out('i3');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - Gigabit Ethernet<br /> - Bluetooth 2.1 EDR<br /> - IEEE 802.11n (draft) </td> <td onmouseout="movement_out('i4');" onmouseover="movement_in();">Network adapter - Gigabit Ethernet</td> <td onmouseout="movement_out('i5');" onmouseover="movement_in();">Network adapter - Ethernet<br /> - IEEE 802.11a<br /> - IEEE 802.11b<br /> - IEEE 802.11g<br /> - Fast Ethernet<br /> - IEEE 802.11n (draft)</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('j0');" onmouseover="movement_in();">Graphics Controller</td> <td onmouseout="movement_out('j1');" onmouseover="movement_in();">NVIDIA GeForce G105M - 256 MB</td> <td onmouseout="movement_out('j2');" onmouseover="movement_in();">NVIDIA GeForce 9600M GT - 512 MB</td> <td onmouseout="movement_out('j3');" onmouseover="movement_in();">Intel GMA 500</td> <td onmouseout="movement_out('j4');" onmouseover="movement_in();">ATI Mobility RADEON? HD 3670 - 512MB - 512 MB</td> <td onmouseout="movement_out('j5');" onmouseover="movement_in();">Intel GMA 950 - 8 MB</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('k0');" onmouseover="movement_in();">Notebook Camera</td> <td onmouseout="movement_out('k1');" onmouseover="movement_in();">Integrated - 1.3 Megapixel</td> <td onmouseout="movement_out('k2');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('k3');" onmouseover="movement_in();">Integrated</td> <td onmouseout="movement_out('k4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('k5');" onmouseover="movement_in();">Info unavailable</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('l0');" onmouseover="movement_in();">Optical Storage</td> <td onmouseout="movement_out('l1');" onmouseover="movement_in();">DVD-Writer - Integrated</td> <td onmouseout="movement_out('l2');" onmouseover="movement_in();">DVD?RW (?R DL) / DVD-RAM with LightScribe Technology</td> <td onmouseout="movement_out('l3');" onmouseover="movement_in();">None</td> <td onmouseout="movement_out('l4');" onmouseover="movement_in();">8X DVD+/- RW(DVD/CD read/write) Slot Load Drive</td> <td onmouseout="movement_out('l5');" onmouseover="movement_in();">DVD?RW (?R DL) / DVD-RAM - Integrated</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('m0');" onmouseover="movement_in();">RAM</td> <td onmouseout="movement_out('m1');" onmouseover="movement_in();">4 GB (installed) / 8 GB (max) - DDR3 SDRAM - 1066 MHz - PC3-8500 ( 2 x 2 GB )</td>
>>Appendix:SourceCode
85
<td onmouseout="movement_out('m2');" onmouseover="movement_in();">6 GB (installed) / 8 GB (max) - DDR2 SDRAM</td> <td onmouseout="movement_out('m3');" onmouseover="movement_in();">2 GB (installed) / 2 GB (max) - DDR2 SDRAM - 533 MHz ( 1 x 2 GB )</td> <td onmouseout="movement_out('m4');" onmouseover="movement_in();">4 GB DDR3 SDRAM</td> <td onmouseout="movement_out('m5');" onmouseover="movement_in();">2 GB (installed) / 4 GB (max) - DDR2 SDRAM - 667 MHz - PC2-5300</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('n0');" onmouseover="movement_in();">Cache Memory</td> <td onmouseout="movement_out('n1');" onmouseover="movement_in();">3 MB - L2 cache</td> <td onmouseout="movement_out('n2');" onmouseover="movement_in();">3 MB - L2 cache</td> <td onmouseout="movement_out('n3');" onmouseover="movement_in();">512 KB - L2 cache</td> <td onmouseout="movement_out('n4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('n5');" onmouseover="movement_in();">2 MB - L2 cache</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('o0');" onmouseover="movement_in();">Processor</td> <td onmouseout="movement_out('o1');" onmouseover="movement_in();">Intel Core 2 Duo P8700 / 2.53 GHz ( Dual-Core )</td> <td onmouseout="movement_out('o2');" onmouseover="movement_in();">Intel Core 2 Duo P8600 / 2.4 GHz ( Dual-Core )</td> <td onmouseout="movement_out('o3');" onmouseover="movement_in();">Intel 1.33 GHz</td> <td onmouseout="movement_out('o4');" onmouseover="movement_in();">Intel Core 2 Duo P8700 / 2.53 GHz</td> <td onmouseout="movement_out('o5');" onmouseover="movement_in();">Intel Core 2 Duo T5500 / 1.66 GHz ( Dual-Core )</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('p0');" onmouseover="movement_in();">Hard Drive</td> <td onmouseout="movement_out('p1');" onmouseover="movement_in();">320 GB - Serial ATA-300 - 5400 rpm</td> <td onmouseout="movement_out('p2');" onmouseover="movement_in();">500 GB - Serial ATA-150 - 5400 rpm</td> <td onmouseout="movement_out('p3');" onmouseover="movement_in();">64 GB - Serial ATA-150</td> <td onmouseout="movement_out('p4');" onmouseover="movement_in();">500 GB - 5400 rpm</td> <td onmouseout="movement_out('p5');" onmouseover="movement_in();">250 GB - Serial ATA-150 - 4200 rpm</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('q0');" onmouseover="movement_in();">Display</td> <td onmouseout="movement_out('q1');" onmouseover="movement_in();">16 in TFT active matrix 1366 x 768 ( WXGA ) - VibrantView</td> <td onmouseout="movement_out('q2');" onmouseover="movement_in();">17 in TFT active matrix 1440 x 900 ( WXGA+ ) - BrightView</td> <td onmouseout="movement_out('q3');" onmouseover="movement_in();">8 in TFT active matrix 1600 x 768</td> <td onmouseout="movement_out('q4');" onmouseover="movement_in();">16.0</td> <td onmouseout="movement_out('q5');" onmouseover="movement_in();">15.4 in TFT active matrix 1280 x 800 ( WXGA ) - 24-bit (16.7 million colors)</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('r0');" onmouseover="movement_in();">Battery</td> <td onmouseout="movement_out('r1');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r2');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r3');" onmouseover="movement_in();">Lithium ion</td> <td onmouseout="movement_out('r4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('r5');" onmouseover="movement_in();">Lithium ion</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('s0');" onmouseover="movement_in();">Dimensions (WxDxH)</td> <td onmouseout="movement_out('s1');" onmouseover="movement_in();">15.4 in x 10.2 in x 1 in</td> <td onmouseout="movement_out('s2');" onmouseover="movement_in();">15.6 in x 11.2 in x 1.7 in</td> <td onmouseout="movement_out('s3');" onmouseover="movement_in();">9.6 in x 4.7 in x 0.8 in</td>
>>Appendix:SourceCode
86
<td onmouseout="movement_out('s4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('s5');" onmouseover="movement_in();">14.3 in x 10.6 in x 1.3 in</td> </tr><tr> <td class="oce-first" onmouseout="movement_out('t0');" onmouseover="movement_in();">Weight</td> <td onmouseout="movement_out('t1');" onmouseover="movement_in();">5.5 lbs</td> <td onmouseout="movement_out('t2');" onmouseover="movement_in();">7.7 lbs</td> <td onmouseout="movement_out('t3');" onmouseover="movement_in();">1.4 lbs</td> <td onmouseout="movement_out('t4');" onmouseover="movement_in();">Info unavailable</td> <td onmouseout="movement_out('t5');" onmouseover="movement_in();">6.4 lbs</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('u0');" onmouseover="movement_in();">OS Provided</td> <td onmouseout="movement_out('u1');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium 64-bit Edition</td> <td onmouseout="movement_out('u2');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium</td> <td onmouseout="movement_out('u3');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium Edition</td> <td onmouseout="movement_out('u4');" onmouseover="movement_in();">Microsoft Windows Vista</td> <td onmouseout="movement_out('u5');" onmouseover="movement_in();">Microsoft Windows Vista Home Premium</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('v0');" onmouseover="movement_in();">Attribute X</td> <td onmouseout="movement_out('v1');" onmouseover="movement_in();">x1</td> <td onmouseout="movement_out('v2');" onmouseover="movement_in();">x2</td> <td onmouseout="movement_out('v3');" onmouseover="movement_in();">x3</td> <td onmouseout="movement_out('v4');" onmouseover="movement_in();">x4</td> <td onmouseout="movement_out('v5');" onmouseover="movement_in();">x5</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('w0');" onmouseover="movement_in();">Attribute Y</td> <td onmouseout="movement_out('w1');" onmouseover="movement_in();">y1</td> <td onmouseout="movement_out('w2');" onmouseover="movement_in();">y2</td> <td onmouseout="movement_out('w3');" onmouseover="movement_in();">y3</td> <td onmouseout="movement_out('w4');" onmouseover="movement_in();">y4</td> <td onmouseout="movement_out('w5');" onmouseover="movement_in();">y5</td> </tr> <tr> <td class="oce-first" onmouseout="movement_out('x0');" onmouseover="movement_in();">Attribute Z</td> <td onmouseout="movement_out('x1');" onmouseover="movement_in();">z1</td> <td onmouseout="movement_out('x2');" onmouseover="movement_in();">z2</td> <td onmouseout="movement_out('x3');" onmouseover="movement_in();">z3</td> <td onmouseout="movement_out('x4');" onmouseover="movement_in();">z4</td> <td onmouseout="movement_out('x5');" onmouseover="movement_in();">z5</td> </tr> </tbody> <tfoot> <tr> <td class="oce-first" onmouseout="movement_out('z0');" onmouseover="movement_in();"> </td> <td onmouseout="movement_out('z1');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('1');" value="Buy Now" /></td> <td onmouseout="movement_out('z2');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('2');" value="Buy Now" /></td> <td onmouseout="movement_out('z3');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('3');" value="Buy Now" /></td> <td onmouseout="movement_out('z4');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('4');" value="Buy Now" /></td> <td onmouseout="movement_out('z5');" onmouseover="movement_in();"><input type="submit" name="button" onclick="bought('5');" value="Buy Now" /></td> </tr></tfoot> </table></td> </tr> </table> </body> </html>
>>Appendix:SourceCode
87
TheJavaScriptfile
var http = getHTTPObject(); var cellEntryDate; var cellExitDate; var time; var queue1=""; var queue2=""; var flag=0; var done = 0; var tempQueue=""; var userId=new Date(); userId=userId.getTime(); var startpredict=0; var predictProduct=0; function autoPredict() { setTimeout("predict()",10000); } function predict() { http.open("GET", "predict.php?userId="+userId, true); http.onreadystatechange = predictResponse; http.send(null); } function predictResponse() { if (http.readyState == 4) { predictProduct = http.responseText; var colName=Number(predictProduct)+1; document.getElementById("cg2").className=""; document.getElementById("cg3").className=""; document.getElementById("cg4").className=""; document.getElementById("cg5").className=""; document.getElementById("cg6").className=""; document.getElementById("cg"+colName).className="oce-predict"; alert("Product : "+predictProduct); setTimeout("predict()",10000); } } function handleHttpResponse() { if (http.readyState == 4) { startIt(); } } function handleHttpResponseBought() { if (http.readyState == 4) { alert("Thanks for Participating"); } } function start_It() { if(done==0) { setTimeout("sendData()",2000); } if(startpredict==0) { ++startpredict; autoPredict(); }
>>Appendix:SourceCode
88
} function sendData() { if(flag==0) { queue2=""; flag=1; var query_string = "data.php?userId="+userId+"&queue="+queue1; queue1=""; } else { queue1=""; flag=0; var query_string = "data.php?userId="+userId+"&queue="+queue2; queue2=""; } http.open("GET", query_string, true); http.onreadystatechange = handleHttpResponse; http.send(null); } function movement_in(){ cellEntryDate = new Date(); } function movement_out(cell){ cellExitDate = new Date(); time = cellExitDate.getTime()-cellEntryDate.getTime(); if(done==0) { if(flag==0) { queue1 = queue1+cell+":"+time+"_"; } else { queue2 = queue2+cell+":"+time+"_"; } } } function bought(product){ done=1; var query_bought = "bought.php?userId="+userId+"&product="+product; http.open("GET", query_bought, true); http.onreadystatechange = handleHttpResponseBought; http.send(null); } function getHTTPObject() { var xmlhttp; /*@cc_on @if (@_jscript_version >= 5) try { xmlhttp = new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlhttp = new ActiveXObject("Microsoft.XMLHTTP"); } catch (E) { xmlhttp = false; } } @else xmlhttp = false; @end @*/ if (!xmlhttp && typeof XMLHttpRequest != 'undefined') { try { xmlhttp = new XMLHttpRequest(); } catch (e) { xmlhttp = false; } } return xmlhttp;
>>Appendix:SourceCode
89
} function hiliteColumn(e) { var o = (document.all) ? e.srcElement : e.target; if (o.nodeName != "TD") return; document.getElementById("cg"+(o.cellIndex+1)).className="over"; } function resetColumn(e) { var o = (document.all) ? e.srcElement : e.target; if (o.nodeName != "TD") return; document.getElementById("cg"+(o.cellIndex+1)).className=""; }
>>Appendix:SourceCode
90
TheCSSfile
body { margin-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; text-align: left; } colgroup.over { background: #ebeeff; } .oce-first { background: #d0dafd; border-right: 10px solid transparent; border-left: 10px solid transparent; min-width:199px; font-size: 14px; padding: 12px 15px; color: #039; text-align:justify; } .oce-predict { background: #d0dafd; border-right: 3px solid #F00; border-left: 3px solid #F00; border-top: 3px solid #F00; border-bottom: 3px solid #F00; min-width:199px; font-size: 14px; padding: 12px 15px; color: #039; text-align:justify; } table.one-column-emphasis { font-family: "Lucida Sans Unicode", "Lucida Grande", Sans-Serif; font-size: 12px; width: 100%; border-collapse: collapse; color: #969; } table.one-column-emphasis th { font-size: 14px; font-weight: bold; padding: 12px 15px; color: #039; text-align:center; } table.one-column-emphasis td { padding: 10px 15px; color: #669; border-top: 1px solid #e8edff; min-width:166px; text-align:center; } table.one-column-emphasis tr:hover td { background: #ebeeff; text-align: center; } table.one-column-emphasis tr:hover td:hover
>>Appendix:SourceCode
91
{ color: #039; background: #94acff; } .bold { font-weight: bold; } .italics { font-style: italic; } .oce-first { text-align: justify; }
>>Appendix:SourceCode
92
ThePHPscripts
data.php
<?php $queue=$HTTP_GET_VARS['queue']; $userId=$HTTP_GET_VARS['userId']; include("connect.php"); $queueArray=explode("_",$queue); for($i=0;$i<substr_count($queue,"_");$i++) { $values=explode(":",$queueArray[$i]); mysql_query("INSERT into data values(\"".$userId."\",\"".$values[0]."\",\"".$values[1]."\")"); } mysql_close($conn); ?>
connect.php
<?php $dbhost = 'localhost:8889'; $dbuser = 'root'; $dbpass = 'root'; $conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql'); $dbname = 'MSc'; mysql_select_db($dbname); ?>
bought.php
<?php $product=$HTTP_GET_VARS['product']; $userId=$HTTP_GET_VARS['userId']; include("connect.php"); mysql_query("INSERT into bought values(\"".$userId."\",\"".$product."\")"); mysql_close($conn); ?>
alignData.php
<?php include("connect.php"); $result=mysql_query("SELECT * FROM `bought`"); while($row = mysql_fetch_array($result)) { $result_1=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$row['userId']."\" order by `cellID`"); $columnNames=""; $values=""; $row_1 = mysql_fetch_array($result_1); $previous_column=$row_1['cellID']; $previous_value=$row_1['time']; while($row_1 = mysql_fetch_array($result_1)) { if($previous_column==$row_1['cellID']) { $previous_value+=$row_1['time']; }
>>Appendix:SourceCode
93
else { $columnNames=$columnNames.",".$previous_column; $values=$values.",\"".$previous_value."\""; $previous_value=$row_1['time']; $previous_column=$row_1['cellID']; } } $columnNames=$columnNames.",".$previous_column.",product"; $values=$values.",\"".$previous_value."\",\"".$row['product']."\""; mysql_query("INSERT INTO finalData(userID".$columnNames.") values (\"".$row['userId']."\"".$values.")"); mysql_query("DELETE from `bought` where `userId` = \"".$row['userId']."\""); mysql_query("DELETE from `data` where `userID` = \"".$row['userId']."\""); } mysql_close($conn); ?>
predict.php
<?php include("connect.php"); $totalTime=0; $result_1=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$_GET['userId']."\" order by cellID"); $result_2=mysql_query("SELECT * FROM `data` WHERE `userID`=\"".$_GET['userId']."\" order by cellID"); while($row_2 = mysql_fetch_array($result_2)) $totalTime+=$row_2['time']; $columnNames=""; $values=""; $row_1 = mysql_fetch_array($result_1); $previous_column=$row_1['cellID']; $previous_value=$row_1['time']; while($row_1 = mysql_fetch_array($result_1)) { if($previous_column==$row_1['cellID']) { $previous_value+=$row_1['time']; } else { $$previous_column=$previous_value/$totalTime; $previous_value=$row_1['time']; $previous_column=$row_1['cellID']; } } $$previous_column=$previous_value/$totalTime; decisionTree(); //neuralNetwork(); function decisionTree() { $model_DT=0; if($b5 <= 0.04509) if($k4 <= 0.013828) if($v1 <= 0.000362) if($r0 <= 0.000626) if($d5 <= 0.003481) if($d5 <= 0.001586) if($g4 <= 0.033267) if($s3 <= 0.004874) if($u1 <= 0.002108) if($f1 <= 0.039667) if($f4 <= 0.028894) if($i4 <= 0.004699) if($d2 <= 0.001173) if($e5 <= 0.001377)
>>Appendix:SourceCode
94
if($e1 <= 0.029566) if($r3 <= 0.000861) if($c1 <= 0.043665) if($a3 <= 0.206815) if($b1 <= 0.007319) if($f3 <= 0.001471) if($b4 <= 0.00214) $model_DT=2; else if($a4 <= 0.004126) $model_DT=3; else $model_DT=2; else $model_DT=3; else if($b3 <= 0.123969) $model_DT=2; else $model_DT=1; else $model_DT=1; else $model_DT=1; else $model_DT=3; else $model_DT=1; else $model_DT=3; else if($s4 <= 0.002873) $model_DT=2; else $model_DT=4; else $model_DT=1; else $model_DT=4; else $model_DT=3; else $model_DT=3; else if($q1 <= 0.004708) if($r4 <= 0.007391) $model_DT=3; else $model_DT=2; else $model_DT=2; else if($g5 <= 0.004141) if($k4 <= 0.001354) $model_DT=4; else $model_DT=3; else $model_DT=2; else $model_DT=4; else if($g5 <= 0.004141) if($b5 <= 0.002996) if($g4 <= 0.003922) $model_DT=2; else $model_DT=1; else $model_DT=3; else $model_DT=5; else $model_DT=4; else if($s4 <= 0.005561) if($t4 <= 0.002371) if($e0 <= 0.001979) if($h2 <= 0.005305) $model_DT=1; else $model_DT=2; else $model_DT=2; else $model_DT=2; else $model_DT=2; else if($f5 <= 0.001805) $model_DT=4; else $model_DT=2; else if($t3 <= 0.000515) if($d4 <= 0.008991) if($e2 <= 0.011901) if($a1 <= 0.001341) if($g2 <= 0.001762) $model_DT=4; else $model_DT=5; else $model_DT=5; else $model_DT=4; else $model_DT=2; else $model_DT=3; echo $model_DT; }
>>Appendix:SourceCode
95
function neuralNetwork() { $Node5=(-0.0209449762256399)+($a0*0.0120761574490061)+($a1*-0.0174298014185729)+($a2*-0.0175622955697642)+($a3*-0.000798046164731245)+($a4*-0.00566210278243689)+($a5*-0.00257021437573848)+($b0*0.0813554156049207)+($b1*-0.0383601651270091)+($b2*0.0315342748963075)+($b3*0.04750940128612)+($b4*0.00444930879229902)+($b5*0.0447743155601993)+($c0*0.0127846301489485)+($c1*0.0167829106398277)+($c2*0.0412283962113621)+($c3*0.0647197008365273)+($c4*0.026137495413712)+($c5*0.0292672102649498)+($d0*0.0575247995032596)+($d1*-0.0248903478567491)+($d2*-0.0356248056960633)+($d3*0.0131503378763436)+($d4*-0.00943722882163672)+($d5*0.0254130310753136)+($e0*0.0953293388209953)+($e1*-0.0358630730881965)+($e2*0.09184645890614)+($e3*0.0879998946588433)+($e4*-0.0210989430518799)+($e5*0.0236328879965554)+($f0*0.0521255666178908)+($f1*0.0562279524027289)+($f2*0.0420766593208718)+($f3*0.0219358641315261)+($f4*0.0500915161629286)+($f5*0.0598788090622592)+($g0*-0.0106339935340819)+($g1*0.0158371741591566)+($g2*0.0828753056435395)+($g3*-0.0152508552198513)+($g4*-0.00815101349601804)+($g5*0.0268439313590316)+($h0*0.070123678107641)+($h1*-0.0147305324346031)+($h2*0.0517135568746786)+($h3*-0.0117294349734072)+($h4*-0.00594235655570873)+($h5*0.0410639065208286)+($i0*-0.00105630930040345)+($i1*-0.00543787837624847)+($i2*0.0603755263497366)+($i3*0.0287693595250936)+($i4*0.0554227984526808)+($i5*0.0600355834517169)+($j0*0.0186135251521197)+($j1*0.00984875030922667)+($j2*0.0193290574626347)+($j3*0.021484574396215)+($j4*0.0484829773111019)+($j5*0.0233728871769681)+($k0*0.0410110073637687)+($k1*-0.00743846515678319)+($k2*0.0446579060767132)+($k3*0.00789530586935209)+($k4*0.0185589336156669)+($k5*0.0178833473514336)+($l0*0.0366297156412459)+($l1*0.0297884220860898)+($l2*0.0450253751867714)+($l3*0.0705159823038729)+($l4*0.074643360814636)+($l5*0.049178643898654)+($m0*0.00649293306157912)+($m1*0.0235761949995652)+($m2*0.0282972581223614)+($m3*0.00995247757969736)+($m4*0.0635360916248171)+($m5*-0.0185514952082912)+($n0*0.0798799834823821)+($n1*-0.0367274799798666)+($n2*0.0461992904934746)+($n3*0.0354383668658634)+($n4*-0.00123240277220675)+($n5*-0.0150807856098709)+($o0*-0.0260784636646052)+($o1*0.0553028912171675)+($o2*0.0802089447351997)+($o3*-0.0235601224487924)+($o4*-0.0281363990127924)+($o5*0.0319917291420718)+($p0*-0.0257109331590629)+($p1*-0.0279769700636828)+($p2*0.0433907293866429)+($p3*-0.0310545628159805)+($p4*0.0348153094694314)+($p5*-0.00776438719161176)+($q0*-0.0069736497593223)+($q1*0.0161811177301145)+($q2*0.0576906924312276)+($q3*0.0441712928131897)+($q4*0.0165528172670987)+($q5*-0.0274805831321372)+($r0*0.0120430047036489)+($r1*-0.000892653621313331)+($r2*0.0868045378672117)+($r3*0.0281943074796785)+($r4*0.0670839346752799)+($r5*0.0110772507057164)+($s0*0.0214207237015366)+($s1*-0.032511653106313)+($s2*0.0328856849361516)+($s3*0.0313926662260086)+($s4*0.0111177031525771)+($s5*0.0284289901014687)+($t0*0.0428425565992686)+($t1*0.0534413420371503)+($t2*0.0244766875457709)+($t3*0.0647078085232812)+($t4*0.0112235270733354)+($t5*0.0097765520400492)+($u0*0.0259846759422365)+($u1*-0.0430507927467189)+($u2*0.107107831659775)+($u3*0.0467301403971514)+($u4*0.0571975966844622)+($u5*-0.0079845822250066)+($v0*0.0303173561775128)+($v1*-0.0043169837441232)+($v2*0.0866140345320475)+($v3*0.00261036151061667)+($v4*0.00523185366643474)+($v5*-0.0239702999191261); // Similar codes for the rest 72 nodes have been omitted. This was done because it was a 40 page long code. The complete code is available online for reference $max=max((1/(1+(1/pow(2.718282,$Node0)))),(1/(1+(1/pow(2.718282,$Node1)))),(1/(1+(1/pow(2.718282,$Node2)))),(1/(1+(1/pow(2.718282,$Node3)))),(1/(1+(1/pow(2.718282,$Node4))))); if($max==(1/(1+(1/pow(2.718282,$Node0))))) echo "1"; else if($max==(1/(1+(1/pow(2.718282,$Node1))))) echo "2"; else if($max==(1/(1+(1/pow(2.718282,$Node2))))) echo "3"; else if($max==(1/(1+(1/pow(2.718282,$Node3))))) echo "4"; else if($max==(1/(1+(1/pow(2.718282,$Node4))))) echo "5"; } mysql_close($conn); ?>
top related