make predictions from your data using

52
Workshop: Make Predictions From Your Data Using R. UTM 15 July 2019 Workshop Instructors: Dr. Norhaiza Ahmad . Dr. Noraslinda M. Ismail . Dr. Shariffah Suhaila Syed Jamaludin Make Predictions from your Data using PART A: R INTRO Dr. Norhaiza Ahmad Department of Mathematical Sciences Faculty of Science Universiti Teknologi Malaysia http://people.utm.my/norhaiza/

Upload: others

Post on 06-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

Workshop Instructors: Dr. Norhaiza Ahmad . Dr. Noraslinda M. Ismail . Dr. Shariffah Suhaila Syed Jamaludin

Make Predictions from your Data using

PART A: R INTRODr. Norhaiza Ahmad

Department of Mathematical SciencesFaculty of Science

Universiti Teknologi Malaysia

http://people.utm.my/norhaiza/

Page 2: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Start your R Session on the PC in this Lab!

1. Go to Desktop2. Click Folder: Mathematics Software or Math

Software 3. Click Folder: R4. There are three R applications:

i. R i386 3.4.0ii. R x64 3.4.0iii. RStudio

2

ChooseRStudio

Otherwise go to the START button and search for R studio

Page 3: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Workshop Schedule: Make Predictions from your Data using R

PART ITEM DETAILS

PART A Intro toPredictiveLinearModel

IntroductiontoModelling- theory&terminology. Datastructure.

RQuickie About R.Rbase .BasicRSyntax.RstudioInterface.Packages.Help

PART B PredictiveLinearModelforContinuousData(Response)

(i)Basic RegressionModel,SinglePredictorVariable

(ii)MultipleLinearRegression,ModelAssessment

PARTC PredictiveLinearModelforCountsData(Response)

IntrotoGeneralised LinearModel

Page 4: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza4

DataBigDatavsSmallData

Extract Information From Data

• Classification• Discrimination• Comparison• Relationship

•Prediction

Extractinformation

Page 5: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

Example:PredictingBodyMassIndex

5

DatabaseofindividualsVariablesmeasured:height,weight,body fat

HowtouseBMItopredictbodyfatpercentage?

Page 6: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Predictive Modelling

Predictive modeling is a technique that uses mathematical and computational methods to predict or forecast an event or outcome.

Page 7: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Types of Predictive Models

Parametric

Models

Non-ParametricModels

• modelsuseafixednumberofparameters

• basedonanunderlyingprobabilisticmodel

• modelsdonothaveafixednumberofparameters

• basedonanunderlyingprobabilisticmodel

• eg.DecisionTreesetc.

This workshop focusses on:Regression Modeli.e linear regression model

Page 8: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza8

General modeling framework

Page 9: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Wewillbeusingadatacalled

9

• House sale prices for King County, Washington USA, which includes Seattle. • It includes homes sold between May 2014 and May 2015. • 21613observations.21variables

id 1 2 3 4 5 6 21613

date 20141013 20141209 20150225 20141209 20150218 20140512 .. .. .. 20141015price 221900 538000 180000 604000 510000 1225000 .. .. .. 325000bedrooms 3 3 2 4 3 4 .. .. .. 2bathrooms 1 2.25 1 3 2 4.5 .. .. .. 0.75sqft_living 1180 2570 770 1960 1680 5420 .. .. .. 1020sqft_lot 5650 7242 10000 5000 8080 101930 .. .. .. 1076floors 1 2 1 1 1 1 .. .. .. 2waterfront 0 0 0 0 0 0 .. .. .. 0view 0 0 0 0 0 0 .. .. .. 0condition 3 3 3 5 3 3 .. .. .. 3grade 7 7 6 7 8 11 .. .. .. 7sqft_above 1180 2170 770 1050 1680 3890 .. .. .. 1020sqft_basement 0 400 0 910 0 1530 .. .. .. 0yr_built 1955 1951 1933 1965 1987 2001 .. .. .. 2008yr_renovated 0 1991 0 0 0 0 .. .. .. 0zipcode 98178 98125 98028 98136 98074 98053 .. .. .. 98144lat 47.5112 47.721 47.7379 47.5208 47.6168 47.6561 .. .. .. 47.5941long -122.257 -122.319 -122.233 -122.393 -122.045 -122.005 .. .. .. -122.299sqft_living15 1340 1690 2720 1360 1800 4760 .. .. .. 1020sqft_lot15 5650 7639 8062 5000 7503 101930 .. .. .. 1357

Consideradatasetcalledhouse_prices

Question: Can we predict the sale price of houses based on their features?

Page 10: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Consideronlycertainfeaturesofthehouses:sqft_living, condition, bedrooms, yr_built, waterfront

id 1 2 3 4 5 6 21613

price 221900 538000 180000 604000 510000 1225000 .. .. .. 325000bedrooms 3 3 2 4 3 4 .. .. .. 2sqft_living 1180 2570 770 1960 1680 5420 .. .. .. 1020waterfront 0 0 0 0 0 0 .. .. .. 0condition 3 3 3 5 3 3 .. .. .. 3yr_built 1955 1951 1933 1965 1987 2001 .. .. .. 2008

10

𝑦 = 𝑓 𝑥 + 𝜀Housesaleprices(in$)

squarefeet ,condition,bedrooms,yearhousewasbuilt,waterfront

Canwepredictthesalepriceofhousesbasedonthefeaturesofhouses?

Response/Outcomevariableofinterest explanatory/predictorvariables

Page 11: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Background: General modeling framework formula

11

𝑦 = 𝑓 𝑥 + 𝜀

Response/Outcomevariableofinterest

explanatory/predictorvariables

where𝑦:outcomevariableofinterest𝑥 explanatory/predictorvariable(s)𝑓 :functionoftherelationshipbetweenyand𝑥 𝜀:unsystematicerrorcomponenti.e.noise

Page 12: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

The modelling problem

12

𝑦 = 𝑓 𝑥 + 𝜀

Response/Outcomevariableofinterest

explanatory/predictorvariables

consider

functionoftherelationshipbetweenyand▁𝑥

unknown

unknown

known.givenbydata(nobservations)

known.givenbydata(nobservations)

Aim: 1. Fitamodel 𝑓()* that approximates𝑓() whileignoring𝜀.è Separatesignalnoise

2. Generate fitted/predicted values𝑦, = 𝑓()*

Page 13: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Fitted (Predicted) Linear Regression Model

𝑦 = 𝑓 𝑥 + 𝜀Generalmodellingi.e theobservedvalueofyis

𝑦, = 𝑓(𝑥)-Fitted (Predicted) Linear Regression Model is given by

ToFitaLinearRegressionModel,weestimate𝑓 . by𝑓(. )*

Thus,

Page 14: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Recall: Types of Data

Data

Qualitative

Nominal Ordinal

Quantitative

Discrete Continuous

IntervalCategorical

CountsFrequency

Rank

Page 15: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

SCOPE OF WORKSHOPData

Prediction

UsingLinearRegressionModel

ContinuousData

SimpleLinearRegression

MultipleLinear

Regression

DiscreteData

CountsData

GeneralizedLinearModel

UsingRSoftware

ResponseVariable

ResponseVariable

Page 16: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza16

About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand

Packages5. Saving&Quitting

Page 17: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

About R?

17

WhyR?

FREE

LargeCommunityofRUsers

Growsintandemwith

Development in

Statistics

Economicsustainability

Applicationcodes- latestresearchworkarelikelytobeavailabletouse.

Helpavailable:

Extensivehelpdocumentationinsystem

Justask!OrBrowse inarchiveQ&A

• A computer language, with orientation towards statistical applications

• Open-sourced software -non-commercial- FREE§ open exchange,

publicly accessible Community-oriented software

• Origin in academics:§ solid foundation of core

statistical and numerical algorithms and continues to grow to this end.

Page 18: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

How to download R?

18

http://www.r-project.org/ Rbase

Page 19: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

PointstoNote

19

• WhenyoudownloadandinstallR,youaredownloadingandinstallingbaseRandselectedpackages.

OnceRdownloadiscomplete:§ anRIconwillappearonyourdesktop.

Page 20: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Main Interface Options -R Users

20

Rbase

RCommander

RStudio

Page 21: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza21

ProceduretodownloadRStudio onyourownPC(1) MakesureyouhavealreadydownloadedRbase

(2)DownloadthefreeversionofRStudio athttps://www.rstudio.com/products/rstudio/download/

How to download R Studio?

Page 22: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

RInterface:RbasevsRStudio

22

Rbase RStudio

Page 23: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

1. GotoDesktop2. ClickFolder:MathematicsSoftwareorMath

Software3. ClickFolder:R4. ChooseRStudio

23

OtherwisegototheSTARTbuttonandsearchforRStudio

ProceduretodownloadRStudio onyourownPC(1)MakesureyouhavealreadydownloadedRbase(2)DownloadthefreeversionofRStudio athttps://www.rstudio.com/products/rstudio/download/

Start your R Studio Session!

Page 24: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza24

About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand

Packages5. Saving&Quitting

RStudio• NavigatePanels• Entering&ExecutingCommands-v Rconsolev Rscript

• functionsinR&Help• Rpackages

Page 25: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

NavigatingRstudio:ThePanels

ConsolePanel

ViewerPanel

EnvironmentPanelSourcePanel

Panelsonright:maintain

theworkingenvironment

Panelsonleft:RunCodes/Commands

4Panelsshown

writeyourcommandshere(likeanotepad)

writeyourcommands

aftertheprompt>

Page 26: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza26

ASIDE:NavigatingRStudio

AttheRibbon,ChooseTabViewà Panes

Youcanspecifywhichpaneltodisplay

Ordisplayaparticularpane,eg.TodisplayConsoleonly

AttheRibbon,ChooseTabViewà Panesà ZoomConsole

Toreturndisplayofallpanels

AttheRibbon,ChooseTabViewà Panesà ShowAllPanes

Page 27: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

How to Run & Execute R commands?

27

Method1 Method

2R-CONSOLE R-SCRIPT

Method2

- Commandsareexecuted- Likearoughpaper- Notconvenientforcode

storage

Page 28: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

> 6+3[1] 9

28

• WriteyourRcommandsaftereachRprompt (>)• HitEnter toexecutecommand

• Otheroperators:+,-,*,/

Output

Write6+3

TypethefollowingontheRconsolePanel

How to Run commands in R:Method 1: R Console

28ConsolePanel

ViewerPanel

entryindex1

Page 29: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

HandyTip

29

• Usetheupanddownarrowkeys-

TorecallpreviousRlinecommandsintheconsole

Handy Tips

Page 30: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Making Comments in R

30

• CommentlinesinRaredenotedby“#”• Anylinesthatiswrittenafter“#”willnotbereadasanRcommand

• Comment linesareusefulformakingnotation/notesinyourprogram

> 31 %% 7 #remainder after division of 31 by 7

Page 31: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

R Object Assignment

31

# use symbol ‘=‘ or ‘<-’ for assignment

• Risanobjectorientedprogram.• #Eachinput/outputcanbeassigned/storedtoanobject#case-sensitive

• OnceanRobjectisassigned,itcanbecalleduponatanytimeaslongasitissaved.Herethenumber2isstoredinanobjectcalled“x”.

> x = 2> x > y <-2> y

# CALL UP the r-object to display results

(if required)

Page 32: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza32

# Use ‘;’ to combine line commands

• Ingeneral,RobjectsarestoredinanRworkspace,alsoknownastheglobalenvironment.

> x = 2; x > len = 2; len> x=2; len=2;x+2

R Object Assignment

Page 33: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

HandyTip2

33

• Usebracketsaroundassignment– toautomaticallycallupstoredobject

# Use ‘()’ to auto call up assignment

> (x=2)> (len=2)

Handy Tips

Page 34: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Important Rules on object assignment

34

• Variablenamesarecasesensitive• Noblanksinname

(canuse_or.tojoinwords,butnot-)• Startwithaletter(capitalorlower-case)

Page 35: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

How to Run & Execute R commands?

35

Method1 Method

2R-CONSOLE R-SCRIPT

Method2

- Commandsareexecuted- Likearoughpaper- Notconvenientforcode

storage

Ascriptfile:filewhereyoucantypeyourcommandsandrunthemontheconsoleatyourownconvenience.Itis

similartoanotepad/textfile!

Page 36: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza36

How to Run commands in R:Method 2: R Script

Nowopenascriptfile.

AttheRibbon,ChooseTabFileà NewFileà RScript

Thisisanewscript“Untitled1”.Youcannameitandsavelikeanotepad

Page 37: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

Writing and executing commands from an R Script

37

# This is my first Rscript6+36-3

Step2:Toruncommandsonascriptfile:

Placethecursorononthelineyouwanttoexecute.Presstherunbuttononthetab.èTheresultsofexecutingtheselineswillappearintheConsole.

Step1:TypetheseinyourRscript file.Note:Thereisno“>“promptinascriptfile

Step3:Saveyourscriptfileinyourpreferreddrive/folder.èsimilartosavinganotepad/textfile

Page 38: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

HandyTip• YoucanalsotypetheRCodesintheScriptfile(Sourcepanel)andexecuteeachlinebypressingCmd + enter.(alternativetopressing RUNicon ontheSourcePanel)

• ItisadvisabletowriteyourcodeintheRScriptfile(sourcepanel),sothatyouwillbeabletosaveyourworkattheendofyourcodingsession.

38

Page 39: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza39

About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand

Packages5. Saving&Quitting

RStudio• NavigatePanels• Entering&ExecutingCommands-v Rconsolev Rscript

• functionsinR&Help• Rpackages

Page 40: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

40

\

PackagesinR=BRAINofR.

WhenyoudownloadandinstallRforthefirsttime,- youwillbeautomatically downloadingandinstalling:base packageandselectedpackages(fromCRAN).

packageBase

SelectedPackages

Add-OnPackages

HowRworks:AnatomyofR

Otherpackagescanbeaddedontoowhenrequired!

• R has many codes for many inbuilt functions, datasets & Help documentations

• These are contained in ‘Packages’ developed by the R-team and the community

Page 41: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

Package‘base’

Containsvarious:

• Functions• Datasets• Help

Manymorepackageswithdifferentcapabilities!

Package‘stats’

Containsvarious:

• Functions• Datasets• Help

…..

Package‘ggplot2’

Containsvarious:

• Functions• Datasets• Help

HowRworks:AnatomyofRExamples of ‘Packages’ in R

BasePackage• Corepackage• AutomaticallyinstalledwhenyoudownloadR

gglot2 Package• FancyDataVisualization

stats Package• Generalstatisticalapplications

Page 42: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

42

• R Packages are stored in certain Repositories:

HowRworks:AnatomyofR

Package‘base’

Containsvarious:

• Functions• Datasets• Help

CRAN

ROfficial/Default

….. Manyotherpackages

Bioconductor

Rspecifictobioinformatics

GitHubOtherrepositories:NotRbutRepository formanyopensourcedprojects

Package‘…’

Containsvarious:

• Functions• Datasets• Help

….. Manyotherpackages

Rforge

Includedevelopmentversionsofpackages

Package‘…’

Containsvarious:

• Functions• Datasets• Help

….. Manyotherpackages

Page 43: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

FindingPackagesinRStudio

GoToViewerPanel.PressthetabPackages.ThelistofpackagesalreadyinstalledinyourPCappears.Usethesearchfinder tofindaspecificpackagename

TheseareallthepackagesthatareavailableinyourR

ToseeallthepackagesalreadyinstalledinyourPC

Findpackages

Page 44: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

InstallingPackagesinRStudio

GoToViewerPanel.PressthetabPackages.

1. ClickInstall2. APop-up

windowappear3. Writethename

ofthepackageinthepop-upboxExample:

abind4. PressInstall

ToinstallthepackagesnotavailableonyourRSession

NoticethatthepackagethatyouhaveinstalledisnowcontainedinthelistofpackagesofyourRsession

Page 45: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

LoadingPackagesinRStudio

1. Findthepackagethatyouwanttouse

2. Tickontheboxnexttothenameofthepackageeg.abind

Toload(ie.use)apackageinRStudio

(Alternatively:manuallywriteattheRprompt>library(”nameofpackage”)

GoToViewerPanel.PressthetabPackages.

Noticethat:> library(abind) isshownintheRconsole

Page 46: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

Task

46

Inthefollowingpartoftheworkshop,wewillusespecificpackagesinRinthisworkshop:

CheckifthesepackagesareavailableintheRstudio ofyourPC:

• moderndive• dplyr• ggplot2• tidyverse• MASS• glm2

Iftheyarenotinthelistofpackagesè Installthepackage(s)Iftheyarealreadyinthelistofpackagesè checkiftheyareloadedinthesystem,otherwiseloadthemup.

Page 47: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

R functions• Rcommandsareexecuted byrunningafunction• A functionisaname,whichistypedfollowedbyapairof

brackets.Argumentsareaddedinsidethebrackets> sqrt(2)> sin(pi)

• SometimesfunctionsinRhaveextraargument.> sum(2,3,5)> log(10,10)

• Rhasmanyin-builtfunctions.• Thesefunctionsarecontainedinspecificpackages• Thesefunctionscanalsobecombinedand

programmedmanually• Asfunctionsarebeingbuiltandcontributedallthe

timeè useHELPinRto(a)knowwhichpackagetheyarein(b)knowhowtousethem 47

Page 48: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza48

Getting help for R functions: in R StudioGoToViewerPanel.PresstabHelp

Page 49: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

HelpfunctionsinRStudio

49

Task:Searchforlinear model

Getting help for R functions: in R StudioAttheViewerPanel-HelpTab,usekeywordsearchtofindthefunction&relatedpackage

ggplot2:fortify.lm

Page 50: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza50

About R: a quickie1. About R2. Interfaceoptions forR3. RStudio4. AnatomyofR:Functionsand

Packages5. Saving&Quitting

Page 51: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

SavingStuff&Exiting

51

• Bringsuptothisscreen:• makeitapracticetosave

yourworkonascriptfile

• Unlessyouneedtorecallcertainobjectsregularly,wedonotneedtosavetheworkspace.

• Workspaceisyourcurrentworkingenvironment.Thisincludesallthefunctions, objectsetc thatyouhavecreatedinthatsession.

• Toexit:• hitX(top rightcorner)

oratRprompt

>quit ( )

SAVING EXITING

Page 52: Make Predictions from your Data using

Workshop:MakePredictions FromYourDataUsingR.UTM15July2019

@haiza

NEXT

•PARTB(i):PredictiveLinearModelforContinuousData(Response)- singlepredictorvariable

52