linear regression in r

31
www.edureka.co/r-for-analytics Linear Regression in R

Upload: edureka

Post on 25-Jan-2017

570 views

Category:

Technology


0 download

TRANSCRIPT

www.edureka.co/r-for-analytics

Linear Regression in R

www.edureka.co/r-for-analytics

What will you learn today?

What is Linear Regression ?

How to Design a Linear Regression Model ?

How to Compare Regression Models ?

Hands-On : Linear Regression in R

www.edureka.co/r-for-analytics

Problem

Lets assume you are an owner of a restaurant where “tips” are part of a waiter’s pay. The amount of tip depends on the amount of the total bill.

Lets see how we can predict the amount of tip

from the bill using Linear Regression

www.edureka.co/r-for-analytics

Predicting the Tip

Suppose you don’t have the data for the amount of bill, so only data that you have is the tip amount for the order as shown below.

For first meal order waiter got 5$ as tip, for second meal order waiter got 17$ as tip as shown above

www.edureka.co/r-for-analytics

Lets Visualize the data that we have

www.edureka.co/r-for-analytics

How to predict the next tip?

How can I predict the

tip amount ?

www.edureka.co/r-for-analytics

How to predict the next tip?

Since only data we have is the tip amount, all we can do is take a mean of the tip amount.

www.edureka.co/r-for-analytics

Conclusion

So the best estimate that we can do for the tip amount from the data that we have is 10$, which is the mean of all the tip amounts

Mean= 5$+17$+11$+8$+14$+5$

6

=10$

Note that when you have only one variable and no other information, the best prediction that can be made is the mean of the sample data itself

www.edureka.co/r-for-analytics

Residuals (Errors)

The deviation between actual and estimated value is called residuals or errors

www.edureka.co/r-for-analytics

Residuals (Errors)

Note that sum of the residuals is always zero. So if you add up all the positive and negative deviation you will get zero. In other words, amount of positive and negative deviation is always the same

www.edureka.co/r-for-analytics

Sum of Square of Residuals (Errors)

Note that sum of squared errors (SSE) is 120

www.edureka.co/r-for-analytics

Why Square the Residuals ?

What do we get from

squaring the residuals ?

www.edureka.co/r-for-analytics

Key Points

By squaring the residuals(errors) we achieve following :

It emphasizes the deviation and make it more obvious

It helps in comparing different analysis models

The goal of linear regression is to create a linear model which minimizes the sum of square of residuals/errors SSE

www.edureka.co/r-for-analytics

Improving the Current Model

The tip of the waiter depends on the amount of the bill.

Till now we were just using the value of previous tips to estimate the value of next tip.

Next we will design a linear regression model which will estimate the amount of tip depending on billing amount.

www.edureka.co/r-for-analytics

Lets Visualize the data that we have

Note that Tip amount is dependent variable which depends on Bill amount and Bill amount is independent variable

www.edureka.co/r-for-analytics

Linear Regression

Note that in linear regression the value of dependent variable (e.g. tip amount) is the mean of values, not just a single value

Linear Regression Equation

www.edureka.co/r-for-analytics

Linear Regression Types

A linear regression model with narrow distribution is much better than a model with broad distribution

Narrow Distribution Broad Distribution

www.edureka.co/r-for-analytics

Linear Regression – a closer look

To draw a linear regression line we would need value of slope (b1) and value of interceptor (b0) as shown below :

www.edureka.co/r-for-analytics

Linear Regression - Calculation

www.edureka.co/r-for-analytics

Linear Regression – Calculating Slope

Value of slope (b1) is 0.1462 as calculated below :

www.edureka.co/r-for-analytics

Linear Regression – Calculating Y Intercept

Value of Y intercept (b0) is -0.8188 as calculated below :

www.edureka.co/r-for-analytics

Linear Regression – Putting the values

Lets put the values of slope and Y intercept into the Linear Regression equation

www.edureka.co/r-for-analytics

Linear Regression – Predicting Tip amount

Lets calculate the predicted tip amount

www.edureka.co/r-for-analytics

Linear Regression – Drawing the regression line

www.edureka.co/r-for-analytics

Linear Regression – Calculating Residuals

Lets calculate the residuals (errors)

www.edureka.co/r-for-analytics

Linear Regression – Regression line with residuals

www.edureka.co/r-for-analytics

Linear Regression – Squaring the residuals (errors)

Lets calculate the sum of square of residuals

www.edureka.co/r-for-analytics

Summing it up - Comparison

As shown,

Second approach provides better estimate as it decreases the sum of squared errors (SSE)

www.edureka.co/r-for-analytics

Hands-onLinear Regression in R

www.edureka.co/r-for-analytics

Survey

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

www.edureka.co/r-for-analytics

Thank You …

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours