1 spreadsheet problem solving fitting models to data straight-line regression multilinear...

41
1 Spreadsheet Problem Solving fitting models to data straight-line regression multilinear regression nonlinear regression model building and selection Data Analysis Regression tool Trendline Solver using

Upload: duane-wilcox

Post on 22-Dec-2015

231 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

1

Spreadsheet Problem Solving fitting models to data

straight-line regression multilinear regression nonlinear regression

model building and selection Data Analysis Regression tool Trendline Solver

using

Page 2: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

2

Review of Straight-line Linear Regression[ from Class #6 ]

For each data point, there is an error between thatpoint and the model line. Fitting the model has to dowith minimizing these errors.

x

y

y = ax + by1

y11

e11

Model

x11

11y

Page 3: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

3

Finding the model parameters that give the best fit

For the straight-line model, the model parameters arethe slope (a) and the intercept (b).

The problem is then to find the values of a and b thatgive the best fit. What is meant by the best fit?

The standard measure of goodness of fit is the sumof squares of the errors:

n

2

i ii 1

ˆSSE y y

i iy a x b

So, the problem reduces to finding the minimum ofSSE by adjusting a and b.

Page 4: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

4

Fitting a straight-line model to data

The minimization of SSE can be solved by calculusto give formulas for the best values of a and b:

n n n

i i i ii 1 i 1 i 1

2n n2i i

i 1 i 1

n n

i ii 1 i 1

n x y x y

a

n x x

y xb a

n n

and Excel solves problems like this with either formulasor built-in tools (Data Analysis Regression & Trendline).

Page 5: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

5

Example: straight-line fit

Page 6: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

6

Transfer the data to an Excel spreadsheetand create a graph

CO2 Emissions for the US

1320

1340

1360

1380

1400

1420

1440

1460

1480

1500

1520

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

CO

2 E

mis

sio

ns

(MM

T C

)

Page 7: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

7

Calculating the slope and intercept using Excel formulas

n n n

i i i ii 1 i 1 i 1

2n n2i i

i 1 i 1

n n

i ii 1 i 1

n x y x y

a

n x x

y xb a

n n

Page 8: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

8

The formulas behind the numbers

Page 9: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

9

Using the model straight-line equation to computethe predictions:

and copy theseto the graph,displaying asa straight line

Page 10: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

10

CO2 Emissions for the US

1300

1350

1400

1450

1500

1550

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

CO

2 E

mis

sio

ns

(MM

T C

) y = 21.32x - 41090

Page 11: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

11

Using an alternate, shortcut approach Trendline

CO2 Emissions for the US

1320

1340

1360

1380

1400

1420

1440

1460

1480

1500

1520

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

CO

2 E

mis

sio

ns

(MM

T C

)

Start with a simple graph of the data

Select the data series byclicking on it

Right-click on adata point to getcontext-sensitivemenu

SelectAdd Trendlineoption

Page 12: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

12

The Add Trendline dialog box

Linear selectedby default

OK for thisproblem

Click onOptions tab

Page 13: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

13

Options tab

Set forDisplay equationon chart

Click OK

Page 14: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

14

CO2 Emissions for the US

y = 21.315x - 41090

1300

1350

1400

1450

1500

1550

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

CO

2 E

mis

sio

ns

(MM

T C

)Initial form of graph with straight-line added Fix up

equationdisplay

Page 15: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

15

CO2 Emissions for the US

y = 21.315x - 41090

1300

1350

1400

1450

1500

1550

1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Year

CO

2 E

mis

sio

ns

(MM

T C

)

Looks just like before, but we got there quicker

But neither of these approaches gives us much informationabout the model, how good it is, etc.

Page 16: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

16

A 2nd alternate approach Data Analysis Regression tool

recall that, if Data Analysisdoes not appear on the Toolsmenu, you will need to checkAnalysis Toolpak in the Add-insdialog box [if it’s not there, youwill have to go back to MicrosoftOffice/Excel set-up]

Tools Data Analysis

Initial, emptyRegressiondialog box

Page 17: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

17

Regression dialog box set up for our problem

checking Residualswill give us alsomodel predictions

Page 18: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

18

Initial (poorly formatted) Regression output display[ on new worksheet ]

and fix updisplay forappropriatesignificantfigures

Format

Autoformat

OK

Page 19: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

19

Final Display of Regression Output

[ tons of info, most of which you will not understand for a couple years ]

used to judgegoodness offit

interceptand slopevalues

used to judgewhether terms“belong” in themodel

add to data graphfor visual comparisonwith model

Page 20: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

20

Judging Goodness of Fit correlation coefficient: if closeto +1 or –1, indicates strongcorrelation between x and y[something we already knowfrom the original graph!]

coefficient of determination:%-age of the variability in ythat’s accounted for by themodel

adjustment to R2 thatpenalizes the value forusing a model with toomany terms

gives an idea of howfar off the modelpredictions will be

Adjusted R2 or Standard Error can be used to comparedifferent models and choose which fits best. The higherthe value of Adjusted R2 the better, the lower the valueof Standard Error the better.

Page 21: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

21

Judging whether terms belong in the model

P-values estimate the probabilitythat the true value of the coefficientcould be zero

P-values that are quite small, likethese, indicate that there is littlequestion about the significance ofthe term coefficients. In our casehere, that means that both theintercept term and the slope termbelong in the model.

A P-value of 5%(0.05) or greatercauses suspicionthat the coefficientmay not besignificant and thatthe term shouldprobably be droppedfrom the model

Page 22: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

22

The Data Analysis Regression tool appears much morecomplicated and involved that the shortcut Trendline tool, so . . .

Why use Data Analysis Regression?

1) It provides more information that let’s usjudge the goodness of fit and significanceof model terms

2) It can handle model forms that cannot be handled by Trendline

So, generally, when using Excel, we preferthe Data Analysis Regression tool over Trendline

but Trendline is still quite good for “quick and dirty”looks at the data

Learn to use both!

Page 23: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

23

More complicated models

Polynomial models2 3y a bx cx dx

General linear models

1 2 3 4y a f x b f x c f x d f x

Examples: polynomial models above

1y a b c ln x

x

Multilinear models

1 1 2 2 1 2 3 1 2y a f x ,x , b f x ,x , c f x ,x ,

Examples: 1 2 1 2y a bx cx dx x 1

2

x

xy a e

Note: it is called linear regression,even when there are nonlinearterms in x, because the terms arelinear in the model parameters,a, b, c, etc.

Page 24: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

24

Nonlinear models

Transformable to linear

b xy a e ln y ln a b x

Not transformable

BA

T CP 10

straight-lineregression!

We can use the Data Analysis Regression tool for everythingexcept the nonlinear models that can’t be transformed intolinear. For those, we can use the Solver.

Page 25: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

25

Example: polynomial regression

Viscosity of Water at Atmospheric Pressure

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

1.600

1.800

2.000

0 50 100 150 200 250

Temperature (degF)

Vis

cosi

ty (

cp)

curvature evident

Page 26: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

26

Setting up for polynomial fits

Select for quadratic model, etc

Page 27: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

27

Data Analysis Regression tool

check Labels becauseheadings are includedin selections for Y and X

checkResiduals

Page 28: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

28

Quadratic model regression results

copy to graph

model coefficients

model performanceadjR2

Page 29: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

29

Viscosity of Water at Atmospheric Pressure

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

1.600

1.800

2.000

0 50 100 150 200 250

Temperature (degF)

Vis

cosi

ty (

cp)

Data

Quadratic

Quadratic model really doesn’t “capture” behavior of data

Page 30: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

30

Continue with fits of cubic, 4th- & 5th-order polynomials

Summary of results

Looks like 5th-order offers best performance

but improvement is marginal over 4th-order.

Resulting model:

4 2 6 3 9 4Visc 3.161 0.05699 T 5.023 10 T 2.162 10 T 3.593 10 T

Page 31: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

31

Viscosity of Water at Atmospheric Pressure

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

1.600

1.800

2.000

20 40 60 80 100 120 140 160 180 200 220

Temperature (degF)

Vis

cosi

ty (

cp)

Data

Quadratic

Cubic

4th Order

Page 32: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

32

Precautions on polynomial fitting

Try to use the lowest-order model that gives a good fit.

Higher-order models will have “wiggles” between datapoints that will cause prediction errors.

In fact, an (n-1)th-order polynomial will provide a perfectfit to the n data points, but it will usually do bizarre thingsin between the data points.

Page 33: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

33

Example: multi-linear regression

X-input range includestwo independent variables:x1 and x2

Model 1: 1 2y a b x c x

Model 2: 1 2y b x c x

High P value for intercept inModel 1 suggests Model 2without intercept, but thereis a significant loss in adjR2

Page 34: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

34

Multilinear Model Performance

0.0

2.0

4.0

6.0

8.0

10.0

12.0

0 2 4 6 8 10 12

Measured y

Pre

dic

ted

y

Model 1

Model 2

Model performance isn’t thatgreat for either model, andModel 1 doesn’t appeardramatically better than Model 2

Note: for multi-linear models, we plot Predicted vs Measured y.A perfect model would place points directly on the 45-degree line.

Page 35: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

35

Nonlinear Regression

Fitting the parameters of the van der Waals’ equation of stateData for SO2

2

RT aP ˆ ˆV b V

Find the values of a and bthat give the best predictionsfor P, when compared to themeasured values of P

Page 36: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

36

Strategy for Nonlinear Regression

1) estimate initial values for a and b

2) compute predicted P’s using data for and TV

3) compute errors between predicted P’s and measured P’s

4) sum the squares of these errors to compute SSE

5) have the Solver minimize SSEby adjusting the values of a and b

Page 37: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

37

Basic data Calculated Pressure

Sum ofsquaresof thiscolumn

by both ideal gas lawand van der Waals

-

Page 38: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

38

Ideal GasCalculation

van der Waals Calculation

Error Calculation

Sum of SquaresCalculation

Page 39: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

39

Setting up Solver Parameters

SSE as Target CellMinimizeby adjusting a and bwith b>=0 constraint

Results

Page 40: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

40

Results

Page 41: 1 Spreadsheet Problem Solving  fitting models to data  straight-line regression  multilinear regression  nonlinear regression  model building and

41

Fit of van der Waals Eqn for SO2

and Comparison to Ideal Gas Law

0

2000000

4000000

6000000

8000000

10000000

12000000

0 2000000 4000000 6000000 8000000 10000000 12000000

Measured Pressure (Pa)

Pre

dic

ted

Pre

ssu

re (

Pa)

van der Waals

Ideal Gas

Note departure ofideal gas predictionsat higher pressures