why model? make predictions or forecasts where we don’t have data

30
Why Model? Make predictions or forecasts where we don’t have data

Upload: lucinda-carson

Post on 03-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Model? Make predictions or forecasts where we don’t have data

Why Model?

• Make predictions or forecasts where we don’t have data

Page 2: Why Model? Make predictions or forecasts where we don’t have data

Linear Regression

wikipedia

Page 3: Why Model? Make predictions or forecasts where we don’t have data

Modeling Process

Observe

Define Theory/Type of Model

DesignExperiment

Collect Data

SelectModel

Evaluate the Model

Qualify Data

EstimateParameters

Publish Results

Page 4: Why Model? Make predictions or forecasts where we don’t have data

Bouncing Balls• Observation: balls bounce more when

dropped from higher height• Theory: there is a linear relationship

between the height of a drop and the number of bounces

people.rit.edu

Page 5: Why Model? Make predictions or forecasts where we don’t have data

Bounding Balls (con’t)

• Experimental Design?• Collect Data?• Qualify Data?• Select Model:

– Start with linear regression

Page 6: Why Model? Make predictions or forecasts where we don’t have data

Parameter Estimation

• Excel spreadsheet• X, Y columns• Add “trend line”

Page 7: Why Model? Make predictions or forecasts where we don’t have data

DefinitionsHorizontal axis: Used to create prediction– Independent variable– Predictor variable– Covariate– Explanatory variable– Control variable– Typically a raster– Examples:

• Temperature, aspect, SST, precipitation

Vertical axis: What we are trying to predict

– Dependent variable– Response variable– Measured value– Explained– Outcome– Typically an attribute

of points– Examples:

• Height, abundance, percent, diversity, …

Page 8: Why Model? Make predictions or forecasts where we don’t have data

Linear Regression: Assumptions• Predictors are error free• Linearity of response to predictors• Constant variance within and for all

predictors (homoscedasticity)• Independence of errors• Lack of multi-colinearity• Also:

– All points are equally important– Residuals are normally distributed (or close).

Page 9: Why Model? Make predictions or forecasts where we don’t have data

Linear Regression 

 

Page 10: Why Model? Make predictions or forecasts where we don’t have data

Normal Distribution

 

 

To positive infinity

To negativeinfinity

Page 11: Why Model? Make predictions or forecasts where we don’t have data

Linear Data Fitted w/Linear Model

Should be a diagonal line for normally distributed data

Page 12: Why Model? Make predictions or forecasts where we don’t have data

Non-Linear Data Fitted with a Linear Model

This shows the residuals are not normally distributed

Page 13: Why Model? Make predictions or forecasts where we don’t have data

Homoscedasticity

• Residuals have the same normal distribution throughout the range of the data

Page 14: Why Model? Make predictions or forecasts where we don’t have data

Ordinary Least Squares•  

Page 15: Why Model? Make predictions or forecasts where we don’t have data

Linear Regression

•  

 

 

Residual 

Page 16: Why Model? Make predictions or forecasts where we don’t have data

Parameter Estimation

•  

 

 

 

Page 17: Why Model? Make predictions or forecasts where we don’t have data

Evaluate the Model

•  

Page 18: Why Model? Make predictions or forecasts where we don’t have data

Evaluation

• Find the highest performing model in Excel for the golf ball data

• https://www.youtube.com/watch?v=fss3i1XMMIY

Page 19: Why Model? Make predictions or forecasts where we don’t have data

“Goodness of fit”

•  

Page 20: Why Model? Make predictions or forecasts where we don’t have data

 

y = 0.0024x + 0.4347R² = 0.0051

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25 30 35

Page 21: Why Model? Make predictions or forecasts where we don’t have data

 

y = 1.0029x + 0.4188R² = 0.999

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

Page 23: Why Model? Make predictions or forecasts where we don’t have data

Two Approaches

• Hypothesis Testing– Is a hypothesis supported or not?– What is the chance that what we are seeing

is random?• Which is the best model?

– Assumes the hypothesis is true (implied)– Model may or may not support the

hypothesis• Data mining

– Discouraged in spatial modeling– Can lead to erroneous conclusions

Page 24: Why Model? Make predictions or forecasts where we don’t have data

Significance (p-value)

• H0 – Null hypothesis (flat line)• Hypothesis – regression line not flat• The smaller the p-value, the more

evidence we have against H0 – Our hypothesis is probably true

• It is also a measure of how likely we are to get a certain sample result or a result “more extreme,” assuming H0 is true

• The chance the relationship is random

http://www.childrensmercy.org/stats/definitions/pvalue.htm

Page 25: Why Model? Make predictions or forecasts where we don’t have data

Confidence Intervals

• 95 percent of the time, values will fall within a 95% confidence interval

• Methods:– Moments (mean, variance)– Likelihood– Significance tests (p-values)– Bootstrapping

Page 26: Why Model? Make predictions or forecasts where we don’t have data

Model Evaluation

• Parameter sensitivity• Ground truthing• Uncertainty in data AND predictors

– Spatial– Temporal– Attributes/Measurements

• Alternative models• Alternative parameters

Page 28: Why Model? Make predictions or forecasts where we don’t have data

Robust models• Domain/scope is well defined• Data is well understood• Uncertainty is documented• Model can be tied to phenomenon• Model validated against other data• Sensitivity testing completed• Conclusions are within the domain/scope

or are “possibilities”• See:https

://www.youtube.com/watch?v=HuyMQ-S9jGs

Page 29: Why Model? Make predictions or forecasts where we don’t have data

Modeling Process II

Investigate

Find Data

SelectModel

Evaluate the Model

Qualify Data

EstimateParameters

Publish Results

Page 30: Why Model? Make predictions or forecasts where we don’t have data

Research Papers• Introduction

– Background– Goal

• Methods– Area of interest– Data “sources”– Modeling approaches– Evaluation methods

• Results– Figures– Tables– Summary results

• Discussion– What did you find?– Broader impacts– Related results

• Conclusion– Next steps

• Acknowledgements– Who helped?

• References– Include long URLs