applications of regression to water quality analysis unite 5: module 18, lecture 1

24
Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Upload: nathaniel-campbell

Post on 03-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Applications of Regression to Water Quality Analysis

Unite 5: Module 18, Lecture 1

Page 2: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s2

Statistics

A branch of mathematics dealing with the collection, analysis,interpretation and presentation of masses of numerical data Descriptive Statistics (Lecture 1)

Basic description of a variable Hypothesis Testing (Lecture 2)

Asks the question – is X different from Y? Predictions (Lecture 3)

What will happen if…

Page 3: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s3

Objectives

Introduce the basic concepts and assumptions of regression analysis Making predictions Correlation vs. causal relationships Applications of regression

Basic linear regression Assumptions Techniques

What if it is not linear: data transformations Water quality applications of regression analyses Survey of regression software

Page 4: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s4

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

Regression defined

A statistical technique to define the relationship between a response variable and one or more predictor variables

Here, fish length is a predictor variable (also called an “independent” variable.

Fish weight is the response variable

Page 5: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s5

Regression and correlation

Regression: Identify the relationship between a predictor and

response variables Correlation

Estimate the degree to which two variables vary together Does not express one variable as a function of the other No distinction between dependent and independent

variables Do not assume that one is the cause of the other Do typically assume that the two variable are both effects of

a common cause

Page 6: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s6

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

Basic linear regression

Assumes there is a straight-line relationship between a predictor (or independent) variable X and a response (or dependent) variable Y Equation for a line:

Y = mX + b

m – the slope coefficient(increase in Y per unit increase in X)

b – the constant or Y Intercept(value of Y when X=0)

Page 7: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s7

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

Basic linear regression

Assumes there is a straight-line relationship between a predictor (or independent) variable X and a response (or dependent) variable Y Regression analysis

finds the ‘best fit’ line that describes the dependence of Y on X

Page 8: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s8

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

Basic linear regression

Assumes there is a straight-line relationship between a predictor (or independent) variable X and a response (or dependent) variable Y Outputs of regression

Regression model

Y = mX + b

Weight = 4.48*Length + -28.722

Page 9: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s9

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

Basic linear regression

Assumes there is a straight-line relationship between a predictor (or independent) variable X and a response (or dependent) variable Y Outputs of regression

Regression modelY = mx + b

Weight = 4.48*Length + -28.722

Coefficient of Determination

R2 = 0.89

Page 10: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s10

0

5

10

15

20

25

30

35

40

45

5 7 9 11 13 15

Fish Length (in)

Fish

Wei

ght (

oz)

How good is the fit? The Coefficient of Determination

R2: The proportion of the total variation that is explained by the regression Coefficient of

determination R2 = 0.89 Ranges from 0.00 to 1.00

0.00 – No correlation 1.00 – Perfect correlation

no scatter around line

Page 11: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s11

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

•R2 = 0.08

0

10

20

30

40

50

60

70

80

0 0.2 0.4 0.6 0.8 1

•R2 = 0.54

Example coefficients of determination

Page 12: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s12

Four assumptions of linear regression-adapted from Sokal and Rohlf (1981)

The independent variable X is measured without error Under control of the investigator X’s are ‘fixed’

Page 13: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s13

Four assumptions of linear regression-adapted from Sokal and Rohlf (1981)

The independent variable X is measured without error Under control of the investigator X’s are ‘fixed’

The expected value for Y for a given value of X is described by the linear function Y = mX +b

Page 14: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s14

Four assumptions of linear regression-adapted from Sokal and Rohlf (1981)

The independent variable X is measured without error Under control of the investigator X’s are ‘fixed’

The expected value for Y for a given value of X is described by the standard linear function y = mx +b

For any value of X, the Y’s are independently and normally distributed Scan figure 14.4 from S&R

Page 15: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s15

Four assumptions of linear regression -adapted from Sokal and Rohlf (1981) The independent variable X is measured without error

Under control of the investigator X’s are ‘fixed’

The expected value for Y for a given value of X is described by the standard linear function y = mx +b

For any value of X, the Y’s are independently and normally distributed Scan figure 14.4 from S&R

The variance around the regression line is constant; variability of Y does not depend on value of X Extra credit word: the samples are homoscedastic

Page 16: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s16

It is often possible to ‘linearize’ data in order to use linear models

This is particularly true of exponential relationships

Data transformations: What if data are not linear?

Page 17: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s17

N

Applications: Standard curves for lab analyses

A classic use of regression: calibrate a lab instrument to predict some response variable – a “calibration curve”

In this example, absorbance from a spectrophotometer is measured from series of standards with fixed N concentrations.

Once the relationship between absorbance and concentration is established, measuring the absorbance of an unknown sample can be used to predict its N concentration

Page 18: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s18

The USGS has real time water quality monitors installed at several stream gaging sites in Kansas

Using regression to estimate stream nutrient and bacteria concentrations in streams

Page 19: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s19

Using regression to estimate stream nutrient and bacteria concentrations in streams: data flow

                                            

Page 20: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s20

Using Regression to estimate stream nutrient and bacteria concentrations in streams: Results

USGS developed a series of single or multiple regression models Total P = 0.000606*Turbidity + 0.186

R2=0.964 Total N = 0.0018*Turbidity + 0.0000940*Discharge + 1.08

R2=0.916 Total N = 0.000325 * Turbidity + 0.0214 * Temperature +

0.0000796*Conductance + 0.515R2=0.764

Fecal Coliform = 3.14 * Turbidity + 24.2R2=0.62

Page 21: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s21

Using Regression to estimate stream nutrient and bacteria concentrations in streams: Important Considerations

Explanatory variables were only included if they had a significant physical basis for their inclusion Water temperature is

correlated with season and therefore application of fertilizers

Conductance is inversely related to TN and TP, which tend to be high during high flow

Turbitidy is a measure of particulate matter – TN and TP are related to sediment loads

The USGS needed a separate model for each stream! The basins were different

enough that a general model could not be developed

By using the models with the real-time sensors, USGS can predict events, e.g. when fecal coliform concentrations exceed criteria

Page 22: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s22

Measured and regression estimated density

Page 23: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s23

Using regression to estimate stream nutrient and bacteria concentrations in streams: Important Considerations

Explanatory variables were only included if they had a significant physical basis for their inclusion Water temperature is

correlated with season and therefore application of fertilizers

Conductance is inversely related to TN and TP, which tend to be high during high flow

Turbitidy is a measure of particulate matter – TN and TP are related to sediment loads

The USGS needed a separate model for each stream! The basins were different

enough that a general model could not be developed

By using the models with the real-time sensors, USGS can predict events, e.g. when fecal coliform concentrations exceed criteria

Concentration estimates can be coupled with flow data to estimate nutrient loads

Finally, these regressions can be useful tools for estimating TMDL’s

Page 24: Applications of Regression to Water Quality Analysis Unite 5: Module 18, Lecture 1

Developed by: Host Updated: Jan. 21, 2003 U5-m18a-s24

Software for regression analyses

Any basic statistical package will do regressions SigmaStat Systat SAS

Excel and other spreadsheets also have regression functions Excel requires the Analysis Toolpack Add-in

Tools > Add-in > Analysis ToolPack