background data statistical calculations results future areas for research questions

21
Bacterial Contamination in Texas Coastal Bays: Data Characterization James Seppi CE397 – Statistics in Water Resources Spring 2009

Upload: dennis-cummings

Post on 19-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Bacterial Contamination in Texas Coastal Bays:

Data Characterization

James SeppiCE397 – Statistics in Water Resources

Spring 2009

Page 2: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Background

CWA mandates classification of impaired water bodies.› Median fecal coliform concentration in bay

and gulf waters, exclusive of buffer zones, shall not exceed 14 colonies per 100 ml, with not more than 10% of all samples exceeding 43 colonies per 100 ml. - TAC, Title 30, Part 1, Chapter 307, Rule §307.7

Future work at the CRWR – modeling for determination of TMDL

Page 3: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Background - Bays

East Matagorda Bay Cedar Lakes Tres Palacios/Turtle

Bays Lavaca/Chocolate

Bays Cox Bay Crancahua Bay San Antonio/ Hynes/

Guadalupe Bays Copano Bay Matagorda Bay

Page 4: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Data

TCEQ Surface Water Quality Monitoring – accessible online

Fecal Colony Forming Units / 100 mL ~1972-2005 Detection Limit of 2 cfu/100mL Censored Data – “Less Thans”

› Ex: <10 cfu/100mL Measured at multiple stations per bay

Page 5: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Data

Page 6: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics - Project Goals

1) Confirm Data are LogNormally-Distributed

2) Calculate Median and 90th Percentiles› Calculate Confidence Intervals› For period of record, for last 5 years, and

for last 7 years

3) Calculate Prediction Intervals

Page 7: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics

How to deal with all the censored data and those at the detection limit?

Best method of estimation? Large data sets (mostly)

Bay n n.cen%

CensoredCedar Lake 65 3 4.62%Lavaca Bay 5839 2754 47.17%Copano Bay 1787 1266 70.84%Cox Bay 483 297 61.49%Crancahua Bay 1054 617 58.54%East Matagorda Bay 1668 1192 71.46%Matagorda Bay 2777 1632 58.77%San Antonio Bay 2742 1599 58.32%Tres Palacios/Turtle Bay 3777 2025 53.61%

Page 8: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics - NADA

Underused in the field, even though we have lots of nondetects in environmental data.

Very important!

Page 9: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics – NADA

Three approaches detailed› Substitution› Maximum Likelihood Estimation› Regression on Order Statistics

Page 10: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics – NADA

Three approaches detailed› Substitution› Maximum Likelihood Estimation› Regression on Order Statistics

Page 11: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics – NADA MLE

Three approaches detailed› Substitution

› Maximum Likelihood Estimation 50-80% censored data Large number of data points

› Regression on Order Statistics

Page 12: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics – NADA MLE

These don’t look so good… MLE might be overestimating SD

Bay Mean SDCedar Lake 158.67 1425.47Copano Bay 118.83 59419.63Cox Bay 97.36 17599.06Crancahua Bay 390.41 149704.98East Matagorda Bay 199.75 175014.38Lavaca Bay 392.71 57165.47Matagorda Bay 273.56 96400.48San Antonio Bay 77.24 5482.49Tres Palacios/Turtle Bay 265.00 45315.73

Page 13: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results – NADA MLE Plots

Page 14: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results – NADA MLEBay Median Lower Conf Upper ConfCedar Lake 17.55 10.18 30.26Copano Bay 0.24 0.17 0.33Cox Bay 0.54 0.34 0.86Crancahua Bay 1.02 0.76 1.36East Matagorda Bay 0.23 0.16 0.32Lavaca Bay 2.70 2.45 2.98Matagorda Bay 0.78 0.64 0.94San Antonio Bay 1.09 0.93 1.27Tres Palacios/Turtle Bay 1.55 1.36 1.77

Bay 90th Percentile Lower Conf Upper ConfCedar Lake 258.37 127.07 525.33Copano Bay 21.78 17.26 27.49Cox Bay 33.55 22.28 50.52Crancahua Bay 84.65 62.92 113.90East Matagorda Bay 25.51 19.84 32.80Lavaca Bay 154.04 137.29 172.82Matagorda Bay 62.55 52.17 74.99San Antonio Bay 45.89 39.27 53.62Tres Palacios/Turtle Bay 94.41 81.62 109.21

Page 15: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Statistics – NADA ROS

Three approaches detailed› Substitution› Maximum Likelihood Estimation

› [Robust] Regression on Order Statistics Regression equation on probability plot Use sample data where we have it Assume distribution only for censored data

Impute values for censored points Best for small data sets

Page 16: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results – NADA ROS Plots

Page 17: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results – NADA ROSBay Median Lower Conf Upper ConfCedar Lake 17.00 3.07 30.00Copano Bay 0.50 0.47 0.68Cox Bay 1.06 0.99 2.00Crancahua Bay 1.96 1.66 2.57East Matagorda Bay 0.62 0.56 0.81Lavaca Bay 4.00 4.00 5.00Matagorda Bay 1.47 1.59 2.12San Antonio Bay 2.05 1.99 2.51Tres Palacios/Turtle Bay 2.59 2.50 3.15

Bay 90th Percentile Lower Conf Upper ConfCedar Lake 306.00 110.00 920.00Copano Bay 23.00 20.00 33.00Cox Bay 33.00 23.00 70.00Crancahua Bay 79.00 70.00 130.00East Matagorda Bay 33.00 23.00 33.00Lavaca Bay 170.00 130.00 170.00Matagorda Bay 70.00 49.00 79.00San Antonio Bay 46.00 33.00 49.00Tres Palacios/Turtle Bay 110.00 79.00 130.00

Page 18: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results – Prediction Intervals

Bay Mean Lower PI Upper PICedar Lake 158.66759 158.14966 158.9488Copano Bay 118.83158 114.54587 121.1857Cox Bay 97.35657 94.49062 98.92486Crancahua Bay 390.40849 388.28908 391.5455East Matagorda Bay 199.75109 195.32916 202.1536Lavaca Bay 392.71178 391.40887 393.4091Matagorda Bay 273.56241 271.14292 274.8631San Antonio Bay 77.23869 75.22121 78.338Tres Palacios/Turtle Bay 265.00299 263.28773 265.9233

Prediction Interval – “bracket the range of locations for … observations not currently in the data set.”

Finding a value outside should happen only 1-0.95 = 5% of the time

Used MLE method to get params

Page 19: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Future Work

Repeat for last 5-years and last 7-years of data› Is water quality in bays

improving/declining? Use method/findings in Copano Bay

project to predict median/90th %ile given geomean from model

Look at spatial variation in each bay› Though regulation is not done this way

Page 20: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Thanks & Questions

Thanks to:› Stephanie Johnson› Grace Chen› Sammy Sandoval› Dr. Maidment

Page 21: Background  Data  Statistical Calculations  Results  Future Areas for Research  Questions

Results without NADABay Median Lower Conf Upper ConfCedar Lake 17 10 30Chocolate Bay 4 4 5Copano Bay 2 2 2Cox Bay 2 2 2Crancahua Bay 2 2 2East Matagorda 2 2 2Matagorda Bay 2 2 2San Antonio 2 2 2Tres Palacios 2 2 2

Bay 90th Percentile Lower Conf Upper ConfCedar Lake 350 110 920Chocolate Bay 170 130 170Copano Bay 23 20 33Cox Bay 33 23 70Crancahua Bay 79 70 130East Matagorda 33 23 33Matagorda Bay 70 49 79San Antonio 46 33 49Tres Palacios 110 79 130