digital spatial data for observed, predicted, and ...digital spatial data for observed, predicted,...

10
U.S. Department of the Interior U.S. Geological Survey Data Series 697 Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area National Water-Quality Assessment Program Observed arsenic concentration, micrograms per liter Less than 1.0 1.0 to 1.9 2.0 to 2.9 3.0 to 4.9 5.0 to 9.9 10 to 24 Equal to or greater than 25 EXPLANATION Arsenic Training Observations Trinity Trinity River River Eel River Eel River Russian River Russian River River River Sacramento Sacramento Feather River Feather River North North Yuba Yuba River River American River American River Truckee Truckee River River Mokelumne River Mokelumne River River River Sacramento Sacramento River River American American North Fork North Fork River River Tuolumne Tuolumne Merced River Merced River River River San Joaquin San Joaquin River River San Joaquin San Joaquin River River Kings Kings Salinas River Salinas River Kern River Kern River Quinn River Quinn River River River Humboldt Humboldt Reese River Reese River Sevier River Sevier River River River Bear Bear Jordan River Jordan River Meadow Valley Wash Meadow Valley Wash White River White River Pahranagat Wash Pahranagat Wash Muddy River Muddy River Amaragosa Amaragosa River River Verde River Verde River Salt River Salt River Colorado River Colorado River Gila River Gila River Mojave River Mojave River Rio Grande Rio Grande Rio Chama Rio Chama Rio Puerco Rio Puerco Owens River Owens River Grande Grande Rio Rio ARIZONA NEW MEXICO UTAH COLORADO MEXICO OREGON CALIFORNIA IDAHO WYOMING NEVADA 0 45 90 135 180 MILES 0 50 100 150 200 KILOMETERS 35° 40° 124° 117° 110° Study-area boundary Reno Reno Sacramento Sacramento San Francisco San Francisco Stockton Stockton Modesto Modesto San Jose San Jose Fresno Fresno Bakersfield Bakersfield Las Vegas Las Vegas Los Angeles Los Angeles San Diego San Diego Yuma Yuma Phoenix Phoenix Tucson Tucson El Paso El Paso Flagstaff Flagstaff Santa Fe Santa Fe Albuquerque Albuquerque Salt Lake City Salt Lake City U.S. Geological Survey digital data, 1:2,000,000, 2,500,000, and 5,000,000 scale, 2003, 2005, and 2006 National Elevation Data 1:24,000, 1999 Albers Equal Area Conic Projection, central meridian -113, NAD 83 Pacific Ocean Las Cruces Las Cruces

Upload: others

Post on 28-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

U.S. Department of the InteriorU.S. Geological Survey

Data Series 697

Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area

National Water-Quality Assessment Program

Observed arsenic concentration, micrograms per literLess than 1.01.0 to 1.92.0 to 2.93.0 to 4.95.0 to 9.910 to 24Equal to or greater than 25

EXPLANATION

Arsenic Training Observations

TrinityTrinity

RiverRiver

Eel RiverEel River

Russian River

Russian River

RiverRiver

Sacramento

Sacramento

Feath

er R

iver

Feath

er R

iver

NorthNorthYubaYuba River

River

American RiverAmerican River

TruckeeTruckee RiverRiver

Mokelumne River

Mokelumne RiverRiverRiver

Sacramento

Sacramento

RiverRiverAmericanAmerican

North

For

k

North

For

k

RiverRiverTuolumneTuolumne

Merced River

Merced River

RiverRiver

San JoaquinSan Joaquin

RiverRiver

San Joaquin

San Joaquin

River

River

KingsKings

Salinas River

Salinas River

Kern River

Kern River

Quinn R

iver

Quinn R

iver

RiverRiver

Humboldt

Humboldt

Rees

e Rive

r

Rees

e Rive

r

Sevier R

iver

Sevier R

iver

River

River

Bear

Bear

Jordan RiverJordan River

Mea

dow

Valle

y Was

h

Mea

dow

Valle

y Was

hW

hite RiverW

hite River

Pahranagat Wash

Pahranagat Wash

Muddy

RiverM

uddy

River

Amaragosa

Amaragosa

RiverRiver

Verde River

Verde River

Salt RiverSalt River

Colorado River

Colorado River

Gila RiverGila River

Mojave River

Mojave River

Rio Grande

Rio Grande

RioChama

RioChama

Rio PuercoRio Puerco

Owens River

Owens River

GrandeGrande

RioRio

ARIZONA NEW MEXICO

UTAH COLORADO

MEXICO

OREGON

CALIFORNIA

IDAHO

WYOMINGNEVADA

0 45 90 135 180 MILES

0 50 100 150 200 KILOMETERS

35°

40°

124°

117°

110°

Study-areaboundary

RenoReno

SacramentoSacramento

SanFrancisco

SanFrancisco

StocktonStockton

ModestoModesto

San JoseSan Jose

FresnoFresno

BakersfieldBakersfield

Las VegasLas Vegas

Los AngelesLos Angeles

San DiegoSan DiegoYumaYuma

PhoenixPhoenix

TucsonTucson

El PasoEl Paso

FlagstaffFlagstaff

Santa FeSanta Fe

AlbuquerqueAlbuquerque

Salt Lake CitySalt Lake City

U.S. Geological Survey digital data, 1:2,000,000, 2,500,000, and 5,000,000 scale, 2003, 2005, and 2006National Elevation Data 1:24,000, 1999Albers Equal Area Conic Projection, central meridian -113, NAD 83

Pacific Ocean

LasCrucesLasCruces

Page 2: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate
Page 3: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area

By Tim S. McKinney and David W. Anning

National Water-Quality Assessment Program

Data Series 697

U.S. Department of the InteriorU.S. Geological Survey

Page 4: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

U.S. Department of the InteriorKEN SALAZAR, Secretary

U.S. Geological SurveyMarcia K. McNutt, Director

U.S. Geological Survey, Reston, Virginia: 2012

For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment, visit http://www.usgs.gov or call 1–888–ASK–USGS.

For an overview of USGS information products, including maps, imagery, and publications, visit http://www.usgs.gov/pubprod

To order this and other USGS information products, visit http://store.usgs.gov

Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Although this report is in the public domain, permission must be secured from the individual copyright owners to reproduce any copyrighted materials contained within this report.

Suggested citation:McKinney, T.S. and Anning, D.W., 2012, Digital spatial data for observed, predicted, and misclassification errors for observations in the training dataset for nitrate and arsenic concentrations in basin-fill aquifers in the Southwest Principal Aquifers study area: U.S. Geological Survey Data Series Report 697, 2 p. Available at http://pubs.usgs.gov/ds/697.

Page 5: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

iii

ContentsAbstract ......................................................................................................................................................... 1Background ................................................................................................................................................... 1Development of Classifiers ......................................................................................................................... 1Supplemental Information .......................................................................................................................... 1References Cited .......................................................................................................................................... 2

Page 6: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

iv

Page 7: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area

AbstractThis product “Digital spatial data for observed, predicted,

and misclassification errors for observations in the training dataset for nitrate and arsenic concentrations in basin-fill aquifers in the Southwest Principal Aquifers study area” is a 1:250,000-scale point spatial dataset developed as part of a regional Southwest Principal Aquifers (SWPA) study (Anning and others, 2012). The study examined the vulnerability of basin-fill aquifers in the southwestern United States to nitrate contamination and arsenic enrichment. Statistical models were developed by using the random forest classifier algorithm to predict concentrations of nitrate and arsenic across a model grid that represents local- and basin-scale measures of source, aquifer susceptibility, and geochemical conditions.

BackgroundThis dataset was developed as part of a regional Southwest

Principal Aquifers (SWPA) study and for the National Water-Quality Assessment (NAWQA) Program. The dataset repre-sents observed nitrate and arsenic concentrations that were used to train the confirmatory and prediction classifiers.

Development of ClassifiersSeparate classifiers were developed for nitrate and arsenic

because each constituent was expected to be affected by a different set of factors, and each factor could have a different magnitude or directional influence (increase/decrease) on con-centration. For each constituent, two different classifiers were developed: a prediction classifier and a confirmatory classi-fier. The prediction classifiers were developed specifically to predict nitrate and arsenic concentrations in basin-fill aquifers across the SWPA study area and were based on explanatory variables representing source and susceptibility conditions. These explanatory variables were available throughout the entire SWPA study area and, therefore, did not pose a limita-tion for using the classifiers to predict concentrations.

The confirmatory classifiers were developed to supplement the prediction classifiers in the evaluation of the conceptual model. The name, “confirmatory,” reflects the classifier’s pur-pose for evaluation of a-priori hypotheses and contrasts other general types of statistical models, such as those used for pre-diction or exploratory purposes. The confirmatory classifiers included the explanatory variables used in the prediction clas-sifiers, as well as additional variables representing geochemi-cal conditions and basin groundwater budget components. The inclusion of the geochemical and basin groundwater budget variables in the confirmatory classifiers allowed for further evaluation of the conceptual models, which was not possible with the prediction classifiers alone. The geochemical data, however, were only available at specific well locations, and consistent water-budget data were not available for every basin in the study area. The limited availability of the data for these variables constrained the confirmatory classifiers to obser-vations from 16 case-study basins and precluded use of the confirmatory classifier for predicting concentrations across the SWPA study area. To contrast the scope of the two classifiers, the confirmatory classifiers were developed by using all avail-able explanatory variables but with observations restricted to the 16 case-study basins, whereas the prediction classifiers were unrestricted with respect to spatial extent because these were developed by using a subset of the explanatory variables that were available throughout the study area.

Supplemental InformationThe nitrate and arsenic observations, predictions, and

misclassification error data for observations are part of a larger dataset of explanatory variables and model input data. Those data are archived in tabular form along with the model report. The explanatory data may be joined to this dataset via the site identifier. For more information on the development of the explanatory variables, see “Compilation and Processing of Explanatory Variables” in the “Approach and Methods” sec-tion of Anning and others (2012).

The digital dataset can be downloaded from the USGS at http://water.usgs.gov/lookup/getspatial?ds2012-697_SWPA_NO3_As_training

Page 8: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

2 Digital Spatial Data for Observed, Predicted, and Misclassification Errors

References Cited

Anning, D.W., Paul, A.P., McKinney, T.S., Huntington, J.M., Bexfield, L.M., and Thiros, S.A., 2012, Predicted nitrate and arsenic concentrations in basin-fill aquifers of the south-western United States: U.S. Geological Survey Scientific Investigations Report 2012–5065, 78 p. Available at http://pubs.usgs.gov/sir/2012/5065.

Page 9: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate
Page 10: Digital Spatial Data for Observed, Predicted, and ...Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate

McKinney and Anning—

Digital Spatial D

ata for Observed, Predicted, and M

isclassification Errors for Observations in the Training D

ataset for Nitrate and A

rsenic Concentrations in B

asin-Fill Aquifers in the Southw

est Principal Aquifers Study A

rea—Data Series 697