![Page 1: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/1.jpg)
Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research Center
Ed BullinerU.S. Geological Survey, Columbia Environmental Research Center
![Page 2: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/2.jpg)
Goals of Presentation
• How are the data available to us different than the past?
• What different approaches are needed to analyze these data?
• What questions are we asking and answering that we could not before?
• ‘Big river science’ – four examples• How does this relate to NRDAR/ecological
restoration?
![Page 3: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/3.jpg)
“Big Data”
• What is “big data”?• Emerging field• Several definitions – volume, variety,
variability• Do we work with ‘big data’ or ‘lots of
data’• Is that distinction important?• Regardless of semantics, increasing scale and
complexity of problems and necessary data• What do increasing amounts of data mean
for science and scientists?• How do we get the most value from the data
available to us?• Why is this important?
![Page 4: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/4.jpg)
Data Intensive Science
• Paradigm shift in how we do science
• Can ask (and answer) new kinds of questions
• New tools and techniques
![Page 5: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/5.jpg)
Traditional versus Data-Intensive Analyses
• Where do we see ‘data-intensive’ science?
• Within river science?• Within USGS/government?
• Why now? (what’s different?)• Data availability• Data resolution• Computational power
• What are the different tools and approaches currently used?
![Page 6: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/6.jpg)
Tools for Data-Intensive Analyses
• Data storage• Increased hard drive space• Databases
• Data manipulation• Scripting languages• Web scraping/data
‘munging’/ data mining• Modeling
• Scripting languages• Modeling packages• Data visualization
![Page 7: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/7.jpg)
Python
OS Operations
Web Queries
Database Integration
IDL
ArcGIS & ArcPY
Data Visualization
Statistics
• General purpose scripting language
• Lots of modules• Free*
• Tools for:• Data management• Data
filtering/cleaning• Scientific
computing• Geospatial analyses• Plotting• Collaborating
![Page 8: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/8.jpg)
Pretty cool, but what can we use it for?
![Page 9: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/9.jpg)
Question: Where do riverine sandbars exist and how do
they change over time?
![Page 10: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/10.jpg)
• Create database of rivers and flows• Mask active channel within overlap
of rivers and landsat images• Integrate Landast metadata with
corresponding discharge data through relational database
• Query imagery by discharge/date• Automated download and analysis
of imagery – timeseries of sandbars
![Page 11: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/11.jpg)
• Identified areas of persistent sand
• Investigated flows where sand was exposed
• Examined spatial variation• Used metrics of exposure
to help model success of Least Tern nests
![Page 12: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/12.jpg)
Main Points
• Scripts and databases allow for automated downloading and linking of multiple data types
• Too much data for manual analysis
• Python can be used to batch-process images across programs without manual intervention
• Scripted tools can be used to directly query, plot, and perform statistics on image data
![Page 13: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/13.jpg)
Question: What information can we synthesize from a 400+ day archive of field
measurements?
![Page 14: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/14.jpg)
2.5
0
EXPLANATIONVelocity, in cubic meters per second
Velocity ensemble
Velocity bin
River bottom
Water column
fast slow4-beamdepths
• Velocities and depths measured along regular transects
• Lateral, longitudinal, and vertical variability
![Page 15: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/15.jpg)
ADCP and single-beam survey dates, locations and
discharges 2000-2015
EXPLANATION
Flow percentileLow <25%Medium 25-75%High >75%
![Page 16: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/16.jpg)
• Compiled over 32,000 individual cross-sections from 2000-2015
• Joined dataset to river mile and gage to allow discharge-specific queries
• Can group data by location along river and varying discharge levels to compare
![Page 17: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/17.jpg)
• Ongoing restoration question: how does habitat (velocity) compare in river chutes versus main channel
• Chutes = restoration• 37 field days where
measurements in chutes were taken incidentally or deliberately
• Can use geospatial tools and scripts to come up with relevant comparisons
![Page 18: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/18.jpg)
Measurement archive in lieu of hydrodynamic model –sturgeon spawning locations?
![Page 19: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/19.jpg)
Main Points
• Scripts and databases allow for efficient querying and cleaning of archived datasets
• Python can be used to quickly and interactively summarize datasets by specific groupings
• Existing data can be repurposed and integrated with new data for value-added analyses using scripting
![Page 20: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/20.jpg)
Question: How can we better visualize field
measurements of channel velocity and bathymetry?
![Page 21: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/21.jpg)
• Measurements of velocity collected along ‘regular’ transects
• Python used to interpolate data into structured grid (3d matrix)
![Page 22: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/22.jpg)
Paraview
![Page 23: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/23.jpg)
• Can visualize flowlines around structures (biology)
• Identified bias in field measurements?
![Page 24: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/24.jpg)
• Noticed systematic bias
• Collaborating with ILWSC
33 million+ data points!
![Page 25: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/25.jpg)
Main Points
• Python scripts allow for interpolation and visualization of field data
• Using open-source (free) tools along with Python allows for replication of abilities from more expensive software
• New insights can be gained from visualizing data in different ways
![Page 26: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/26.jpg)
Question: How can we better
characterize inundation patterns along the Missouri
River?
![Page 27: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/27.jpg)
• Hydrodynamic (HEC-RAS) model provided by USACE describing water surface elevations at cross sections over time
• Used scripting to extend cross sections across floodplain for Missouri River
![Page 28: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/28.jpg)
• Merged LIDAR and channel data provides high-resolution characterization of floodplain elevation
• Spatial interpolations of water elevation
• Calculations of inundation depths
![Page 29: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/29.jpg)
Inundation return interval statistics
![Page 30: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/30.jpg)
Base unit for calculations: 1 date, water depth raster grid (30m) for 1 area
Time series of rasters, 1 per day for 29,892 modeled days
…n dates…
…n dates…
![Page 31: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/31.jpg)
Stack over time
xy
z
Structured 3-dimensional matrix of data
x and y are geospatial coordinates (raster dims)z is time coordinate (29,892 days)
Water depth for each x,y,z
![Page 32: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/32.jpg)
Tim
e
Data structured as hierarchical data format (hdf) on disk to allow computationally efficient slicing in time domain
Setting inundation threshold allows for identification of inundated periods per pixel
![Page 33: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/33.jpg)
Can aggregate data by year
Evaluate inundation status by criteria (such as longest consecutive inundated period during growing season)
Summarize metrics across all modeled years
…nyears……n
years…
![Page 34: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/34.jpg)
Main Points
• Python scripts allow for dealing with data too big for one computer
• Processing across virtual machines• Processing large files
• Time-series analyses on large datasets are useful for answering management questions
• Computational models are a useful supplement to field data
![Page 35: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/35.jpg)
Data Intensive Restoration?
• There have been many attempts at ecological restoration
• Meta-analysis of restoration success is nothing new
• What data are available to us in USGS/DOI that might lend itself to these approaches?
• What data are needed by people implementing NRDAR restoration?
• How can NRDAR projects contribute useful information?
![Page 36: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/36.jpg)
NRDAR Case Map and Document Library
![Page 37: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/37.jpg)
Conclusions
• As scientists, we work in an expanding world of ‘big data’
• We can’t analyze data by ourselves – need tools• Sharing data is important• Ongoing projects are just beginning to utilize
the scope of available datasets and capabilities of tools like Python
• What existing data is not fully utilized?• Think big• Add value
![Page 38: Big data for big river science: data intensive tools, …...Big data for big river science: data intensive tools, techniques, and projects at the USGS/Columbia Environmental Research](https://reader035.vdocuments.site/reader035/viewer/2022062302/5ec47fc81209b03b346ebf12/html5/thumbnails/38.jpg)
Questions?