cdcol · 2017. 9. 26. · cdcol a geoscience data cube that meets colombian needs christian...
Post on 28-Feb-2021
5 Views
Preview:
TRANSCRIPT
CDCOL A GEOSCIENCE DATA CUBE THAT MEETS COLOMBIAN NEEDS
Christian Ariza-Porras, Germán Bravo, Mario Villamizar , Andrés Moreno, Harold Castro, Gustavo Galindo, Edersson Cabera, Saralux Valbuena, and Pilar Lozano
Problem Analysts’ time
Effort replication
Processing
Variety of sources and tools
Replicability
Processing power and storage
Developers are a scarce resource
Results can be reused only if can be trusted
Traditional remote sensing product generation process
Source: Held A. 2015. Power Point presentation First Workshop Data Cube Colombia
To majority of end-users, saving up to 80% of collective effort and costs.
New Vision – Analysis ready data
Source: Held A. 2015. Power Point presentation First Workshop Data Cube Colombia
CDCol Goals
Data ownership Extensibility Lineage Replicability Standardization
Reusability Complexity abstraction
Ease of use Parallelization
Solution Strategy
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
CDCol User Roles
System Administrator
Data Administrator
Developer
Analyst
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Algorithms Life Cycle
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Development
Complexity Abstraction
• Independent of datacube-core
• Automatic parallelization
• Python well known libraries
• Numpy
• xArray
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Execution
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
CDCol Web UI
Empowers users to work on a large set of satellite images from any device
Reduces learning curve
Authentication and roles management
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
CDCol Demo
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Parallelization Strategy
Automatic
By Tile
Generic Task
Celery
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Bulk Ingestion
Initial ingestion
15854 Scenes
Landsat 5, 7, and 8 (T1 Surface
Reflectance products from USGS)
15 years
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Training Workshops
Training and diffusion workshops are essential to the success of the data cube.
Developers
• Python fundamentals
• Multidimensional arrays manipulation on python
Analysts
• Datacube workfow
Roles Bank of
algorithms and results
Web UI
Parallelization strategy
Bulk Ingestion Training
Workshop
Results Bank of algorithms
◦ Algorithms
◦ Temporal medians compounds
◦ NDVI
◦ Forest-No forest classification
◦ Change detection using PCA
◦ WOFS –adapted
Workshops participants developed their own algorithms
Repeatable results
Set of available tools to analysts
Time reduction (a task that used to take 72 hours now can be done on 12 hours)
Results
15años DATOS DE 2000-2015
30metros RESOLUCIÓN DE PIXEL
342 escenas
LANDSAT 7/8
2h PROCESAMIENTO
Results
15años DATOS DE 2000-2015
30metros RESOLUCIÓN DE PIXEL
466 imágenes
LANDSAT 7/8
2min x año PROCESAMIENTO
Bosque Otras Coberturas
Results
15años DATOS DE 2000-2015
30metros RESOLUCIÓN DE PIXEL
45 imágenes
LANDSAT 7
20min x periodo PROCESAMIENTO
Conclusions Data ownership
• 15 years of curated images from different sources
Extensibility
• Developers can implement, with a low learning curve, new algorithms
• Data administrator to add new images to collection, and create new data types to support new sources.
Lineage and Replicability
• Results are replicable by logging executions parameters and algorithms versions.
Complexity abstraction
• Algorithms are independent of data cube core API. Developers Works only with multidimensional arrays with well stablished Python packages.
Ease of use
• Easy to use web user interface.
Parallelism
• Automatic parallelism by tile.
Future Work Horizontal Scaling
Algorithm dependent parallelization schemes
Workflows management
New sensors
New algorithms
Training
Cloud enabled-CDCol
Acknowledgements We thank to Brian Killough from NASA, and Alfredo Delos Santos and Kayla Fox from AMA team, for their support and fruitfully discussions. We also thank to CEOS Australia group for its work and for share it with the world. We thank also to the Environmental Ministry for financial support.
CDCol uses NetCDF format UCAR/Unidata to storage ingested data and results (http://doi.org/10.5065/D6H70CW6).
top related