bulk data retrieval echo technical interchange meeting april 30 & may 1, 2013 raytheon eed...

12
BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

Upload: marylou-day

Post on 16-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

BULK DATA RETRIEVALECHO Technical Interchange Meeting

April 30 & May 1, 2013

Raytheon EED Program | ECHO Technical Interchange 2013

Page 2: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

What is it For?• Quick access to Publicly Available Data via URLs

• No processing options

• User Driven Pull• Near-instant

Raytheon EED Program | ECHO Technical Interchange 2013

Page 3: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

State of the Union (Reverb)• Put items in your cart, click “Download”• URL Options:

• Data• Metadata• Browse

• Download Options• Text File• FTP Batch Script

Raytheon EED Program | ECHO Technical Interchange 2013

Page 4: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

How Does Reverb Do It• Catalog-REST!• Granule Searches

• “atom” format results

• Scan for “links” to URLs• Download a file containing those links.

Raytheon EED Program | ECHO Technical Interchange 2013

Page 5: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

How do we do it?• Catalog-REST!• Granule Searches

• “atom” format results

• Scan for “links” to URLs• Create a file containing those links.• Get them.

Raytheon EED Program | ECHO Technical Interchange 2013

Page 6: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

Example (cURL)*• curl -gG

“https://testbed.echo.nasa.gov/catalog-rest/echo_catalog/granules.atom?echo_collection_id=C3878-LPDAAC_ECS&bounding_box=10.488%2C-0.703%2C53.331%2C68.906&temporal[]=2009-01-01T10%3A00%3A00Z%2C2010-03-10T12%3A00%3A00Z”

• This gets all granules with:• echo_collection_id of: C3878-LPDAAC_ECS• Spatial bounding box: 10.488, -0.703, 53.331, 68.906 (W, S, E, N)• Time constraint: 2009-01-01T10:00:00Z - 2010-03-10T12:00:00Z

• ~80 hits! Use -I as options to curl, and look for:• “Echo-Hits”

* also from perl/bulk/get_bulk.pl

Raytheon EED Program | ECHO Technical Interchange 2013

Page 7: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

What do the Results Look Like?• <entry xmlns:georss="http://www.georss.org/georss/10" xmlns:time="http://a9.com/-/opensearch/extensions/time/1.0/"

xmlns:echo="http://www.echo.nasa.gov/esip" xmlns:gml="http://www.opengis.net/gml">• <id>G10607-LPDAAC_ECS</id>• <title type="text">SC:MCD43A4.005:2075808749</title>• <updated>2009-10-15T14:01:49.076Z</updated>• <echo:datasetId>MODIS/Terra+Aqua Nadir BRDF-Adjusted Reflectance 16-Day L3 Global 500m SIN Grid V005</echo:datasetId>• <echo:producerGranuleId>MCD43A4.A2009257.h21v08.005.2009276131145.hdf</echo:producerGranuleId>• <echo:granuleSizeMB>57.7068</echo:granuleSizeMB>• <echo:dataCenter>LPDAAC_ECS</echo:dataCenter>• <time:start>2009-09-14T00:00:00.000Z</time:start>• <time:end>2009-09-29T23:59:59.999Z</time:end>• <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf"

hreflang="en-US" rel="http://esipfed.org/ns/fedsearch/1.1/data#"/>• <link href="ftp://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2009.10.03/BROWSE.MCD43A4.A2009257.h21v08.005.2009276091235.1.jpg"

hreflang="en-US" title=" (BROWSE)" type="image/jpeg" rel="http://esipfed.org/ns/fedsearch/1.1/browse#"/>• <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf.xml"

hreflang="en-US" title=" (METADATA)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>• <link href="http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=aqua" hreflang="en-US" title=" (DatasetDisclaimer)"

rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>• <link href="http://lpdaac.usgs.gov/modis/dataprod.html" hreflang="en-US" title="Documents page for LP DAAC MODIS Products. (MiscInformation)"

rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>• <link href="http://testbed.echo.nasa.gov/LPDAAC_ECS/2010/04/06/:BR:Browse.001:2075808773:1.BINARY" hreflang="en-US" type="application/x-

hdfeos" rel="http://esipfed.org/ns/fedsearch/1.1/browse#" length="29953"/>• <georss:polygon>3.85518158489962e-05 29.8878504914521 -0.00342414555683897 40.0119084380163 9.99985925957934 40.6260396971992

10.0030323070386 30.3449665340407 3.85518158489962e-05 29.8878504914521</georss:polygon>• <echo:onlineAccessFlag>true</echo:onlineAccessFlag>• <echo:browseFlag>true</echo:browseFlag>• <echo:dayNightFlag>DAY</echo:dayNightFlag>• </entry>

• YIKES!

Raytheon EED Program | ECHO Technical Interchange 2013

Page 8: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

“link” is your friend!• <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/

MCD43A4.A2009257.h21v08.005.2009276131145.hdf" hreflang="en-US" rel="http://esipfed.org/ns/fedsearch/1.1/data#"/>

• <link href="ftp://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2009.10.03/BROWSE.MCD43A4.A2009257.h21v08.005.2009276091235.1.jpg" hreflang="en-US" title=" (BROWSE)" type="image/jpeg" rel="http://esipfed.org/ns/fedsearch/1.1/browse#"/>

• <link href="ftp://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD43A4.005/2009.09.14/MCD43A4.A2009257.h21v08.005.2009276131145.hdf.xml" hreflang="en-US" title=" (METADATA)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>

• <link href="http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=aqua" hreflang="en-US" title=" (DatasetDisclaimer)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>

• <link href="http://lpdaac.usgs.gov/modis/dataprod.html" hreflang="en-US" title="Documents page for LP DAAC MODIS Products. (MiscInformation)" rel="http://esipfed.org/ns/fedsearch/1.1/metadata#"/>

• <link href="http://testbed.echo.nasa.gov/LPDAAC_ECS/2010/04/06/:BR:Browse.001:2075808773:1.BINARY" hreflang="en-US" type="application/x-hdfeos" rel="http://esipfed.org/ns/fedsearch/1.1/browse#" length="29953"/>

Raytheon EED Program | ECHO Technical Interchange 2013

Page 9: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

Only link…/data# Please!• curl -gG

“https://testbed.echo.nasa.gov/catalog-rest/echo_catalog/granules.atom?echo_collection_id=C3878-LPDAAC_ECS&bounding_box=10.488%2C-0.703%2C53.331%2C68.906&temporal[]=2009-01-01T10%3A00%3A00Z%2C2010-03-10T12%3A00%3A00Z” | perl –ne “printf if m/link.*\/data#/;”

• “Just show me the results that have ‘data’ type in the links”

• Slightly more clever perl:• … | perl -anF/\”/ -e “printf qq(\$F[1]\n) if m/link.*\/data#/;”• But watch out for Windows vs. Mac/Linux quoting!

Raytheon EED Program | ECHO Technical Interchange 2013

Page 10: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

Loop it Over your Echo-Hits• append &page_size=500 to URL_String• end_pages = (Echo-Hits DIV page_size) + 1• for page 1 .. end_pages

• curl (string+&page_num=$page) | clever.perl >> output.URLs

• Now you have an output.URLs with lots of URLs in them…

Raytheon EED Program | ECHO Technical Interchange 2013

Page 11: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

So What if I Have Some URLs?• Scripting curl to the rescue!• Linux/Mac/Unix:

• for url in $(<output.URLs); do curl $url -OL -s; done

• Windows:• for /f %f in (output.URLs) do curl %f -OL

Raytheon EED Program | ECHO Technical Interchange 2013

Page 12: BULK DATA RETRIEVAL ECHO Technical Interchange Meeting April 30 & May 1, 2013 Raytheon EED Program | ECHO Technical Interchange 2013

Questions?

Raytheon EED Program | ECHO Technical Interchange 2013