geomesa – spatio-temporal indexing in accumulo

12
James Hughes Mathematician and Systems Engineer Commonwealth Computer Research, Inc [email protected]

Upload: cvilledatascience

Post on 01-Jul-2015

271 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: GeoMesa – Spatio-Temporal Indexing in Accumulo

James HughesMathematician and Systems EngineerCommonwealth Computer Research, [email protected]

Page 2: GeoMesa – Spatio-Temporal Indexing in Accumulo

What is GeoMesa?

•A flexible spatio-temporal index built on Accumulo.

•An implementation of GeoTools interfaces to make integration seamless.

•A set of GeoServer plugins for OGC compliant access to data.

Page 3: GeoMesa – Spatio-Temporal Indexing in Accumulo

What is Accumulo?

“The Accumulo sorted distributed key/value store is a robust, high performance data storage and retrieval system” http://accumulo.apache.org

http://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html

Page 4: GeoMesa – Spatio-Temporal Indexing in Accumulo

What is Accumulo?

Page 5: GeoMesa – Spatio-Temporal Indexing in Accumulo

How Do We Store Multi-Dimensional Data in a Dictionary?

• Space Filling Curves project multiple dimensions into a single dimension

•Base32 encoding induces an Accumulo friendly lexicographic ordering

•Recursive nesting facilitates storing different resolutions of data

•GeoHashes are common in web services

http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves

Page 6: GeoMesa – Spatio-Temporal Indexing in Accumulo

How Does GeoMesa’s Index Work?Constructs a key beginning with a shard id for horizontal scalability.Uses Space Filling Curves to encode spatio-temporal data in Accumulo keys.

Stacks server side iterators to apply (E)CQL standard queries in parallel at scan time.

Page 7: GeoMesa – Spatio-Temporal Indexing in Accumulo

What is the GeoMesa Model?

Page 8: GeoMesa – Spatio-Temporal Indexing in Accumulo

How Does GeoMesa Perform?

GDELT - Global Database of Events, Language, and ToneLeetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html220 million geocoded events from 1979 until current.Exhibits pathologies common in spatio-temporal data sets

Hot spots

Bad geocoding

Page 9: GeoMesa – Spatio-Temporal Indexing in Accumulo

Performance

PostGIS 1000 responses in > 30 seconds

GeoMesa 1000 responses in < 1 second

Page 10: GeoMesa – Spatio-Temporal Indexing in Accumulo

Geotools Integration

Page 11: GeoMesa – Spatio-Temporal Indexing in Accumulo

The Big, Open Picture

Storage, Querying, Filtering

Aggregation and analysis

Visualization

Using Open Source

Page 12: GeoMesa – Spatio-Temporal Indexing in Accumulo

Roadmap

•Integrate with M/R, Scalding, Spark, etc.•Build statistical index and query optimizationoBring Your Own Space Filling Curveo“VACUUM ANALYZE”

•Improve our WMS support•Continue contributing through LocationTech