streaming using kaa iot data ingestion in spark · iot data ingestion in spark streaming using kaa...
TRANSCRIPT
IoT data ingestion in Spark Streaming using Kaa
Andrew Shvayka
[email protected]© 2015 CyberVision, Inc. All rights reserved.
kaaproject.org© 2015 CyberVision, Inc. All rights reserved.
Agenda
➢ Data ingestion challenges➢ Why Kaa?➢ Why Spark?➢ Reference architecture overview➢ Hands-on
➢ Environment setup➢ Intel Edison application code walkthrough➢ Spark application code walkthrough➢ Live demo
➢ Q&A
kaaproject.org© 2015 CyberVision, Inc. All rights reserved.
Data ingestion requirements/challenges
Must have:➢ Guaranteed data delivery➢ Scalability➢ Security➢ Performance➢ Low latency
Nice to have:➢ Built-in data structure validation➢ Device platform independent➢ Low footprint➢ Low bandwidth support
kaaproject.org© 2015 CyberVision, Inc. All rights reserved. kaaproject.org© 2015 CyberVision, Inc. All rights reserved.
➢ Fully-featured IoT middleware platform➢ 10 Kb RAM footprint (with C SDK)➢ Guaranteed data delivery and reliable local storage➢ Built-in transport security➢ Efficient data serialization➢ Horizontally scalable and fault tolerant➢ 100% open-source (Apache license 2.0)➢ Rapid application development using C / C++ / Java SDKs➢ Integration with popular device platforms
Why Kaa?
kaaproject.org© 2015 CyberVision, Inc. All rights reserved. kaaproject.org© 2015 CyberVision, Inc. All rights reserved.
➢ Fast and performant cluster computing➢ Rapid application development➢ SQL support➢ Streaming analytics support➢ Machine learning and graph processing support➢ 100% open-source (Apache license 2.0)➢ Easy deployment
Why Spark?
© 2015 CyberVision, Inc. All rights reserved.
Problem description
kaaproject.org
Zone 1 Zone 2
Zone 3 Zone 4
Zone 5 Zone 6
Spark cluster/sandbox
Kaacluster/sandbox
© 2015 CyberVision, Inc. All rights reserved.
Reference architecture
kaaproject.org
Solar panels
Flume event
StructuredData
Solar panels
Raw data
Intel Edison
Kaa SDK
Client application
Intel Edison
Kaa SDK
Client application
Kaa node
Flume agent
Spark node
kaaproject.org© 2015 CyberVision, Inc. All rights reserved.
Development environment setup
Sample project repository: https://github.com/kaaproject/kaa-spark-sampleApache Spark (Standalone mode): http://spark.apache.org/docs/latest/spark-standalone.htmlKaa Sandbox: http://www.kaaproject.org/download-kaaIntel Edison: https://docs.kaaproject.org/display/KAA/Intel+Edison
© 2015 CyberVision, Inc. All rights reserved.
Spark processing
kaaproject.org
DStream<SparkFlumeEvent>
JavaPairDStream<ZoneId, ZoneStats>
JavaPairDStream<ZoneId, ZoneStats>
JavaPairDStream<ZoneId, ZoneStats>
JavaDStream<String>
FlatMap
ReduceByKey
Sort
Map