streaming using kaa iot data ingestion in spark · iot data ingestion in spark streaming using kaa...

12
IoT data ingestion in Spark Streaming using Kaa Andrew Shvayka [email protected] kaaproject.org © 2015 CyberVision, Inc. All rights reserved.

Upload: nguyenminh

Post on 07-Jun-2018

260 views

Category:

Documents


0 download

TRANSCRIPT

IoT data ingestion in Spark Streaming using Kaa

Andrew Shvayka

[email protected]© 2015 CyberVision, Inc. All rights reserved.

kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

Agenda

➢ Data ingestion challenges➢ Why Kaa?➢ Why Spark?➢ Reference architecture overview➢ Hands-on

➢ Environment setup➢ Intel Edison application code walkthrough➢ Spark application code walkthrough➢ Live demo

➢ Q&A

kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

Data ingestion requirements/challenges

Must have:➢ Guaranteed data delivery➢ Scalability➢ Security➢ Performance➢ Low latency

Nice to have:➢ Built-in data structure validation➢ Device platform independent➢ Low footprint➢ Low bandwidth support

kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

kaaproject.org© 2015 CyberVision, Inc. All rights reserved. kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

➢ Fully-featured IoT middleware platform➢ 10 Kb RAM footprint (with C SDK)➢ Guaranteed data delivery and reliable local storage➢ Built-in transport security➢ Efficient data serialization➢ Horizontally scalable and fault tolerant➢ 100% open-source (Apache license 2.0)➢ Rapid application development using C / C++ / Java SDKs➢ Integration with popular device platforms

Why Kaa?

kaaproject.org© 2015 CyberVision, Inc. All rights reserved. kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

➢ Fast and performant cluster computing➢ Rapid application development➢ SQL support➢ Streaming analytics support➢ Machine learning and graph processing support➢ 100% open-source (Apache license 2.0)➢ Easy deployment

Why Spark?

© 2015 CyberVision, Inc. All rights reserved.

Problem description

kaaproject.org

Zone 1 Zone 2

Zone 3 Zone 4

Zone 5 Zone 6

Spark cluster/sandbox

Kaacluster/sandbox

© 2015 CyberVision, Inc. All rights reserved.

Reference architecture

kaaproject.org

Solar panels

Flume event

StructuredData

Solar panels

Raw data

Intel Edison

Kaa SDK

Client application

Intel Edison

Kaa SDK

Client application

Kaa node

Flume agent

Spark node

kaaproject.org© 2015 CyberVision, Inc. All rights reserved.

Development environment setup

Sample project repository: https://github.com/kaaproject/kaa-spark-sampleApache Spark (Standalone mode): http://spark.apache.org/docs/latest/spark-standalone.htmlKaa Sandbox: http://www.kaaproject.org/download-kaaIntel Edison: https://docs.kaaproject.org/display/KAA/Intel+Edison

© 2015 CyberVision, Inc. All rights reserved.

Spark processing

kaaproject.org

DStream<SparkFlumeEvent>

JavaPairDStream<ZoneId, ZoneStats>

JavaPairDStream<ZoneId, ZoneStats>

JavaPairDStream<ZoneId, ZoneStats>

JavaDStream<String>

FlatMap

ReduceByKey

Sort

Map

Andrew [email protected]

kaaproject.orgcybervisiontech.com

THANK YOU FOR YOUR ATTENTIONQUESTIONS?

© 2015 CyberVision, Inc. All rights reserved.

Zookeeper quorum

Endpoints

Control servers

standby

Bootstrap servers

Operations servers

Fault-tolerance and horizontal scalability

kaaproject.org

active