iot, timeseries and prediction with android, cassandra and ... · iot, timeseries and prediction...
TRANSCRIPT
![Page 1: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/1.jpg)
IOT, timeseries and prediction
with Android, Cassandra and Spark
Amira Lakhal@Miralak
![Page 2: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/2.jpg)
10/02/2016@Miralak #jfokus
About meRunning addict
Paris 2016 Marathon
@Miralak
github.com/MiraLak
Agile Java Developer and
![Page 3: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/3.jpg)
10/02/2016@Miralak #jfokus
Duchess Francewww.duchess-france.org
@duchessfr
![Page 4: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/4.jpg)
10/02/2016@Miralak #jfokus
The internet of things
![Page 5: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/5.jpg)
10/02/2016@Miralak #jfokus
Internet of things (IoT)
![Page 6: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/6.jpg)
10/02/2016@Miralak #jfokus
Internet of things
![Page 7: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/7.jpg)
10/02/2016@Miralak #jfokus
Big Data Era
Source: micronautomata.com
![Page 8: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/8.jpg)
10/02/2016@Miralak #jfokus
Transform data
![Page 9: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/9.jpg)
10/02/2016@Miralak #jfokus
Future?
![Page 10: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/10.jpg)
10/02/2016@Miralak #jfokus
Far far away
Source: aldebaran
![Page 11: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/11.jpg)
10/02/2016@Miralak #jfokus
My connected objects
![Page 12: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/12.jpg)
10/02/2016@Miralak #jfokus
GoalPhysical activity recognition with measurement coming from an accelerometer:
walking, jogging, sitting ...
Health care
![Page 13: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/13.jpg)
10/02/2016@Miralak #jfokus
Accelerometer● A motion sensor● Measure proper acceleration● Multiple usages
![Page 14: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/14.jpg)
10/02/2016@Miralak #jfokus
AccelerometerEach acceleration contains:
● a timestamp (eg, 1428773040488)● acceleration force along the x axis (unit is m/s²)● acceleration force along the y axis (unit is m/s²)● acceleration force along the z axis (unit is m/s²)
![Page 15: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/15.jpg)
10/02/2016@Miralak #jfokus
Timeseries
Source: cityzendata.com
![Page 16: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/16.jpg)
10/02/2016@Miralak #jfokus
Collect data
https://github.com/MiraLak/accelerometer-rest-to-cassandra
![Page 17: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/17.jpg)
10/02/2016@Miralak #jfokus
Android App
https://github.com/MiraLak/AccelerometerAndroidApp
![Page 18: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/18.jpg)
10/02/2016@Miralak #jfokus
Store data
![Page 19: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/19.jpg)
10/02/2016@Miralak #jfokus
![Page 20: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/20.jpg)
10/02/2016@Miralak #jfokus
History● Created by Facebook● Open-source in 2008● current version 3.3● column-oriented ☞ distributed table
![Page 21: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/21.jpg)
10/02/2016@Miralak #jfokus
Cassandra Key Facts● Linear scalability
● Master-less: peer to peer
● Operational simplicity
● Multi-datacenter
● No SPOF ☞ Continuous availability (≈100% up-time)
![Page 22: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/22.jpg)
10/02/2016@Miralak #jfokus
Last Write Win
DuchessFrname(t1) age(t1)
Duchess France 5
INSERT INTO users(login, name, age) VALUES (‘DuchessFr’ , ‘Duchess France’, ‘5’);
auto-generated timestamp
![Page 23: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/23.jpg)
10/02/2016@Miralak #jfokus
Last Write WinUPDATE users SET age = ‘6’ WHERE login=‘DuchessFr’ ;
DuchessFr
name(t1) age(t1)
Duchess France 5
DuchessFr
age(t2)
6
SSTable1 SSTable2
![Page 24: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/24.jpg)
10/02/2016@Miralak #jfokus
Last Write WinDELETE age from users WHERE login=‘DuchessFr’ ;
DuchessFr
name(t1) age(t1)
Duchess France 5
DuchessFr
age(t2)
6
SSTable1 SSTable2
DuchessFr
age(t3)
SSTable3
![Page 25: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/25.jpg)
10/02/2016@Miralak #jfokus
Last Write WinSELECT age from users WHERE login=‘DuchessFr’ ;
DuchessFr
name(t1) age(t1)
Duchess France 5
DuchessFr
age(t2)
6
SSTable1 SSTable2
DuchessFr
age(t3)
SSTable3
![Page 26: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/26.jpg)
10/02/2016@Miralak #jfokus
Compaction
DuchessFr
name(t1) age(t1)
Duchess France 5
DuchessFr
age(t2)
6
SSTable1 SSTable2
DuchessFr
age(t3)
SSTable3
DuchessFr
name(t1) age(t3)
Duchess France
newSSTable
![Page 27: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/27.jpg)
10/02/2016@Miralak #jfokus
Timeseries with CassandraWe want to use historical data
Use timeseries data model
User ID and timestamp unique
Store many as needed
![Page 28: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/28.jpg)
10/02/2016@Miralak #jfokus
Timeseries data modelCREATE TABLE accelerometer (
user_id text,
timestamp bigint,
x double, y double, z double,
PRIMARY KEY (( user_id ),timestamp)
); Partition keyClustering column
![Page 29: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/29.jpg)
10/02/2016@Miralak #jfokus
Storage Model : Logical View
user-id timestamp x y z
TEST_USER 1447034733210 10.9 34.5 12.6
TEST_USER 1447034733344 15.4 39.9 16.3
TEST_USER 1447034733462 13.5 36.1 20.3
TEST_USER 1447034733556 13.0 41.3 22.8”
![Page 30: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/30.jpg)
10/02/2016@Miralak #jfokus
Storage model: Disk layout
TEST_USER
1447034733210 1447034733344 1447034733462
x Y Z x y z x y z
10.9 34.5 12.6 15.4 39.9 16.3 13.5 36.1 20.3
![Page 31: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/31.jpg)
10/02/2016@Miralak #jfokus
Timeseries data modelCREATE TABLE accelerometer (
user_id text,
date text,
timestamp bigint,
x double, y double, z double,
PRIMARY KEY ((user_id, date),timestamp)
);
![Page 32: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/32.jpg)
10/02/2016@Miralak #jfokus
Storage Model : Logical View
user-id:date timestamp x y z
TEST_USER:24-09-2015 1446034733566
10.9 34.5 12.6
TEST_USER:24-09-2015
1446034733631 15.4 39.9 16.3
TEST_USER:24-09-2015
1446034733740 13.5 36.1 20.3
TEST_USER:24-09-2015
1446034733830 13.0 41.3 22.8”
![Page 33: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/33.jpg)
10/02/2016@Miralak #jfokus
Storage model: Disk layout
TEST_USER:24-09-2015
1446034733566 1446034733631 .. 1446034734120
x Y Z x y z .. x y z
10.9 34.5 12.6 15.4 39.9 16.3 .. 13.5 36.1 20.3
TEST_USER:25-09-2015
1447034733210 1447034733300 .. 1447034733810
x Y Z x y z .. x y z
13.3 30.5, 12.1 25.2 34.9 16.1 .. 15.0 32.1 12.4
![Page 34: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/34.jpg)
10/02/2016@Miralak #jfokus
Query patternsRange queries ☞ Slice on disk
Select user_id, date, timestamp, x, y, z
From accelerometer
Where user_id =”TEST_USER”
and date = “24-04-2015”
and timestamp > “1447034733462” and timestamp < “1447034733890”
![Page 35: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/35.jpg)
10/02/2016@Miralak #jfokus
Timeseries with lastest columns firstCREATE TABLE latest_accelerations (
user_id text,
timestamp bigint,
x double, y double, z double,
PRIMARY KEY (user_id, timestamp)
WITH CLUSTERING ORDER BY (timestamp, DESC);
);
![Page 36: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/36.jpg)
10/02/2016@Miralak #jfokus
Timeseries with expiring columns
INSERT INTO latest_accelerations( user_id, timestamp, x, y, z )
VALUES ( ’TEST_USER’, 1447034733462, 10.9, 34.5, 12.6 )
USING TTL 20;
![Page 37: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/37.jpg)
10/02/2016@Miralak #jfokus
Analyse data
![Page 38: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/38.jpg)
10/02/2016@Miralak #jfokus
![Page 39: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/39.jpg)
10/02/2016@Miralak #jfokus
Spark History● Created by AMPLab● Open-source in 2010● Current version 1.5.2● Written in Scala● Fast cluster computing
![Page 40: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/40.jpg)
10/02/2016@Miralak #jfokus
Spark ecosystem
Source: Databricks
![Page 41: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/41.jpg)
10/02/2016@Miralak #jfokus
Spark data sources
Source: Databricks
![Page 42: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/42.jpg)
10/02/2016@Miralak #jfokus
Resilient Distributed Dataset (RDD)
● Abstraction : collection of objects● Operated in parallel● Fault tolerant without replication
Source: Databricks
![Page 43: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/43.jpg)
10/02/2016@Miralak #jfokus
RDD transformations and actions
Source: http://www.bogotobogo.com
![Page 44: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/44.jpg)
10/02/2016@Miralak #jfokus
Word count sampleSparkConf conf = new SparkConf() .setAppName("Wordcount") .setMaster("local[*]");JavaSparkContext sc = new JavaSparkContext(conf);
//TransformationsJavaRDD<String> words = sc.textFile(myFilePath)
.flatMap(line -> Arrays.asList(line.split(" ")));JavaPairRDD<String, Integer> pairs = words.mapToPair(x -> new Tuple2(x,1))
.reduceByKey((a,b) -> a+b) .filter(tuple -> tuple._2() > 3);
//ActionList<Tuple2<String, Integer>> result = words.collect();
![Page 45: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/45.jpg)
10/02/2016@Miralak #jfokus
Word count sampleSparkConf conf = new SparkConf() .setAppName("Wordcount") .setMaster("local[*]");JavaSparkContext sc = new JavaSparkContext(conf);
//TransformationsJavaRDD<String> words = sc.textFile(myFilePath)
.flatMap(line -> Arrays.asList(line.split(" ")));JavaPairRDD<String, Integer> pairs = words.mapToPair(x -> new Tuple2(x,1))
.reduceByKey((a,b) -> a+b) .filter(tuple -> tuple._2() > 3);
//ActionList<Tuple2<String, Integer>> result = words.collect();
![Page 46: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/46.jpg)
10/02/2016@Miralak #jfokus
Word count sampleSparkConf conf = new SparkConf() .setAppName("Wordcount") .setMaster("local[*]");JavaSparkContext sc = new JavaSparkContext(conf);
//TransformationsJavaRDD<String> words = sc.textFile(myFilePath)
.flatMap(line -> Arrays.asList(line.split(" ")));JavaPairRDD<String, Integer> pairs = words.mapToPair(x -> new Tuple2(x,1))
.reduceByKey((a,b) -> a+b) .filter(tuple -> tuple._2() > 3);
//ActionList<Tuple2<String, Integer>> result = words.collect();
![Page 47: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/47.jpg)
10/02/2016@Miralak #jfokus
Spark on cluster
● mesos● YARN● standalone
Source: http://spark.apache.org
![Page 48: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/48.jpg)
10/02/2016@Miralak #jfokus
Spark key facts● fast● flexible● easy to use
![Page 49: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/49.jpg)
10/02/2016@Miralak #jfokus
Real time analytics
![Page 50: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/50.jpg)
10/02/2016@Miralak #jfokus
&
![Page 51: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/51.jpg)
10/02/2016@Miralak #jfokus
Spark Cassandra Connector● Open source● Version 1.5● Implemented in Scala● Loads data from Cassandra to Spark● Writes data from Spark to Cassandra
https://github.com/datastax/spark-cassandra-connector
![Page 52: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/52.jpg)
10/02/2016@Miralak #jfokus
Spark Cassandra connectorExposes Cassandra tables as Spark RDD
c* CassandraJava driver
SparkCassandraConnector
Spark
![Page 53: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/53.jpg)
10/02/2016@Miralak #jfokus
Spark connection setupSparkConf sparkConf = new SparkConf() .setAppName("activityRecognition") .set("spark.cassandra.connection.host", "127.0.0.1")) .set("spark.cassandra.connection.native.port", "9142") .setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
![Page 54: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/54.jpg)
10/02/2016@Miralak #jfokus
From Cassandra to SparkCassandraJavaRDD<CassandraRow> cassandraRowsRDD = javaFunctions(sc).cassandraTable("activityrecognition", "training");
JavaRDD<Long> times = cassandraRowsRDD.select("timestamp") .where("user_id=? AND activity=?", user, activity) .map(CassandraRow::toMap) .map(entry -> (long) entry.get("timestamp")) .cache();
![Page 55: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/55.jpg)
10/02/2016@Miralak #jfokus
From Cassandra to SparkCassandraJavaRDD<CassandraRow> cassandraRowsRDD = javaFunctions(sc).cassandraTable("activityrecognition", "training");
JavaRDD<Long> times = cassandraRowsRDD.select("timestamp") .where("user_id=? AND activity=?", user, activity) .map(CassandraRow::toMap) .map(entry -> (long) entry.get("timestamp")) .cache();
![Page 56: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/56.jpg)
10/02/2016@Miralak #jfokus
Spark streamingScalable, high-throughput, fault-tolerant stream processing
Source: http://spark.apache.org
![Page 57: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/57.jpg)
10/02/2016@Miralak #jfokus
Spark Streaming// declare a streaming contextJavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
JavaReceiverInputDStream<String> cassandraReceiver = ssc.receiverStream(new CassandraReceiver(StorageLevel.MEMORY_ONLY(), ssc.sparkContext()) // custom Cassandra receiver
);
… // The transformations are done herecassandraReceiver.print(); // print the receiver value
ssc.start();ssc.awaitTermination(); // Wait for the computation to terminate
![Page 58: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/58.jpg)
10/02/2016@Miralak #jfokus
Spark Streaming// declare a streaming contextJavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
JavaReceiverInputDStream<String> cassandraReceiver = ssc.receiverStream(new CassandraReceiver(StorageLevel.MEMORY_ONLY(), ssc.sparkContext()) // custom Cassandra receiver
);
… // The transformations are done herecassandraReceiver.print(); // print the receiver value
ssc.start();ssc.awaitTermination(); // Wait for the computation to terminate
![Page 59: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/59.jpg)
10/02/2016@Miralak #jfokus
Spark Streaming// declare a streaming contextJavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
JavaReceiverInputDStream<String> cassandraReceiver = ssc.receiverStream(new CassandraReceiver(StorageLevel.MEMORY_ONLY(), ssc.sparkContext()) // custom Cassandra receiver
);
… // The transformations are done herecassandraReceiver.print(); // print the receiver value
ssc.start();ssc.awaitTermination(); // Wait for the computation to terminate
![Page 60: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/60.jpg)
10/02/2016@Miralak #jfokus
Spark Streaming// declare a streaming contextJavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
JavaReceiverInputDStream<String> cassandraReceiver = ssc.receiverStream(new CassandraReceiver(StorageLevel.MEMORY_ONLY(), ssc.sparkContext()) // custom Cassandra receiver
);
… // The transformations are done herecassandraReceiver.print(); // print the receiver value
ssc.start();ssc.awaitTermination(); // Wait for the computation to terminate
![Page 61: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/61.jpg)
10/02/2016@Miralak #jfokus
Spark StreamingPublic class CassandraReceiver extends Receiver<String>{ ... @Override public void onStart() { // Start the thread that receives data over a connection new Thread() { @Override public void run() { receive(); //Read data from Cassandra, compute features and write prediction into Cassandra } }.start(); } @Override public void onStop() { //Nothing to do }...}
![Page 62: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/62.jpg)
10/02/2016@Miralak #jfokus
Spark StreamingPublic class CassandraReceiver extends Receiver<String>{ ... @Override public void onStart() { // Start the thread that receives data over a connection new Thread() { @Override public void run() { receive(); //Read data from Cassandra, compute features and write prediction into Cassandra } }.start(); } @Override public void onStop() { //Nothing to do }...}
![Page 63: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/63.jpg)
10/02/2016@Miralak #jfokus
Time for prediction
![Page 64: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/64.jpg)
10/02/2016@Miralak #jfokus
Activity RecognitionPossible activity: walking, jogging, sitting or standing
Measurement from accelerometer : timeseries
Classifying a timeseries into physical activity classes
Machine Learning
![Page 65: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/65.jpg)
10/02/2016@Miralak #jfokus
Multiclass classificationlabelling unknown pattern based on known patterns
Logistic regression
Naive Bayes
Random forest
![Page 66: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/66.jpg)
10/02/2016@Miralak #jfokus
Decision tree model
![Page 67: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/67.jpg)
10/02/2016@Miralak #jfokus
Supervised learning
features labelfeatures label
features labelfeatures label
Features Label
Features ?
Train
Predict
![Page 68: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/68.jpg)
10/02/2016@Miralak #jfokus
Predictive model● collect labeled data● identify the features● compute the features● create and train the random forest model● predict the activity
![Page 69: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/69.jpg)
10/02/2016@Miralak #jfokus
Special thanks
![Page 70: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/70.jpg)
10/02/2016@Miralak #jfokus
Collecting data
![Page 71: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/71.jpg)
10/02/2016@Miralak #jfokus
Android App
![Page 72: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/72.jpg)
10/02/2016@Miralak #jfokus
Training data
![Page 73: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/73.jpg)
10/02/2016@Miralak #jfokus
Training data13679 training acceleration => only 1.6 Mo
![Page 74: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/74.jpg)
10/02/2016@Miralak #jfokus
Prepare data: windows
![Page 75: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/75.jpg)
10/02/2016@Miralak #jfokus
Timeseries: sitting vs standing
Standing Y> X and Sitting Y <X
Source: cityzendata.com
![Page 76: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/76.jpg)
10/02/2016@Miralak #jfokus
Timeseries: walking vs Jogging
Y peak to peak amplitude
Source: cityzendata.com
![Page 77: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/77.jpg)
10/02/2016@Miralak #jfokus
Features extraction● Average acceleration (for each axis)
● Average difference for X and Y axis
● Variance (for each axis)
● Standard deviation (for each axis) √ 1/n * ∑ (x - mean_x)
● Average absolute difference (for each axis)
● Average resultant acceleration 1/n * ∑ √(x² + y² + z²)
● Average peak to peak amplitude for Y axis
![Page 78: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/78.jpg)
10/02/2016@Miralak #jfokus
Compute featuresJavaRDD<double[]> accelerationData = dataFromCassandra.map(CassandraRow::toMap).map(
row -> new double[]{(double) row.get("x"), (double) row.get("y"), (double) row.get("z")});
JavaRDD<Vector> vectorsXYZ = accelerationData.map(Vectors::dense);
MultivariateStatisticalSummary statisticalSummary = Statistics.colStats(accelerationData.rdd());double[] meanArray = statisticalSummary.mean().toArray();double[] varianceArray = statisticalSummary.variance().toArray();double difference = extractFeature.computeDifferenceBetweenAxes(meanArray);
//build LabeledPoint with an activity labelLabeledPoint labeledPoint =new LabeledPoint(label, Vectors.dense(features))
![Page 79: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/79.jpg)
10/02/2016@Miralak #jfokus
Compute featuresJavaRDD<double[]> accelerationData = dataFromCassandra.map(CassandraRow::toMap).map(
row -> new double[]{(double) row.get("x"), (double) row.get("y"), (double) row.get("z")});
JavaRDD<Vector> vectorsXYZ = accelerationData.map(Vectors::dense);
MultivariateStatisticalSummary statisticalSummary = Statistics.colStats(accelerationData.rdd());double[] meanArray = statisticalSummary.mean().toArray();double[] varianceArray = statisticalSummary.variance().toArray();double difference = extractFeature.computeDifferenceBetweenAxes(meanArray);
//build LabeledPoint with an activity labelLabeledPoint labeledPoint =new LabeledPoint(label, Vectors.dense(features))
![Page 80: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/80.jpg)
10/02/2016@Miralak #jfokus
Compute featuresJavaRDD<double[]> accelerationData = dataFromCassandra.map(CassandraRow::toMap).map(
row -> new double[]{(double) row.get("x"), (double) row.get("y"), (double) row.get("z")});
JavaRDD<Vector> vectorsXYZ = accelerationData.map(Vectors::dense);
MultivariateStatisticalSummary statisticalSummary = Statistics.colStats(accelerationData.rdd());double[] meanArray = statisticalSummary.mean().toArray();double[] varianceArray = statisticalSummary.variance().toArray();double difference = extractFeature.computeDifferenceBetweenAxes(meanArray);
//build LabeledPoint with an activity labelLabeledPoint labeledPoint =new LabeledPoint(label, Vectors.dense(features))
![Page 81: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/81.jpg)
10/02/2016@Miralak #jfokus
Decision Tree Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<>();int numClasses = ActivityType.values().length; //num of classes = num of activity to predictString impurity = "gini"; //measure of the homogeneity of the labels at the node ∑Ci=1fi(1−fi)int maxDepth = 20;int maxBins = 32; //minimum value for bins
// create modelfinal DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins);model.save(sc.sc(), "predictionModel/decisionTree");
// Compute classification accuracy on test datafinal long correctPredictionCount = testData
.mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label())) .filter(pl -> pl._1().equals(pl._2())) .count();
Double classificationAccuracy = 1.0 * correctPredictionCount / testData.count();
![Page 82: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/82.jpg)
10/02/2016@Miralak #jfokus
Decision Tree Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<>();int numClasses = ActivityType.values().length; //num of classes = num of activity to predictString impurity = "gini"; //measure of the homogeneity of the labels at the node ∑Ci=1fi(1−fi)int maxDepth = 20;int maxBins = 32; //minimum value for bins
// create modelfinal DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins);model.save(sc.sc(), "predictionModel/decisionTree");
// Compute classification accuracy on test datafinal long correctPredictionCount = testData
.mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label())) .filter(pl -> pl._1().equals(pl._2())) .count();
Double classificationAccuracy = 1.0 * correctPredictionCount / testData.count();
![Page 83: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/83.jpg)
10/02/2016@Miralak #jfokus
Decision Tree Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<>();int numClasses = ActivityType.values().length; //num of classes = num of activity to predictString impurity = "gini"; //measure of the homogeneity of the labels at the node ∑Ci=1fi(1−fi)int maxDepth = 20;int maxBins = 32; //minimum value for bins
// create modelfinal DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins);model.save(sc.sc(), "predictionModel/decisionTree");
// Compute classification accuracy on test datafinal long correctPredictionCount = testData
.mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label())) .filter(pl -> pl._1().equals(pl._2())) .count();
Double classificationAccuracy = 1.0 * correctPredictionCount / testData.count();
![Page 84: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/84.jpg)
10/02/2016@Miralak #jfokus
Random forest// parametersMap<Integer, Integer> categoricalFeaturesInfo = new HashMap<Integer, Integer>();int numTrees = 10;int numClasses = ActivityType.values().length;String featureSubsetStrategy = "auto"; //Number of features to consider for splits at each nodeString impurity = "gini";int maxDepth = 20;int maxBins = 32;int randomSeeds = 12345; //Random seed for bootstrapping and choosing feature subsets.
// create modelRandomForestModel model = RandomForest.trainClassifier(trainingData, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, randomSeeds);model.save(sc.sc(), "predictionModel/randomForest");
![Page 85: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/85.jpg)
10/02/2016@Miralak #jfokus
Accuracy● 13679 training acceleration ● 15 features
Decision tree accuracy 69.23%
Random forest accuracy 92.3%
![Page 86: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/86.jpg)
10/02/2016@Miralak #jfokus
Time for the demo!
![Page 87: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/87.jpg)
10/02/2016@Miralak #jfokus
Conclusion● Collect data for a device and analyse it.
● Sensors can be combined
● Infinites possibilities with IOT
![Page 88: IOT, timeseries and prediction with Android, Cassandra and ... · IOT, timeseries and prediction with Android, Cassandra and Spark Amira Lakhal @Miralak](https://reader030.vdocuments.site/reader030/viewer/2022040508/5e4851fc14b4f572661a3176/html5/thumbnails/88.jpg)
10/02/2016@Miralak #jfokus