utilizing data warehousing and data mining algorithms on information gathered with iot sensors
TRANSCRIPT
Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors
Eric Matthews – Mohsen Tavakoli Fall 2016
Emerging Non-Traditional Database Systems: Data Warehousing and Mining (03-60-539) Dr. Ezeife
1
Contents
● Software and Hardware● Data Warehouse● Roll-Up Function● WEKA Clustering
○ 3 Clusters○ 6 Clusters
● Conclusion
2
Software and Hardware● Arduino / Arduino IDE 1.6.13 (1)● Ubuntu Linux Server● Python 2.7 (Server & Client)● MySQL Server v5.5.46● WEKA 3.8
Sensors:(1)
(2)
(5)
(3)
(4)
3
Sound (RB-Wav-26) (2)Ultrasonic Distance (SR04) (3)Temperature (DHT11) (4)Light (Photoresistor) (5)Motion Sensor (HC-SR501) (6)
(6)
Data WarehouseOur data warehouse consists of the following fields:
● location_id (any unique location that the device is placed)● average , maximum and minimum over 10 readings of:
○ Distance○ Light○ Sound○ Temperature○ Humidity○ # of Counts of Motion
● time_collected (time that client collected data)● srv_time_collected (time that server collected data) 4
Roll-UpWe have created a stored procedure in MySQL that allows us to roll-up our data by any interval of time and location
CALL database_project.rollup_time(time_interval_seconds, location_id)
This query allows us to aggregate our data into fact tables by any time interval (minute, hour, day, year, or any amount of seconds) and location
We do this using GROUP BY on our time_collected field in MySQL
7
WEKA Clustering - Location 5 - 3 ClustersUsing EM clustering with a maximum of 3 clusters, we have retrieved clusters for location 3, per minute, that we call Not Home, Passively Home, and Actively Home
Passively home Not home Actively Home9
● 47% Being used
● 53% Not being used
Location 5 - Cluster CentroidsUsing 3 of our attributes (Light, Motion, and Sound) we have calculated these centroids for our clusters in location 5. Data has been normalized.
Passively Home Not Home Actively Home
Avg Light 0.2533 0.7012 0.6758
Max Motion Count 0.1172 0 0.2433
Avg Sound 0.0431 0.0306 0.0819
# of Data Points 176 (9%) 899 (47%) 851 ( 44%)10
WEKA Clustering - Location 5 - 6 ClustersUsing EM on location 5 with no maximum cluster parameter resulted in 6 clusters Based on the clusters we came to the conclusion that:
● 51% location being used● 49% location not being used
● Highly Active● 2 Lights no Activity● Quietly Active● No light No Activity● Main Light No Activity● 1 Light Quietly Active
11
Location 5 - Cluster CentroidsHighly Active
2 LightsNo Activity
Quietly Active
No LightNo Activity
Main LightNo Activity
1 LightQuietly Active
Avg Light 0.6901 0.8603 0.6777 0 0.6485 0.3654
Max Motion Count
0.2052 0 0.2438 0 0 0.1826
Avg Sound
0.3034 0.0194 0.0717 0.0317 0.0338 0.049
# of Data Points
896 (52%) 519 (30%) 316 (18%)
73 (4%) 166 (9%) 94 (5%)
12
Conclusion- We can conclude that it is possible to define three different states of home
presence, namely: Not Home, Passively Home, and Actively Home- Any new readings can be categorized into these clusters to determine
whether the subject is home or not
- Also, we can gain finer detail into the state of a location by using more clusters:● Determine when lights or heating/cooling are turned on but nobody is
using the location● Monitor sources of ambient or constant noise● Detection of presence during usual periods of no activity (locked building,
or house) 13
Future WorkWe hope to find out more information from our data by:
● Collecting more data● Rolling up larger amounts of time● Using different subsets of our data for different hypotheses● Using different algorithms for clustering
14