utilizing data warehousing and data mining algorithms on information gathered with iot sensors

15
Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors Eric Matthews Mohsen Tavakoli Fall 2016 Emerging Non-Traditional Database Systems: Data Warehousing and Mining (03-60-539) Dr. Ezeife 1

Upload: mohsen-tavakoli

Post on 24-Jan-2017

14 views

Category:

Documents


0 download

TRANSCRIPT

Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors

Eric Matthews – Mohsen Tavakoli Fall 2016

Emerging Non-Traditional Database Systems: Data Warehousing and Mining (03-60-539) Dr. Ezeife

1

Contents

● Software and Hardware● Data Warehouse● Roll-Up Function● WEKA Clustering

○ 3 Clusters○ 6 Clusters

● Conclusion

2

Software and Hardware● Arduino / Arduino IDE 1.6.13 (1)● Ubuntu Linux Server● Python 2.7 (Server & Client)● MySQL Server v5.5.46● WEKA 3.8

Sensors:(1)

(2)

(5)

(3)

(4)

3

Sound (RB-Wav-26) (2)Ultrasonic Distance (SR04) (3)Temperature (DHT11) (4)Light (Photoresistor) (5)Motion Sensor (HC-SR501) (6)

(6)

Data WarehouseOur data warehouse consists of the following fields:

● location_id (any unique location that the device is placed)● average , maximum and minimum over 10 readings of:

○ Distance○ Light○ Sound○ Temperature○ Humidity○ # of Counts of Motion

● time_collected (time that client collected data)● srv_time_collected (time that server collected data) 4

Data warehouse - Location 5 data

5

Location Table - Data warehouse

6

Roll-UpWe have created a stored procedure in MySQL that allows us to roll-up our data by any interval of time and location

CALL database_project.rollup_time(time_interval_seconds, location_id)

This query allows us to aggregate our data into fact tables by any time interval (minute, hour, day, year, or any amount of seconds) and location

We do this using GROUP BY on our time_collected field in MySQL

7

Roll-up (15s) - Location 5 - Example

8

WEKA Clustering - Location 5 - 3 ClustersUsing EM clustering with a maximum of 3 clusters, we have retrieved clusters for location 3, per minute, that we call Not Home, Passively Home, and Actively Home

Passively home Not home Actively Home9

● 47% Being used

● 53% Not being used

Location 5 - Cluster CentroidsUsing 3 of our attributes (Light, Motion, and Sound) we have calculated these centroids for our clusters in location 5. Data has been normalized.

Passively Home Not Home Actively Home

Avg Light 0.2533 0.7012 0.6758

Max Motion Count 0.1172 0 0.2433

Avg Sound 0.0431 0.0306 0.0819

# of Data Points 176 (9%) 899 (47%) 851 ( 44%)10

WEKA Clustering - Location 5 - 6 ClustersUsing EM on location 5 with no maximum cluster parameter resulted in 6 clusters Based on the clusters we came to the conclusion that:

● 51% location being used● 49% location not being used

● Highly Active● 2 Lights no Activity● Quietly Active● No light No Activity● Main Light No Activity● 1 Light Quietly Active

11

Location 5 - Cluster CentroidsHighly Active

2 LightsNo Activity

Quietly Active

No LightNo Activity

Main LightNo Activity

1 LightQuietly Active

Avg Light 0.6901 0.8603 0.6777 0 0.6485 0.3654

Max Motion Count

0.2052 0 0.2438 0 0 0.1826

Avg Sound

0.3034 0.0194 0.0717 0.0317 0.0338 0.049

# of Data Points

896 (52%) 519 (30%) 316 (18%)

73 (4%) 166 (9%) 94 (5%)

12

Conclusion- We can conclude that it is possible to define three different states of home

presence, namely: Not Home, Passively Home, and Actively Home- Any new readings can be categorized into these clusters to determine

whether the subject is home or not

- Also, we can gain finer detail into the state of a location by using more clusters:● Determine when lights or heating/cooling are turned on but nobody is

using the location● Monitor sources of ambient or constant noise● Detection of presence during usual periods of no activity (locked building,

or house) 13

Future WorkWe hope to find out more information from our data by:

● Collecting more data● Rolling up larger amounts of time● Using different subsets of our data for different hypotheses● Using different algorithms for clustering

14

Thank YouAny Questions?

15