managing and mining smart meter data – at scale cse project showcase9 july 2013 twitter:...

39
Managing and mining smart meter data – at scale CSE Project Showcase 9 July 2013 Twitter: @cse_bristol #SmartMeterData

Upload: jonah-wells

Post on 18-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Managing and mining

smart meter data – at scale

CSE Project Showcase 9 July 2013

Twitter: @cse_bristol #SmartMeterData

Page 2: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

Contents

-Introduction to the project, the data, and its applications

-Managing SM data at scale

-Getting valuable knowledge out of SM data

-Demo: Smart Meter Analytics, Scaled by Hadoop (SMASH)

-Where next?

-Discussion

Page 3: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

Project Background

“Generating Value from Smart Electricity Meter Data”

18 Month TSB-supported collaboration

CSE, University of Bristol, SSE and Western Power Distribution

Three themes:

• Managing the data at scale• Extracting useful knowledge• Integrating the above in a user-facing application

Page 4: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

The data

A half-hourly timeseries for each smart meter / register

Content: date, time, consumption in the half hour.

For a single register: 17,520 records per year.

This is what 18 months look like:

Page 5: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

The data

EDRP:• 18 months• 16,250 smart metered households• 16,250 smart electricity meters• 9,364 smart gas meters• 670m half-hourly records (E: 420m, G: 250m)• 40GB of raw csv file data

Post rollout, per year, domestic only:• 25m smart metered households• 25m smart electricity meters• 20m smart gas meters• 800 billion half-hourly records (E: 450Bn, G: 350Bn)• 50TB of raw csv file data

EDRP ~ 0.1% of a year’s domestic data

Page 6: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

What might we use it for?

Improve existing processes

• Settlement• Billing, reconciliation, audit• Demand profiling• Customer profiling & segmentation

New processes not possible without HH data at scale

• Localised prediction• Distribution network planning and modelling• Automated DSM – prediction and verification• System state detection• Individualised consumer energy services

Page 7: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Introduction

What are the essential processes?

Ingestion – getting the data into the system

Storage – keeping it there securely

Analysis and reporting

• Ad-hoc queries• Transaction reports• Descriptives and summaries (e.g. OLAP)• Mining and modelling• Visualisation

Page 8: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

More fundamentally

Moving data between storage, memory and CPU

Transforming it in the CPU into desired forms

There are physical constraints on the speed of this.

(These are relevant at the scale of smart meter datasets).

Page 9: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Single machine RDBMS

MEMORY ~10s of GB per machine

CPU

STORAGE ~ 1TB per disk

~ 100 MB/s

~ 1000 MB/s

~ 2.5GHz

Using SQL Server to sum half hourly consumption:

4 bn records: ~ 1 hour40 bn records: ~ 10 hours1 years’ worth: ~ 200 hours

Page 10: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Single machine RDBMS

MEMORY ~10s of GB per machine

CPU

STORAGE ~ 1TB per disk

~ 100 MB/s

~ 1000 MB/s

~ 2.5GHz

Problem: the throughput of a single machine has not kept up with the growth in the size of datasets.

Page 11: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Single machine RDBMS

MEMORY ~10s of GB per machine

CPU

STORAGE ~ 1TB per disk

~ 100 MB/s

~ 1000 MB/s

~ 2.5GHz

Problem: the throughput of a single machine has not kept up with the growth in the size of datasets.

Solution: harness multiple individual machines (‘horizontal scaling’).

Page 12: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Single machine RDBMS

MEMORY ~10s of GB per machine

CPU

STORAGE ~ 1TB per disk

~ 100 MB/s

~ 1000 MB/s

~ 2.5GHz

Problem: the throughput of a single machine has not kept up with the growth in the size of datasets.

Solution: harness multiple individual machines.

Problem: this is difficult and expensive using traditional relational database applications

Page 13: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Solution

Move away from traditional databases and use a purpose-designed (‘big data’) framework to get horizontal scaling:

1 machine~£10k

2.5GHz1 GB/s100MB/s

~ a week

Page 14: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Solution

Move away from traditional databases and use a purpose-designed (‘big data’) framework to get horizontal scaling:

1 machine~£10k

2.5GHz1 GB/s100MB/s

~ a week

10 node cluster~£50k

25GHz10 GB/s1 GB/s

~ a day

Page 15: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Solution

Move away from traditional databases and use a purpose-designed (‘big data’) framework to get horizontal scaling:

1 machine~£10k

2.5GHz1 GB/s100MB/s

~ a week

10 node cluster~£50k

25GHz10 GB/s1 GB/s

~ a day

100 node cluster~£300k

250GHz100 GB/s10 GB/s

~ an hour

Page 16: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Hadoop

Designed to solve the problem of exponentially growing data volumes (originally, google’s searchable copy of the web)

Harness a large number of commodity machines and low cost networking and storage.

Software takes a job (query, calculation, whatever) and ‘maps’ it out across the cluster.

In parallel each node locally processes a subset of the problem, before the results are ‘reduced’ back to a single dataset.

(Hence ‘Map/Reduce’)

Page 17: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Experiments: SQL serverSingle high performance machine: bottlenecked by the speed of the hard drive

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 2,000,000,000 4,000,000,000 6,000,000,000

Runti

me

in s

econ

ds

Aggregation query performance versus dataset size

SQL Rows/second

~ 400GB

Page 18: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Experiments: Hadoop 11 node physical cluster (~£50k hardware cost)

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

0 10,000,000,000 20,000,000,000 30,000,000,000 40,000,000,000

Runti

me

in s

econ

ds

Aggregation query performance versus dataset size

SMASH Rows per second vs dataset size

~2,500GB

Page 19: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Experiments comparedNot straightforward to get SQL Server to run over ~ 10Bn records.

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

0 10,000,000,000 20,000,000,000 30,000,000,000 40,000,000,000

Runti

me

in s

econ

ds

Aggregation query performance versus dataset size

SMASH Rows per second vs dataset size

SQL Rows/second

~2,500GB

Page 20: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Experiments: growing the clusterFixed dataset size of 500m records

R² = 0.9148

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1 2 3 4 5 6 7 8 9 10 11

Row

s pe

r sec

ond

Cluster size (nodes)

Aggregation query performance versus cluster size

SMASH speed in records per second vs cluster size

Page 21: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data management & processing

Hadoop

Pros•Open source software – free and customisable•Adjustable data redundancy (data is replicated over the cluster)•Incrementally scalable – on both performance and cost measures: just add machines, system adapts automatically.•Responsive and cooperative developer community

Cons•Not the last word in user-friendliness (but this is changing)•Sledgehammer to crack a nut below a certain scale•Less mature (but rapidly developing) software ecosystem•Algorithms must fit the framework

Conclusion: low cost option for smart meter data processing

Page 22: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Finding value in the data

Improve existing processes

• Settlement• Billing, reconciliation, audit• Demand profiling• Customer profiling & segmentation

New processes not possible without HH data at scale

• Localised prediction• Distribution network planning and modelling• Automated DSM – prediction and verification• System state detection• Individualised consumer energy services

Page 23: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Finding value in the data

Collaborative approach with industry partners to identify business needs

Focus on:

(1) Datamining for subgroup discovery – classifying end users

(2) Cluster analysis on demand data – finding profiles

(3) Innovative visualisation of consumption data and datamining results

Page 24: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

“Pattern features”: 14 variables describing each household

•Income, geography, access to gas, size of house, value of house etc.

“Target features”: describe the behaviour of interest

•Profile error: how different is usage from the assigned profile?

Outputs:•groups of households with significantly different profile errors

Page 25: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

Looking at % annual profile error against sociodemographics

Page 26: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

Looking at % annual profile error against sociodemographics

Page 27: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

Looking at % annual profile error against sociodemographics

Page 28: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

Looking at % annual profile error against sociodemographics

Page 29: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Subgroup discovery

Looking at % annual profile error against sociodemographics

Page 30: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Clustering

Can we use demand data to create better profiles?

Define target features: waveform’s properties of interest

Two examples: using imposed and emergent properties.

Each using 3 clusters.

Page 31: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Clustering

E.g. 1 the average weekday as 5 pairs of numbers:

Page 32: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Clustering

E.g. 2: Frequency spectrum of the demand timeseries

Page 33: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Cluster analysis

Project competition results (the University won)

0.25

0.27

0.29

0.31

0.33

0.35

Average % difference from the cluster centroid

Page 34: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Data mining and visualisation

Conclusions from datamining

Subgroup discovery results suggest the approach is useful as long as you have metadata on the households

Cluster analysis work suggests it is possible to improve on the standard profile classes using SM data

Further work needs to be carried out on more representative datasets

There are many other potential applications!

Page 35: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

The SMASH application

Web application

Installation of Hadoop on UoB and CSE clusters

11 Node physical cluster at the university (£50k)8 Node virtual cluster at CSE (£15k)

Integration of a range of Hadoop-friendly data management components

Development of a proof-of-concept web application for user interaction, job management, visualisation etc.

Deployment on both clusters

Page 36: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

The SMASH application

Web application

Currently running on the CSE virtual Hadoop cluster

Page 37: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Generating Value from SM Data

Where next?

We have a proof-of-concept system developed with TSB R&D funding support.

We have mastered the underlying technologies and established that this approach has the potential to be a low-cost solution to a number of industry data challenges.

On a technical level the next steps are to•Further develop the web application •Refine the datamining algorithms (with more data)•Implement selected DM algorithms directly on the cluster

On a policy/programme level we want ensure this knowledge is incorporated into SM rollout infrastructure decision making.

Page 38: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Questions and discussion

@cse_bristol#SmartMeterData

Page 39: Managing and mining smart meter data – at scale CSE Project Showcase9 July 2013 Twitter: @cse_bristol #SmartMeterData

Contacts:

Simon Roberts [email protected]

Joshua Thumim [email protected]

Web: www.cse.org.uk Sign up to our monthly e-news through our website

Follow us on Twitter @cse_bristol