Download - Power grid-data-analysis-overview-2013-03

Power Grid Data Management and AnalysisPRESENTED BY: TERENCE CRITCHLOW

PNNL-SA-94183

2

The core of the power grid has changed little over the past 25 years

Relatively small number of power producers

Large number of consumers

Transmission grid moves power from producers to distribution points

Distribution network moves power to consumers

Information on grid status is relatively sparseSCADA data every 2 sec – 1 min

Meter data every month

Top Engineering Achievement of the 20th Century – US NAE

3

The future power grid must be smarter, not just bigger

Climate change

Developing nations

New applications

Constraints Regulatory

Social

Physical

Integration of renewables

Distributed generation

Real time markets

Electric vehicles

Integrating renewables at scale requires faster understanding of transmission grid status

In order to meet statutory requirements, renewables must be integrated into the system

Do not consistently generate power => Need to be smoothed

Do not operate a fixed output levels=> need reliable predictions

Phasor Measurement Units (PMUs) are expected to be the dominant source of insight into transmission network status

~48bytes /record * 60 records /sec

50,000 * 2.88KB/sec ~= 144MB/sec

* 60*60*24 ~= 12.5TB/day4

5

Distributed generation means power production could occur anywhere

Consumer based electricity generation is on the rise

Amount of grid-supplied energy required by a particular consumer could vary dramatically based on external conditions

What happens if the power is not needed? Significant power coming from distribution system could decrease stability

6

Establishing real time markets will moderate both supply and demand

Current prices are fixed

What if prices changed every ~5-15min?Utility sets prices based on model (expected avail and usage)

Millions of meters receive prices

Meter estimates consumption based on price and statusEach appliance determines response

Responses are aggregated at meter

Meter returns proposed / actual consumption

To consumer, behavior appears the same

Predictive models

Smart MetersAdaptive

appliances

7

EV’s can act as both producers and consumers of electricity

Need to be ready to go when needed by driver

Discretion on when to re-charge batteriesConnected to grid most of the day

Does not have to start charging as soon as plugged in

Strategy could vary based on where you are

By selling stored electricity, could act as distributed generator Could employ a buy-low sell-high strategy

8

Data analysis is key to maintaining stability of the future power grid

Data flow is complex Multiple types of information (pricing, weather, sensor)

Information moving in both directions

Relatively high, sustained data rates

Privacy must be preserved

Utilities will require significant analysis capabilities

Effective model development requires a flexible, scalable data analysis pipeline

Sensor Streams

Data Analysis

Infrastructure

Data Storage

Models over streaming data

Accessible Repository

Community Resource

9

10

Goal: gain insights from real sensor data using event detection models

Out-of-sync eventsDetermine when the network partitions itself

Requires comparison across different PMUs

Generator trip eventsSudden drop in frequency that occurs across the network

Looking at average behavior of PMUs

2TB PMU data set38 PMUs

1.5 years

53.7B sensor readings

11

Our iterative approach uses historical data to validate the models

Data-Driven Model Development

Use actual data to guide definition of the models

Analyze the data

Identify events of interest

Create event extraction model based on data subset

Execute model against entire data set to extract events

Validate results

Models can be adapted to work on data streams

within a distributed, agent-based framework

Real-Time Event DetectionModels applied to live data streams

12

Our approach leverages R, Hadoop, and our institutional resources

RFlexible statistical scripting language

Thousands of packages

RHIPE interface to Hadoop

Easy to prototype models

Hadoop Scalable

PNNL Institutional Computing 19,200 cores (used max 2048 cores)

102TFlops

4PB Lustre file system

13

Initial event detection runs highlighted significant data quality errors

Over 10,000 candidate out-of-sync events detected

No good models of sensor errors were available

Errors Occurred over time

Required analysis across sensors

Included transient errors

Needed to differentiate between data that couldn’t occur and anomalous data

13

14

Exploratory data analysis is beneficial when you don’t know quite what you are looking for

Define initial problem

Define model

Run model over entire data set

Select interesting

subsets of the data

Analyze results / patterns

Model validated

Refine model

16

This lead to rediscovery of lost knowledge about status flags

Originally told flag 132 means bad data

A detailed look at regions with high concentrations of 59.999 Hz revealed correlation between certain flag / value combinations > 0.95

After additional investigation, we found specifications indicating any flag > 128 indicates bad data

8B records with bad data flags

17

Some PMUs were consistently less reliable than others

> 50% of the PMUs report no error flags

1 PMU reports nothing but error flagged data

Begs lots of questions we can’t answer with the data we have

Are certain devices less reliable

How do errors relate to maintenance

Are certain locations inherently less reliable

19

Frequency was unreliably reported when only spurious data was recorded

On certain days, there was an (unknown) problem that prevented most data from specific PMUs from being recorded

Only a small number of values were present resulting in large gaps

Stored data appeared random within these time frames

1.19B records removed from specific dates

Sometimes sensors get stuck repeating the same value

21

Experts thought change should occur at least every 5 sec, data indicated up to 10 sec was reasonable

Use geometric distribution to filter out sequences longer than statistically possible

These would be difficult to find if we sampled the data or if we were only looking at summaries

~124M records removed

21

22

Since the network is connected, everything should look essentially the same

There will be time delays between sensors, but the overall patterns should be similar

Differences in patterns can signal a network partitioning

Frequency data cannot change randomly, physics dictates how much variation between values is possible

23

Valid sensor data reflects the constraints on the underlying network

There should be a strong correlation between a current value and the preceding values

An autocorrelation analysis identified areas where the data was completely uncorrelated (compared to high correlation in normal use case)

Much of the random data fell into “acceptable” limits, so it would not be identified by thresholds

~25.5M records removed

Applying models developed to clean data had significant impact on data quality

2TB of historical PMU data53.7 B records

Identified 9.475B bad records 18% of original data is bad data

Defined 4 data cleaning filters Flag based (8.13B records)

Missing data(1.19B records)

Constant values (124M records)

White noise (25.5M records)

53.7B PMU sensor records

Filter error flags

Filter bad dates

Filter repeated seqs

Filter white noise

OOS freq algorithm

45.56B Records

44.37B Records

44.25B Records

44.21B Records

Event Repository

Gen trip algorithm

24

Once data was cleaned, event detection algorithms worked much better

Generator trips 329 candidate events detected

Most represent real events

Also detected unexpected, anomalous data spikes

Out-of-sync frequency73 events detected, instead of 10,000 with original data set

No islanding events detected

Most reflect offsets / shifts in frequency

25

26

Once the basic models work, more interesting questions can be answered

Where is the least stable generator?

Find the PMU that first identifies the trip

Start with data around the trips

Freq -15 std dev from the mean

Count number of times each PMU is first

Least stable generator is closest to that PMU

With additional information could triangulate actual generator

27

We have demonstrated our ability to run at scale

Tested on data up to 128 TBDuplicated data set

43 months of data for 1000 PMUs

Complete data set analysis in under 10 hours on 128 nodes

Good scalability demonstratedPrimary limitations are file systems related

Could increase number of nodes for faster analysis of large data

Hardware & software updated since these tests completed

684 PMU mo1,368 PMU mo2,736 PMU mo5,472 PMU mo43,776 PMU mo

28

We are now applying our models to real-time data streams

Existing R models Process data faster than data arrives

Incremental / windows minimize data requirements

Minor modifications to allow filters to work on streams instead of files

Being deployed in a framework designed to manage limited resources in a distributed environment

Generating artificial, but realistic data streams

29

How to store and disseminate data is a significant issue within the community

NASPI working on Data Repository white paper

CPUC Energy Data workshopHow can researchers access data efficiently

IEEE activity initiated Dec’12Organized by IBM

Brings together leaders from industry, research and academic organizations

Goal: a demonstration data center within 2 years

Critchlow (PNNL) leads architecture sub-committee

30

Ongoing and future activities build on the current capabilities

Data analysis researchRefine analysis questions

Incorporate multi-modal data

Apply appropriate machine learning algorithms

Improve scalability

Investigate in-memory solutions

Apply to streaming + historical data simultaneously

Data facility Define data access policies / requirements

Distributed or monolithic?

Data transfer capabilities

Data standards

Application libraries

Curation requirements

31

Thank you

?

Download - Power grid-data-analysis-overview-2013-03

Top Related