Download - Power grid-data-analysis-overview-2013-03
Power Grid Data Management and AnalysisPRESENTED BY: TERENCE CRITCHLOW
PNNL-SA-94183
2
The core of the power grid has changed little over the past 25 years
Relatively small number of power producers
Large number of consumers
Transmission grid moves power from producers to distribution points
Distribution network moves power to consumers
Information on grid status is relatively sparseSCADA data every 2 sec – 1 min
Meter data every month
Top Engineering Achievement of the 20th Century – US NAE
3
The future power grid must be smarter, not just bigger
Climate change
Developing nations
New applications
Constraints Regulatory
Social
Physical
Integration of renewables
Distributed generation
Real time markets
Electric vehicles
Integrating renewables at scale requires faster understanding of transmission grid status
In order to meet statutory requirements, renewables must be integrated into the system
Do not consistently generate power => Need to be smoothed
Do not operate a fixed output levels=> need reliable predictions
Phasor Measurement Units (PMUs) are expected to be the dominant source of insight into transmission network status
~48bytes /record * 60 records /sec
50,000 * 2.88KB/sec ~= 144MB/sec
* 60*60*24 ~= 12.5TB/day4
5
Distributed generation means power production could occur anywhere
Consumer based electricity generation is on the rise
Amount of grid-supplied energy required by a particular consumer could vary dramatically based on external conditions
What happens if the power is not needed? Significant power coming from distribution system could decrease stability
6
Establishing real time markets will moderate both supply and demand
Current prices are fixed
What if prices changed every ~5-15min?Utility sets prices based on model (expected avail and usage)
Millions of meters receive prices
Meter estimates consumption based on price and statusEach appliance determines response
Responses are aggregated at meter
Meter returns proposed / actual consumption
To consumer, behavior appears the same
Predictive models
Smart MetersAdaptive
appliances
7
EV’s can act as both producers and consumers of electricity
Need to be ready to go when needed by driver
Discretion on when to re-charge batteriesConnected to grid most of the day
Does not have to start charging as soon as plugged in
Strategy could vary based on where you are
By selling stored electricity, could act as distributed generator Could employ a buy-low sell-high strategy
8
Data analysis is key to maintaining stability of the future power grid
Data flow is complex Multiple types of information (pricing, weather, sensor)
Information moving in both directions
Relatively high, sustained data rates
Privacy must be preserved
Utilities will require significant analysis capabilities
Effective model development requires a flexible, scalable data analysis pipeline
Sensor Streams
Data Analysis
Infrastructure
Data Storage
Models over streaming data
Accessible Repository
Community Resource
9
10
Goal: gain insights from real sensor data using event detection models
Out-of-sync eventsDetermine when the network partitions itself
Requires comparison across different PMUs
Generator trip eventsSudden drop in frequency that occurs across the network
Looking at average behavior of PMUs
2TB PMU data set38 PMUs
1.5 years
53.7B sensor readings
11
Our iterative approach uses historical data to validate the models
Data-Driven Model Development
Use actual data to guide definition of the models
Analyze the data
Identify events of interest
Create event extraction model based on data subset
Execute model against entire data set to extract events
Validate results
Models can be adapted to work on data streams
within a distributed, agent-based framework
Real-Time Event DetectionModels applied to live data streams
12
Our approach leverages R, Hadoop, and our institutional resources
RFlexible statistical scripting language
Thousands of packages
RHIPE interface to Hadoop
Easy to prototype models
Hadoop Scalable
PNNL Institutional Computing 19,200 cores (used max 2048 cores)
102TFlops
4PB Lustre file system
13
Initial event detection runs highlighted significant data quality errors
Over 10,000 candidate out-of-sync events detected
No good models of sensor errors were available
Errors Occurred over time
Required analysis across sensors
Included transient errors
Needed to differentiate between data that couldn’t occur and anomalous data
13
14
Exploratory data analysis is beneficial when you don’t know quite what you are looking for
Define initial problem
Define model
Run model over entire data set
Select interesting
subsets of the data
Analyze results / patterns
Model validated
Refine model
16
This lead to rediscovery of lost knowledge about status flags
Originally told flag 132 means bad data
A detailed look at regions with high concentrations of 59.999 Hz revealed correlation between certain flag / value combinations > 0.95
After additional investigation, we found specifications indicating any flag > 128 indicates bad data
8B records with bad data flags
17
Some PMUs were consistently less reliable than others
> 50% of the PMUs report no error flags
1 PMU reports nothing but error flagged data
Begs lots of questions we can’t answer with the data we have
Are certain devices less reliable
How do errors relate to maintenance
Are certain locations inherently less reliable
19
Frequency was unreliably reported when only spurious data was recorded
On certain days, there was an (unknown) problem that prevented most data from specific PMUs from being recorded
Only a small number of values were present resulting in large gaps
Stored data appeared random within these time frames
1.19B records removed from specific dates
Sometimes sensors get stuck repeating the same value
21
Experts thought change should occur at least every 5 sec, data indicated up to 10 sec was reasonable
Use geometric distribution to filter out sequences longer than statistically possible
These would be difficult to find if we sampled the data or if we were only looking at summaries
~124M records removed
21
22
Since the network is connected, everything should look essentially the same
There will be time delays between sensors, but the overall patterns should be similar
Differences in patterns can signal a network partitioning
Frequency data cannot change randomly, physics dictates how much variation between values is possible
23
Valid sensor data reflects the constraints on the underlying network
There should be a strong correlation between a current value and the preceding values
An autocorrelation analysis identified areas where the data was completely uncorrelated (compared to high correlation in normal use case)
Much of the random data fell into “acceptable” limits, so it would not be identified by thresholds
~25.5M records removed
Applying models developed to clean data had significant impact on data quality
2TB of historical PMU data53.7 B records
Identified 9.475B bad records 18% of original data is bad data
Defined 4 data cleaning filters Flag based (8.13B records)
Missing data(1.19B records)
Constant values (124M records)
White noise (25.5M records)
53.7B PMU sensor records
Filter error flags
Filter bad dates
Filter repeated seqs
Filter white noise
OOS freq algorithm
45.56B Records
44.37B Records
44.25B Records
44.21B Records
Event Repository
Gen trip algorithm
24
Once data was cleaned, event detection algorithms worked much better
Generator trips 329 candidate events detected
Most represent real events
Also detected unexpected, anomalous data spikes
Out-of-sync frequency73 events detected, instead of 10,000 with original data set
No islanding events detected
Most reflect offsets / shifts in frequency
25
26
Once the basic models work, more interesting questions can be answered
Where is the least stable generator?
Find the PMU that first identifies the trip
Start with data around the trips
Freq -15 std dev from the mean
Count number of times each PMU is first
Least stable generator is closest to that PMU
With additional information could triangulate actual generator
27
We have demonstrated our ability to run at scale
Tested on data up to 128 TBDuplicated data set
43 months of data for 1000 PMUs
Complete data set analysis in under 10 hours on 128 nodes
Good scalability demonstratedPrimary limitations are file systems related
Could increase number of nodes for faster analysis of large data
Hardware & software updated since these tests completed
684 PMU mo1,368 PMU mo2,736 PMU mo5,472 PMU mo43,776 PMU mo
28
We are now applying our models to real-time data streams
Existing R models Process data faster than data arrives
Incremental / windows minimize data requirements
Minor modifications to allow filters to work on streams instead of files
Being deployed in a framework designed to manage limited resources in a distributed environment
Generating artificial, but realistic data streams
29
How to store and disseminate data is a significant issue within the community
NASPI working on Data Repository white paper
CPUC Energy Data workshopHow can researchers access data efficiently
IEEE activity initiated Dec’12Organized by IBM
Brings together leaders from industry, research and academic organizations
Goal: a demonstration data center within 2 years
Critchlow (PNNL) leads architecture sub-committee
30
Ongoing and future activities build on the current capabilities
Data analysis researchRefine analysis questions
Incorporate multi-modal data
Apply appropriate machine learning algorithms
Improve scalability
Investigate in-memory solutions
Apply to streaming + historical data simultaneously
Data facility Define data access policies / requirements
Distributed or monolithic?
Data transfer capabilities
Data standards
Application libraries
Curation requirements
31
Thank you
?