big data types & sourcessmartgridsbigdataspoke.org/wp-content/uploads/2017/... · big data...
TRANSCRIPT
Big DataTypes & Sources
Focus GroupTexas A&M University
April 18, 2017
Big Data Challenges & Tools for Smart GridsCharacteristics:• Volume
• Small Packets to Large Files
• Velocity• Nanoseconds to years
• Veracity• Trust, Provenance, Security, Attacks, Fidelity, Integrity
• Variety• Sensor data to manufacturing specs to social media• Stuctured to unstructured• Real-time to static data
• Value• Time is of essence
• Findability• Discovery & Metadata
• Availability• Sharing and access
• Management/Organization• Authentication, Authorization, Auditing, Accounting
Tools:• Hadoop• Spark• Storm• AMQP Middleware Systems• Distributed Data/Compute Grids• Cluster Computing• Cloud Computing• Edge Computing• App Engines
Analytics:• Post-Mortem• Predictive• Proscriptive• Real-Time• Workflows/Pipelines• Rule-based Analytics
Higher-level Challenges in Smart GridsData-rich operational controls have a lot of potential:• Better integrate variable renewable energy resources like wind and solar
• Multiple production points – two-way power transfers• Distributed Energy Resource (DER) Integration • Breaking Silos
• Aggregate and match distribution resources to meet energy demand in near real-time
• Rapid Demand Response (DR) • Prevent power outages• Optimize unit commitment
• Improve planning in ways that minimize excessive costs from unnecessary infrastructure
• Reduce the need to build new power plants.• Forecast energy demand.
• Incent load-shifting to increase energy efficiencies• Coopt consumers in an effort to affect their usage patterns.• Integrate Environmental/Weather/Planning Data
Internal Challenges• Customer-Facing Analytics -• understanding of customer
behavior • customer accounts • demand response &variable pricing
• time-of-use rates or peak-time rebates• customers as partners in the grid
management challenge• reduce energy consumption to reduce
system peak loads • conservation voltage reduction (CVR)• integrate smart meter data and• customer insight into long-range
ratemaking and grid planning• Protection
• theft detection - revenue protection• data privacy and security
• New models• telecommunications• home security and home• improvement providers
• Strategic Initiatives• AMI Data Integration and Analysis • Grid Edge Management• Outage Detection & Restoration• Asset Management
• diagnosing the health of grid systems• Workforce Deployment
• Data Integration• Data Lake, Data Mart, Data Grid
• Cloud-based software-as-a-service (SaaS) models
• Predictive Analytics• Downtime, Line downs, …
Getting value out of Data
• Federation (pull together multi-resource, heterogeneous data)• Virtualization (hide complexity and location information)• Standardization (operationalize for ease of use)• Metadata & Ontology (capture contextual, descriptive & system info)
• Operational and Management Automation (Policy-driven)• Data Organization (Logical Aggregation)• Discovery Tools (Human and Machine-oriented finability)• Data Services (Dataflow/Workflow)
Data Sources & Types (non-exhaustive list)• Operational Parametric Data• Synchrophasor/phasor measurement unit• Event Data • Equipment Monitors• Wearables, Appliances and Devices - IoT• Fault Logs• Metering Data (automated metering infrastructure)• Line Ratings Data• Congestion Management Data• Load and Interface Flow Data• SCADA Historical Data• Billing Data• Non-operational Data• Enviro/Eco/Weather/Seismic/Hydro Data
Data Marts (John McDonald)
8
GE-Tarigma GEM
Operational vs. Non-Operational Data
John D. McDonald, GE
Ability Cloud
EDGE CLOUD
Cloud Gateway
Embedded Device
Edge Gateway(ABB Wireless)
• Andy Sun (ISyE, Georgia Tech)• How can big data help?
– Dealing with uncertainty:• Robust/stochastic short and long-term planning (natural
resource uncertainty)• Robust demand response (human behavior uncertainty)
– Static & dynamic physical flow problems:• Optimal power flow and optimal topology reconfiguration• State estimation • Voltage stability & load modeling
– Methodology research:• Uncertainty quantification & optimization
– Multistage robust/stochastic optimization• Nonconvex quadratic optimization and convexification
Andy Sun
Center for Analysis and Prediction of StormsUniversity of OklahomaHigh Resolution Forecast Ensembles
Vision: Future Operational Numerical Weather Prediction• Increasing resolution – operational models down to 1-3 km grid spacing• Able to model thunderstorms and thunderstorm complexes• Improved modeling of clouds over global models• Ensembles can provide probabilistic information and weighted mean better than a single deterministic
model.
CAPS Storm-Scale Ensemble Forecasts (SSEF)• Run Large Ensemble of Convection-Allowing NWP Forecasts (3 km) over CONUS• 25-50 NWP ensemble members total run at NSF XSEDE Centers• Explore new methods for severe weather prediction in 12-60h time frame• 5-6 weeks in Spring, application to Severe Weather Forecasting• 4 weeks in Summer, application to Flash Flood Forecasting
Datasets Dataset Domain Size/Member Size/Day Size/Season30 days
Full CONUS 1.64 TB 16.4 TB (for 10) 492 TB
SignificantSubdomain
200x200 13.5 GB 54 GB (for 4) 1.62 TB
Keith Brewster
BD Types and Sources: Discussion NotesSummary
• Many types and sources
• Silos at utility – data is not efficiently used in all the places within an organization that could benefit from it
• How to share the data you have with others
• CEII/proprietary data – data belongs not just to the utility but to the stakeholders or even the instrument
• Metrics drive data requirements, but good metrics are hard to develop without data
Path Forward
• Understand and manage how/when/why utilities can release what types of data
• Identify sources that may not be traditional but are helpful
• Licensing of data to provide assurance of how it will be handled, altered, etc.
• How to close the loop and get results back quickly to those providing the data
• We should address how it should all work if someone wanted to release and share the data
• How do we navigate different data types and sources to reach particular end goals
• How to efficiently organize the data to get the most value out? Policy driven, support human and machine learning applications
Google Document link with more live notes:
https://docs.google.com/document/d/1eL3r04fjZ_fSQafI_Fhdf_iJEt5_mdDcwEZopQQiJXI/edit