1 tdtwg report to rms scr addressing ercot system outages tuesday, may 10

10
1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

Upload: briana-young

Post on 20-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

3 Background continued… While it is difficult to determine the exact impact or to what extent a customer or Market Participant has been impacted by an ERCOT System Outage, it is extremely apparent that these Outages if allowed to grow in number may eventually pose a detriment to the Texas Retail Market. At the request of RMS, TDTWG was asked to review the reasons for the outages and determine if anything can be done. TDTWG has completed a review of each outage as well as the activities necessary to restore successful system processing in each event. The product of that work is the SCR being presented at this meeting.

TRANSCRIPT

Page 1: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

1

TDTWG Report to RMS

SCR Addressing ERCOT System Outages

Tuesday, May 10

Page 2: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

2

BackgroundOriginal design for ERCOT system architecture did not include “high

availability” for ERCOT Systems

ERCOT built systems to comply with Protocol timing specific for ERCOT

Increased criticality of Market process timing and transaction volume has driven the need for ERCOT Systems to be more robust than originally designed and built.

Several Market system changes for ERCOT have been approved and many have been implemented but those did not include “high availability” of the ERCOT systems supporting the Retail Market.

ERCOT Unplanned System Outages have become burdensome on Market Participants and have in some cases impacted processes directly related to supporting customers.

Page 3: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

3

Background continued…

While it is difficult to determine the exact impact or to what extent a customer or Market Participant has been impacted by an ERCOT System Outage, it is extremely apparent that these Outages if allowed to grow in number may eventually pose a detriment to the Texas Retail Market.

At the request of RMS, TDTWG was asked to review the reasons for the outages and determine if anything can be done.

TDTWG has completed a review of each outage as well as the

activities necessary to restore successful system processing in each event. The product of that work is the SCR being presented at this meeting.

Page 4: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

4

TDTWG Approach The TDTWG approach included reviewing procedures for systems in an

attempt to ensure all aspects surrounding an ERCOT Outage were taken into consideration. In order to do this ERCOT IT provided multiple overviews of processes and systems to help TDTWG members be able to take an informed approach prior to proceeding with their analysis.

These include:• Presentation of existing system architecture• Understanding of current processing capability/limitations• Details of transaction process timing• Overview and discussion of internal processes supporting systems

TDTWG completed a detailed review of system outages including: date, length of outage, system affected, description of outage, action necessary to resolve the outage including what is necessary to ensure the outage should not happen again.

This information is contained in the “Outage Appendix” of the SCR.

Page 5: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

5

Existing NAESB

The ERCOT Network provides Internet redundancy in the form of dual ISP connections to disparate providers in addition to high speed metro links between sites.

The ERCOT Network provides high-availability in the form of redundant firewalls, switches, and routers.

NAESB is currently only available in Taylor. There is only one sender and one receiver. However test systems are available in Austin with the capability of back end connectivity to Taylor.

A single server failure has the capability to take NAESB offline

Network maintenance can render NAESB unavailable.

To Austin(10 Mb Ethernet)

Internet

Taylor, TX

Market Paticipant

NAESBSender

NAESBReceiver

Inside Switch

InternetFirewall

ProductionFirewall

Internet Switches Perimeter Router

DMZ Switches

Page 6: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

6

While ERCOT should complete a full evaluation of systems and processes, TDTWG agrees with this as the recommended enhancement for NAESB Reliability.

TDTWG would like ERCOT to take this recommendation into consideration while developing their evaluation of systems.

Install an extra sender and receiver at each site

Load balance these new servers behind a redundant set of content switches

A single server failure no longer has the capability to completely take NAESB offline

Firewall or Router maintenance will no longer render NAESB unavailable. This will mimic the failover functionality of our other high availability systems.

To Austin(10 Mb Ethernet)

Internet

Taylor, TX

Market Paticipant

NAESBSender

NAESBReceiver

Inside Switch

ProductionFirewall

ProductionFirewall

Inside Switch Perimeter Router

DMZ Switches

Load Balancer

Add redundant setof load balancers

Add additionalNAESB sender

and receiver

Recommendation to improve NAESB Reliability

Page 7: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

7

Next Steps

• SCR process to begin• TDTWG will follow the SCR through the

process• ERCOT will complete their Evaluation and

respond to the Market • Market workshops may be held throughout

the process as needed

Page 8: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

8

Timeline and Process for SCR • SCR to be posted by May 20• SCR to be reviewed by the Market for 21 days• Following the 21 day comment period, ERCOT Market rules will send all

comments received for the SCR to the RMS listserve• SCR and comments will be presented at the June 15 RMS meeting • RMS may vote to move forward with the SCR at the June 15 RMS meeting• Following the RMS vote, ERCOT will begin an impact analysis • Impact analysis will be presented at the July 13 RMS meeting – RMS may

vote to approve the SCR which will include the impact analysis• Market workshop may be called to review• At that point, if approved ERCOT will update the SCR with the RMS

recommendation and send to PRS for prioritization • August 4, TAC considers for approval • Following August 4, TAC recommendation is posted by ERCOT and a 30

day period for review and comments • October ERCOT Board meeting, Board to consider

Page 9: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

9

Section 21, System Change Section 21, System Change Requests Timeline for the SCRRequests Timeline for the SCR

XXXX XXXXXX21 Day Comment 21 Day Comment

PeriodPeriod

ERCOT PostsERCOT PostsSCR by 5/20SCR by 5/20

June 15June 15thth RMS RMSConsiderationConsideration

11stst Consideration Consideration

May 15thMay 15th21 Day Comment 21 Day Comment

PeriodPeriodAugust 4thAugust 4th

TACTACConsiderationConsideration

ERCOT PostsERCOT PostsSCR RecSCR Rec

ERCOT PostsERCOT PostsTAC RecTAC Rec

ERCOT PostsERCOT PostsBOD DecisionBOD Decision

25-Day IA25-Day IAPeriodPeriod

July 13July 13thth RMS RMSConsiderationConsideration

2nd Consideration2nd Consideration

XX

ERCOT UpdatesERCOT UpdatesRMS Rec; send toRMS Rec; send to

PRS for priorityPRS for priority

XX

October October BODBOD

ConsiderationConsideration

30-Day IA30-Day IAPeriodPeriod

Page 10: 1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10

10

Questions?