1 tdtwg report to rms scr addressing ercot system outages tuesday, may 10
DESCRIPTION
3 Background continued… While it is difficult to determine the exact impact or to what extent a customer or Market Participant has been impacted by an ERCOT System Outage, it is extremely apparent that these Outages if allowed to grow in number may eventually pose a detriment to the Texas Retail Market. At the request of RMS, TDTWG was asked to review the reasons for the outages and determine if anything can be done. TDTWG has completed a review of each outage as well as the activities necessary to restore successful system processing in each event. The product of that work is the SCR being presented at this meeting.TRANSCRIPT
1
TDTWG Report to RMS
SCR Addressing ERCOT System Outages
Tuesday, May 10
2
BackgroundOriginal design for ERCOT system architecture did not include “high
availability” for ERCOT Systems
ERCOT built systems to comply with Protocol timing specific for ERCOT
Increased criticality of Market process timing and transaction volume has driven the need for ERCOT Systems to be more robust than originally designed and built.
Several Market system changes for ERCOT have been approved and many have been implemented but those did not include “high availability” of the ERCOT systems supporting the Retail Market.
ERCOT Unplanned System Outages have become burdensome on Market Participants and have in some cases impacted processes directly related to supporting customers.
3
Background continued…
While it is difficult to determine the exact impact or to what extent a customer or Market Participant has been impacted by an ERCOT System Outage, it is extremely apparent that these Outages if allowed to grow in number may eventually pose a detriment to the Texas Retail Market.
At the request of RMS, TDTWG was asked to review the reasons for the outages and determine if anything can be done.
TDTWG has completed a review of each outage as well as the
activities necessary to restore successful system processing in each event. The product of that work is the SCR being presented at this meeting.
4
TDTWG Approach The TDTWG approach included reviewing procedures for systems in an
attempt to ensure all aspects surrounding an ERCOT Outage were taken into consideration. In order to do this ERCOT IT provided multiple overviews of processes and systems to help TDTWG members be able to take an informed approach prior to proceeding with their analysis.
These include:• Presentation of existing system architecture• Understanding of current processing capability/limitations• Details of transaction process timing• Overview and discussion of internal processes supporting systems
TDTWG completed a detailed review of system outages including: date, length of outage, system affected, description of outage, action necessary to resolve the outage including what is necessary to ensure the outage should not happen again.
This information is contained in the “Outage Appendix” of the SCR.
5
Existing NAESB
The ERCOT Network provides Internet redundancy in the form of dual ISP connections to disparate providers in addition to high speed metro links between sites.
The ERCOT Network provides high-availability in the form of redundant firewalls, switches, and routers.
NAESB is currently only available in Taylor. There is only one sender and one receiver. However test systems are available in Austin with the capability of back end connectivity to Taylor.
A single server failure has the capability to take NAESB offline
Network maintenance can render NAESB unavailable.
To Austin(10 Mb Ethernet)
Internet
Taylor, TX
Market Paticipant
NAESBSender
NAESBReceiver
Inside Switch
InternetFirewall
ProductionFirewall
Internet Switches Perimeter Router
DMZ Switches
6
While ERCOT should complete a full evaluation of systems and processes, TDTWG agrees with this as the recommended enhancement for NAESB Reliability.
TDTWG would like ERCOT to take this recommendation into consideration while developing their evaluation of systems.
Install an extra sender and receiver at each site
Load balance these new servers behind a redundant set of content switches
A single server failure no longer has the capability to completely take NAESB offline
Firewall or Router maintenance will no longer render NAESB unavailable. This will mimic the failover functionality of our other high availability systems.
To Austin(10 Mb Ethernet)
Internet
Taylor, TX
Market Paticipant
NAESBSender
NAESBReceiver
Inside Switch
ProductionFirewall
ProductionFirewall
Inside Switch Perimeter Router
DMZ Switches
Load Balancer
Add redundant setof load balancers
Add additionalNAESB sender
and receiver
Recommendation to improve NAESB Reliability
7
Next Steps
• SCR process to begin• TDTWG will follow the SCR through the
process• ERCOT will complete their Evaluation and
respond to the Market • Market workshops may be held throughout
the process as needed
8
Timeline and Process for SCR • SCR to be posted by May 20• SCR to be reviewed by the Market for 21 days• Following the 21 day comment period, ERCOT Market rules will send all
comments received for the SCR to the RMS listserve• SCR and comments will be presented at the June 15 RMS meeting • RMS may vote to move forward with the SCR at the June 15 RMS meeting• Following the RMS vote, ERCOT will begin an impact analysis • Impact analysis will be presented at the July 13 RMS meeting – RMS may
vote to approve the SCR which will include the impact analysis• Market workshop may be called to review• At that point, if approved ERCOT will update the SCR with the RMS
recommendation and send to PRS for prioritization • August 4, TAC considers for approval • Following August 4, TAC recommendation is posted by ERCOT and a 30
day period for review and comments • October ERCOT Board meeting, Board to consider
9
Section 21, System Change Section 21, System Change Requests Timeline for the SCRRequests Timeline for the SCR
XXXX XXXXXX21 Day Comment 21 Day Comment
PeriodPeriod
ERCOT PostsERCOT PostsSCR by 5/20SCR by 5/20
June 15June 15thth RMS RMSConsiderationConsideration
11stst Consideration Consideration
May 15thMay 15th21 Day Comment 21 Day Comment
PeriodPeriodAugust 4thAugust 4th
TACTACConsiderationConsideration
ERCOT PostsERCOT PostsSCR RecSCR Rec
ERCOT PostsERCOT PostsTAC RecTAC Rec
ERCOT PostsERCOT PostsBOD DecisionBOD Decision
25-Day IA25-Day IAPeriodPeriod
July 13July 13thth RMS RMSConsiderationConsideration
2nd Consideration2nd Consideration
XX
ERCOT UpdatesERCOT UpdatesRMS Rec; send toRMS Rec; send to
PRS for priorityPRS for priority
XX
October October BODBOD
ConsiderationConsideration
30-Day IA30-Day IAPeriodPeriod
10
Questions?