scada system failure impact study at lorne park and ... · scada system failure impact study at...

34
SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little, Claudio Cuffolo - Region of Peel Dale Barker, Eric Zhang - OCWA OWWA Conference 2014 May 5 th London, ON

Upload: trinhtuyen

Post on 26-Aug-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

SCADA System Failure Impact Study at Lorne Park and Lakeview

WTPs

Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little, Claudio Cuffolo - Region of Peel Dale Barker, Eric Zhang - OCWA

OWWA Conference 2014 May 5th

London, ON

Page 2: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Project Background

Project Objectives

Analysis Approach

Recommendations

Presentation Outline

Page 3: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Background

Lorne Park WTP and Lakeview WTP

Supplies water to South Peel and York Region

Owned by Peel Region and operated by OCWA

Page 4: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Background

Employs advanced treatment process: membranes, UV, Ozone, BACC, GACC, and conventional filters

Advanced SCADA system for automation

Page 5: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Background

Failures have been experienced at both plants

Adverse impact on the plants’ ability to maintain water production

Page 6: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Maintain three objectives during a SCADA failure event:

Water quality

Water production

Compliance reporting

Project Objectives

Page 7: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Establish critical process areas at each plant that are essential to the plant’s ability to meet water quality and quantity requirements.

Identify critical instruments for MOE compliance reporting.

Review each critical process area to identify operational and data reporting problems that could occur if SCADA was disrupted.

Project Objectives

Page 8: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Recommend operation procedures to follow during a SCADA failure for each of the problems identified above in order to minimize risks to water quality or quantity, or compliance reporting.

Conduct network or SCADA failure modes and effect analysis (FMEA).

Document the evaluation and recommendation in an organized format.

Project Objectives

Page 9: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

A systematic analysis approach is required to analyze such a complex system with a multitude of subjects

Analysis Approach

CAUSES

FAILURE

IMPACT

DESIGN IMPROVEMENT

EMERGENCY PREPAREDNESS PLAN

Page 10: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

AWWA G440 - Emergency Preparedness Practices AWWA M19 - Emergency Planning for Water Utilities AWWA J100 - Risk and Resilience Management of

Water and Wastewater Systems

Analysis Approach

Page 11: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Risk Management Framework ∗ Identify Risks ∗ Analyze Risks ∗ Evaluate Risks ∗ Treat Risks

Analysis Approach

Page 12: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Risk Analysis Processes and Tools ∗ RAMCAP - Risk Analysis and Management for Critical

Asset Protection ∗ HAZOP - HAZard and OPerability study ∗ FMEA - Failure Modes and Effect Analysis

Analysis Approach

Page 13: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Combination of RAMCAP and FMEA

Analysis Approach

*From AWWA, J-100

FMEA

Page 14: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Project Methodology ∗ Step 1 – Asset Characterisation (RAMCAP Step 1) ∗ Step 2 – Threat Characterization (RAMCAP Step 2) ∗ Step 3 – FMEA Analysis (Consequence, Risk Analysis

and Management, RAMCAP Step 3-7)

Analysis Approach

Page 15: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Asset Characterization – Lorne Park

LOW LIFT

MEMBRANE AOUV GAC

MUG

FILTER 9-12 UV

CONVENTIONAL FILTERS

RESERVOIR HIGH LIFT

Page 16: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Low Lift Pumping Plant 1 - MUG - Membranes Plant 1 – MUG - AOUV System Plant 1 – MUG - GAC Contactors Plant 2 - Filters No. 9-12 Plant 2 - Filters No. 9-12 UV High Lift Pumping Chemical Systems – Coagulant, Bisulphite for WW

supernatant dechlorination, Hypochlorite for disinfection

Critical Process Areas - Lorne Park

Page 17: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Compliance Intruments – Lorne Park

MEMBRANE/UV/GACC

CONVENTIONAL FILTERS 9-12 & RESIDUAL MANAGEMENT

CHEMICAL SYSTEM HIGH/LOW LIFT PUMP STATION

AIT39401

TURB

AIT39301

TURB

AIT39201

TURB

AIT39101

TURB

AIT34801

TURB

AIT34701

TURB

AIT34601

TURB

AIT34501

TURB

AIT34401

TURB

AIT34301

TURB

AIT34201

TURB

AIT34101

TURB

AIT35601

TURB

AIT35501

TURB

AIT35401

TURB

AIT35301

TURB

AIT35201

TURB

AIT35101

TURB

AIT35001

TURB

AIT34901

TURB

AIT87501

AIT87502

TURB

CL2/pH

AIT87504

FL

AIT42002

CL2

FIT64421

COAGFIT

16301

FIT17301

FIT17311

FIT39401

FIT39301

FIT39201

FIT39101

FIT39801

FIT39001

BW

GACBW

FIT42001

SUPERNATANT

CONTROL ROOM

RESERVOIR

LIT87501

FIT55101

FIT55201

FIT55301

FIT55401

FIT55501

FIT55601

FIT55701

FIT55801

GAC FLOW

FIT38801

FIT38701

FIT38601

FIT38501

FIT38401

FIT38301

FIT38201

FIT38101

LIT87511

TIT87501

TIT87511

Page 18: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

PLC & CCP – Lorne Park

Page 19: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

SCADA System Failure Modes

Threat Characterization

SCADA SYSTEMFAILURE

SCADANETWORKFAILURE

SCADA SERVERFAILURE

PLCFAILURE

RING SWITCHFAILURE

CABLE FAILURE- FIBRE RING

PANEL SWITCH FAILURE

POWER SUPPLYFAILURE

RACK FAILURE

CPU FAILURE

I/O MODULEFAILURE

BOTH SERVERSFAIL

VIRTUAL MACHINECRASH

PRIMARY SERVERFAILURE

SECONDARY SERVER FAILURE

CABLE FAILURE- CAT6 CABLE

BUS NETWORKFAILURE

CONTROLNETFAILURE

MODBUSFAILURE

DEVICENETFAILURE

SECURITYBREACH

POWERFAILURE

INSTRUMENTFAILURE

Page 20: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Threat Characterization

Failure Modes

Failure Causes Detection Impacts Response /Remediation

Criticality

SCADA Network Failure

Ring Switch Software Failure (RSTP) - Unidirectional link - Duplex Mismatch - Software bug - etc

SCADA freezing - Shutdown of the plant or process areas - Loss of compliance reporting - Loss of control of the Plant

Shutdown network and reboot switches one by one

High

PLC Hardware Failure - Major

Power Supply Failure - Remote Rack

All lights off at rack Loss of control of some equipment

Replace power supply

Medium

Remote Rack Failure All lights off at rack Loss of control of some equipment

Replace rack Medium

ControlNet Failure

ControlNet Module Failure - OK Status LED steady red, display "FAIL"

Loss of control of the affected process area

Test, reboot, or replace module

High

ControlNet Media Failure - Module indicates "NET ERR" - Network Channel Status A/B red or flashing red

Loss of control of the affected process area

Reconnect media High

Instrument Failure

Compliance Instrument Failure

- Fault alarm - Out of range alarm - Loss of Echo

- Loss of compliance data - Manual sampling and recording

High

Page 21: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Select Process/System for Evaluation Select Team Diagram system blocks Identify failure modes Identify failure causes Evaluate failure severity Evaluate failure probability Evaluate failure detection & rectification Calculate risk priority number (RPN) Recommend actions to reduce RPN

FMEA Analysis

Page 22: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Failure impacts ∗ Water production ∗ Water quality ∗ Compliance reporting ∗ Operational resources

FMEA Analysis

Page 23: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Page 24: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Page 25: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Page 26: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Scoring of Failure Severity

Score Severity Rationale

1 not perceptible No Effect

2 small This failure is not significant. Treatment process can continue with no interruption.

3 medium

This failure is significant enough to affect the treatment process system operation, compliance, water production, and water quality. Additional resources for plant operation are required.

4 significant

Inoperability of a significant process in the treatment plant. Significant requirement of additional resources to operate the process in local-manual mode.

5 very significant Total inoperability of the treatment plant

Page 27: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Scoring of Failure Probability Score Probability Rationale

1 improbable MTBF > 25 years 2 very small MTBF [ 10-25] years 3 Small MTBF [ 3-10] years 4 Medium MTBF [ 1-3] years

5 High MTBF < 1 years, or has occurred previously

Page 28: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

FMEA Analysis

Scoring of Failure Detection and Rectification Score Detection

/Rectification Rationale

1 very probable Very probable and immediate detection and rectification of failure (<1 hr)

2 high Chance of discovery high, time to detect and rectify failure is short. (<4 hr)

3 medium Time to detect and rectify the failure within a day. (<24 hr)

4 small

Failure may be undetected for extended period of time or it may take days to troubleshoot and rectify the failure. (< 1 week)

5 very small Very small chance of discovery or it will take a long time to rectify the failure (> 1 week)

Page 29: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Design Improvement ∗ I/O distribution ∗ Redundant network cabling ∗ Environmental control for network equipment ∗ Local data recording ∗ Local operation of critical process equipment ∗ Signal communication for process control

Recommendations

Page 30: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Operations Improvement ∗ Treatment process response to SCADA system failure ∗ Instrumentation response to SCADA system failure ∗ SCADA system maintenance

Recommendations

Page 31: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Recommendations

Process Criticality: High

Rationale: Supplies raw water to Plant 2.

SCADA Communication Dependency:

None.

Local-Manual Operation: Low Lift Pump Operation – Operate pump in Local-Manual mode from pump Control Gallery or Local Control Station. Start the required number of pumps based on filters feed flow requirement. Adjust flow control valve FCV16301/16302 to meet the flow setpoint and also to maintain a target level in Settled Water Conduit.

SCADA Failure Response: (1)If SCADA network fails a.Low Lift Pumps for Plant 2 can stay under SCADA control as there is no communication requirement between PLC-116 and PLC-398 to allow the operation of Low Lift Pumps

(2)If PLC-116 fails a.Check the feed flow demand for Plant 2 and start the appropriate number of low lift pumps. b.Manually adjust the position of flow control valves FCV16301/16302 to achieve the target feed flow demand and also to maintain a target level in Settled Water Conduit.

Additional Operational Resources Requirements:

One operator is required to monitor and control the operation of the low lift pumps to maintain water supply to Plant 2 when operating the pumps in Local Manual mode.

Low Lift Pumps for Conventional Filters 9-12

Page 32: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Emergency Preparedness Plan (EPP) ∗ Objectives: ∗ Emergency Response Team ∗ Communications Chart ∗ Plan Activation ∗ Component Specific Plans (process & instrumentation) ∗ Emergency Response and Recovery ∗ Training, Testing & Drilling

Recommendations

Page 33: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Recommendations

Abnormal Event SCADA Problem?

Actual/Probable Adverse Impact to

Water Supply?

OCWA designate contact

Peel On-call SCADA Tech

Peel SCADA follow with OCWA on action taken/

required

NO

OCWA ORO/Designate Contacts Peel On-call

Compliance Rep. Activate EPP

YES

Determine Failure Impact and

Affected Process Areas

OCWA Follows Appropriate Treatment

Process and Instrumentation Plans

OCWA/Peel SCADA Troubleshoots the SCADA

System

Calls in additional operations resources if

required

Communicates with internal/external contacts per Communication Chart

Isolate event causes and apply remedial actions

Treatment Process Recovery

Event Documentation

Engage the Emergency

Response Team

Page 34: SCADA System Failure Impact Study at Lorne Park and ... · SCADA System Failure Impact Study at Lorne Park and Lakeview WTPs Shawn Xiong - Cole Engineering Jeff Hennings, Jason Little,

Regional Municipality of Peel Ontario Clean Water Agency AECOM CH2M Hill

Acknowledgement