3 risk reliability and availability 2009

29
1 University Of Western Australia Subsea Technology module OENA8589 RISK, RELIABILITY AND AVAILABILITY Kevin Mullen Risk

Upload: dhejo

Post on 13-Nov-2014

514 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 3 Risk Reliability and Availability 2009

1

University Of Western Australia

Subsea Technology module OENA8589

RISK, RELIABILITY AND AVAILABILITY

Kevin Mullen

Risk

Page 2: 3 Risk Reliability and Availability 2009

2

What is Risk?

• “The chance of something happening that will have an impact on the objective”

• Frequency x Consequence

• “Expected value of an unwanted outcome measured in dollars”

What is Risk?

• “Expected value of an unwanted outcome measured in dollars”

– Injury or death of personnel

– Damage or destruction of the environment

– Excessive production costs

– Reduction or loss of production

– Project delays

Page 3: 3 Risk Reliability and Availability 2009

3

Consequence

Likelihood

Typical Risk Matrix

Likelihood -> Never heard of Has occurred Has occurred Occurs often Occurs often

Consequence in industry in industry in company in company at site

No Injury LOW LOW LOW LOW LOW

Slight injury LOW LOW MED MED MED

Minor injury LOW MED MED HIGH HIGH

Major injury MED MED HIGH HIGH VERY HIGH

Fatality MED HIGH HIGH VERY HIGH VERY HIGH

Multiple fatality HIGH HIGH VERY HIGH VERY HIGH VERY HIGH

VERY HIGH Rectify immediately

HIGH Rectify with urgency, unless clearly impracticable

MED Reduce risk as far as practicable

LOW Accept, but manage through competency and awareness

Page 4: 3 Risk Reliability and Availability 2009

4

Enterprise-Wide Risk Ranking MatrixAPPENDIX A -- ENTERPRISE-WIDE RISK RANKING MATRIX

OCCUPATIONAL HEALTH AND SAFETY DRIVENSAFETY MANAGEMENT SYSTEM DRIVEN

ENTERPRISE RISK MANAGEMENT DRIVEN

PRIMARY DRIVER

66654Remote(5)Highly unlikely, although statistics show that a similar event has happened.Statistical probability P< 10-6

66644Unlikely(4)Given current practices and procedures, this incident is not likely to occur at this facility. Statistical probability:10-4 >P > 10-6

65433Seldom(3)Incident has occurred at a similar facility and may reasonably occur at this facility. Statistical probability: 10-3 > P > 10-4

64322Occasional(2)Incident may occur at this facility some time during its life time. Statistical probability: 10-2 > P > 10-3

53221Frequent(1)Incident is very likely to occur at this facility. Possibly several times during its life time. Statistical probability P> 10-2

Incidental(5)

���� PERSONNEL – Minor or no injury, no lost time.���� COMMUNITY - No injury, hazard, or annoyance to the public.���� ENVIRONMENT -Environmentally recordable event with no Agency notification or Permit violation.���� FACILITY - Minimal equipment damage at an estimated cost less than $100,000; negligible downtime.

Minor(4)

���� PERSONNEL - Single injury, not severe, possible lost time.���� COMMUNITY - Odor or noise complaint from the public.���� ENVIRONMENTAL -Release which results in Agency notification or Permit violation.���� FACILITY - Some equipment damage at an estimated cost greater than $100,000 but less than $1,000,000; 1 to 10 days of downtime.

Serious(3)

� PERSONNEL - One or more severe injuries, including permanently disabling injuries.� COMMUNITY - One or more minor injuries.� ENVIRONMENTAL -Significant release with serious off-site impact.� FACILITY - Damage to process area(s) at an estimated cost greater than $1,000,000 but less than $10,000,000; 10 to 90 days of downtime.

Major(2)

���� PERSONNEL – One or several fatalities, limited to immediate area of incident.���� COMMUNITY - One or more severe injuries.���� ENVIRONMENTAL -Significant release with serious off-site impact and more likely than not to cause immediate or long-term health effects.���� FACILITY – Damage to installation(s) estimated at a cost greater than $10,000,000 but less than $100,000,000; downtime in excess of 90 days.

Threat to Enterprise (Catastrophic)(1)

PERSONNEL – Multiple (five or more) fatalities.COMMUNITY – Widespread impact to nearby communities.ENVIRONMENTAL – Long term environmental impact, and/or adverse, worldwide publicity.FACILITY – Total destruction to installation(s) estimated at a cost greater than $100,000,000; Extended facility shutdown, and/or potential for permanent closure. For floating production systems, loss of floating structure.

SEVERITY OF CONSEQUENCES

LIKELIHOOD OF OCCURRENCE

ENTERPRISE-

WIDE RISK RANKING

MATRIX

Risk Assessments

1. Identifying what could go wrong2. Estimating the likelihood of these

events occurring3. Examining the possible

consequences of these events

4. Deciding which risks are tolerable and which aren’t

5. Modifying the activity so the intolerable risks are reduced or eliminated.

Risk Assessment

Risk Management -changes to design and

operational practice

Risk Analysis

QRA – Quantitative Risk Assessment

Page 5: 3 Risk Reliability and Availability 2009

5

Fatal Accident Rates

Implied Cost of Averting a Fatality (ICAF)58. In making an assessment of reasonable practicability, there is a need to set criteria on the

value of a life or implied cost of averting a statistical fatality (ICAF). HSE’s ‘Reducing Risks Protecting People’ document sets the value of a life at £1,000,000 and by implication therefore the level at which the costs are disproportionate to the benefits gained. In simplistic terms, a measure that costs less than £1,000,000 and saves a life over the lifetime of an installation is reasonably practicable, while one that costs significantly more than £1,000,000, is disproportionate and therefore is not justified. However case law indicates that costs should be grossly disproportionate and therefore costs in excess of this figure (usually multiples) are used in the offshore industry. In reality of course there is no simple cut-off and a whole range of factors, including uncertainty need to be taken account of in the decision making process.

59. In the offshore industry there is a need to take account of the increased focus on societal (or group) risk, i.e. the risk of multiple fatalities in a single event, as a result of society's perceptions of these types of accident. Therefore the offshore industry typically addresses this by using a high proportion factor for the maximum level of sacrifice that can be borne without it being judged ‘grossly disproportionate’; this has the effect of increasing the ICAF value used for decision-making. The typical ICAF value used by the offshore industry is around £6,000,000, i.e. a proportion factor of 6. HSE considers this to be the minimum level for the application of Cost Benefit Analysis (CBA) in the offshore industry.

60. Use of a proportion factor of 6 ensures that any CBA tends towards the conservative end of the spectrum and therefore takes account of the potential for multiple fatalities and uncertainty. Although a proportion factor of 6 tends to be used, there are no agreed standards and it is for each duty holder to apply higher levels if appropriate, for example in very novel designs.

Extract from Assessment Principles for Offshore Safety Cases (APOSC)

Issued March 2006

UK Health and Safety Executive

Page 6: 3 Risk Reliability and Availability 2009

6

Safety Terminology

• Risk Assessment - a subjective evaluation, involving judgment, intuition and experience, where the level of risk is classified in four levels and their associated measures of Fatalities/Person/Year

– 1) Tolerable Risk - level prepared to accept but will continue to seek reduction. 10-3 to 10-5

– 2) Acceptable Risk - level prepared to accept without seeking further reduction. 10-5

– 3) Unacceptable Risk - level prepared to reject for oneself and others. 10-3

– 4) ALARP - As low as reasonably practicable.

• The usual measure of risk at a global level is Fatalities/Person/Year, but for the local view, i.e., for your immediate corporate mission, risk can be viewed as simply the “failure of your product.”

• The usual format for the analysis of Risk Assessment is a “Cost-Benefit” Analysis, lives saved versus monetary costs.

What is Risk Management?

Risk Management is the effective identification, assessment and control of Risk

• Establish Context and Scope

• Identify the Hazards

• Assess the Risk

– frequency

– consequences

– safeguards

• Rank the Risks

• Eliminate / Minimise the Risk

• Ongoing review and monitoring

Page 7: 3 Risk Reliability and Availability 2009

7

How is Risk Managed?

• Useful Tools:

– QRA

– RAM studies

– FMECA

– HAZID \ HAZOP

– Audits

• Best implemented during design

• Qualitatively first, then quantitatively

Why is Risk Management needed?

• Legislation \ Standards• Control of Major Hazard Facilities

• Pipeline Acts

• OS&H Regulations 1984

• AS/NZS 4360 Risk Management

• Necessary for business optimisation ($)

• Increase value by:

– minimising loss ($)

– maximising opportunity ($)

• Optimises the performance of the facility

• Reduces probability of becoming:

– Piper Alpha

– Longford

– Exxon Valdez

Page 8: 3 Risk Reliability and Availability 2009

8

History of Major Hazards Control

1960’s Flixborough UK (explosion and fire)

Prescriptive

• Recommendations for design and operation

• (USA) style statutory provisions

• Consideration of the operation of safety procedures

1970’s Alexander L. Kielland (accommodation platform capsize)

The “Safety Report” approach.

• Operator has to describe safety management to the Regulator.

1980’s Bhopal India (toxic release)

• Concept Safety Evaluations based on Quantified Risk Analysis Techniques QRA

• Aims to identify and quantify risks to an acceptable level

1990’s Piper Alpha oil platform (explosion and fire)

The “Safety Case” approach.

• Operator has to convince Regulator on safety management.

• Companies now responsible for their Actions - Must assess and determine the level of Risk

2000’s Bombay High North platform (explosion and fire)

Control of Major Hazards

• Safety SILs

Bowtie Diagram

Critical

Event

Events leading to critical event Events following critical event

The process of risk analysis, with a sequence of events leading to a hazardous situation (critical event), followed by a series of events leading to a variety of possible consequences

Page 9: 3 Risk Reliability and Availability 2009

9

Identify the Control Measures

Elimination measures

Prevention measures

Proactive Controls

Mitigation measures

Prevention of escalation

Reactive Controls

IncidentsHazardsCauses Outcomes

Emergency Response

Reduction measures

Safety Case

“A documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment”

To implement a safety case we need to:• make an explicit set of claims about the system• produce the supporting evidence• provide a set of safety arguments that link the claims to the

evidence• make clear the assumptions and judgements underlying the

arguments

The Safety Case must demonstrate that the control measures are adequate to eliminate or reduce as far as practicable risks associated with Major Incidents

Demonstration is typically achieved through:• Reference to Codes of Practice, Standards, Guidance, etc.• Through risk assessment (qualitative or quantitative)

The safety case is a “living document” which evolves over the safety life-cycle.

Page 10: 3 Risk Reliability and Availability 2009

10

Reliability

RAM DEFINITIONS

• RAM – Reliability, Availability, Maintainability

• Reliability - The ability of an item to perform a required function under stated conditions for a stated period of time (BS4778) –UPTIME

• Failure – The termination of the ability of an item to perform a required function (BS4778) - FAILURE EVENT

• Maintainability - The ability of an item, under stated conditions of use, to be retained in, or restored to, a state in which it can perform its required functions, when maintenance is performed under stated conditions and using prescribed procedures and resources (BS4778) - DOWNTIME

• Availability - The ability of an item (under combined aspects of its reliability, maintainability and maintenance support) to perform a required function at a stated instant of time or over a stated period of time (BS4778) - UPTIME / (UPTIME + DOWNTIME) or MTTF / (MTTF + MTTR)

• Deliverability – The ability of a system to deliver gas to the LNG plant (under combined aspects of availability and capacity) understated conditions and at a stated instant of time or over a stated period of time – (AVAILABILITY * CAPACITY)

Page 11: 3 Risk Reliability and Availability 2009

11

Reliability: Key Design Requirement

• Reliability is as fundamental a design requirement as function and performance

• For every Functional requirement a Reliability requirement can (in principle) be specified

– Function: Seal A must not leak

– Reliability: P(seal A does not leak) > 0.99

• For every Performance requirement a Reliability requirement can (in principle) be specified

– Function: Valve must close in less than 10 seconds

– Reliability: P(time to close < 10) > 0.99

Failure Characteristics

• Different components fail in different patterns

– Flow components, chokes & valves - wear out

– Mechanical components, wellheads – long life

– Electronic components - fail early or last a long time

– Pressure containment, pipes – system fails pressure test, or long life

– Environmental influences, CO2, H2S, chlorides, over-protective CP and H2 build-up – corrode progressively or induce rapid cracking failures

• These create various distribution, Normal, Exponential, Weibull, etc.

• Simple Prediction uses Exponential = e ^ (t/mttf) as approximation for linear failure rates

• Complex Simulation programs use distributions matched to components

Page 12: 3 Risk Reliability and Availability 2009

12

Factors influencing failure rate

In general the failure rate of a component or element depends on four main factors:

(a) Quality(b) Temperature(c) Environment(d) Stress

These factors are influenced by:• the design process• manufacture• the way the system is operated

Probabilistic Design

Probability Distribution Function of Load and Resistance

Page 13: 3 Risk Reliability and Availability 2009

13

Stress and Strength

Overlapping of stress and strength distributions

Failure Rate and Mean Time To Failure

Example: Constant Failure Rate

• Set h(t) = λ, a constant failure rate. Integrate to find the reliability R(t)

• R(t) = exp (-λ t),

This is often used in reliability analysis of systems.

Mean Time To Failure (MTTF) - average time a device or system will operate, without repair, before failure. Form the Expected Value Theorem:

• E(x) = ∫ x f(x) dx, and introducing an integration by parts, it follows that the MTTF can be determined as:

• MTTF = ∫ t f(t) dt = ∫ R(t) dt

For the special case of a constant failure rate:

• MTTF = 1 / λ

Page 14: 3 Risk Reliability and Availability 2009

14

Availability

Availability Improvement

• Availability = MTTF / (MTTF+MTTR)

• It is express as a fixed ratio, NOT time dependent

• Availability can be achieved in 2 ways:

– Extend failure free operating period (reliability)

– Reduce time to restore system (maintainability)

• Subsea time to repair must include; Detection, Location, Analysis of repair, Spares / repair kit, Qualification, Mobilisation, Deployment, Repair execution, Commissioning.

• Increased value in driving for Reliability rather than Maintainability to achieve Availability

Page 15: 3 Risk Reliability and Availability 2009

15

Reliability & Repair Data

Reliability / Availability of Repairable Items

Assessment Period (t) 30 years

ITEM REPAIRABLE MTTF FAILURE RATE QUANTITY RELIABILITY UNRELIABILITY MTTR REPAIR RATE AVAILABILITY UNAVAILABILITY

ITEM X OF ITEMS OVER PERIOD OVER PERIOD u PROPORTION PROPORTION

years years -̂1 No. Re=exp (̂-Xt) 1-Re days years -̂1 A=u / (X + u) 1-A

Hydraulic System Elements

1 Production Pipiing 10000 0.0001 1 0.99700 0.0030 100 3.650 0.999973 0.000027

2 Test / Vent Piping 5000 0.0002 1 0.99402 0.0060 100 3.650 0.999945 0.000055

3 10 inch 10 kpsi gate valve Isolation function 1000 0.0010 1 0.97045 0.0296 70 5.214 0.999808 0.000192

4 10 inch 10 kpsi gate valve HIPPS function 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219

5 1/2" Test Valve 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219

6 1/2" Vent Valve 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219

7 PZT Sensor 50 0.0200 1 0.54881 0.4512 20 18.250 0.998905 0.001095

8 HIPPS Hydraulic Module 210 0.0048 1 0.86688 0.1331 20 18.250 0.999739 0.000261

9 Check valve 500 0.0020 1 0.94176 0.0582 20 18.250 0.999890 0.000110

10 HIPPS SEM 42 0.0238 1 0.48954 0.5105 20 18.250 0.998697 0.001303

Types of Redundancy

• Classified on how the redundant elements are introduced into the circuit

• Active or Static Redundancy

– External components are not required to perform the function of detection, decision and switching when an element or path in thestructure fails.

• Standby or Dynamic Redundancy

– External elements are required to detect, make a decision and switch to another element or path as a replacement for a failed element or path.

• Generally subsea systems (e.g. umbilicals, the MCS) use active redundancy – hot standby

• As an alternative to redundancy, consider Diversity

– using alternative arrangements of a different kind

– e.g. the Back-Up Intervention Control system (BUICS) available on Snohvit, in case the umbilical fails

Page 16: 3 Risk Reliability and Availability 2009

16

Simple Parallel RedundancyActive - Type 1

In its simplest form,

redundancy consists of a

simple parallel combination

of elements. If any element

fails open, identical paths

exist through parallel

redundant elements.

Bimodal Parallel RedundancyActive - Type 3

A series connection of parallel

redundant elements provides

protection against shorts and

opens. Direct short across the

network due to a single element

shorting is prevented by a

redundant element in series. An

open across the network is

prevented by the parallel element.

Network (a) is useful when the

primary element failure mode is

open. Network (b) is useful when

the primary element failure mode

is short.

(a) Bimodal Parallel/

Series Redundancy

(b) Bimodal Series/

Parallel Redundancy

Page 17: 3 Risk Reliability and Availability 2009

17

Series and Parallel Availabiity Calculations

SAP Series - Availabilty - Product

Availability 72.000%

UnAvail 28.000%

Av 90.000% Av 80.000%

UnAv 10.000% UnAv 20.000%

PUP Parallel - Unavailabilty - Product

Re 90.000%

UnRe 10.000% OR Re 99.000%

MTTF yrs 4.5 UnRe 1.000%

MTTR years 0.5

Re 90.000%

UnRe 10.000%

MTTF yrs 4.5

MTTR days 0.5

Umbilical Subsea

SCM A

SCM B

Maintainability

Page 18: 3 Risk Reliability and Availability 2009

18

Maintainability

• Philosophy - preventative, corrective, opportunistic

• Actions to demonstrate function is in good condition

– In service monitoring, testing and footprinting

– Corrosion monitoring

– Noise / vibration monitoring

– Fluid monitoring, sand detection, SRBs, chlorides, scale

• Repair planning and contingencies, pipeline repair systems, spares stock holding, stand-by or call-off intervention contracts, alternative temporary systems

• Access systems and tooling

• All aim to reduce MTTR

• Reliability Centred Maintenance

• Historic records, Trends, Predictive capability & feed back loops

Maintenance Philosophy

• Subsea � Excess Capacity (typical)

• Subsea � High Redundancy (typical)

– spare wells

– valves

– spare control systems

• Mobilise maintenance when…?

Page 19: 3 Risk Reliability and Availability 2009

19

Maintaining the Gorgon Field

Deliverability

Page 20: 3 Risk Reliability and Availability 2009

20

Deliverability

• Deliverability = Availability * Capacity

• Useful terms

– DCQ, Daily Contract Quantity

– Shortfall, Quantity not supplied

• Security of supply

• Contract shape and style

• Business Risk and Exposure

• Best Programs focus on the issue

• Used to understand, Quantify risk & contract accordingly

• Shapes contract terms DCQ to rolling 24 hour average quantity

• “Its about the money stupid”

Deliverability

• How to get high deliverability

– System analysis & engineering

– Understanding frequency & duration of failures

– Standard sizes and component rating at no extra cost

– De-bottlenecking & tuning capacity of system

– Line pack and storage

– Ability of downstream to respond to peak turn-up rates

– Capacity and ullage as pressure drops due to well failure

– Temporary increase of flow velocity / erosion limits wrt life

– N out of M philosophy and sparing insurance

• Operability studies & modelling

• Supply chain models based on “Just In Time” logistics

• Define value of Re Av De in relationship to project

Page 21: 3 Risk Reliability and Availability 2009

21

Safety Integrity Levels

What is a Safety Integrity Level?

Safety Integrity

Level

Low demand mode of operation (Average probability of failure to

perform its design function on demand) 4 ≥ 10-5 to < 10-4 3 ≥ 10-4 to < 10-3 2 ≥ 10-3 to < 10-2 1 ≥ 10-2 to < 10-1

Safety Integrity

Level

High demand or continuous mode of operation

(Probability of a dangerous failure per annum)

4 ≥ 10-5 to < 10-4 3 ≥ 10-4 to < 10-3 2 ≥ 10-3 to < 10-2 1 ≥ 10-2 to < 10-1

Safety Integrity Level is the required “reliability” of a safety function

Page 22: 3 Risk Reliability and Availability 2009

22

PFD

• Risk reduction requiring a SIL 4 function should not be implemented. Rather, this should prompt a redistribution of required risk reduction across other measures.

Classic HIPPS Configuration

Page 23: 3 Risk Reliability and Availability 2009

23

SIL 3 HIPPS example

Risk Reduction

Tolerable

risk

Initial Risk of high pressure

getting past the tree

production choke (Pressure

Regulating System)

Necessary risk reduction

Actual risk reduction

Increasing

risk

Residual

risk

10-5 pa (Acceptable failure rate per DNV)100 (once per annum)

1.87 x 10-6 pa

Page 24: 3 Risk Reliability and Availability 2009

24

Risk Reduction

Layers of Protection

Pressure Protection System for Pipeline

Tolerable

risk

Initial Risk of hydrate

blockage, and

overpressuring the

pipeline

Necessary risk reduction

Actual risk reduction

Increasing

risk

Residual

risk

Partial risk covered

by other systems

e.g. manual shutdown,Pipeline Simulator etc.

Risk Reduction by Pressure Regulating

SystemSIL 2

Risk reduction achieved by all safety-related

systems and external risk reduction facilities

10-5 (Acceptable failure rate per DNV)100 (once per annum)

Risk Reduction by Pressure Safety

SystemSIL 3

Page 25: 3 Risk Reliability and Availability 2009

25

Equipment Failure Rates

Equipment PFDs

Page 26: 3 Risk Reliability and Availability 2009

26

PFD as a function of Test Interval

Probability of Failure on Demand

Time, Test Interval

PFDavg

PFDAVG = ½ λ τ i

τ i

TIFTest Independent

Failure

PFD for a simple system

Proof Test = 1 yr

For the Pressure Transmitter,

PFDSE = 0.44 x 10-3

For the logic solving element,

PFDLS = 7.0 x 10-3

For the final element,

PFDFE = 3.5 x 10-3

Therefore, for the safety function,

PFDAVG = 0.44 x 10 -3 + 7.0 x 10-3 + 3.5 x 10-3 = 1.1 x 10-2

≡ Safety Integrity Level 1

Change proof test interval to 6 months

PFDSE = 0.22 x 10 -3

PFDLS = 3.5 x 10-3

PFDFE = 1.75 x 10-3

PFDAVG = 5.5 x 10-3

≡ Safety Integrity Level 2

Page 27: 3 Risk Reliability and Availability 2009

27

Layered Protection SystemSubsea Control Module

Subsea Electronics

Module

Gas Plant DCS

Single layer

PFDAVG = 1.1 x 10-2 ≡ Safety Integrity Level 1 (annual testing)

PPS card

Dump Valve

Dual layers

PFDAVG = (1.1 x 10-2) x (1.1 x 10-2) ≡ 1.2 x 10-4

(assuming no common mode failure)≡ ”Safety Integrity Level 3” (annual testing)

Conclusion

Page 28: 3 Risk Reliability and Availability 2009

28

The cost of failure - BP experience

These are the direct costs only, Foinaven also incurred: • FPSO demurrage charges• NPV of production (20% * 80,000 bbl/d * 300 days * 25USD / bbl) 120MUSD• Share value erosion and significantly lower dividends for period• Loss of public / shareholder confidence in BP abilities to manage technology• Reputation damage• Tangible losses > 250MUSD, Measurable losses at least the same again• Changed BP contracting philosophy, EPC to EPCM Managed Engineeirng• Schiehallion SCM were run at single high pressure but DCV pilots were not requalified and subsequently overstressed and leaked.

The BP Bathtub Curve

Page 29: 3 Risk Reliability and Availability 2009

29

Value of Performance

An interesting echo from the 1970’s

or SAFETY