1 reliability and availability of the large hadron collider (lhc) machineprotection system jan...

Reliability and Availability Reliability and Availability of the Large Hadron of the Large Hadron

Collider (LHC) Collider (LHC) MachineProtection SystemMachineProtection System

Jan Uythoven

CERN, Geneva, Switzerland

Thanks to R. Schmidt, B. Goddard, R. Filippini* and the many other colleagues working on the LHC Protection System

*Presently at PSI, Zürich

Jan Uythoven, CERN

ITER RAMI Workshop 6-7 December 2007

The Large Hadron Collider (LHC) at The Large Hadron Collider (LHC) at CERN - Geneva CERN - Geneva

The world largest particle accelerator with a circumference of 27 km

1232 Superconducting dipole magnets operating at 1.9 K

Operation with beam foreseen for 2008

Jan Uythoven, CERN

LHC LayoutLHC Layout

Jan Uythoven, CERN

LHC Stored Energy LHC Stored Energy For nominal beam intensity at 7 TeV:

Energy Stored in one beam: 360 MJ Energy Stored in the superconducting magnets: 10 GJ

100.00

1000.00

10000.00

1 10 100 1000 10000Momentum [GeV/c]

J] LHC topenergy

LHC injection(12 SPS batches)

SNSLEP2

SPS fixed target HERA

TEVATRON

SPSppbar

SPS batch to LHC

Factor~200

RHIC proton

LHC energy in magnets

Energy to

heat a

one kg of

copper:

700 kJ

Jan Uythoven, CERN

Quench Protection and Quench Protection and Energy Extraction SystemEnergy Extraction System

when one magnet quenches, quench heaters are fired for this magnet the current in the quenched magnet decays in about 200 ms the current in series from the other magnets flows through the bypass diode that can stand the current for about 100-200 seconds

Magnet 1 Magnet 2

Power Converter

Magnet 154

Magnet i

Jan Uythoven, CERN

13 kA Energy Extraction in 13 kA Energy Extraction in tunnel adjacent to acceleratortunnel adjacent to accelerator

Resistors absorbing the energy

Switches - for switching the resistors into series with the magnets

Jan Uythoven, CERN

Quench Protection and Quench Protection and Energy Extraction SystemEnergy Extraction System

8 Separate systems: one for each sector Energies per sector similar to Hera and Tevatron

accelerators Needs to work very reliably, as damage potential is huge

Reliability studies of the system have been done ‘Traditional’ technologies Limited dependence on other systems

This talk mainly on Protection from beam energy

PhD. Thesis A.Vergara:http://documents.cern.ch/cgi-bin/setlink?base=preprint&categ=cern&id=cern-thesis-2004-019

Jan Uythoven, CERN

How to protect the machine How to protect the machine from the Beam Energy ?from the Beam Energy ?

Machine Protection System which Detects “any fault” in the machine:

Hardware not working properly, although fault tolerant design of safety critical systems

Effect of failures, including beam instabilities, leading to beam losses Safely dumps the beam before it can cause any damage

Fast reaction time Beams to be dumped within

3 turns of detection of problem= 300 s

Beam Dump Block:Where the beam

should go in case of any ‘problems’

detected

Systems detecting failures and LHC Beam Interlocks

Beam Interlock SystemBeam

Dumping System

Injection InterlockPowering

Interlockssc magnets

PoweringInterlocks

nc magnets

QPS(several 1000)

Power Converters

Magnets

Magnet Current Monitor

CryoOK

RFSystem

Movable Detectors

LHCExperiments

Beam LossMonitors

Experimental Magnets

CollimationSystem

CollimatorPositions

Environmentalparameters

Transverse Feedback

Beam ApertureKickers

BeamLifetimeFBCM

Screens / Mirrors

Access System

Doors EIS

VacuumSystem

Vacuumvalves

AccessSafetyBlocks

RF Stoppers

Beam loss monitors

SpecialBLMs

Monitorsaperture

limits(some 100)

Monitors in arcs

(several 1000)

Timing System (Post Mortem

Trigger)

Operator Buttons

SafeLHC

Parameter

SoftwareInterlocks

LHCDevices

Sequencer

LHCDevices

Safe Beam Parameter

Distribution

SafeBeamFlag

Little beam dependence

Jan Uythoven, CERN

Principle of the LHC Machine Principle of the LHC Machine

protection Systemprotection System

• ‘User systems’ can detect failures and send hardwired signal to beam interlock system

• Range from Experimental Detectors to Vacuum Valves

• Each user system provides a status signal, the user permit signal.

• The beam interlock system combines the user permits and produces the beam permit

• The beam permit is a hardwired signal that is provided to the dump kicker

• The Beam Dumping System combines many high technology techniques

Beam Interlock System

LHC Dump kickerBeam ‘Permit’

User permitsignals

Hardware links /systems, fully redundant

different

technologies

different

technologies

Jan Uythoven, CERN

Organisation for the LHCOrganisation for the LHC

Machine Protection includes many different hardware systems

Many different departments and groups responsible for their equipment Coordination of machine protection by two working groups

General coordination – definition of the system Commissioning working group – accent is on procedures to be applied

Reviews and external audits are used for obtaining external advice

General review LHC Machine Protection System Audit of Beam Interlock Controller done Audit of Beam Dumping System planned Audit of Beam Loss Monitoring System requested

Jan Uythoven, CERN

Requirements concerning Requirements concerning Machine Protection SystemMachine Protection System

Safety Assessment (‘reliability’) IEC 61508 standard defining the different Safety Integrity Levels (SIL)

ranking from SIL1 to SIL4 Based on Risk Classes = Consequence x Frequency Machine Protection System for the LHC should be SIL3, taking definition of

Protection Systems, with a probability of failure between 10-8 and 10-7 per hour (because of short mission times)

Catastrophy = beam should have been dumped and this did not take place; can possibly cause large damage

Availability Definition:

Beam is dumped when it was not required Operation can not take place because the protection system does not give

the green light (is not ready) Requirement:

Definition not according to any standard Downtime comparable to other accelerator equipment; maximum tens of

operations per year

Jan Uythoven, CERN

Approach AdoptedApproach Adopted“Strategy”“Strategy”

End of ’90s: start an “Interlock Manager”, which later continued as a Machine Protection System

Until then Particle Accelerators mainly considered Equipment Protection

Since then ‘Machine Protection’ has become a common approach in high power accelerators

Dual Approach Prevent fault at the source (= old fashioned approach)

& Detect the effect resulting from any fault, including beam

instabilities, and react fast enough to prevent damage Deployment in SPS accelerator to test concepts

Jan Uythoven, CERN

Are the requirements Are the requirements fulfilled?fulfilled?

Reduce the Protection System to the basic elements. The other systems give an additional protection.

BICBeam Interlock Controller

LBDSBeam Dumping System

BLMBeam Loss Monitors

PICPower Interlock Controller

QPSQuench Protection System

6 BLMs per sc quad4000 in total

Systems detecting failures and LHC Beam Interlocks

Beam Interlock SystemBeam

Dumping System

Injection InterlockPowering

Interlockssc magnets

PoweringInterlocks

nc magnets

QPS(several 1000)

Power Converters

Magnets

Magnet Current Monitor

CryoOK

RFSystem

Movable Detectors

LHCExperiments

Beam LossMonitors

Experimental Magnets

CollimationSystem

CollimatorPositions

Environmentalparameters

Transverse Feedback

Beam ApertureKickers

BeamLifetimeFBCM

Screens / Mirrors

Access System

Doors EIS

VacuumSystem

Vacuumvalves

AccessSafetyBlocks

RF Stoppers

Beam loss monitors

SpecialBLMs

Monitorsaperture

limits(some 100)

Monitors in arcs

(several 1000)

Timing System (Post Mortem

Trigger)

Operator Buttons

SafeLHC

Parameter

SoftwareInterlocks

LHCDevices

Sequencer

LHCDevices

Safe Beam Parameter

Distribution

SafeBeamFlag

Jan Uythoven, CERN

Main SystemsMain Systems

Thorough design from the start Based on redundancy

For each of the 5 main components of the Machine Protection System Dependability numbers (= reliability & availability) have been calculated Basically one PhD thesis per system ! Some details for the Beam Dumping System calculations are

given later Assume operational scenario Combination of these numbers gives the Machine

Protection Dependability estimate Shows weak links

Jan Uythoven, CERN

Resulting Unsafety and Resulting Unsafety and Availability NumbersAvailability Numbers

System Unsafety/y

Probability

False dumps/yAverage Std.D.

LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9

BIC 1.410-8 0.5 +/-0.5

BLM 1.4410-3 (Front-end)

0.0610-3 (Back-end VME)

17 +/-4.0

PIC 0.510-3 1.5 +/-1.2

QPS 0.410-3 15.8 +/-3.9

MPS 2.310-4

5.75 10-8/h (SIL3)

41 +/-6.0

ASSUMPTIONSOperational scenario

200 days/year of operations: 400 beam operations (10h each) followed by checks (2h).

Diagnostics effectivenessLBDS and BIC “as good as new” after checks (BLM, partially)QPS and PIC “as good as new” after periodic inspection or power abort

DR apportionment60% planned dumps15% fast beam losses15% slow beam losses10% others

Redundancy No cross-redundancy within the Beam Loss Monitors (P = 0, worst-case)

Jan Uythoven, CERN

Sensitivity of Safety to the Model Sensitivity of Safety to the Model ParametersParameters

Sensitivity to the type of dump request The fast beam losses contribute by two orders of magnitude more to the overall unsafety.

45% of fast beam losses assumed instead of 15%.Safety moves from 2.3 10-4 /y to 6.810-4 /y SIL2

Sensitivity to the redundancy of the BLM Same dump request apportionment, but a beam loss is detectable by two monitors with a probability 0<P<1.

If P moves from 0 to 1, the safety will be recovered from 6.810-4 /y to 2.810-5 /y SIL 4

RESULTS on LOG scale!

Jan Uythoven, CERN

Failure Rates of a Single Failure Rates of a Single Sub-SystemSub-System

(…open brackets…(…open brackets…

System Unsafety/y

Probability

LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9

BIC 1.410-8 0.5 +/-0.5

17 +/-4.0

PIC 0.510-3 1.5 +/-1.2

QPS 0.410-3 15.8 +/-3.9

MPS 2.310-4

5.75 10-8/h (SIL3)

41 +/-6.0

LHC Beam Dumping System

Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007

The LBDSLHC Beam Dumping System

LBDS inventory

Extraction 15 Kicker Magnets + 15 generators

10 Septum Magnets + 1 power converter

Dilution 10 Kicker Magnets + 10 generators

Absorption One dump block

Electronics Beam energy measurement (BEM)

Beam energy tracking (BET)

Triggering and re-triggering

Post mortem diagnostics (check of every beam dump)

Beam line 975 m from extraction point to TDE

1) MKD

The 15 kicker magnets deflect the beam horizontally

4) MKB

The 10 kicker magnets dilute the beam energy

3) MSD

The 15 septum magnets deflect the beam vertically

5) TDE

The beam is absorbed in a graphite block

The quadrupole enhances the horizontal deflection

The beam sweep at the front face of the TDE absorber at 450 GeV

The LBDS: Safety in DesignFault Tolerant Features

No single point of failure should exist in the LBDS• Redundancy is introduced to allow failures up to a certain threshold.• Surveillance detects failures and issues a fail safe dump request.

Redundancy

14 out of 15 MKD, 1 out of 2 MKD generator branches

Surveillance

Energy tracking, Retriggering

Redundancy

1 out of 4 MKBH, 1 out of 6 MKBV

Surveillance

Energy tracking

Surveillance

Energy tracking, Fast current change monitoringRedundancy

1 out of 2 trigger generation and distribution

Surveillance

Synchronization tracking

Surveillance

TX/RX error detection Voting of inputs

The Modeling Framework

FMECA = Failure Modes Effects and Criticalities Analysis

No detailed assessment of fault consequences. Two failure modes only:

•Fail Safe

•Fail Unsafe

Reliability Prediction• Failure rates are deduced at component level from standard literature

(i.e. Military Handbook 217F).• The logic expressions of the failure modes are translated into probabilities and into failure rates.• Example: the failure mode F1MKD of the MKD system:

1. Logic Expression 2 out of 15[(PT1A AND PT1B) OR (SP1A AND SP1B) OR (SC1A AND SC1B) OR (CP2A AND CP2B) OR (COS12A AND COS12B) OR (COS22A AND COS22B) OR M]

2. Probability

3. Failure rate

MCOSCOSCPSCSPPTsilentMKD

silentMKDsilentMKDMKDF

PPPPPPPP

)1)(1)(1)(1)(1)(1(1

)1(14)1(1512

MCOSCOSCPSCSPPTsilentMKD

silentMKDsilentMKDMKDF

PPPPPPPP

)1)(1)(1)(1)(1)(1(1

)1(14)1(1512

)()( )(

tdPtetP

MKDFMKDF

)()( )(

tdPtetP

MKDFMKDF

ResultsFailure Modes and Rates of the LBDS

The FMECA and reliability prediction have been performed for all sub-systems in the LBDS.

More than 2100 failure modes have been classified at component level.

They have been arranged into 21 failure modes at system level.

Operation Scenarios for one MissionState Transition Diagrams

Failsafe ratesFS\Xk are decreasing with time

Fail unsafe rates FU\Xk are increasing with time

STATES

Available X0

X1 (no BETS)

X2 (no RTS)

X3 (no BETS, RTS)

Failsafe X4

Failed unsafe X5

Compact State Based Approach

State Transition DiagramsThe Sequence of Missions and Checks

Missions are driven either by internal false dumps or by external dump requests.

At checks the system is recovered to the initial state.

The process starts in X = 0 of Mission 1 and stops when one year of operation is reached.

The sequence of N missions and checks is a non-homogeneous Markov process of 2N5 states.

Operational Scenario

• Missions of random duration alternate with 2 hours of checks, over 200 days of operations.– In addition to a false dump, the end of the mission is determined by an

external dump request, which is either a planned dump request (Weibull) or a beam induced.

• The dump request rate is:

factor scale

factor shape

at t value theis

0 at t value theis

)()/1(

factor scale

factor shape

at t value theis

0 at t value theis

)()/1(

Planned dump

=5, = 1/11

Beam induced dump

= 0.001, 0 = 0.1

Distribution of dump requests

Jan Uythoven, CERN

… … close brackets…)close brackets…)

System Unsafety/y

Probability

LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9

BIC 1.410-8 0.5 +/-0.5

17 +/-4.0

PIC 0.510-3 1.5 +/-1.2

QPS 0.410-3 15.8 +/-3.9

MPS 2.310-4

5.75 10-8/h (SIL3)

41 +/-6.0

LHC Beam Dumping System

PhD. thesis Roberto Filippini:http://doc.cern.ch/archive/electronic/cern/preprints/thesis/thesis-2006-054.pdf

Availability of other systems not studied, can be done if

required

Jan Uythoven, CERN

Also Analysis being done Also Analysis being done with a Different Approachwith a Different Approach

Hybrid methodology combining fault tree for component failure rates and simulations in the time domain for the complete system

Results concerning protection system reliability and beam availability

Option to disable part of a system and see the effect

Collaboration with Laboratory for Safety Analysis, ETH Zürich, Sigrid Wagner

Ongoing…

hybrid metho-dology

System level: global frame, agent-based approach

Component level: established reliability method, e.g. fault tree

User System BIS LBDS

dangerousconditions

Dfaulttree

individualfailure behaviour

Jan Uythoven, CERN

Key issues concerning Key issues concerning Design of Sub-SystemsDesign of Sub-Systems

Requirements to obtain a “safe system” No single point of failure

Redundancy of critical components Redundancy of signal paths between (sub-)systems

Periodic checks to get back to a state which is ‘as good as new’

Failure rates of redundant systems increase in time – get back to zero (different from aging)

Surveillance of critical signals Safe mission abort Trade off between availability and reliability

Jan Uythoven, CERN

Following the Design Following the Design Studies and ManufacturingStudies and Manufacturing

Test equipment in operational environment Quench Protection System operational during Hardware Commissioning of

the LHC magnets Reliability run starting for the Beam Dumping System with about 3 months

of continuous operation Can give upper limit of failure rate of most critical components because of

redundancy Logging and Post Mortem systems (analysis of events using logging data,

and special ‘fast’ buffers triggered after a beam dump) used during Hardware Commissioning

Install similar equipment or components in operational accelerators Beam Interlock System installed and operational in the LHC injection chain Fast Magnet Current Change Monitor already operational Energy tracking system of the LHC beam dump working for the extraction

system of the SPS injector

Jan Uythoven, CERN

General Test ProceduresGeneral Test Procedures Before operation with beam:

Thorough testing required of all installed equipment Definition and follow-up of test procedures for the individual

equipment Machine Protection System Commissioning Working Group

which approves the test procedures Tests with beam required

Define tests before going into a next beam commissioning phase

Example: Provoke a quench of a magnet and check Beam Loss Monitoring signals

Measure delays between detection and actual beam dump Safe beam flag to allow masking of some interlock channels in

case of low intensity / low energy beams

How to enforce these tests ?

Jan Uythoven, CERN

Lessons Learned from the Lessons Learned from the exerciseexercise

Absolute failure rate levels depend largely on model assumptions, but do indicate the weak links in the system Confidence in relative numbers and sensitivity effects

Hardware of some systems was adapted to obtain reliability numbers similar to the other systems Add redundancy

Periodic testing, sometimes several times per day, will contribute to the safety of the system Test the presence of the assumed redundancy

Jan Uythoven, CERN

Human AspectsHuman Aspects

Hardware •Design•Dependability Studies•Testing of proto-types•Testing of series in Laboratory•Testing once installed•Tests with beam

Procedures•Testing during production•Testing after installation•During Operation

•Confirm Redundancy•Post Mortem•Re-establish confidence

•When changing hardware•When changing settingsGained

experienceA lot of

discussions…

Jan Uythoven, CERN

Example of human AspectsExample of human Aspects

Beam accident extracting high intensity beam in 2004 from the SPS injector by which vacuum chamber was damaged

Noise on temperature sensors induced by the beam caused magnet interlock, stopping the magnet power converter

Error in the protection logic: Magnet power converter was stopped before inhibiting extraction

No clear procedures what to do: the experiment was continued without sorting out the problem

No clear responsibility: several people were in charge at the same time and nobody said ‘stop’

Created a lot of awareness of potential problems for the LHC

Jan Uythoven, CERN

LHC StrategyLHC Strategypresently under Discussionpresently under Discussion

How to change Beam Loss Monitor thresholds & masking of signals

Thousands of values – avoid errors Are the correct when put in for the first time?

Who is allowed to do adapt the thresholds? What will be the procedures?

The Post Mortem Analysis of the Beam Dumping System indicates a fault

What are the procedures to recover? Who can give the ‘ok’ again? “The same problem happened last month; after 1 day of testing we just

continued. We are near the end of the physics run of this year…” Who is in charge? Will there be a group of ‘safety experts’ and what will be

their role?

Jan Uythoven, CERN

ConclusionsConclusions Safety and Reliability has become an accepted topic for high power

accelerators The LHC has a coherent Machine Protection System following

interdisciplinary work for almost 20 years Producing dependability numbers is very time consuming and the

result depends largely on the model assumptions However the benefits are that

The weak links can be shown Designs have been adapted accordingly Awareness has been raised

On paper the numbers look good, but testing is required during installation, cold check-outs and operation with beam

Procedures during normal operation. Checks required almost continuously to confirm the redundancy of the systems

Procedures in case an abnormality is detected Who is responsible in the control room?

Organisational issues will be important Enforcing procedures / exceptions

1 reliability and availability of the large hadron collider (lhc) machineprotection system jan...

beam energy

beam interlock system

beam permitthe beam

beam instabilities

beam lossessafely

lhc protection system

nominal beam intensity

acceleratorquench protection

Documents

the large hadron collider

cern - large hadron collider

promoting the large hadron collider

cern large hadron collider

hadron collider

very large hadron collider

analysis at a hadron colliderdouglasg/cern100120-v2.pdf•...

hadron collider physics

lhec large hadron electron collider

pqcd and hadron collider phenomenology

447dlarge hadron collider

large hadron collider 2007upload

hadron collider physics - chris quigg · the large hadron...

search for new physics at large hadron collider (cms) (hope...

large hadron collider images

large hadron collider presentation

the 8 annual large hadron collider physics conference ·...

der large hadron collider (lhc) - cern education · 1 | der...

on the large hadron collider - collider ...

hadron collider breakout session summary