1 reliability and availability of the large hadron collider (lhc) machineprotection system jan...
Post on 29-Jan-2016
215 Views
Preview:
TRANSCRIPT
1
Reliability and Availability Reliability and Availability of the Large Hadron of the Large Hadron
Collider (LHC) Collider (LHC) MachineProtection SystemMachineProtection System
Jan Uythoven
CERN, Geneva, Switzerland
Thanks to R. Schmidt, B. Goddard, R. Filippini* and the many other colleagues working on the LHC Protection System
*Presently at PSI, Zürich
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 2
The Large Hadron Collider (LHC) at The Large Hadron Collider (LHC) at CERN - Geneva CERN - Geneva
The world largest particle accelerator with a circumference of 27 km
1232 Superconducting dipole magnets operating at 1.9 K
Operation with beam foreseen for 2008
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 3
LHC LayoutLHC Layout
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 4
LHC Stored Energy LHC Stored Energy For nominal beam intensity at 7 TeV:
Energy Stored in one beam: 360 MJ Energy Stored in the superconducting magnets: 10 GJ
0.01
0.10
1.00
10.00
100.00
1000.00
10000.00
1 10 100 1000 10000Momentum [GeV/c]
En
erg
y st
ore
d in
th
e b
eam
[M
J] LHC topenergy
LHC injection(12 SPS batches)
ISR
SNSLEP2
SPS fixed target HERA
TEVATRON
SPSppbar
SPS batch to LHC
Factor~200
RHIC proton
LHC energy in magnets
Energy to
heat a
nd
melt
one kg of
copper:
700 kJ
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 5
Quench Protection and Quench Protection and Energy Extraction SystemEnergy Extraction System
when one magnet quenches, quench heaters are fired for this magnet the current in the quenched magnet decays in about 200 ms the current in series from the other magnets flows through the bypass diode that can stand the current for about 100-200 seconds
Magnet 1 Magnet 2
Power Converter
Magnet 154
Magnet i
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 6
13 kA Energy Extraction in 13 kA Energy Extraction in tunnel adjacent to acceleratortunnel adjacent to accelerator
Resistors absorbing the energy
Switches - for switching the resistors into series with the magnets
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 7
Quench Protection and Quench Protection and Energy Extraction SystemEnergy Extraction System
8 Separate systems: one for each sector Energies per sector similar to Hera and Tevatron
accelerators Needs to work very reliably, as damage potential is huge
Reliability studies of the system have been done ‘Traditional’ technologies Limited dependence on other systems
This talk mainly on Protection from beam energy
PhD. Thesis A.Vergara:http://documents.cern.ch/cgi-bin/setlink?base=preprint&categ=cern&id=cern-thesis-2004-019
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 8
How to protect the machine How to protect the machine from the Beam Energy ?from the Beam Energy ?
Machine Protection System which Detects “any fault” in the machine:
Hardware not working properly, although fault tolerant design of safety critical systems
Effect of failures, including beam instabilities, leading to beam losses Safely dumps the beam before it can cause any damage
Fast reaction time Beams to be dumped within
3 turns of detection of problem= 300 s
Beam Dump Block:Where the beam
should go in case of any ‘problems’
detected
9
Systems detecting failures and LHC Beam Interlocks
Beam Interlock SystemBeam
Dumping System
Injection InterlockPowering
Interlockssc magnets
PoweringInterlocks
nc magnets
QPS(several 1000)
Power Converters
~1500
AUG
UPS
Power Converters
Magnets
Magnet Current Monitor
CryoOK
RFSystem
Movable Detectors
LHCExperiments
Beam LossMonitors
BCM
Experimental Magnets
CollimationSystem
CollimatorPositions
Environmentalparameters
Transverse Feedback
Beam ApertureKickers
BeamLifetimeFBCM
Screens / Mirrors
BTV
Access System
Doors EIS
VacuumSystem
Vacuumvalves
AccessSafetyBlocks
RF Stoppers
Beam loss monitors
BLM
SpecialBLMs
Monitorsaperture
limits(some 100)
Monitors in arcs
(several 1000)
Timing System (Post Mortem
Trigger)
Operator Buttons
CCC
SafeLHC
Parameter
SoftwareInterlocks
LHCDevices
Sequencer
LHCDevices
LHCDevices
Safe Beam Parameter
Distribution
SafeBeamFlag
Little beam dependence
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 10
Principle of the LHC Machine Principle of the LHC Machine
protection Systemprotection System
• ‘User systems’ can detect failures and send hardwired signal to beam interlock system
• Range from Experimental Detectors to Vacuum Valves
• Each user system provides a status signal, the user permit signal.
• The beam interlock system combines the user permits and produces the beam permit
• The beam permit is a hardwired signal that is provided to the dump kicker
• The Beam Dumping System combines many high technology techniques
Beam Interlock System
LHC Dump kickerBeam ‘Permit’
User permitsignals
Hardware links /systems, fully redundant
Many
different
technologies
Many
different
technologies
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 11
Organisation for the LHCOrganisation for the LHC
Machine Protection includes many different hardware systems
Many different departments and groups responsible for their equipment Coordination of machine protection by two working groups
General coordination – definition of the system Commissioning working group – accent is on procedures to be applied
Reviews and external audits are used for obtaining external advice
General review LHC Machine Protection System Audit of Beam Interlock Controller done Audit of Beam Dumping System planned Audit of Beam Loss Monitoring System requested
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 12
Requirements concerning Requirements concerning Machine Protection SystemMachine Protection System
Safety Assessment (‘reliability’) IEC 61508 standard defining the different Safety Integrity Levels (SIL)
ranking from SIL1 to SIL4 Based on Risk Classes = Consequence x Frequency Machine Protection System for the LHC should be SIL3, taking definition of
Protection Systems, with a probability of failure between 10-8 and 10-7 per hour (because of short mission times)
Catastrophy = beam should have been dumped and this did not take place; can possibly cause large damage
Availability Definition:
Beam is dumped when it was not required Operation can not take place because the protection system does not give
the green light (is not ready) Requirement:
Definition not according to any standard Downtime comparable to other accelerator equipment; maximum tens of
operations per year
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 13
Approach AdoptedApproach Adopted“Strategy”“Strategy”
End of ’90s: start an “Interlock Manager”, which later continued as a Machine Protection System
Until then Particle Accelerators mainly considered Equipment Protection
Since then ‘Machine Protection’ has become a common approach in high power accelerators
Dual Approach Prevent fault at the source (= old fashioned approach)
& Detect the effect resulting from any fault, including beam
instabilities, and react fast enough to prevent damage Deployment in SPS accelerator to test concepts
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 14
Are the requirements Are the requirements fulfilled?fulfilled?
Reduce the Protection System to the basic elements. The other systems give an additional protection.
BICBeam Interlock Controller
LBDSBeam Dumping System
BLMBeam Loss Monitors
PICPower Interlock Controller
QPSQuench Protection System
6 BLMs per sc quad4000 in total
15
Systems detecting failures and LHC Beam Interlocks
Beam Interlock SystemBeam
Dumping System
Injection InterlockPowering
Interlockssc magnets
PoweringInterlocks
nc magnets
QPS(several 1000)
Power Converters
~1500
AUG
UPS
Power Converters
Magnets
Magnet Current Monitor
CryoOK
RFSystem
Movable Detectors
LHCExperiments
Beam LossMonitors
BCM
Experimental Magnets
CollimationSystem
CollimatorPositions
Environmentalparameters
Transverse Feedback
Beam ApertureKickers
BeamLifetimeFBCM
Screens / Mirrors
BTV
Access System
Doors EIS
VacuumSystem
Vacuumvalves
AccessSafetyBlocks
RF Stoppers
Beam loss monitors
BLM
SpecialBLMs
Monitorsaperture
limits(some 100)
Monitors in arcs
(several 1000)
Timing System (Post Mortem
Trigger)
Operator Buttons
CCC
SafeLHC
Parameter
SoftwareInterlocks
LHCDevices
Sequencer
LHCDevices
LHCDevices
Safe Beam Parameter
Distribution
SafeBeamFlag
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 16
Main SystemsMain Systems
Thorough design from the start Based on redundancy
For each of the 5 main components of the Machine Protection System Dependability numbers (= reliability & availability) have been calculated Basically one PhD thesis per system ! Some details for the Beam Dumping System calculations are
given later Assume operational scenario Combination of these numbers gives the Machine
Protection Dependability estimate Shows weak links
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 17
Resulting Unsafety and Resulting Unsafety and Availability NumbersAvailability Numbers
System Unsafety/y
Probability
False dumps/yAverage Std.D.
LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9
BIC 1.410-8 0.5 +/-0.5
BLM 1.4410-3 (Front-end)
0.0610-3 (Back-end VME)
17 +/-4.0
PIC 0.510-3 1.5 +/-1.2
QPS 0.410-3 15.8 +/-3.9
MPS 2.310-4
5.75 10-8/h (SIL3)
41 +/-6.0
ASSUMPTIONSOperational scenario
200 days/year of operations: 400 beam operations (10h each) followed by checks (2h).
Diagnostics effectivenessLBDS and BIC “as good as new” after checks (BLM, partially)QPS and PIC “as good as new” after periodic inspection or power abort
DR apportionment60% planned dumps15% fast beam losses15% slow beam losses10% others
Redundancy No cross-redundancy within the Beam Loss Monitors (P = 0, worst-case)
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 18
Sensitivity of Safety to the Model Sensitivity of Safety to the Model ParametersParameters
Sensitivity to the type of dump request The fast beam losses contribute by two orders of magnitude more to the overall unsafety.
45% of fast beam losses assumed instead of 15%.Safety moves from 2.3 10-4 /y to 6.810-4 /y SIL2
Sensitivity to the redundancy of the BLM Same dump request apportionment, but a beam loss is detectable by two monitors with a probability 0<P<1.
If P moves from 0 to 1, the safety will be recovered from 6.810-4 /y to 2.810-5 /y SIL 4
RESULTS on LOG scale!
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 19
Failure Rates of a Single Failure Rates of a Single Sub-SystemSub-System
(…open brackets…(…open brackets…
System Unsafety/y
Probability
False dumps/yAverage Std.D.
LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9
BIC 1.410-8 0.5 +/-0.5
BLM 1.4410-3 (Front-end)
0.0610-3 (Back-end VME)
17 +/-4.0
PIC 0.510-3 1.5 +/-1.2
QPS 0.410-3 15.8 +/-3.9
MPS 2.310-4
5.75 10-8/h (SIL3)
41 +/-6.0
LHC Beam Dumping System
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
20
The LBDSLHC Beam Dumping System
LBDS inventory
Extraction 15 Kicker Magnets + 15 generators
10 Septum Magnets + 1 power converter
Dilution 10 Kicker Magnets + 10 generators
Absorption One dump block
Electronics Beam energy measurement (BEM)
Beam energy tracking (BET)
Triggering and re-triggering
Post mortem diagnostics (check of every beam dump)
Beam line 975 m from extraction point to TDE
1) MKD
The 15 kicker magnets deflect the beam horizontally
4) MKB
The 10 kicker magnets dilute the beam energy
3) MSD
The 15 septum magnets deflect the beam vertically
5) TDE
The beam is absorbed in a graphite block
2) Q4
The quadrupole enhances the horizontal deflection
The beam sweep at the front face of the TDE absorber at 450 GeV
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
21
The LBDS: Safety in DesignFault Tolerant Features
No single point of failure should exist in the LBDS• Redundancy is introduced to allow failures up to a certain threshold.• Surveillance detects failures and issues a fail safe dump request.
Redundancy
14 out of 15 MKD, 1 out of 2 MKD generator branches
Surveillance
Energy tracking, Retriggering
Redundancy
1 out of 4 MKBH, 1 out of 6 MKBV
Surveillance
Energy tracking
Surveillance
Energy tracking, Fast current change monitoringRedundancy
1 out of 2 trigger generation and distribution
Surveillance
Synchronization tracking
Surveillance
TX/RX error detection Voting of inputs
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
22
The Modeling Framework
FMECA = Failure Modes Effects and Criticalities Analysis
No detailed assessment of fault consequences. Two failure modes only:
•Fail Safe
•Fail Unsafe
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
23
Reliability Prediction• Failure rates are deduced at component level from standard literature
(i.e. Military Handbook 217F).• The logic expressions of the failure modes are translated into probabilities and into failure rates.• Example: the failure mode F1MKD of the MKD system:
1. Logic Expression 2 out of 15[(PT1A AND PT1B) OR (SP1A AND SP1B) OR (SC1A AND SC1B) OR (CP2A AND CP2B) OR (COS12A AND COS12B) OR (COS22A AND COS22B) OR M]
2. Probability
3. Failure rate
MCOSCOSCPSCSPPTsilentMKD
silentMKDsilentMKDMKDF
PPPPPPPP
PPP
)1)(1)(1)(1)(1)(1(1
)1(14)1(1512
222
212
22
12
12
1_
15_
14__1
MCOSCOSCPSCSPPTsilentMKD
silentMKDsilentMKDMKDF
PPPPPPPP
PPP
)1)(1)(1)(1)(1)(1(1
)1(14)1(1512
222
212
22
12
12
1_
15_
14__1
)(1
)()( )(
1
11
)(
10
1
tP
tdPtetP
MKDF
MKDFMKDF
d
MKDF
t
MKDF
)(1
)()( )(
1
11
)(
10
1
tP
tdPtetP
MKDF
MKDFMKDF
d
MKDF
t
MKDF
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
24
ResultsFailure Modes and Rates of the LBDS
MKD
The FMECA and reliability prediction have been performed for all sub-systems in the LBDS.
More than 2100 failure modes have been classified at component level.
They have been arranged into 21 failure modes at system level.
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
25
Operation Scenarios for one MissionState Transition Diagrams
Failsafe ratesFS\Xk are decreasing with time
Fail unsafe rates FU\Xk are increasing with time
STATES
Available X0
X1 (no BETS)
X2 (no RTS)
X3 (no BETS, RTS)
Failsafe X4
Failed unsafe X5
Compact State Based Approach
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
26
State Transition DiagramsThe Sequence of Missions and Checks
Missions are driven either by internal false dumps or by external dump requests.
At checks the system is recovered to the initial state.
The process starts in X = 0 of Mission 1 and stops when one year of operation is reached.
The sequence of N missions and checks is a non-homogeneous Markov process of 2N5 states.
Jan Uythoven, CERN ITER RAMI Workshop 6-7 December 2007
27
Operational Scenario
• Missions of random duration alternate with 2 hours of checks, over 200 days of operations.– In addition to a false dump, the end of the mission is determined by an
external dump request, which is either a planned dump request (Weibull) or a beam induced.
• The dump request rate is:
factor scale
factor shape
at t value theis
0 at t value theis
)()/1(
1)(
0
1
03
tt
t
factor scale
factor shape
at t value theis
0 at t value theis
)()/1(
1)(
0
1
03
tt
t
Planned dump
=5, = 1/11
Beam induced dump
= 0.001, 0 = 0.1
Distribution of dump requests
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 28
… … close brackets…)close brackets…)
System Unsafety/y
Probability
False dumps/yAverage Std.D.
LBDS (OP1) 2.410-7(2x) 4(2x) +/-1.9
BIC 1.410-8 0.5 +/-0.5
BLM 1.4410-3 (Front-end)
0.0610-3 (Back-end VME)
17 +/-4.0
PIC 0.510-3 1.5 +/-1.2
QPS 0.410-3 15.8 +/-3.9
MPS 2.310-4
5.75 10-8/h (SIL3)
41 +/-6.0
LHC Beam Dumping System
PhD. thesis Roberto Filippini:http://doc.cern.ch/archive/electronic/cern/preprints/thesis/thesis-2006-054.pdf
Availability of other systems not studied, can be done if
required
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 29
Also Analysis being done Also Analysis being done with a Different Approachwith a Different Approach
Hybrid methodology combining fault tree for component failure rates and simulations in the time domain for the complete system
Results concerning protection system reliability and beam availability
Option to disable part of a system and see the effect
Collaboration with Laboratory for Safety Analysis, ETH Zürich, Sigrid Wagner
Ongoing…
hybrid metho-dology
System level: global frame, agent-based approach
Component level: established reliability method, e.g. fault tree
User System BIS LBDS
dangerousconditions
Dump
Dfaulttree
individualfailure behaviour
D
MPS
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 30
Key issues concerning Key issues concerning Design of Sub-SystemsDesign of Sub-Systems
Requirements to obtain a “safe system” No single point of failure
Redundancy of critical components Redundancy of signal paths between (sub-)systems
Periodic checks to get back to a state which is ‘as good as new’
Failure rates of redundant systems increase in time – get back to zero (different from aging)
Surveillance of critical signals Safe mission abort Trade off between availability and reliability
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 31
Following the Design Following the Design Studies and ManufacturingStudies and Manufacturing
Test equipment in operational environment Quench Protection System operational during Hardware Commissioning of
the LHC magnets Reliability run starting for the Beam Dumping System with about 3 months
of continuous operation Can give upper limit of failure rate of most critical components because of
redundancy Logging and Post Mortem systems (analysis of events using logging data,
and special ‘fast’ buffers triggered after a beam dump) used during Hardware Commissioning
Install similar equipment or components in operational accelerators Beam Interlock System installed and operational in the LHC injection chain Fast Magnet Current Change Monitor already operational Energy tracking system of the LHC beam dump working for the extraction
system of the SPS injector
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 32
General Test ProceduresGeneral Test Procedures Before operation with beam:
Thorough testing required of all installed equipment Definition and follow-up of test procedures for the individual
equipment Machine Protection System Commissioning Working Group
which approves the test procedures Tests with beam required
Define tests before going into a next beam commissioning phase
Example: Provoke a quench of a magnet and check Beam Loss Monitoring signals
Measure delays between detection and actual beam dump Safe beam flag to allow masking of some interlock channels in
case of low intensity / low energy beams
How to enforce these tests ?
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 33
Lessons Learned from the Lessons Learned from the exerciseexercise
Absolute failure rate levels depend largely on model assumptions, but do indicate the weak links in the system Confidence in relative numbers and sensitivity effects
Hardware of some systems was adapted to obtain reliability numbers similar to the other systems Add redundancy
Periodic testing, sometimes several times per day, will contribute to the safety of the system Test the presence of the assumed redundancy
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 34
Human AspectsHuman Aspects
Hardware •Design•Dependability Studies•Testing of proto-types•Testing of series in Laboratory•Testing once installed•Tests with beam
Procedures•Testing during production•Testing after installation•During Operation
•Confirm Redundancy•Post Mortem•Re-establish confidence
•When changing hardware•When changing settingsGained
experienceA lot of
discussions…
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 35
Example of human AspectsExample of human Aspects
Beam accident extracting high intensity beam in 2004 from the SPS injector by which vacuum chamber was damaged
Noise on temperature sensors induced by the beam caused magnet interlock, stopping the magnet power converter
Error in the protection logic: Magnet power converter was stopped before inhibiting extraction
No clear procedures what to do: the experiment was continued without sorting out the problem
No clear responsibility: several people were in charge at the same time and nobody said ‘stop’
Created a lot of awareness of potential problems for the LHC
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 36
LHC StrategyLHC Strategypresently under Discussionpresently under Discussion
How to change Beam Loss Monitor thresholds & masking of signals
Thousands of values – avoid errors Are the correct when put in for the first time?
Who is allowed to do adapt the thresholds? What will be the procedures?
The Post Mortem Analysis of the Beam Dumping System indicates a fault
What are the procedures to recover? Who can give the ‘ok’ again? “The same problem happened last month; after 1 day of testing we just
continued. We are near the end of the physics run of this year…” Who is in charge? Will there be a group of ‘safety experts’ and what will be
their role?
Jan Uythoven, CERN
ITER RAMI Workshop 6-7 December 2007
Page 37
ConclusionsConclusions Safety and Reliability has become an accepted topic for high power
accelerators The LHC has a coherent Machine Protection System following
interdisciplinary work for almost 20 years Producing dependability numbers is very time consuming and the
result depends largely on the model assumptions However the benefits are that
The weak links can be shown Designs have been adapted accordingly Awareness has been raised
On paper the numbers look good, but testing is required during installation, cold check-outs and operation with beam
Procedures during normal operation. Checks required almost continuously to confirm the redundancy of the systems
Procedures in case an abnormality is detected Who is responsible in the control room?
Organisational issues will be important Enforcing procedures / exceptions
top related