safety informing design - eurocontrol€¦  · web viewsafety objectives translate into detailed...

67
Safety Informing Design Barry Kirwan Eurocontrol Experimental Centre, Bois des Bordes BP 15, Breigny sur Orges, F-91222 CEDEX, France. [email protected] Abstract Evidence suggests that the roots of many accidents occur at the early system design stages. This paper gives an account of one organisation’s attempt to consider safety at the concept exploration or research stage – very early on in the design process. The industry of concern is Air Traffic Management, and the nature of the work involves the development of new concepts to make ATM more effective (e.g. handling more traffic in line with increasing public demand) while maintaining or improving safety. The scope of the safety work at this early stage is more qualitative than quantitative, focusing in particular on hazard and human error assessment, and gaining safety insights from real-time simulations. Examples are given of the detailed safety approach. The safety management framework and organisational commitment necessary to sustain and channel such activities and safety results are also discussed. Keywords: Safety, safety assessment, human error, air traffic management, safety management 1. Introduction 1.1 The ATM Context Air Traffic Management (ATM) involves air traffic controllers in control towers and air traffic control centres controlling civil air traffic through a sophisticated network of airspace, using radar and voice communication as the primary media to know where the aircraft are, to help them get to where they are going, and to keep the individual aircraft apart.

Upload: others

Post on 23-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Safety Informing Design

Barry Kirwan Eurocontrol Experimental Centre, Bois des Bordes BP 15, Breigny sur Orges, F-91222 CEDEX,

[email protected]

AbstractEvidence suggests that the roots of many accidents occur at the early system design stages. This paper gives an account of one organisation’s attempt to consider safety at the concept exploration or research stage – very early on in the design process. The industry of concern is Air Traffic Management, and the nature of the work involves the development of new concepts to make ATM more effective (e.g. handling more traffic in line with increasing public demand) while maintaining or improving safety. The scope of the safety work at this early stage is more qualitative than quantitative, focusing in particular on hazard and human error assessment, and gaining safety insights from real-time simulations. Examples are given of the detailed safety approach. The safety management framework and organisational commitment necessary to sustain and channel such activities and safety results are also discussed.

Keywords: Safety, safety assessment, human error, air traffic management, safety management

1. Introduction

1.1 The ATM ContextAir Traffic Management (ATM) involves air traffic controllers in control towers and air traffic control centres controlling civil air traffic through a sophisticated network of airspace, using radar and voice communication as the primary media to know where the aircraft are, to help them get to where they are going, and to keep the individual aircraft apart.

The fact that ATM does this so well has earned it a reputation as a ‘high reliability’ operation, with very few accidents. The accident rate in European aviation due to civil air traffic management is approximately 1.6 x 10-8 fatal accidents per aircraft flight hour, which equates to 0.6 accidents per year given current air traffic levels. ATM as a direct contribution accounts for only about 4% of all civil aviation accidents, therefore most such accidents are caused by a host of other events outside the current remit of air traffic management. ATM is therefore very safe compared to other industries and other modes of transport. However, to put this into perspective, there have been three recent fatal accidents with ATM causal contribution in Europe since 2000 – a runway collision in Paris CDG killing one person (flight crew), a runway collision in Milan Linate Airport killing 116 people, (ANSV, 2001) and a mid-air collision over Germany (Lake Constance) in Swiss-managed airspace killing 74 people (BFU, 2004).

Figure 1 shows that for some time ATM-induced accidents were rare, but with these three recent accidents there is cause for concern – either because a trend is perhaps occurring (it is too early to say statistically), or because three such accidents in a short space of time is simply unacceptable. These accidents have indeed triggered significant additional safety efforts aimed at reinforcing the safety of the current European ATM system (Eurocontrol, 2003). There is therefore no room

Page 2: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

for complacency in ATM safety, especially since the number of passengers and aircraft flying is increasing, and is set to double in the next 10-15 years. This capacity increase means that ATM as an industry must work hard to maintain its current level of safety, and even harder to improve on that level.

Figure 1 – European ATM accident occurrences

ATM in Europe is also undergoing fundamental changes, partly to adapt to the increasing capacity demands, and partly to remove the in-built hindrances to an already complex and near-saturated system. New designs and approaches have therefore been conceived to enhance ATM system performance. Some of these aim to support the controller, the human responsible for maintaining a safe, orderly and efficient flow of traffic, whether in a control tower, or an en route air traffic control centre, via new sensors, procedures, displays, warning devices, and other tools to support efficient communication and control of traffic. Other approaches are concerned with how the airspace is organized and managed, in particular moving away from traditional national boundaries to ‘functional airspace blocks’ which are based instead on predominant traffic flows and routes. Rather than having forty or so different countries each with their own style of procedures and ways of working traffic, and each with their own national rules and regulations, the aim is for a ‘Single Sky’ for Europe, with more collaborative approaches and harmonisation of air traffic control practices and procedures. Therefore, the whole ATM system in Europe is undergoing a paradigm shift, with key dates for its transformation being the 2012 – 2017 period, when many new tools, procedures, and practices will be implemented. Such a change offers both challenges and opportunities for safety, and of course for the designers of this new ATM system paradigm.

1.2 The Design Process & Safety

1.2.1 Eurocontrol1

Eurocontrol arose some time ago as an organisation from a European Convention, and from a realisation that different member states needed to have a body working for the common interest. Eurocontrol is therefore non-profit making, and does not produce ATM ‘products’: it is not a manufacturer. Eurocontrol’s mission is the development and harmonisation of the European ATM system. Hence it aims to show the way, to help different European States towards mutually beneficial consensus, and also to promote best practice in ATM. At the kernel of its mission is safety, since ATM is centrally concerned with the travelling public.

Eurocontrol headquarters is in Brussels; it has its own air traffic control centre in Maastricht, a controller training centre in Luxembourg, and a research centre in Brétigny south of Paris, France, where real-time simulations or ‘experiments’ are carried out on future ATM concepts.

1 EUROCONTROL is the European organisation for the safety of air navigation. This civil and military organisation, which currently numbers 38 member states, has as its primary objective the development of a seamless, pan-European ATM system.

Page 3: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

1.2.2 Eurocontrol and DesignAt the Eurocontrol Experimental Centre (EEC), the research arm of the Eurocontrol Agency, a number of future concepts for ATM are being explored, both for the mid-term (2012 – 2017) and beyond (e.g. 2025). Typically, concepts are explored together with interested European Member States as primary stakeholders, until such concepts are either discarded or stabilised for further development and ultimate deployment. The further development is carried out by the main body of Eurocontrol, whilst final development and deployment is passed over to the stakeholders in industry. This handover from the EEC to Eurocontrol in general, and from Eurocontrol to its stakeholders, is represented in figure 2.

In practice however it is not such a ‘clean’ picture: some concepts stay at the EEC for a long time, reaching a relatively advanced stage of maturity before being considered ready for development towards industrial systems or tools to help the controllers do their job better. Thus, sometimes tools may be developed to the stage that they can even be tested in real ATM centres in Europe under closely controlled conditions (usually called ‘shadow-mode’ trials). However, such tools will still be handed over to other stakeholders at a certain point in order to carry out formal requirements engineering and software building etc. to generate true products that could be licensed to work in such ATM centres. This is to say that whilst the safety approaches embodied in this paper are aimed at early concept design, sometimes the examples herein will appear far beyond the ‘concept’ stage. This state of affairs is not unique to the ATM industry, but may contrast with certain others such as nuclear power and offshore petrochemical, where the handover from concept to detailed design is likely to be a more black-and-white picture with clear handover points.

New ATM concepts are developed and designed at the EEC, whether these are ideas on how to improve the capacity of the whole ATM system via better procedures, or tools to help the controller handle more traffic effectively, or warning tools to support safety. Once developed, these new concepts are tested either in small-scale simulations with real controllers to gain early feedback and improve the concept (figure 3), or in large-scale simulations to test a more mature concept with a larger and more representative sample of controllers from various member states.

Page 4: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

ANSP Air Navigation Service ProvidersEEC Eurocontrol Experimental CentreFHA Functional Hazard AssessmentH HumanHW Hardware

PISC Pre-Implementation Safety CasePOSC Post-Operational Safety CasePSSA Preliminary System Safety AssessmentSTD Standardisation SW Software

Figure 2: EEC design and safety assessment

Figure 3 – Small-scale simulation investigating a new ATM concept at the EEC

The people developing these new concepts come from a range of backgrounds, including a number of controllers from different national Air Navigation Service Providers (ANSPs) in many European countries and, relevant to this paper, a small team of safety assessors.

Page 5: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

1.2.3 ATM & Safety AssuranceATM safety assessment is mandated in European member states by the ESARR 4 (Eurocontrol Safety and Regulatory Requirement 4) guidance on risk assessment (Eurocontrol 2003b). This states that any new systems or significant changes must not lead to exceeding the agreed target level of safety (TLS) for ATM in Europe, which is currently set at 1.55 x 10-8 accidents per flight hour. The means of demonstrating compliance with this TLS is via a three-stage safety assessment approach, as shown in figure 4.

The first of these stages, Functional Hazard Assessment (FHA), entails a consideration of the potential hazards of the new design. The second stage, Preliminary System Safety Assessment (PSSA) entails a detailed qualitative and quantitative demonstration that the system is tolerably safe (i.e. within the TLS). In particular this stage identifies the safety requirements for the proposed system architecture, specifying in effect what will keep the system safe, and the required integrity of the safety attributes and properties of the system, including its human elements. The third stage, System Safety Assessment (SSA) ensures that the system, as implemented, achieves tolerable safety i.e. is still compliant with the safety objectives, requirements and integrity levels, and is therefore safe to go live. This includes implementing, verifying and monitoring of risk mitigation measures, together with demonstrating that the level of risk has been reduced as low as reasonably practicable (ALARP), i.e. all the measures have been taken to reduce the risk unless their cost is grossly disproportionate to the reduction in risk they achieve. The ultimate deliverable is the safety case itself, where the service provider assembles and documents evidence which justify the adequacy of the safety provisions at his facilities.

Figure 4: Three-Step Safety Assessment Process (ESARR 4 Safety Assessment Methodology [SAM])

Page 6: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

In practice, this explicit and pro-active safety assurance approach is fairly new to ATM, and is still undergoing refinement. The three safety assessment steps have only been applied in total to a few ATM systems within Eurocontrol at the time of writing, although a number of future system elements have passed the FHA and PSSA stages, and many more are entering this overall assessment process, whether such assessments are carried out by Eurocontrol or member states. Additionally, because the approaches are relatively new to ATM, the actual means of compliance, the techniques applied to achieve the three steps, are varied, and still to an extent finding their way. This is because at the time ESARR 4 was developed, there was no off-the-shelf assessment system and associated toolkit sitting waiting to be used for ATM, although considerable information already existed. Also, although ATM has borrowed approaches from other industries where for example Probabilistic Safety Assessment (PSA) is more commonplace and mature (e.g. the nuclear and oil and gas industries), ATM has some notable differences which mean that such borrowing is insufficient. Four of these intrinsic differences are noteworthy, both for the designer and the safety assessor:

There is high reliance on the human, both the controller who makes critical decisions every minute (critical in that if they are wrong an incident may well occur), and the pilot of the aircraft being controlled.

The situation is live and highly dynamic, and events can escalate rapidly, with a shift from normal operations to a serious event and risk of accident often within minutes or even seconds, rather than tens of minutes or hours as in many other industries.

ATM is a truly open system, literally a global system, so that problems in one area can impinge on other areas and other transport sectors (e.g. an event causing an airport to close, particularly a large airport, will cause significant perturbation to a very large segment of the airspace and will increase the loading at other airports and cause increased demand in other transport domains (principally rail and road) that must attempt to take up the load in terms of people travelling). This means that the system elements are highly interactive and coupled to other elements of the system and outside its intended boundaries of operations and responsibility. Consequently, since aviation system designers do not (and cannot) attempt to construct the whole system simultaneously at a detailed level, reductionism – breaking the overall system down into components - is generally applied. However, this obviously requires the subsystem interactions to be well-understood and possible operational and safety consequences of such interactions to be determined. Design and safety assessment therefore have to work in this increasingly complex system environment.

There is no easy way to shut down the system quickly. There is not the equivalent of an emergency shutdown function, since the aircraft have to land in order for the system to reach an intrinsically safe shutdown state.

Many of the current developments within Eurocontrol cut across traditional divisions within aviation, i.e. they impact airworthiness, route spacing standards, ground systems, space infrastructure, procedure design, etc. This in particular exposes projects to the full and varied set of target levels of safety.

Nevertheless, ATM has borrowed significantly from other industries, and its safety technique toolbox looks similar to other technologies, except for its extensive use of real-time simulations when evaluating new concepts such as new airspace designs, controller tools, etc.

1.2.4 Safety Assurance & Concept Design: Safety Opportunism, Design EnhancementConcept exploration and associated research is typically the province of the EEC (other member states and research centres do of course explore concepts independently of Eurocontrol). Once concepts are mature and are taken on as part of the developing programme of work for

Page 7: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

implementation, they will be subject to a safety assessment process as dictated by the appropriate safety regulatory authority (e.g. compliant with ESARR 4), as would be expected in any mature industry. However, before such maturity, when concepts are not stable but are still being explored, there is still the opportunity to consider safety, even at this very early stage. Yet traditionally this has not been done in any formal way, either in ATM or in other industries with a more mature and explicit safety assessment process.

The two papers by Kinnersley & Roelen and Taylor in this special issue have shown that accidents often have their roots in the design process. This appears to be a common fact across a range of different industries. Furthermore, it is clear that the roots of accidents are sometimes at the early design stages. But there are several problems with trying to address safety in the concept exploration or concept research phases:

There is often little detail on the procedures or controller working practices proposed for the concept. This amounts to a lack of a mature operational concept, one that is sufficiently detailed to allow safety hypotheses (e.g. ‘what would happen if…?’) to be answered (other than – ‘well, it depends how we operate it’) or even asked.

The way in which the specific research concept is developed (e.g. a new tool for the controller or a new means of controller-pilot communication) will interface with other parts and dimensions of the system (e.g. other tools and airspace design and procedural constraints) that may not yet be known or developed. Research often explores new system elements or element replacements rather than entire system architectures, and work on integration into the full system concept will come later. This makes consideration of hazards due to interactions with other parts of the system, and determination of the impact of the new system element on the overall ATM target level of safety difficult, to say the least.

Safety assessment of new concepts requires incorporating expert judgements where data are not available or not representative. This requires identifying hazards that might have never occurred in the past. Moreover, experience has shown that it can be difficult to find experts with sufficient ATM expertise, who are also able to relate to a risk model.

People in aviation have a legendary ‘can-do’ attitude, which contributes to the success of the industry, but some people may have difficulty admitting that something ‘can’t’ or ‘should not’ be done; that the margin has been cut too short. This can apply to controllers involved in the design process as domain experts.

The people developing new concepts are trying to find better ways of optimising the system, and do not necessarily want to be too constrained at an early stage with burdensome safety assessment procedures and processes. Indeed, some promising concepts could be deterred at an early stage by too much safety stringency, whereas perhaps later on the safety issues could be resolved, and the positive benefits of the proposed system change could still be realised.

The question is therefore one of whether there is value in doing safety at an early concept research stage. The potential advantages, however, are significant:

1. Since accidents often have their roots in design, the sooner safety starts the better. In particular, hazards or hazard causes identifiable early on may become more difficult to find or correct later, with the risk that they become latent errors in the system design.

2. An early involvement, one that is not too stringent or constraining for the designers, will lead to designers also thinking about safety from the start, rather than thinking it is something that comes later, and not their job or concern. It will help to avoid ‘bolt-on’ safety or safety fixes, and should lead to safety becoming a more integral part of the concept. This requires safety personnel to be present and active and serve as a channel

Page 8: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

for the voicing of safety concerns and aiding system designers to build safety recommendations into the design architecture and design detail.

3. At the very early stages, the greater degrees of freedom can lead to the discovery of safety opportunities, i.e. ways of making the overall ATM system safer, whether or not the new system element is adding hazard potential to the ATM system. Therefore, at such an early stage there can be added safety value, either by building in certain properties or checks in the proposed system element, or via realisation of joint safety benefits from partnerships in different developing concepts.

4. Safety culture can be enhanced by early consideration of safety in the design process. Not only do the designers become more exposed to safety and its mission and practices, but other stakeholders, from project managers to controllers taking part in early simulations, realise that safety is being addressed in a useful way, reinforcing the importance of safety for all concerned, and its continual presence throughout the entire system life cycle.

5. Some potential ‘show-stoppers’ related to safety can be identified early on2, when there is time enough to try to derive thorough protection mechanisms, rather than having to rely overly on controllers saving a flawed system concept when handling live aircraft.

1.2.5 Commitment from the TopSince there are clearly potential advantages, and because the management of the EEC decided that safety was sufficiently important to investigate it early on in the new developing ATM paradigm, a safety policy was developed for the research centre. The main content of the EEC safety policy is shown in figure 5. What is notable about the safety policy is firstly that it exists for a research and development organisation, which is unusual. Secondly, it makes it clear that safety is the responsibility of all. At the EEC it is mostly technical people who are involved in concept exploration and who are effectively doing design via research – they are not safety people. Thirdly, it clearly states the need to build safety into design at an early stage. Although a safety policy is ‘just another piece of paper’, its role should not be underestimated. It acts as an anchor point for the safety culture of an organisation. Although it guarantees neither commitment of resources, nor an insightful safety-design partnership, it does send a signal through the organisation, and gives the safety people and at the least willing designers a mandate to work together on producing safe concepts.

What is especially relevant to this paper and to designers and safety assessors alike, is point 6 in the first part of the safety policy – ‘Proactive approach to safety benefits’. The point is not only to make sure that the new concepts do not add any new risk – this should always be a given – rather, the aim is to look additionally for safety opportunities, i.e. ways of increasing safety. This is a recurring theme in this paper, and is believed to be one hallmark of a positive safety culture.

1.2.6 From Policy to PracticeThe question then becomes one of how to enact such a policy, and make it work. This can be tested via three questions:

Can an approach be evolved which fits concept design/exploration activities? Can such an approach lead to tangible safety improvements? Are those involved in concept development and the recipients of safety effort convinced of

the utility of the approach?

2 As an example, a particular controller interface design contained a ‘waste-paper basket’ function, for deleting information, some of which was critical. However, it was too easy to accidentally and permanently delete active critical information, which could lead to dangerous occurrences. Therefore there needed to be either an ‘undo’ function, or else access to the ‘bin’ in case information needed to be retrieved.

Page 9: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

The last question arises because safety at this stage of design or conceptualisation is not a regulatory requirement. Even with a clear safety policy and procedures, design personnel will not spend time on activities they believe are not adding value to their concept, and cannot be forced to do so. There is therefore the requirement for persuasion, the best form probably being via example and involvement, so that actual benefits are seen first hand. Nevertheless, if the answer to any of these questions is ‘no’, then the approach will either not work or will not lead to a sustainable safety-in-design process. The remainder of this paper therefore sets about showing how the approach works in practice, and formulates answers to these questions.

Page 10: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 5: Extracts from EEC Safety Policy for Concept Exploration (EEC, 2004)

EUROCONTROL EXPERIMENTAL CENTRE SAFETY POLICYSAFETY CHALLENGES AND FUTURE ATM SYSTEMSTo cope with the future expectations and objectives for the overall Air Transport System, ATM is evolving towards a unique Pan-European, fully inter-operable and integrated ATM system supporting the Single European Sky. The EUROCONTROL Experimental Centre (EEC) together with its Stakeholders aims at achieving this vision for future ATM. Because there is so much change in the landscape envisaged, a significant, integrated and explicit safety effort is required.

OUR SAFETY MANAGEMENT SYSTEMWorking under the Agency-wide Safety Policy, the EEC is committed to operate a Safety Management System, which ensures:

1. HIGHEST PRIORITY FOR SAFETYAchieving future safety objectives is afforded the highest priority with respect to commercial, operational, environmental or social pressures;

2. LEADERSHIP AND COMMITMENTLeadership commits necessary and sufficient investments in safety resources to continuously improve safety performance;

3. RESPONSIBILITYOur staff is responsible for improving safety;

4. FUTURE ATM SAFETYThere is an increase in safety along with the implementation of the future ATM system;

5. SAFETY BUILT IN DESIGNSafety activities will accompany the research, development and industrialisation phases of future ATM Systems;

6. PRO-ACTIVE APPROACH TO SAFETY BENEFITSSafety in design activities will pro-actively identify areas where safety benefits can be achieved; and

7. SAFETY WITH OUR STAKEHOLDERSWork in attaining future safety requirements is co-ordinated with all our Stakeholders.

Page 11: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 5 – continued

OUR SAFETY MANAGEMENT APPROACHTherefore, the EEC is committed to:

1. R&D TO INCREASE SAFETYCarry out safety R&D and safety evaluation of R&D to enable full scope assessment of the future ATM system thereby reducing ATM-related safety risks in the face of rising air traffic demand. Safety R&D addresses the need to adopt, develop where needed and maintain appropriate Safety Assessment methods that help R&D to predict potential safety performance of proposed ATM enhancements;

2. REQUIRED SAFETY RESOURCESPlan required resources to assess the changes in the aviation landscape and their implications for safety;

3. SAFETY TRAINING AND COACHINGConduct safety training programmes and provide coaching designed to teach, motivate and sustain safety performance;

4. HOLISTIC APPROACH TO ATM SAFETYDevelop an Integrated Risk Model showing the relative safety contributions of different systems as part of the future ATM vision thereby ensuring that when put together, all aspects remain safe and resilient;

5. SAFETY ASSESSMENT AND SAFETY ASSURANCECarry out safety assessment and assurance activities on all aspects and elements of the future ATM system;

6. EFFICIENT APPROACH TO SAFETYAchieve safety benefits in the most cost-efficient manner and in the shortest time possible; and

7. SAFETY PROMOTIONDisseminate in an efficient and timely manner safety information and lessons learnt to all stakeholders involved.

2. Safety Assessment Approach for ATM Concept DesignThe safety approach developed has been adapted to the way that design and concept exploration are carried out at the EEC. For some time this has been through a loose aggregation of concept elements. For example, there may be a project for a new tool to help tower controllers see where aircraft are even at night or if obscured by buildings or other aircraft. There may be tools developed to maximise runway throughput; tools to help controllers see when aircraft may stray into each other’s paths, and tools may be developed that are more like services; e.g. the ability to send messages to aircraft electronically, like current portable telephone SMS messages, rather than having always to rely on voice communication via radio-telephony (RT), and having to wait until an RT channel is clear before a new message can be transmitted. Such new concepts are clearly good ideas, but they are to an extent piecemeal, each being an addition to the system, which could probably be added on its own. Alternatively, a more ‘macro-concept’ can be developed which integrates a number of such functions to work together more coherently, delivering a major benefit in terms of capacity, efficiency, or safety. In ATM, there is work ongoing at both the concept element level, and at the overall system concept level. Safety

Page 12: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

assessment at the EEC has adapted to this two-level situation by adopting safety techniques that are ‘bottom-up’, and more strategic tools that are ‘top-down’. These two major arms of the safety approach are called respectively SAND (Safety Assessment for New Designs) and Integrate Risk Picture (IRP). These are outlined separately below, as they both have much to offer both safety and design personnel. This paper first presents SAND, because in practice bottom-up safety is easier to grasp first than the top down version, especially since several of the SAND approaches are used by IRP. Most emphasis is on SAND, since this has been in operation longer than IRP, which is still under development. The ‘bottom-up’ (SAND) approach that has been developed up to this time is largely qualitative rather than quantitative, although this may change in the future. This is because at the very early stages attempts to quantify, for example, the impact on the target level of safety of a new ATM tool are likely to be imprecise and therefore not reliable for decision-making. However, hazards that are identified can nevertheless be evaluated initially in terms of their risk levels using generalised categories of severity and frequency, so that their significance and need for amelioration and/or mitigation can be assessed and addressed by the project team. Ultimately however, this requires rationalizing and aggregating the different hazards into a risk model (such as a fault tree). This is the future connection that will be made between SAND and IRP (discussed later in section 2.2).

2.1 SAND: Safety Assessment for New DesignsThe overall philosophy of approach is shown in figure 6 [Eurocontrol-FAA AP 15, 2004], which is a generalised process model of safety assurance; one which can be used to address design processes as well as operating systems. Notably, it explicitly includes feedback loops and hence safety learning. As will be seen below, this aspect became a key element of the approach for

Figure 6: A Generalized Seven-Stage Safety Assessment ProcessFigure 6: A Generalized Seven-Stage Safety Assessment Process

Scoping the assessment

Modelling the nominal system

Identifying hazards

Combining hazards into a risk framework

Evaluating risk

Supporting risk mitigation

Confirming actual risk is tolerable or reducing

ITE

RAT

ION

Feedback /Forw

ard To O

perations, A

ssessment,

and Design

Page 13: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

safety at the concept exploration stage. This stage also does not imply the absolute need for quantification as used in some industries, although as already mentioned, ESARR 4 does require this at present for mature designs and existing systems undergoing safety assessment.

The ‘instantiation’ of the process embodied in figure 6 for concept exploration in ATM is shown in figure 7. This diagram includes more general aspects of the currently developing Safety Management System (italicised in figure 7: Safety policy; Safety Plan; Safety Information Data Exchange System), which are not assessment activities, but are shown for completeness. The link to IRP is also represented, though IRP is not part of SAND (see later in section 2.2).

The following sub-sections describe the individual SAND elements.

Page 14: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 7: Safety Approach for New Designs (SAND)

Safety Plans: Scoping the safety needs for

EEC research projects

Safety policy – commitment of EEC to safety in its ‘products’ in its three main

‘research areas’

System representation – development of

operational scenarios and task analyses

Hazard identification

Review of incidents via SAFLEARN

Formal hazard

identification techniques:

SWIFT, HAZOP,

TRACER, HF Case

Hazards identified from early

simulations (SAFSIM)

Risk evaluation: severity and frequency

estimation

Identification of safety requirements: means by

which to assure safety

Hazard logging and documentation; feed-forward to

other concepts and total system safety

Integrated risk picture project – determining the safety benefits and ‘costs’

of each project in the larger ATM system safety picture

Cross-boundary HAZOP – looking

at hazards from interacting elements

SIDES: Safety Information Data Exchange System for stakeholders: feed forward hazard evaluations and safety requirements to FHA & PSSA stages of viable concepts;

safety monitoring requirements for future concepts when implemented

Live trial HAZOP of concepts being

tested as prototypes in the field

Page 15: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

2.1.1 Safety PlansAt the EEC there are three Research Areas particularly focused on developing systems for the 2012 – 2017 operational ATM concept. These three research areas are concerned with Airports, Sector3 Design and Tools, and Airspace Network Capacity and Demand. Each of these research areas has an over-arching design for 2012, but within that design there are individual projects or work packages dealing with the development of controller tools, procedures, and airspace design and traffic management concepts. A safety plan now exists for each of these research areas, and a safety manager exists to organise the activities in each research area. The safety plan explains what needs to be done, and is signed by the safety manager and by the research area manager. Typically it is a 50-page document, but the meat can be condensed into a table of safety activities planned for each project. As an example, table 1 below shows the preliminary safety activities planned for the ‘airport’ projects in the EEC, with concept elements in the rows and safety techniques in the columns. The techniques are explained in the subsections of 2.1.3.

Table 1: Provisional safety activities planned for EEC Airport R&D concept elements4

ATM process/procedure

SWIFT HAZOP SAF-LEARN

HF Case

Cross Boundary HAZOP

SAFSIM Live Trial HAZOP

Reduced wake vortex separation – cross wind departures

√ √ √ √ √

Time-based separation √ √ √ √ √ √ √

‘OPTIMAL’ procedures - curved segmented approaches

√ √ √ √ √ √ √

ASMGCS - control & planning functions

√ √ √ √ √ √ √

Integrated Tower CWP √ √ √ √

The Safety Plan also gives resource estimates and a timescale for carrying out the safety activities, and acts as a ‘living document’, updated as required and as safety evidence accrues for the various projects, or as new project requirements arise.

3 A sector is a geographical piece of airspace controlled by two air traffic controllers who manage traffic through their sector. The whole airspace is divided up into a large number of neighbouring sectors.4 Wake vortex is the turbulence left behind an aircraft (its ‘wake’), when there are cross winds at an airport. Wake vortices are dispersed faster thus potentially allowing aircraft to land sooner behind each other. Time-based separation also refers to new procedures to increase runway landing frequency. OPTIMAL allows for more efficient airport approaches for certain aircraft. ASMGCS (Airport Surface Movement Ground Control System) helps the tower controllers know where all aircraft are night or day. CWP (Controller Working Position) is a new enhanced workstation for the future tower controller.

Page 16: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

2.1.2 Concept Representation: Operational Scenarios and Task AnalysesIt is usually necessary to analyse the new concept element in the context of an operational scenario rather than purely functionally, since the functional architecture is sometimes not yet fully specified. A primary safety focus is on the human element, which must be considered in a meaningful (i.e. operational rather than abstract) context. The degree to which this can be achieved depends on the design stage, i.e. the maturity of the design concept. There is therefore a trade-off here: the more advanced the design, the more the scenarios and operator interfaces can be described, but the harder it is to make or justify safety-related design changes.

Two approaches are used, operational scenario definition (context definition), and task analysis (see Kirwan and Ainsworth, 1992; Callan et al, 2004). A much-simplified example of each is shown in figures 8 and 9 below. Figure 8 represents an airport terminal manoeuvring area (in this case Paris Orly), where aircraft are approaching the airport and preparing to land. The figure is a simplified schematic, but enough for operational controllers familiar with Orly to use as a basis for hazard identification discussions in a HAZOP (see section 2.1.3.2 below). Figure 9 is a corresponding simplified task analysis for ‘airborne separation assurance procedures’, a new concept wherein controllers will delegate certain tasks to pilots, in this case to merge behind other aircraft as they are manoeuvring in the airspace. This delegation can in theory optimise landing rates, thus enhancing capacity.

2.1.3 Hazard IdentificationHazard identification is the centrepiece of SAND, since at this stage the primary focus is on identifying hazards which could be of concern. Later on in the detailed design phase the focus shifts to more precise evaluation of the risks and system protection devices, and ultimately to assurance and demonstration that the system is indeed safe. But at this early stage, both in a design and a safety sense, the minds of those concerned are open, and are considering ‘what if..?’ type issues. Since hazard identification benefits from looking at the same problem from several different perspectives, there are more techniques utilised at this stage to help ensure comprehensiveness of hazard identification. The hazard identification techniques used at this stage are each defined below.

2.1.3.1 SAFLEARNSAFLEARN (Bonini and Joyce, 2004; Joyce and Bonini, 2005) is aimed at learning from past events to protect the future. Whereas accidents and incidents are routinely analysed for operational lessons in many European States, this currently only acts as a short-term learning cycle. Whilst this is important, there is a longer learning cycle that can be advantageous. The simple philosophy of SAFLEARN is that designers should be aware of those safety issues that have already occurred, so they can try to design them out of the next generation ATM system. The safety database created concerns the very few accidents that have occurred, and a plethora of near miss incidents, where standard safe separation between aircraft (either vertical or horizontal separation, or both) was lost due to controller or pilot error, technical failure, or environmental disturbances, or a combination of these factors. However, this is not as easy as it seems. Designers are no more routine readers of incident reports than safety professionals routinely read design magazines. Furthermore, it is not always apparent how a very specific safety-related incident (and incidents are always ‘specific’, with local factors) relates to a general design concept being explored.

Page 17: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 8 – Safety Scenario: Airborne Separation Assistance System (ASAS) Procedure – ‘Heading then Merge’: Hypothetical scenario based on Paris Orly airspace

Context Definition Headings: Communications, Navigation and Surveillance (CNS)/ATM capabilities – equipment

available to the controller Aircraft equipment – equipment available to the pilot Traffic characteristics – traffic levels and aircraft type mix Airspace design – airspace features in the scenario Sector manning – organisation of controllers on each sector Controller working position – equipment on the workstation Separation – separation standards compared to today’s standards Weather conditions – range of conditions applicable

Overall Description: Unlike today, when the controllers must continuously control all aircraft inbound to the airport, once inside the sector aircraft may be instructed to identify an aircraft ahead of it (the ‘target’ aircraft) and then to follow onto a particular heading, then merge behind it, keeping a particular distance behind the aircraft in front. This reduces controller workload and allows the controller to optimise the landing rate.

Airport

Sector boundary

Sector entry legSector entry leg

Page 18: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 9: Simplified extract from task analysis for ASAS (Airborne Separation Assistance System) scenario (heading then merge)

Task Step Controller Pilot Non-Flying (PNF) (e.g. Captain)

Pilot Flying (e.g. First Officer)

1

First call aircraft B

Identification of Aircraft B Announce to controller

3 Target selection for B

1. Instruction + mouse click 2. Read-back

3. Input code (cockpit display)

4. Visualisation and positioning to PNF (Navigational display)

5. Crosscheck of positioning (Navigational display)

4Target identification for B

2. Target aircraft confirmation

1. Target positioning to ATC

3. Target validation (cockpit display)

5

QContinue heading then merge for B

1. Instruction + click

2. Read-back instruction3. Input instruction, waypoint and spacing value (cockpit display)

3bis. Maintain current heading

4. Check feasibility (onboard computer)

5. Instruction validation (cockpit display)

6. Monitor the acquisition of spacing (Navigational display)7. Initiate direct to Waypoint

8. Crosscheck direct to Pilot Flying9. Report to ATC

10. Maintenance of spacing by speed actions

11. Crosscheck of speed actions and monitoring the spacing

Sometimes general lessons are clear: for example, a common problem is when two aircraft on a controller’s radar screen have similar ‘call-signs’ (e.g. BAW 123 and BAW 223). In such a situation, a busy controller, perhaps through distraction or cognitive overload, may give the right command (e.g. to descend) to the wrong aircraft (this is called call-sign confusion). Such an error

TIM

E

Page 19: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

can lead to a loss of separation, depending on whether there is another aircraft already at the level below. New technology called ‘datalink’ could help avoid such errors, since this technology involves menu selection and mouse selection of the aircraft and automatic relaying of information to the aircraft, rather than a speech command from controller to pilot. However, of course other errors could occur with datalink. Mis-selection is still possible; typing or menu-selection errors could still occur, leading to similar unsafe outcomes. Therefore, the approach of SAFLEARN is to carry out more detailed assessments with project personnel, using the detailed incidents to try to determine how the new system element could overcome current safety disadvantages and vulnerabilities.

An example of SAFLEARN in practice concerns the CORA (Conflict Resolution Assistant) project. CORA (figure 10) is an automated controller resolution assistant. It detects possible ‘conflicts’ between aircraft (i.e. where separation might be lost) and suggests alternative trajectories to avoid loss of separation. In the example in figure 10 CORA is integrated onto the normal controller radar screen. Each of the small white objects in the figure is an aircraft. CORA shows two lines that intersect, suggesting a conflict, and then another line involving a small deviation that will avoid loss of separation. The two embedded windows reflect the vertical picture (the bottom left window) since radar screens are primarily ‘plan view’ (i.e. view from above) in nature, and the urgency for executing a ‘resolving’ manoeuvre (window towards top right).

A SAFLEARN study for CORA started with around 100 incidents in a relevant en-route airspace5. Of these 100 incidents, approximately one third could have been avoided had CORA been present and used by the controllers. This first finding was very useful to the CORA team, since it already showed the safety value of the developing tool.

The second result from the SAFLEARN workshop (using SAFLEARN safety people together with the CORA design team) was that additional safety requirements needed to be taken on board to maintain this safety value and also to avoid introducing other problems. For example, CORA’s operational concept had not sufficiently taken account (from a safety perspective) of problems with military aircraft activities (with their fast climb and descent rates and rate of turn), and the use of CORA across sector boundaries needed further attention. Additionally, its usage during controller training (since final controller training is done ‘on the job’ together with an instructor handling live traffic) needed further consideration.

The third main ‘result’ from this SAFLEARN study was that there existed a significant safety opportunity, if the design of CORA could be linked to the design of datalink procedures and technology. At present CORA could reduce the number of losses of separation, but CORA and datalink together, if designed appropriately and synergistically, could almost double this risk reduction effect.

5 CORA would not be used near an airport – it would cause too many false alarms due to the high degree of manoeuvring near airports

Page 20: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 10 - CORA (Conflict Resolution Assistant) tool for aircraft conflict avoidance.

Page 21: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

2.1.3.2 HAZOP & SWIFTHAZOP (Hazard and Operability study - Kletz, 1974) is a well-known hazard identification technique and is now used in offshore oil and gas, nuclear power industries, and more recently the ATM industry (see also the paper by Jagtman [10] on HAZOP for the automobile industry). HAZOP uses a group of experts led by a chairperson to consider hazards, their consequences, and possible mitigations. HAZOP has been used in ATM, for example, with the design of electronic strips, which replace the paper strips controllers have been using for decades to inform them and remind them of pertinent details of aircraft in or approaching the controllers’ sector of air traffic control. The typical HAZOP ‘guidewords’ are shown in table 2, and table 3 gives an indication of the kind of discussions that occur, adapted from an actual HAZOP transcript for electronic strips (Kirwan and Kennedy, 2001). Table 4 shows an example of ATM early design HAZOP output. In this example, prior to sending an electronic message to an aircraft, the controller must position the mouse target over the aircraft symbol on the radar screen. The first error is relatively minor; the second more significant (e.g. it could result in an instruction for the wrong aircraft to descend.

Table 2 – HAZOP guidewords for ATMBasic Guidewords More information No action Less information More action No information Less action Wrong information Wrong action Additional Concepts Part of action Purpose Extra action Clarity Other action Training More time Abnormal conditions Less time Maintenance Out of sequence Safety

HAZOP has been found to be very useful in revealing credible hazards related to new concepts. It has three significant advantages. Firstly, when the safety assessor is trying to identify hazards when there is, as yet, no mature operational concept or procedures, this can be very difficult. There are simply too many open questions to make progress. However, in a HAZOP where controllers (and if appropriate pilots) are present, these experts can usually ‘fill in the gaps’, interpolating between current practice and future practice of how the system should operate. This means that hazards can be identified. The second and related advantage is that the composition of the HAZOP group, comprising project design and controller reviewers, often lends itself to identifying promptly design solutions that will increase safety. Once the hazard is identified, they will often determine an immediate solution via a system change or procedural modification (as in table 3). The third advantage is that such a process involves the project in safety discussions. If they really believe a hazard is incredible they will say so, but if not there is implicit acceptance that such a hazard must be addressed. The approach therefore gains ‘buy-in’ from the project team. They own the results as much as the safety assessors, rather than being given an ‘external’ safety assessment that is a ‘fait accompli’, which they have to accept only on the basis that they trust that the safety people knew what they were doing.

SWIFT is a faster version of HAZOP and is used for cases where a rapid hazard identification is needed, or where there is little time or resources to run HAZOPs. A typical HAZOP in ATM at the EEC is several days, though longer ones have occurred. This contrasts with full HAZOPs of mature systems in other industries, which may take weeks.

Page 22: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Table 3. Excerpt from HAZOP discussion (of electronic strip system design)

HAZOP Group Member

Discussion

Human Factors Specialist

If these two objects overlap, could the controller operate on the wrong object, i.e. aiming a message for the one on top but actually communicating with the one that is now flying beneath the top one?

Designer 1 Well, we expect the controller always to move the objects so that they won’t overlap, before transmitting a command.

Controller Well, there might not always be time, but as long as you’re operating on the one on top, one coming underneath won’t be selected without it being clear, will it?

Designer 1 Hmm. Well, it isn’t impossible actually, depending on how long you leave the cursor without entering a command. Presumably that would be no more than a couple of seconds would it?

Controller 2 Not necessarily, I mean if I’m in the middle of something and then a higher priority call comes in, I’ll leave the cursor there and then come back to it. It could be a while, up to a minute.

Designer 2 Right. Okay, we need to take another look at this, and implement some way of highlighting that the original target has been de-selected and must be re-acquired, otherwise the right message could be sent to the wrong aircraft.

Chairman So, are we agreed then that we need an action on the designers here to …?

Table 4: Extract of related ATM HAZOP output (Kennedy & Kirwan, 1999).

Function Guide Word

Cause Consequence Indication System Defences

Human Recovery

Recommend-ations

Highlight Object (aircraft label)

NoAction

OtherAction

Another item preventing access to target

Clustering results in different aircraft being highlighted instead of target

Difficulty in hooking target aircraft

Instruction may be given to wrong aircraft on the system

No highlighting of target

As Above

None

Highlighting is colour coded to indicate direction of travel; Call sign is displayed on all menus

Drag blocking object out of way; Strategic management of screen items

As Above

Design objects to ‘roll around’ each other; use Height filtering; Flip system to move between object on top and the one beneath; Highlight background

As Above

2.1.3.3 TRACERTRACER (Technique for Retrospective Analysis of Cognitive Errors in ATM - Shorrock and Kirwan, 2002, figure 11) is a single analyst human error identification approach. It was developed to classify the causes of incidents, and then a further version (TRACER-Lite) was developed for predictive purposes. It requires first a task analysis, and then applies a series of guidewords and other error-related taxonomies to identify both errors and (qualitative) error recovery likelihoods (table 5).

Page 23: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 11 – TRACER-Lite Overview

Table 5 – TRACER-Lite Internal Error Taxonomy (Guidewords)

Internal Error Mode Internal Error MechanismPerceptionMishear ExpectationMis-see Confusion No detection (auditory) Discrimination failure No detection (visual) Perceptual overload

Distraction / PreoccupationMemoryForget action Confusion Forget information Memory overloadMisrecall information Insufficient learning

Distraction / PreoccupationDecision MakingMisprojection MisinterpretationPoor decision or poor plan Failure to consider side- or long-term

effectsLate decision or late plan Mind set / AssumptionNo decision or no plan Knowledge problem

Decision overloadActionSelection error VariabilityUnclear information ConfusionIncorrect information Intrusion

Distraction / PreoccupationOther slip

Page 24: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

TRACER-Lite’s outputs are similar to those for HAZOP, though often they are at a finer level of resolution or granularity. An evaluation of the two approaches in ATM based on application to three Eurocontrol design projects (Shorrock, 2003) showed that TRACER-Lite was potentially more comprehensive in error and hazard identification. However, HAZOP was more easily applied to very early concepts, was better at identification of mitigations (safety requirements), and had the added advantage of including the project design team directly in the hazard identification process. In practice, HAZOP is used as an initial approach, and then TRACER is carried out to analyse the working practices (procedures) in more detail if this is warranted, and if the concept is at the maturity level that allows for a detailed and stable task analysis to be developed. Tables 6 and 7 show respectively the types of recommendations that can be derived from a TRACER analysis, and the typical outputs from such analyses from which these are derived.

Table 6. Types of system design recommendations derived from TRACER analysis of ASAS (Airborne Separation Assistance System) (Gordon et al, 2005)

Problem areas identified Recommendation provided

Number of ASAS Separation pairings that can be created

The number and mix of ASAS Separation pairings that may be created, and associated human factors impacts, should be investigated and set in procedures and training.

Number of procedure checks to be made by controller and flight crew.

An ergonomically designed aide-memoire should be designed to help the controller to ensure that all relevant procedural checks are made.

Number and clarity of manoeuvre symbols, and potential for error in linking aircraft on the PVD.

Controllers should have a permanent electronic indication of ASAS equipage, ASAS application status and associated parameters. The impacts on human performance of the symbology used within the Target Data Block (TDB – the aircraft symbol and associated data as displayed to the controller on the radar screen) should be investigated thoroughly, taking into account other technologies that impact on TDB display.

Flight crew identification of designated aircraft and input of code.

The impact of CDTI (Cockpit Display of Traffic Information for the pilots) interaction and monitoring on flight crew ‘head-down’ time should be investigated.

Initiation of ASAS Separation procedure.

The potential for flight crew and controller confusion between different ASAS services should be examined.

Necessity to inform designated aircraft about ASAS procedure.

It should be considered whether separation phraseology should re-emphasise the critical element of the transmission, e.g. ‘below’.

Page 25: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Table 6 – Example of TRACER human error analysis for part of ASAS task step 1.1.1 ‘(Controller) Detect potential for ASAS separation’ (Gordon et al, 2005)

Task Step Error Mode Internal Error Consequences Detection Means Comments1.1.1 (Controller) Detect potential for ASAS separation*

1. Detect potential for ASAS separation when inappropriate

1. Inappropriate decision/plan

1. Abort ASAS; Increased workload; Potential loss of separation

1. Further visual monitoring/check; Planner (other) controller; Flight crew check; MTCD; STCA.

Even more problematic when you have a number of ASAS separations at the same time

1.1.1.1 Identify reference and designated aircraft

1. Create too many pairings2. Mis-assess which should be designated aircraft3. Identify wrong (unintended) a/c4. Take too long to identify ASAS a/c

1. Inappropriate decision/plan2. Misprojection3. Mis-see4. Late decision

1. Monitoring problems due to lack of task involvement; Confusion; Controller may forget about a/c; Sudden increase in controller workload if multiple aircraft abort; Possible loss of separation.2, 3, 4. Loss of time; Increased workload.

1, 2, 3. Further visual monitoring/check; Planner (other) controller. 1. MTCD; STCA.

Is there a limit on number and mix of pairings to reduce complexity? Does the system detect when there are too many pairings? How are pairings indicated on HMI? When does controller mark aircraft on HMI? If the first a/c has a problem and has to deviate route (e.g. bad weather) then all the other a/c will have a problem. Are all a/c are linked to the first a/c?

A/c AircraftASAS Airborne Separation Assurance System; HMI Human Machine Interface; MTCD Medium Term Conflict Detection; STCA Short Term Conflict Detection

Page 26: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

2.1.3.4 Human Factors Case The Human Factors case (HF Case) approach was developed by Eurocontrol to provide a bounded assessment of the key human factors issues relevant to the new concept. It covers a range of issues that are also pertinent to safety, as represented in the human factors ‘gear-wheel’ shown in figure 12. This figure shows the key factors driving system performance, and the key under-pinning human performance factors. The gear-wheels are simply a metaphor that has been found to be useful, since project and operations managers tend to focus on the gear-wheel to the left, whereas HF and safety practitioners solve problems in the left gear-wheel by focusing on investigations at the more detailed human factors level of the right gear-wheel.

The HF Case approach starts in a similar way to a HAZOP, via a subject matter expert group workshop that identifies key human factors issues that need to be addressed. The issues are then analysed in more detail and compared to best practice guidance, to determine how to optimise the system from a human factors perspective. The approach is holistic, in that detailed interface design aspects will not be considered in isolation from their training and procedural counterparts, as a good interface without appropriate training or with conflicting or ambiguous procedures will not yield good human performance. Similarly, potential negative impacts of new interfaces (or for example skill changes caused by a new automated support tool) on workload, situation awareness, teamwork and trust must also be counter-balanced. This holistic or systems-based approach is necessary to achieve the desired human-system performance. The final result is a set of interrelated Human Factors recommendations for the system concept being developed. These can be highly detailed and may refer to specific training, workload, human-machine interface design requirements, etc.

Although human performance underpins safety, determining whether the workload or training, for example, will be adequate, is something of a ‘grey area’. The utility of the HF case methodology is that it pre-analyses all the potentially relevant human factors issues, and selects the key ones that appear to be dominant or influential in the ATM context. Therefore only key issues relevant to controller performance and safety are addressed. This also means that the HF case is not seen as a ‘never-ending story’, but as a way for the project managers to manage human factors within their project scope, time-scale and budget.

At the EEC there exists detailed design guidance on how to achieve a high degree of usability and consistency for the interface design (radar screen, controller support tool appearance and integration on the radar screen, colour coding; message format, etc.)6. Additionally, there is a human factors laboratory for testing various prototype displays and human performance aspects such as workload, situation awareness, error rates, etc. The HF Case is a means to determining which human factors issues are critical for a concept element, and how to ensure a high level of human factors in the concept design to achieve the desired overall system performance. At present this is a relatively new developmental approach, and has only been applied fully to one project, a situation awareness display for pilots. However, the approach will be applied to a number of other projects in the near future, helping to ensure that hazards from indirect sources are also addressed, and helping to ensure that ATM maintains its high reliability status and resilience, by ensuring that the controllers have a good interface and supporting tools, training,

6 For more detail on the Human Factors area in the air traffic context and practical issues and solutions for ATM human factors problems, see Kirwan et al 2005.

Page 27: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Figure 12. ‘Gear-Wheels’ as used in the HF Case (Schaefer, 2003, Eurocontrol human factors training module for project managers).

!Recovery from

FailuresHuman-Machine Interaction

Procedures, Roles,Responsibilities

Skills, Training and Development

PersonnelTeams and Communication

WorkloadSituation Awareness

Decision Making / Problem Solving

Trust

StressHuman Errorand Reliability

SafetySafetyCapacity / EfficiencyCapacity / EfficiencyQuality of ServiceQuality of ServiceJob SatisfactionJob Satisfaction

Page 28: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

procedures and working practices, rather than tools and displays with which they have to ‘make do’.

Other industries use a range of approaches to deal with human factors issues, and these typically fall into two main categories: on the one hand company standards on human factors (e.g. on human machine interface design); and on the other analytic techniques such as task analysis and prototyping to deal with more specific or detailed issues and problems that can arise. Techniques of human reliability assessment (Kirwan, 1994), including such tools as HAZOP and TRACER become the formal link between human factors analyses and safety work.

2.1.3.5 Cross-Boundary HAZOPCross boundary HAZOP is a developing approach to consider the following:

How different system elements might interact (intentionally and unintentionally) to lead to, or to prevent/mitigate hazards (e.g. an arrival manager tool [which is used to optimise flows of traffic towards an airport] is not a safety tool, but there could possibly be unintended interactions between the use of such a tool and safety devices used by the controller)

How common mode failures (e.g. problems with data integrity) can affect system elements and the system as a whole. For example, datalink is a general concept that will allow electronic communication between ground and air side. Therefore failure or corruption of datalink could have widespread and complex consequences on otherwise unrelated functions in the ATM system.

How system elements might ‘export’ risk to other parts of the system or even outside the ATM system boundaries. This can include exporting indirect risks such as additional workload to other parts of the ATM system, e.g. from the controller to the pilot.

How system elements can be influenced in terms of safety by external events (e.g. social context and disturbances; organisational events such as privatisation; business changes; changes in regulation and/or certification philosophy; etc.)

Cross-Boundary HAZOP is still under development at this stage. However, until a mature technique is available, it is being achieved using traditional HAZOP approaches to consider impacts on other system elements. Ways of considering interactions between concept elements are also being explored via the Integrate Risk Picture approach (Section 2.2).

2.1.3.6 Simulations & SAFSIMEven at the concept exploration stage simulations may be run to consider how controllers would react, and whether the tools and new procedures would work in realistic air traffic scenarios, and be acceptable to controllers. Whilst there is some fast-time simulation (i.e. computer simulations without real controllers or pilots) at the EEC, the predominant approach uses real-time simulations with real controllers with realistic current or future projected traffic patterns, together with ‘pseudo-pilots’ who interact with the controllers (most pseudo-pilots are actual trained commercial pilots, who are given a script and work at a specialised console in a room adjacent to the simulated air traffic operations room).

Within real-time simulations, a number of safety and human factors related measures can be considered, for example:

Losses of separation between (simulated) aircraft Communication load (time ‘on frequency’) Workload subjective measures: measured concurrently (during the task) and terminally

(at the end of the task)

Page 29: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Workload physiological measures (usually in the HF laboratory): heart rate variability; electroencephalograms (EEG); pupil diameter & eye movements

Situation awareness Errors

From safety observations of simulations, it is possible to detect design problems that can be rectified well before the design becomes too mature, and the aspect that needs changing has become too embedded in the system design architecture. In practice, the most frequent design impacts arising from simulations relate to procedures (so-called ‘controller working practices’), interface design and usability. Such issues can often have a negative effect on safety (e.g. poor usability becomes a distraction that controllers can ill afford when busy). A concrete example is that it has been noticed in ASAS simulations that there is a need for the pilot following another aircraft and maintaining a certain spacing behind that aircraft, to more closely monitor the speed of his or her own aircraft than is the case with normal operations. It is therefore useful to have a ‘speed less than x’ alert to ensure that speed is not minimised too much (therefore slowing down towards the aircraft behind, who may also be under a spacing instruction).

To facilitate safety observations and collection of safety-relevant evidence from simulations, a guide to safety measures was developed at the EEC (Antonini and Kermarquer, 2004) for a range of measurements (e.g. NASA-TLX and ISA for workload measurement; SAGAT for Situation Awareness measurement7; etc. – see Antonini & Kermarquer, op cit). In practice, safety or human factors observers, and debriefing of the controllers, are most often used to find hazards that occurred or hazard potential in the scenarios in the simulation. Sometimes hazardous scenarios are presented to the controllers to see how they deal with them (e.g. see Gordon et al, 2005). Such ‘seeding’ of safety events into simulations can be very useful to determine the range of controller responses that can occur. Examples of such scenarios would include the following: a pilot making an error (e.g. the wrong pilot responding to a controller’s call and instruction); erroneous data appearing on the radar screen; failure of a controller tool or radio-telephone communication with the aircraft; full black out of radar information. In each case, the aim is to see its impact and the speed with which such failures are detected and recovered, and then to determine how to enhance such detection and recovery processes. Table 8 shows the types of insights derived from real-time simulations.

Although simulations do not give reliable quantitative information about the frequency of responses to events8, they are useful qualitatively to determine what the controller is likely to do. Furthermore, after such simulations, it is possible to use expert judgement procedures to help the controllers extrapolate their experiences to real life use of the system, and so to quantify human error probabilities for the future system. This approach can help at the risk evaluation stage (see section 2.1.4) as it helps determine priorities between the different hazards identified.

7 NASA-TLX is a workload scale used to assess workload for the overall traffic scenario which lasted typically an hour; ISA [Instantaneous Subjective Assessment] records subjective workload on a five point scale every 2 minutes; SAGAT [Situation Awareness Global Assessment Technique] measures controller situation awareness by interrupting the simulation and asking pre-defined questions about events and what is likely to happen next.8 In a typical three-week simulation with thirty controller working positions, no such rare events would be likely to occur spontaneously, whereas it might be necessary to introduce such events a dozen times to explore controller responses, and it would also be necessary from an ethical standpoint to warn the controllers that adverse events are to be expected. Therefore attempts to deduce quantitative human reliability data from such simulations would lead to inaccurate and invalid results.

Page 30: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Table 8. The main hazard categories identified during the ASAS Separation simulation (with example of some results) Gordon et al, 2005

Label (n)* Relevant Scenario Causes / Consequences Detection Mitigation1 Delegated

aircraft selects a wrong target (n=3)

ASAS subject applied wrong target (read-back incorrect and input incorrect in onboard system)

Inputting SSR code is error-prone due to arbitrary & long number of digits

Pilot must read back ‘clock-position’ of target aircraft for controller to detect error. Detection with clock position may be ambiguous at long distances

Automatic downlink of “airborne data” (e.g. selected target, ASAS “status” of delegated a/c).

2 Unexpected movement of third aircraft interfering with ASAS pair (n=2)

Unexpected movement of surrounding traffic interfering with ASAS configuration. An aircraft is conflicting with an ASAS in-trail chain of 3 aircraft.

Situation highlights the diminished flexibility of ASAS pairing (at least from a controller’s point of view). Controllers seemed reluctant to interrupt ASAS.

Situation can be easily detected since it is based on pilot’s report.

Fallback procedure: descend interfering a/c to lower flight level. Automatic downlink to Controller Working Position may be a technical solution to mitigate ATCos’ errors, or bad coordination.

3 Delegated does not report ‘clear of traffic’ (n=3)

Delegated a/c did not report clear of traffic and did not resume navigation

Definition of “clear of target” is not clear enough from ATCos’ point of view. ATCos must rely on pilots’ reports (also with radar coverage).

Situation is not easily detected, since there is no explicit reminder for ATCo. If a/c is noticeably off-track, ATCo may notice it on a routine scan.

Provide automatic monitoring aids based on ASAS envelope to mitigate ‘not resume’ navigation; provide operational definition of ‘clear of traffic’ for controllers or set automatic reminders to ATCo that prompt them to contact pilot to confirm ‘clear of traffic’

4 Technical or communication failure (n=5)

Delegated onboard system failure, delegated a/c reports unable ASAS

Consequences: if pilot reports on time, ATCo can easily recover the situation. If technical failure is detected late, situation may become impossible to recover. ATCo’s tasks and duties in an already compromised situation appear as a problematic issue that should be further clarified.

Situation is easily detected if pilot reports (critically dependent on this) as it is unlikely that ATCo can notice the problem without pilot’s communication.Situation may also be detected if ATCo is monitoring the two a/c. But unless deviation is really noticeable, it is unlikely that ATCo will intervene (or even detect) before pilot reports.

Fallback procedure: descend delegated a/c to lower standard FL (flight level), or to non-standard (FL for traffic in opposite direction), or to emergency separation (500 ft).Controllers indicated that they prefer to have a lower FL available when clearing ASAS pair in high traffic area as a fallback procedure.

a/c – aircraft, ASAS – Airborne Separation Assistance System, ATCo – Air Traffic Controller, SSR – Secondary Surveillance radar

Page 31: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

2.1.3.7 Live Trial HAZOPAlthough the EEC’s projects are generally at the concept exploration stage, occasionally they are developed into prototypes, and a stakeholder agrees to test the tool in a real environment. This is called a ‘live trial’ if the controllers are using the tool with current traffic, and a ‘shadow mode trial’ if the controllers are watching real traffic and considering how it might be used, but the tool is not actually influencing real current traffic. In both cases, a HAZOP is carried out to assure the safety of the trial itself, and is known as a ‘live trial HAZOP’.

The aim of the live trial HAZOP is to consider what could go wrong with the trial itself, either due to problems with the tool, or the experimental platform that is being used to run the tool and that may be connected to the real operational system. Consideration is also given to the case where an actual loss of separation occurs, whether due to the trial, or simply happening contiguously with the trial but unconnected (causally) with it. In practice this approach leads to the development of very clear trial safety guidance and ‘reversionary procedures’ in case something happens or goes wrong. So far the approach has been applied to three live trials. In one case it was necessary to briefly invoke the safety procedures; in another case, it was realised that there remained significant issues that needed to be addressed with the system procedures (which were effectively under-specified) before it was safe to carry out the trial, and the trial itself was postponed six months. Whereas it is sometimes acceptable to use a real time simulation to work out the detail of procedures, this can be dangerous if continued into the practice of the live trial itself.

2.1.4 Risk EvaluationAll hazards or abnormal events/errors identified are classified into severity and frequency categories such as those shown in table 9. This enables a qualitative assessment of the relative risks and hence their acceptability and the need to develop safety requirements. In some cases this amounts to a need to consider a change in the design or procedures. The severity and frequency categories are based on ESARR 49, since this is the regulation for ATM in Europe. Even though this regulatory requirement does not pertain to concept exploration, it makes little sense to derive a different classification system, which would then lead to problems later on when the concept reached sufficient maturity to require the formal FHA, PSSA and SSA safety assessment processes. In practice at the concept stage, anything with a severity category of 1 to 3 is considered more fully, potentially to the point of deriving safety requirements for the design. This is because at this early design stage, the design should be made more robust while it is still flexible. If left until later on, more difficult ‘trade-offs’ must be made in terms of cost-benefits, such that only risks that are more severe will be countered by risk reduction measures. By eliminating risks early on, the design is made more robust.

It can be asked why this is necessary, and whether there is a danger of ‘over-protection’ of the resultant design. The answer is that at this early design stage, it is ‘reasonably practicable’ 10 to implement such design changes, and design project managers usually agree that such changes can be made in the interests of safety. There are of course instances where the required design change or safety requirement is not merely a change in procedures or working practices, nor a simple change to the interface design, but a more radical one such as new automation to avoid a certain human error. In such cases, which would dramatically alter the design envelope, there is usually a

9 Note these are slightly different from those in ESARR4, although they can be related to the ESARR4 categories – the difference is due to their usage at the concept stage rather than a later more detailed design or operational stage.10 Generally safety must be implemented as far as is reasonably practicable. Where a credible risk has been identified, and means to resolve it are seen as reasonably practicable, such that costs to alleviate the risk are not grossly disproportionate, action must be taken. The point is that what is seen as ‘disproportionate’ increases as the design progresses, hence early treatment is a better solution.

Page 32: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

need to carry out a formal quantitative risk assessment (in terms of ESARR4 SAM this would be a PSSA), in order to decide whether the change is indeed warranted from a risk perspective.

SEVERITYFr

eque

ncy

for

an e

n-ro

ute

air

traf

fic

cont

rol c

entr

e1 Potential Accident (e.g. mid-air collision)

2 Close Air miss (less than half the required aircraft separation)

3 Air Miss (less than the required aircraft separation)

4 Workload or Situation Awareness impacts (which indirectly affect risk)

5 No effect

1 Frequent (1/week)2 Occasional(1/month)3 Rare(1/year)4 Very rare(1/10years)5 Never

Table 9. Simplified Risk Evaluation Matrix

Severity category 4 in table 9 refers to workload or situation awareness impacts. These do not necessarily lead directly to specific hazards or risks or risk-related events, but can contribute to error rates and to poor detection and recovery from abnormal events or emergencies. Where only one or two such instances of severity category 4 are identified with a reasonable frequency (e.g. 2), there may not be noticeable impact on actual operations. However, if there are more, it may be indicative of a more general problem in the underlying human factors properties of the design. This is where the HF case must be used to address such problems, and may result in human factors requirements, which will have the same status as safety requirements. This can also be expressed in terms of ‘human performance assurance levels’, the equivalent of ‘software assurance levels’, as developed for contemporary industrial software applications. Additionally, it is worth noting that if there do appear to be frequent concerns raised in risk studies about human performance (severity category 4), then the designers are likely to want to tackle them in any case, as such factors are likely otherwise to impede system performance, even if they do not greatly impact on risk.

2.1.5 Quantitative Risk EvaluationAt the PSSA (Preliminary System Safety Analysis) stage, the analysis becomes quantitative, in order to see how the new system or system element impacts on the target level of safety (TLS). As noted in the early part of the paper, this risk level amounts to approximately one accident due to ATM system failure every two years for Europe, or about one accident (mid-air collision; controlled flight into terrain; etc.) in thirty years of air traffic operations for a single member state (national air navigation service provider). Any new system or system change must not increase this overall level of risk at the national or European levels. This therefore requires a means of aggregating all the different risks to assess their overall impact. This means that each identified hazard must be assessed to find its likelihood of occurrence. Since most hazards do not lead directly to an accident, because there are safety nets or barriers built into the system, the likelihood of these barriers failing must also be assessed to establish the true risk of an accident.

Page 33: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Broadly speaking, the likelihood of a hazard is determined using a fault tree (see Cox and Tait, 1991; Andrews and Moss, 2002; Eurocontrol-FAA, 2005). This is a logical representation of the causes of a hazard. It builds from lower events and conditions (called base events) through a series of ‘gates’ that define the conditions for proceeding upwards towards a ‘top event’ that is usually either a hazard (e.g. loss of separation) or an accident (mid-air collision). An example of a fault tree from a design assessment is shown in figure 13. This figure is part of a larger fault tree, and concerns the risks from a new way of guiding aircraft towards airport runways during their approach phase (the new system is called GBAS – Ground Based Augmentation System). The main event triggering the potential hazard in this tree is ‘Hazardously Misleading Information’, or ‘HMI’11, and is indicated in the box on the left side of the figure. The problem is that, during maintenance on the ground, wrong information could be sent to the aircraft. This will only lead to the main hazard (called simply here ‘H1b GBAS’), which refers to an aircraft that will be heading to the wrong place (e.g. wrong runway), if the existing safeguards fail. These safeguards are ‘Notices to Airmen’ (NOTAMs), ATIS (Air Traffic Information Service), ground detection systems that should detect the erroneous information being sent to the aircraft, and the air traffic controller.

I E

F1 H1B - GBASQ=2.125e-10

F1 H1b - GBAS

I E

SAFEGUARDSQ=7.085e-7

Failure ofsafeguards

ONGOING

HMI duringmaintenance

I E

Q=3.000e-4

NOTAM

Failure ofNOTAMs

I E

Q=2.700e-2

ATIS

Failure of ATIS

I E

Q=4.100e-2

STATUS 1

Ground stationstatus failure

I E

Q=2.000e-3

ATC ASSIGN

Failure in ATCassignment

I E

Q=3.200e-1

Figure 13. Partial fault tree example for Ground Based Augmentation System (Perrin, 2005)

The numbers indicated under each box refer to failure probabilities – the smaller the number, the lower the risk. Thus, the controller does not reduce the risk by much in this case, because he or she will probably be unaware of the HMI failure. The most efficient safeguard in this example is the ground station status system, which should warn of the problem, and has a failure probability of 2e-3 or 0.002, or a failure once in five hundred times it needs to be used when maintenance is occurring. Also, the occurrence of HMI during maintenance is itself quite a rare event, happening only three times in ten thousand opportunities.

11 Note this is a different acronym from HMI used elsewhere in this paper, which refers to the Human Machine Interface)

Page 34: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

For the ‘top event’ to occur in this case, all of these events must occur, and hence they sit under an ‘AND gate’, which means that their probabilities are effectively multiplied. The other main type of gate is an ‘OR gate’, where any one event under such a gate could lead to the event above it, irrespective of whether the others occurred. For such gates the rule is effectively to add the probabilities rather than multiplying them. In practice the mathematics is a little more involved, and aspects such as dependence between these different events mean that the resultant top event probability or frequency is not as low as a pure multiplication would imply (see Andrews and Moss, op cit). Nevertheless, the resultant top event for this tree is very small, and within the acceptable limits for the TLS (also at present this still represents only a hazard, not an accident outcome). However, this is only one GBAS hazard, and so all hazards arising from GBAS must be assessed and aggregated.

Often it is necessary to consider how different barriers might work to reduce risk, and this is often done using an event tree (Cox and Tait, op cit; Andrews and Moss, op cit; Eurocontrol-FAA, op cit). An event tree example is shown in figure 14.

Recovery via Recovery via Recovery Recovery via Recovery viaTowards onboard ATC via visual (E)GPWS OUTCOMEterrain detection detection MSAW cues

Deviation Yes Yes CFIT prevented

No Yes CFIT prevented

No Yes CFIT prevented

No Yes CFIT prevented

No Yes CFIT prevented

No CFIT

No No CFTT

Figure 14. Event tree example for Ground Based Augmentation System (Perrin, 2005)

The main accident outcome here is CFIT (Controlled Flight Into Terrain), as could conceivably happen due to a hazard caused by GBAS. However, there are a number of potential ‘recovery’ steps as shown in the top row of the event tree: either by the crew onboard the aircraft as they monitor their equipment; the air traffic controllers as they monitor approach on their radar screens; a ground-based automatic warning system (if present) called MSAW (Medium Safe Altitude Warning system); visual cues to the aircrew (depending on visibility and category of approach); and finally a normal or Enhanced Ground Proximity Warning System (GPWS or EGPWS) onboard the aircraft that gives clear oral warnings. All of these must fail or be unavailable in order to have a CFIT accident. Event trees enable a clear representation of the accident and recovery sequences following a hazard, and can be quantified to give the final accident risk.

Fault and event trees are often used together, the fault tree leading to a hazard, the event tree determining the logical possible outcomes from that hazard. When put together they look like a

Page 35: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

‘bow-tie’ as shown in Figure 15, and so this combined risk approach is sometimes called the bow-tie approach to risk analysis (Eurocontrol-FAA, 2005).

OutcomeYes 1

Yes No 2

Yes 3Hazard

No

No 4

OutcomeYes 1

Yes No 2

Yes 3Hazard

No

No 4

Fault Tree Analysis Event Tree Analysis

SafetyTargets

Safety Objectives

Safety Requirements

Figure 15. ‘Bow-Tie’ Analysis

Clearly this type of analysis is highly detailed, and more achievable at a detailed level of design than during concept exploration. Therefore, such analysis does not usually occur until after the concept phase is finished. However, it can and should build on the hazard identification work carried out in earlier phases of design. As shown in the figure, the two approaches can lead to specific safety objectives and safety requirements, which are discussed next.

2.1.6 Safety Objectives and Safety Requirements ‘Safety objectives’ is a technical term relating mainly to the FHA and PSSA stages of risk assessment according to ESARR4. In principle, when a hazard is evaluated according to its severity and frequency, it may be found to offer potential added risk to the system, depending on its frequency (likelihood) of occurrence. In order to ensure that the target level of safety is not exceeded, a ‘safety objective’ may be stated that the frequency of such a hazard must not be greater than a specified value. In the early stages of design, such objectives may have broad frequency statements – e.g. the likelihood of event A must be no greater than ‘rare’. However, later on, such specifications must be made more precise, and these become design requirements to ensure that the relevant system element performs to a particular level. As an example, radio communication between pilot and controller is obviously a critical component to air safety and performance. A stringent specification must be made for any new communications medium to ensure that its failure likelihood is very rare, and when such failure does occur, that it will be only for a few seconds, and also that there should be a back-up. Safety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that will achieve the required safety levels.

Safety requirements are therefore the most important information transmission from safety assessors to designers, and are developed by the safety people working with the design team.

Page 36: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Safety requirements become a key part of the concept definition, and the stakeholders who may go on to develop, build and deploy the new tool must ensure that the safety requirements have been satisfied in the later formal and regulated safety assurance process (the PSSA and/or the later SSA). Safety requirements can be qualitative or quantitative. Examples of the format of safety requirements, in the context of the CORA concept discussed earlier are the following:

SR30 : Limits on the traffic level that can be safely handled (to be determined at a later stage)SR31 : Design of procedures and training (including) regular refresher training, for reducing traffic if CORA 2 failsSR32 : Job design to keep tactical controller ‘engaged’ (situationally aware), and ‘in the loop’.SR33 : Diagnostics and alerts for imminent failure of CORA 2.

Note: the four requirements above would need to be tested via real-time simulation to determine their adequacy in managing risk.

If hazards have been identified, failure requirements may be expressed quantitatively:SR1 : CORA2 Resolver : - Maximum tolerable probability per flight hour : 2 x 10-7,SR2 : Trajectory Predictor/Medium Term Conflict Detection and identification of conflict : - Maximum tolerable probability per flight hour : 5 x 10-7,SR3 : Environment data - Maximum tolerable probability per flight hour : 5 x 10-7,SR4 : CORA2 implementer - Maximum tolerable probability per flight hour : 7.5 x 10-7,Etc.

These requirements are designed to be specific enough to go ahead and develop a working prototype system.

Since, as seen above, there may be several stages of analysis, a means of tracking the various identified hazards, their causes and safeguards, and the resultant safety requirements, needs to be available. This is achieved via hazard logging, and is discussed next.

2.1.7 Hazard Logging & Corporate Memory – HARTS & SIDESDuring a simulation or a HAZOP, a good number of hazards, adverse events or unsafe acts (human errors) may be identified. These must be tracked until they have been resolved, meaning that the hazards are either found to have negligible safety impact, or safety requirements have been identified to neutralise the hazard impact, or else they have been recorded to be resolved in future design stages. An example of a hazard log from the system called HARTS (Hazard and Requirements Tracking System) is shown in table 10 below, which is an early example of a hazard log for airborne separation assurance system (ASAS) applications. The main hazards in this log concern the identification of the wrong target aircraft. The ‘Comments’ column on the right-hand side of the table contains the seeds of future safety requirements12.

12 The abbreviations in the figure are as follows: CDU (Control Display Unit); HMI (Human Machine Interface); ND (Navigational Display); SSR (Secondary Surveillance radar)

Page 37: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Table 10 – Example of early Hazard Log for airborne separation assurance system (ASAS)

ID

Key Operational Task

Failure Mode/ Error Cause

Operational Consequence Safeguards

Comment/ Recommendation

 3. TARGET SELECTION          

 3.1 Read back Code          

 3.2 Input code (select target)          

B1 Select targetLong time to select target

Busy in terminal area (surrounding traffic?)  

A direct input of the target identifier (secondary surveillance radar code)

Full list of a/c on MCDU (Multi-purpose Cockpit Display Unit) has been removed - now only the target aircraft is displayed

B1 Select targetPilot unable to identify target

Misunderstanding with controller      

B2 Select target

Mistake in entering target code

Misunderstanding (quality of communication)  

Cockpit display informing about unknown target or alerting for inconsistent target-own configuration  

B2 Select targetPilot selects wrong target  

Potential for air collision

Target positioning by pilot (but this is not compulsory at present); Confirmation of target; Controller monitoring of the a/c may identify that the pilot has the wrong target later in the task; The use of Anti-Overlap software tool on the controller's HMI

Consider making target positioning by pilot a compulsory sub-task in target selection. Explore how data link technology could be used to support both controller and pilot when selecting a target during ASAS spacing.

B2Entering target code

Error inputting the code    

Explicit positioning of the target (which should contribute to detecting if the target on the ND is the expected one) and if the code entered does not correspond to an existing target, an explicit message "invalid SSR code" is displayed on the CDU  

B2Entering target code

Slip in entering target code    

Position target (pilot may detect if target appearing on cockpit displays is where expected target should be)  

B2Entering target code

Mistake in entering target code

Erroneous controller instruction   Position target  

Page 38: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

The hazard log is a living document, and is the joint property of the safety manager for the project and the project (design) manager. It gives the project manager a clear indication of what the key safety issues are in his or her project or new design, and a current status of how safety is being built into that concept. For the safety manager, it gives a clear focus on where safety requirements still need to be developed.

A new development in relation to hazard logging at the EEC is SIDES – the Safety Information Data Exchange System. Since ATM concepts are still evolving, they are not necessarily stable. Some hazards may become irrelevant if the concept itself has a significant course change. However, since the design process is sometimes cyclical or helical in nature, old concepts or versions of them may be returned to later on, or indeed other stakeholders may revert to an earlier concept design. SIDES aims simply to make hazard information more flexible, so that stakeholders can see what hazards have been identified in total, for example for datalink or ASAS, and then to see which of those hazards relate to the particular conceptualisation of the datalink function that they are currently working on. This is a new approach to hazard logging that is needed when dealing with safety at the concept stage. In the short term SIDES will amount to a managed database using the safety team as an intelligent front end for designers, to determine hazard applicability to concepts which may effectively be moving targets or with ‘movable goalposts’. However, in the future, it is likely that some textual and keyword-based processing will facilitate the development of an intelligent front end to the developing SIDES database.

Since ATM concepts often have long timescales for development, e.g. in excess of ten years, SIDES is a useful additional tool. It is not uncommon for people to be heard saying ‘I’m sure this aspect was looked at years ago’ and to be correct, but without a good storage and retrieval system (in other words, a ‘memory’) work will have to be re-done, with the danger that previously identified concerns or understanding may not be repeated or resurrected successfully. Additionally, by looking across safety assessments of variants of the same concept, it may be possible to identify generic lessons about a system element, that might help all the variants and specifically the one that is finally chosen and implemented. Time will tell if this occurs, but the aim of SIDES is not merely a memory or database of safety information, but ultimately a source of knowledge and understanding about the safety of future ATM concepts.

2.2 Top-Down Safety: The Integrated Risk PictureThe foregoing has concerned itself mainly with the bottom-up approach to safety, i.e., considering each new system element (CORA; ASAS; Datalink; etc.) in relative isolation. This is necessary to elicit detailed safety concerns and solutions for the specific element. It also fits with the design approach, which, whilst not fragmented, is still at a ‘modular’ level.

The danger is that of compartmentalised safety; that every concept element or project in its own right is individually safe, but the whole system when put together will not be. The problem is one of system interactivity and complexity, and in safety terms it is one of dependencies and unforeseen interactions between different system elements.

There is an additional problem that, whilst individual system elements may individually stay within the TLS, there is a need to add up the risks from all these components to check that the combined impact remains acceptable. This is akin to considering that there is an overall ‘safety budget’ in terms of adding additional risk to the current system. When piecemeal safety is done, there is no control over who spends the budget (one project may spend its own and several others’ budgets), or even a check that the whole budget has not been exceeded.

There is also the likelihood that safety opportunities from synergies of different project elements will be overlooked. If the safety assessment for a project is primarily there to ensure

Page 39: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

that it is not unsafe, this can be seen (from a system’s viewpoint) as a failure to see where safety could be increased, thereby increasing the overall system safety budget. It is likely that at least some concept elements may increase risk; the aim after all is to increase system capacity, which means more aircraft, closer proximity when landing, etc. Therefore some other system elements should aim to increase safety, to compensate for inevitable increases in risk elsewhere in the system. Moreover, we need to understand where the relative risks are in the current system, to identify the best places to develop new safety enhancement measures in the future.

Lastly, there is a need to check that the new system elements, when introduced, do actually deliver the safety levels for which they have been assessed. Safety assessment, especially quantitative safety assessment, has uncertainty attached to it. It is therefore necessary to see that safety is actually staying within the prescribed targets. This translates into safety monitoring requirements, in terms of monitoring incidents that occur when new concepts become an operational reality. Often there may be a temporary increase in risk, which reduces when the system settles down. However, if the risk endures at its new higher level, then this is a problem that has to be addressed urgently. In order to monitor safety intelligently, it is necessary to have a precise safety understanding of the new system elements and what types of risks could be expected when they are introduced and begin interacting with the rest of the ATM system13.

The solution to all of these issues is founded in the Integrated Risk Picture (IRP - Kirwan and Perrin, 2004; Perrin and Spouge, 2005). The IRP is a project that aims to do four things:

Show (based on incident and accident experience) where the relative risks in ATM are now, and hence the priorities for safety investment.

Integrate the safety assessment experience for future projects (including interactions) into a complete ATM safety picture for a future key date or period (e.g. 2012-17), thus also showing whether safety targets in the future are likely to be met.

Determine safety monitoring requirements so that, as new concepts are actually deployed, their actual safety performance can be compared against their predicted safety performance

Identify where new safety concepts must be developed to ensure future safety targets are met and where possible improved upon.

The IRP acts as a validation of the lower level (SAND) safety work discussed so far, which is at the project or system element level. IRP aims to determine whether the system will be safe enough when all these new concepts are put together into one coherent operational concept in the 2012 – 2017 time frame. Additionally, since the various concepts will be implemented incrementally rather than in a ‘big bang’ fashion, it will also consider the safety ramifications of different sequences of deployment against a steady rise in traffic levels (capacity).

The IRP avoids a compartmentalised approach to safety assurance, whereby each new element justifies its own safety without considering interactions with other concepts under development. IRP can also help identify common safety issues that may affect a number of new concepts, for example with respect to certain human factors issues that need to be addressed via ‘cross-cutting’ safety requirements for a range of concepts. Situation awareness and distractions are two such issues already identified at this stage by IRP.

IRP must describe and assess the whole system architecture. It models the whole system using an approach called SADT (Structured Analysis and Design Technique [Marca & McGowan 13 A noteworthy point here is that the new system elements will not be introduced altogether in a ‘Big Bang’ fashion, but over a period of several years, leading to incremental changes rather than a sudden paradigm shift. This is seen generally as a safer strategy.

Page 40: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

1988] – see figure 16 for an example), and then a set of fault and event trees to determine overall risk and risk contributions from different parts of the system. This allows a top down perspective for individual projects or system elements, to see what their risk budget should be, and where they should aim to increase safety.

Figure 16 – SADT model example used in the Integrated Risk Picture

Key: MTCD (Medium Term Conflict Detection); AMAN (Arrival Manager); LOA (Letter of Agreement); SSR (Secondary Surveillance Radar); OLDI (On-Line Data Interchange); STCA (Short Term Conflict Alert)

The IRP also allows a better consideration of interactions and dependencies from system elements that are ‘services’ for other concept elements (e.g. datalink). In safety terms, it allows a high level overview of common cause failures that can afflict risk on different parts of the whole system simultaneously.

Technically IRP is not part of SAND, but is SAND’s ‘big brother’. Experience to date is showing that the bottom up work (SAND) enables designers to make their concepts safer and more robust against risk, often at a very detailed and precise level. The IRP, a newer arrival in the EEC safety management approach, when finished will then give higher level or strategic insights into where safety must be made more robust, either via existing projects, or possibly by new safety initiatives outside the current system design concept.

3. Evaluation SAND has been developed in the period from 2002 to the present (mid 2005), and so is fairly new. It has been applied to ten projects, and has resulted in safety requirements for several of these; others are still exploring and assessing their risk priorities. All the methods listed above have been applied, with the exception of cross-boundary HAZOP, which is still under development. The list of EEC projects and safety activities continues to grow. In this sense therefore, the approach appears to fit the design activities in the EEC, and is working (delivering safety insights and requirements) and gaining ground. Additionally, some

Page 41: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

stakeholders outside the EEC and Eurocontrol have requested support for safety evaluation of concepts using SAND techniques.

SAND is administrated by a group of eight safety practitioners called the Safety Research Team (SRT), who also carry out research into ATM safety issues and new methodologies. Although the EEC is developing a Safety Management System (SMS), it will be another year and a half before this is complete, and so it is too early to say whether SAND will be sustainable in the long term or not. However, an independent and internal safety culture survey14 has shown that the safety culture in the EEC has improved in the two years since a baseline safety culture survey was carried out, resulting in a better understanding of safety practices and the need for safety. Such a survey, being conducted independently from the SRT, helps answer positively the more general question of whether the EEC designers are more convinced about the added value of safety assessment than two years ago.

The ultimate evaluation will come much later, however, firstly as concepts mature and require formal safety assessment. At this stage it will be seen how useful the early safety work will be in helping the formal safety case processes. The utility of the safety requirements identified during the SAND stage of assessment will also become clearer. Later on, as the concepts themselves are deployed, their ability to increase safety will be measured via incident rates and other safety performance indicators. This will be the ultimate test of the value of the SAND approach.

For IRP, the overall risk picture for 2004 is currently under validation, before determining the relative safety risks and potential risk apportionments to different parts of the ATM system. In parallel, the IRP for 2012, embodying the key system concept elements for the new ATM paradigm, is under development, to be completed in late 2006. It is envisaged that these two IRPs will enable a ‘safety roadmap’ to be developed that will show what is needed to stay acceptably safe by 2012, with intermediate measurable safety targets to ensure that ATM stays safe on its way to the future.

4. ConclusionsIt was decided that the safety of early concepts should be assessed at EEC, with a view to building safety into design from a very early stage. The process which has been developed to do this is called SAND, with a particular emphasis on hazard identification and requirements determination, and on recording and feeding forward hazard understanding and safety requirements to future system design and development stakeholders. The approach appears to be working, in that it is being applied and gaining reasonable results, and has a degree of credibility with the designers themselves. It is still in its formative stages, so it is too early to see the longer-term impacts on actual ATM safety, or indeed whether the approach will become a sustainable process.

A further test of the SAND approach will be to see whether others in the ATM industry, or even in other industries, adopt a similar or parallel approach, or at least decide to evaluate safety at such an early stage. In any event, a case has been made for doing safety at the early concept stages of design, and the approach has been found to work. Furthermore, designers, or at least concept developers, have found that they can work with safety at this early stage. Designers want their products to be safe, but need to be able to understand and manage the impact of the safety process on their work and domain. The simplified safety approaches embodied in SAND have all had an underlying philosophy of meaningful communication, working with the designers often in their own design language rather than the language of technical safety and risk assessment. When the relevance of specific incidents is made clear to

14 This survey has been carried out by DNV in 2005, based on the Safety Culture Survey tool developed at the EEC with which a baseline survey was carried out in 2003 (see Gordon and Kirwan, 2005)

Page 42: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

the designers’ in terms of their own specific concepts, when task analyses and operational scenarios, and above all hazards, are meaningfully described to designers, safety moves from being an external issue, to being another design problem to solve, and designers are good at problem solving. At this point of meaningful communication, safety enters the design process, and the designers work to improve safety.

IRP in the near future will enable more strategic thinking about safety in ATM. When the IRP for 2012 and its resultant safety roadmap are completed, there will be a major design challenge to develop the required safety interventions to keep ATM safe against a constant pressure to increase capacity and quality of service. Such safety achievement will not happen without focused design and safety effort, working together as partners. It is hoped that SAND and the Integrated Risk Picture will become useful tools, enabling a common language or shared vision of risk, to help this alliance of safety and design reach their mutual goal of a safe and effective ATM service in the next decade.

DisclaimerThe contents of this paper are the opinions of the author alone and do not necessarily represent those of Eurocontrol or related organisations

AcknowledgementsThe author wishes to acknowledge the following members of the Safety Research Team who have worked on SAND or IRP or related work over the past three years: Eric Perrin, Rachael Gordon, Catherine Gandolfi, Fabrice Drogoul, Brian Hickling, Andrea Pechhacker, Paul Humphreys, Tony Joyce, Garfield Dean, Adrian Gizdavu, Veronique Begault, Yann Kermarquer. The following contractors are also acknowledged: Deirdre Bonini, Steve Shorrock, Ed Smith, Mariken Everdij, Andreas Antonini, John Spouge, and Katie Callan. The author would also like to acknowledge Dirk Schaefer, Michiel Woldring and Alistair Jackson in the Human Factors area, and members of the CORA and ASAS project teams, and last but no means least, the support of Pierre Andribet as operational manager of the EEC who has supported Safety Research at the Experimental Centre.

5. ReferencesAndrews, J.D. and Moss, T.R. 2002, Reliability and Risk Assessment (2nd Edition). London:

Professional Engineering Publishing Ltd.ANSV 2001, Final Report: Accident Involved Aircraft Boeing MD-87, registration SE-

DMA and Cessna 525-A, registration D-IEVX, Milano Linate airport October 8, 2001. ANSV 20/01/04 - N.A/1/04. www.ansv.it

Antonini, A. and Kermarquer, Y. 2004, HITL (Human in the loop) Safety Experiments guide,, EEC publications, Brétigny, France. (available on web-link [30]).

Bonini, D., Joyce, A. 2004, Designing Safety into future Air traffic Control system by Learning from operational, Human Factors and Ergonomic Society Conference, Delft, Netherlands.

BFU Report 2004, Investigation report AX001-1-2/02. Bundesstelle fur Flugunfalluntersuchung (BFU), http://www.bfu-web.de, May.

Callan, K., Siemieniuch, C., Sinclair, M., Rognin, L., Kirwan, B., and Gordon, R. 2004, Review of Task Analysis for Use with Human Error Assessment techniques within ATC Domain, Contemporary Ergonomics, Swansea, UK, pp.293-297.

Cox, S.J. and Tait, N.R.S. 1991, Reliability Safety and Risk Management. London: Butterworth-Heinemann.

EEC 2004, Safety Policy, EEC Publication, Brétigny, France. http://www.eurocontrol.int/eec/gallery/content/public/documents/EEC_safety_documents/EEC_Safety_Policy_001.pdf

Page 43: Safety Informing Design - Eurocontrol€¦  · Web viewSafety objectives translate into detailed ‘safety requirements’ that give the designer a design target to aim for that

Eurocontrol 2003a, Strategic Safety Action Plan: http://www.eurocontrol.int/ssap/public/standard_page/agas.html

Eurocontrol 2003b, ESARR 4 & Safety Assessment Methodology: http://www.eurocontrol.int/src/public/standard_page/esarr4.html

Eurocontrol-FAA Action Plan 15 (2005), ATM Safety Techniques and Toolbox. http://www.eurocontrol.int/eec/public/standard_page/safety.html

Everdij, M. 2004, Review of techniques to Support the EATMP Safety Assessment Methodology, EEC Note 2004-1, Brétigny, France.

Gordon, R. & Kirwan, B. (2005) Developing a safety culture in a research and development environment, in de Waard, D., Brookhuis, K.A., van Egmond, R., & Boersema, T. (Eds.) Human Factors in Design, Safety and Management.p.493 – 504. Maastricht: Shaker Publishing.

Gordon, R., Shorrock, S. Pozzi, S., & Boschiero, A. (2005) Predicting and simulating human errors in using the airborne separation assurance system procedure. Human Factors and Aerospace Safety, 5, 1, 43 – 60.

Joyce, A. & Bonini, D. 2005, Final Report on the SAFLEARN Process. EEC Note. Bretigny, France.

Kirwan. B. & Ainsworth, L.K. 1992 (Eds.) A Guide to Task Analysis. London: Taylor & Francis.

Kirwan, B. 1994, A Guide to Practical Human Reliability Assessment. London: Taylor & Francis.

Kirwan, B., & Kennedy, R. (2001) Assessing Safety & Usability of Air Traffic Management (ATM) Systems Using a HAZOP Approach. In Human Error & Safety System Development (HESSD 2001), Linkoping, Sweden, June 11-12.

Kirwan, B. 2003, Safety Research & Development Plan, March 2004, EEC Publication, Brétigny, France.

Kirwan, B. & Perrin, E. (2004) Imagining Safety in European Air Traffic Management. 3rd International Conference on Occupational Risk Prevention, Eds. Mondelo, P.R., Mattila, M., Karwowski, W., & Hale, A.R. (ORP 2004; Santiago, Spain, June 2-4). ISBN 84-933328-2-8

Kirwan, B. Rodgers, M., & Schaefer, D. [Eds.] (2005) Human Factors Impacts in ATM. Ashgate: Aldershot.

Kletz, T. 1974, HAZOP & HAZAN – Notes on the Identification and Assessment of Hazards. Rugby: Institute of Chemical Engineers.

Marca, D.A. & MacGowan, C.L. (1988). SADT; Structured Analysis and Design Technique. New York: Mc Graw-Hill.

Perrin, E. 2004 Category I (CAT-I) Ground Based Augmentation System (GBAS) PSSA Quantification Report. Eurocontrol Experimental Centre, Bretigny, F-91222 France, March.

Perrin, E. & Spouge, J. 2005, Safety Management Coping with Complexity in Air Traffic Management 27 – 30 June, ESREL 2005, Poland.

Perrin, E. 2005, First step towards a uniform and system wide approach to safety: the Integrated Risk Picture for European Air Traffic Management in 2004. EEC Report.

Shorrock, S.T. & Kirwan, B. (2002) Development and Application of a Human Error Identification Tool for Air Traffic Control. Applied Ergonomics, 33, 319 - 336.

Shorrock, S. 2003, Individual and Group Approaches to human error identification, Eurocontrol Note 2003-8, Brétigny, France.