software system safety
TRANSCRIPT
c
.
Nancy G. Leveson
Software System Safety
to the source is given. Abstractingwith credit is permitted.that the copies are not made or distributed for direct commercial advantage and provided that credit
http://sunnyday.mit.edu
MIT Aero/Astro Dept. ([email protected])
Copyright by the author, November 2004.All rights reserved. Copying without fee is permitted provided
��������� ��� ���
��������� ��� ��� ���������������
���������������Accident with No Component Failures
LC
COMPUTER
WATER
COOLING
CONDENSER
VENT
REFLUX
REACTOR
VAPOR
LA
CATALYST
GEARBOX
c
c
Caused by interactive complexity and tight coupling
Exacerbated by the introduction of computers.
Arise in interactions among components
Single or multiple component failures
Component Failure Accidents
Types of Accidents
Usually assume random failure
System Accidents
No components may have "failed"
..
From a blue ribbon panel report on the V−22 Osprey problems:
From an FAA report on ATC software architectures:
Confusing Safety and Reliability
reliability."en route automation systems must posses ultra−highconsideration as a safety−critical system. Therefore,"The FAA’s en route automation meets the criteria for
����� ��� ������������� � � � � �
����� ��� ������������� � � � � �
Recommendation: Improve reliability, then verify byextensive test/fix/test in challenging environments."
.
..
to safety accordingly.their nature, and we must change our approachesAccidents in high−tech systems are changing
ReliabilitySafety
"Safety [software]: ...
.
�������������"!$#
�������������"!$�
c
c
����� ��� ������������� � � � � �
����� ��� ������������� � � � � �
Reliability: The probability an item will perform its requiredfunction in the specified manner over a given timeperiod and under specified or assumed conditions.
Concerned primarily with failures and failure rate reduction
Parallel redundancyStandby sparingSafety factors and marginsDeratingScreeningTimed replacements
Reliability Engineering Approach to Safety
(Note: Most software−related accidents result from errors
from assumed conditions.)in specified requirements or function and deviations
�������������"!$�
�������������"!$%
c
c
Failure: Nonperformance or inability of system or componentto perform its intended function for a specified timeunder specified environmental conditions.
A basic abnormal occurrence, e.g.,
burned out bearing in a pump
relay not closing properly when voltage applied
Fault: Higher−order events, e.g.,relay closes at wrong time due to improper functioningof an upstream component.
All failures are faults but not all faults are failures.
Does Software Fail?
�&��� ��� ������������� � � � � �
�&��� ��� ������������� � � � � �
Highly reliable components are not necessarily safe.
Incomplete or wrong assumptions about operation of
�������������"!$'
c
Are usually caused by flawed requirements
�������������"!$(
controlled system or required operation of computer.
Unhandled controlled−system states and environmentalconditions.
Merely trying to get the software ‘‘correct’’ or to make it
Software−Related Accidents
reliable will not make it safer under these conditions.
c
Reliability Engineering Approach to Safety (2)
Assumes accidents are the result of component failure.
Techniques exist to increase component reliabilityFailure rates in hardware are quantifiable.
Omits important factors in accidents.May even decrease safety.
Many accidents occur without any component ‘‘failure’’
e.g. Accidents may be caused by equipment operationoutside parameters and time limits upon which reliability analyses are based.
Or may be caused by interactions of componentsall operating according to specification
Example (batch reactor)System safety constraint:
Water must be flowing into reflux condenser whenevercatalyst is added to reactor.
Software must always open water valve before catalyst valve
constraints of materials to intellectual limits
A Possible Solution
Enforce discipline and control complexity
Build safety in by enforcing constraints on behavior
Limits have changed from structural integrity and physical
Improve communication among engineers
����� ��� ������������� � � � � �
������)�*�*�� ��+ �
Software safety constraint:
c
��������������,
��������������#�-
�������������"!$.
what is specified in requirements.Software has unintended (and unsafe) behavior beyond
Requirements do not specify some particular behavior
behavior unsafe from a system perspective.Correctly implements requirements but specified
required for system safety (incomplete)
Software may be highly reliable and ‘‘correct’’ and stillbe unsafe.
Software−Related Accidents (con’t.)
c
c
The primary safety problem in computer−based systems
is the lack of appropriate constraints on design.
The job of the system safety engineer is to identify the
design constraints necessary to maintain safety and to
ensure the system and software design enforces them.
The Problem to be Solved
.
������)�*�*�� ��+ �c ��������������#/!
������)�*�*�� ��+ �
������)�*�*�� ��+ �
identification
management
evaluationelimination control
A planned, disciplined, and systematic approach topreventing or reducing accidents throughout the life
‘‘Organized common sense ’’ (Mueller, 1968)
cycle of a system.
Primary concern is the management of hazards:
System Safety
MIL−STD−882
��������������'/!
��������������'�#
design
c
c
Engineers should recognize that reducing risk is not animpossible task, even under financial and time constraints.All it takes in many cases is a different perspective on thedesign problem.
Mike Martin and Roland SchinzingerEthics in Engineering
An Overview of The Approach
Hazard
throughanalysis
Process Steps
2. Perform a System Hazard Analysis (not just Failure Analysis) Identifies potential causes of hazards
Produces hazard list
4. Design at system level to eliminate or control hazards.
5. Trace unresolved hazards and system hazard controls to
and humans.3. Identify appropriate design constraints on system, software,
software requirements.
������)�*�*�� ��+ �
������)�*�*�� ��+ �
1. Perform a Preliminary Hazard Analysis
developmentConceptual
throughout system development and use.
Design Development Operations
Hazard identification
Hazard resolution
Verification
Change analysis
Operational feedback
System Safety (2)
Management
c ��������������'��
Hazard analysis and control is a continuous, iterative process
c ��������������'��
Hazard resolution precedence:
1. Eliminate the hazard2. Prevent or minimize the occurrence of the hazard3. Control the hazard if it occurs.4. Minimize damage.
Process Steps (2)
6.
Human factors analyses (usability, workload, etc.)
Mode confusion and other human error analyses
Robustness (environment) analysis
Software hazard analysis
Simulation and animation
Software requirements review and analysis
������)�*�*�� ��+ �
Derive from system hazard analysis
Specifying Safety Constraints
What must not do is not inverse of what must do
Need to specify what software must NOT do
Need to specify off−nominal behavior
Most software requirements only specify nominal behavior
������)�*�*�� ��+ �
Completeness
��������������'�'
��������������'�%c
c
9.
Process Steps (4)
Periodic audits
Performance monitoring
Incident and accident analysis
Change analysis
Operational Analysis and Auditing
������)�*�*�� ��+ �
8.
7.
Off−nominal and safety testing
Exception−handling etc.
Elimination of unnecessary functions
Separation of critical functions
Assertions and run−time checking
Defensive programming
Implementation with safety in mind
������)�*�*�� ��+ �
Process Steps (3)
��������������'�,
��������������'�(c
c
021436587/7�9 :�;/<=1
Usability Analysis
Other Human FactorsEvaluation (workload, situationawareness, etc.)
Performance Monitoring
Task Allocation Principles
Training Requirements
Operator Goals and
>@? A�BDC E�F�? A�GIHKJIL$M
Operator Task and
Responsibilities
>@NDODN�P$QSRTF�Q�N�? GSA�MU C VSBDW U�XSYZU Q�[�VS\D? XS]
P^V�N�WDNZV�A�FIB X R`_ X A�Q�ASPJ X FSQ�CDV�ASFIQ�\DV�C E�V�P^Q X _�Q ] V�P X�]
a ? Q�C FIP^Q�NDP^? ASG�b�? A�NDP^V�C C VSP$? X ASb
cZ_SQ ] VSP$? X A�N
VSA�FIP ] V�? A�? A�G
Hazard List
Simulation/Experiments
Change Analysis
Incident and accident analysis
Periodic audits
Performance MonitoringChange Analysis
Periodic audits
Preliminary Hazard Analysis
System Hazard Analysis
Safety Verification
Operational AnalysisOperational Analysis
Operator Task Analysis
Preliminary Task Analysis
Fault Tree Analysis
Safety Requirements andConstraints
Completeness/ConsistencyAnalysis
State Machine Hazard Analysis
Deviation Analysis (FMECA)
Mode Confusion Analysis
Human Error Analysis
Timing and other analyses
Safety TestingSoftware FTA
Simulation and Animation
System
VSA�FIF�Q�N�? G�AIB X A�NDP ] VS? ASP$NX _�Q ] V�P$? X ASV�C ] Q�d�E�? ] Q�RIQ�A�P^NeZQSA�Q ] V�P$QIN�ODNDP^Q�RTVSA�F
f Q ] ? g$? BDVSP$? X A
VSA�F X _�Q ] V�P X�] RIV�A�ESV�C NFS? N�_�C VSODNDbSP ] V�? A�? ASGIR`VSP$Q ] ? V�C NDbB X RI_ X A�QSA�P^NDb�B X A�P ]@X C NZV�A�Fh QSND? G�AIV�A�FIB X A�NDP ] E�B�P
QSA�\D? ]�X A�RIQ�A�P^V�CDV�N�NDE�RI_�P^? X A�NL^F�QSA�P$? g^OZNDODNDP^Q�RiG X V�C NZV�A�F
c
GSQ�A�Q ] V�P^QINDODNDP^Q�RiF�Q�ND? G�Aj C C X BDVSP$QIP^V�NDWDNZVSA�F
System SafetyEngineeringHuman Factors
A Human−Centered, Safety−Driven Design Process
k43�lm3�nm:porqts/upv4w/x
y ��z�� � {�)&� � � ���� �
4. Establish the hazard log.
1. Identify system hazards
2. Translate system hazards into high−level
3. Assess hazards if required to do so.
system safety design constraints.
Preliminary Hazard Analysis
.
.
..
y ��z�� � {�)&� � � ���� �
Door that closes on an obstruction does not reopen or reopeneddoor does not reclose.
Doors cannot be opened for emergency evacuation.
����������������(
����������������,
c
Door closes while someone is in doorway
Door opens while improperly aligned with station platform.
Door opens while train is in motion.
Train starts with door open.
System Hazards for Automated Train Doors
c
other than a safe point of touchdown on assigned runway (CFIT)
y ��z�� � {�)&� � � ���� �
violate minimum separation.
with stationary objects or leaves the paved area.
Controlled aircraft executes an extreme maneuver within its
Identify the system hazards for this cruise−control systemExercise:
The cruise control system operates only when the engine is running.
traveling at that instant is maintained. The system monitors the car’sWhen the driver turns the system on, the speed at which the car is
speed by sensing the rate at which the wheels are turning, and itmaintains desired speed by controlling the throttle position. After the system has been turned on, the driver may tell it to start increasingspeed, wait a period of time, and then tell it to stop increasing speed.Throughout the time period, the system will increase the speed at afixed rate, and then will maintain the final speed reached.
The driver may turn off the system at any time. The system will turnoff if it senses that the accelerator has been depressed far enough tooverride the throttle control. If the system is on and senses that thebrake has been depressed, it will cease maintaining speed but will notturn off. The driver may tell the system to resume speed, whereuponit will return to the speed it was maintaining before braking and resumemaintenance of that speed.
authorization.
Controlled airborne aircraft and an intruder in controlled airspace
y ��z�� � {�)&� � � ���� �
Controlled airborne aircraft gets too close to a fixed obstable
Controlled aircraft operates outside its performance envelope.
Aircraft on ground comes too close to moving objects or collides
Aircraft enters a runway for which it does not have clearance.
performance envelope.
Loss of aircraft control.
System Hazards for Air Traffic ControlControlled aircraft violate minimum separation standards (NMAC).
Airborne controlled aircraft enters an unsafe atmospheric region.
Controlled airborne aircraft enters restricted airspace without
����������������.
����������������-
c
c
y ��z�� � {�)&� � � ���� �
1. A pair of controlled aircraft
1b. ATC shall provide conflict alerts.
maintain safe separation betweenaircraft.
1a. ATC shall provide advisories that
direct aircraft into areas with unsafeatmospheric conditions.
2a. ATC must not issue advisories that
2b. ATC shall provide weather advisoriesand alerts to flight crews.
2c. ATC shall warn aircraft that enter an unsafe atmospheric region.
Hazards must be translated into design constraints.
Door areas must be clear before door
Door opens while train is in motion.
violate minimum separation
�������������S�/!c
Example PHA for ATC Approach Control
areas, thunderstorm cells)(icing conditions, windshear
REQUIREMENTS/CONSTRAINTSHAZARDS
unsafe atmospheric region.2. A controlled aircraft enters an
standards.
doorway.
Door must be capable of opening only after
motion.Doors must remain closed while train is in
any door open.Train must not be capable of moving withTrain starts with door open.
DESIGN CRITERIONHAZARD
train is stopped and properly aligned with
Doors cannot be opened foremergency evacuation.
Door that closes on an obstructiondoes not reopen or reopened door does not reclose.
Door closes while someone is in
with station platform.Door opens while improperly aligned
emergency evacuation.anywhere when the train is stopped forMeans must be provided to open doors
reclose.removal of obstruction and then automaticallyAn obstructed door must reopen to permit
closing begins.
platform unless emergency exists (see below).
y ��z�� � {�)&� � � ���� �
Example PHA for ATC Approach Control (2)
c
to avoid intruders if at all possible.
HAZARDS REQUIREMENTS/CONSTRAINTS
6. Loss of controlled flight or lossof airframe integrity.
safety of flight.
the pilot or aircraft cannot fly or that 6c. ATC must not issue advisories that
6b. ATC advisories must not distractor disrupt the crew from maintaining
degrade the continued safe flight of the aircraft.
it at the wrong place.
that cause an aircraft to fall below
6a. ATC must not issue advisories outsidethe safe performance envelope of theaircraft.
6d. ATC must not provide advisories
the standard glidepath or intersect
�������������S��#
5. ATC shall provide alerts and advisories
HAZARDS REQUIREMENTS/CONSTRAINTS
3. A controlled aircraft entersrestricted airspace withoutauthorization.
4. A controlled aircraft gets too close to a fixed obstacle or terrain other than a safe point oftouchdown on assigned runway.
5. A controlled aircraft and anintruder in controlled airspaceviolate minimum separationstandards.
3a. ATC must not issue advisories thatdirect an aircraft into restricted airspaceunless avoiding a greater hazard.
3b. ATC shall provide timely warnings toaircraft to prevent their incursion intorestricted airspace.
4. ATC shall provide advisories that maintain safe separation betweenaircraft and terrain or physical obstacles.
y ��z�� � {�)&� � � ���� �
y ��z�� � {�)&� � � ���� �
10 12 12
121212121110
|t}�~�}������������������������� ��}���� ��� �
��������������������� ���
��������������������� ���
������� ��}���� ��� �|t}�~�}������������������
�������������� ������}���� � � ������6��}�~�}��������������������� � ���|t}�~�}������������
����������������}�~�}������� � ��� ��}����������������� ���������
6
Impossible
CatastrophicI
Critical
II
Marginal
III
Negligible
IV
1 2 3 4 129
12127643
5 8
�t����� ����������� ��
�� �����¡� �¢�£���¤���¡�¥
¦t��¤���¡�¥�§�¨����©����� �����¡� �¢ ¢ ��¥ �¡6£���¤���¡�¥ª ¡� �©���©�� ¢ � ��«¡���¥�¨�����¥
¬����¢ � ��� ©�¢ ��£���¤���¡�¥
�����¨�¡4¡��������® § ª ������ ©�¢ �¯K����¨�§���°Z� ¢ ¢
�� ��� �����¨�¡
¬Z �¡�§���¢ ¢ «±�� ���� �������²�²����p��� ³��
A B C D E F
´�µ�¶�µ�·�¸�¹�º"»$¼�½
Another Example Hazard Level Matrix
��¢ � §�� �������� �¡
¡���¥�¨�����¥ª ¡� �©���©�� ¢ � ��« �¡6£���¤���¡�¥©����� �����¡� �¢ ¢ ��¥¦t��¤���¡�¥�§�¨�����t����� ����������� ��
¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥
�t����� ����������� ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥
�t����� ����������� ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥
�t����� ����������� ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥
�t����� ����������� ��¡�����¨�� ¡���¥���
Improbable
Remote
Unlikely
Impossible
Catastrophic Critical Marginal Negligible
I−A
I−B
I−C
I−E
I−F
II−A
II−B
II−C
II−D
II−E
II−F
III−A
III−B
III−C
III−D
Occasional
c
c ´�µ�¶�µ�·�¸�¹�º"»$¼�¾
A
B
C
D
E
F
Frequent
Moderate
III−E
Frequent Probable Occasional Remote
III−F
IV−A
IV−B
IV−C
IV−D
IV−E
IV−F
I−D
IVIIIIII
LIKELIHOOD
SEVERITY
Classic Hazard Level Matrix
¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
detailed constraints.
StatesInitiatingFinal
EventsInitiating
nonhazard
nonhazard
nonhazard
X
Z
Y
W
D
C
B
A
Final
´�µ�¶�µ�·�¸Ì¹�º"»�»$½
´�µ�¶�µ�·�¸Ì¹�º"»�»$¾
Forward vs. Backward Search
HAZARDHAZARD
Backward Search
nonhazard
nonhazard
nonhazard
X
Z
Y
W
D
C
B
A
StatesEvents
Forward Search
Used to refine the high−level safety constraints into more
Hazard Causal Analysis
Requires some type of model (even if only in head of analyst)
BackwardForward Bottom−upTop−down
to system hazards.system design (model) for states or conditions that could leadAlmost always involves some type of search through the
c
c
Can use to refine system design constraints.
¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
software behavior.
¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
not openvalve 1
too highPressure
fails on
PositionIndicator
Valve 1
too late Light fails on
Open Indicator
failureComputer does
failureValve
inattentiveOperator
open valve 2not know to
Operator does
FailureSensor
does not openRelief valve 2
Computer
´�µ�¶�µ�·�¸Ì¹�º"»�»$Í
´�µ�¶�µ�·�¸Ì¹�º"»�»$Îc
System fault trees helpful in identifying potentially hazardous
FTA and Software
Not looking for failures but incorrect paths (functions)
provides some assurance they don’t exist.Identifies any paths from inputs to hazardous outputs or
FTA can be used to verify code.
Appropriate for qualitative analyses, not quantitative ones
Fault Tree Example
or
or
and
and
or
open valve 1command to
does not issueComputer
output
c
does not open
Valve
Explosion
Relief valve 1
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
Ð@ÑÓÒÕÔ�Ñ×ÖSØÚÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐÞ�Ù×Û à$áßÔ�âÝ2Ñ×ãÓãÓá6ÖßÛ Ý2Ù6Ð&Û Ñ×Ö Ý�Ñ×ãÓãäá6Ö6Û Ý�Ù6ÐåÛ Ñ×Ö
æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç
èêé èëé
é Ùßì×Û ÑÓÞ@ÙÜÛ$à á6Ô�â é Ù6ì×Û ÑÓÑ×Ö�ÒÕÔ�Ñ×ÖrØÞåÔ�âßí×árâÜÖrÝ�î
ï ç�î�ÝñðrÑ×à Ñ6Ø×Û Ý�ÙÜà�çtà Û$ò órÙ×ôSâ×à�Û ÖãÓÛ çñà â6Ùßì×Û ÖrØò6à ÙßÝ�âÓÑ×Ö
çñòrâ6âßìÚÙßì6õtÛ ç�Ñ×Ô�îÞ�Ù×Û à$áßÔ�â
ö á6ã÷Ù×Öï ðrî�çñÛ Ý2Ù×à
èëé
èëéÙ×Û$Ô�ÝñÔ�ÙßÞ@ÐßÐ@Ñøã�ÙÜù�âøÖrâßÝ�â6ç�ç2Ù×Ô�îúçtòrâßâ6ìÓÝtðrÙÜÖrØ6â
æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ$ÖSç�ÐåÔ�árÝ�ÐåÛ ÑÜÖrçúì6ÑøÖrÑßÐ6Ý�ÙÜárç�â
Example Fault Tree for ATC Arrival Traffic
èëé
èëé
ãÓÛ Ö6Û$ãäá6ãûç�â×òrÙÜÔ�Ù6ÐåÛ Ñ×Ö�ç2Ð@ÙÜÖrì6ÙÜÔ�ìßçü òrÙ×Û Ô"ÑßÞ6Ý�ÑÜÖrÐ&Ô�ÑÜà$à â6ìÓÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6õtÛ Ñ×à Ù6Ð�â
´Ìµ�¶�µ�·�¸�¹�ºß»�»$ý
þ Ô�Ñ×ÖSØøà ÙÜôrâ×à
´Ìµ�¶�µ�·�¸�¹�ºß»�»$ÿ
Example Fault Tree for ATC Arrival Traffic (2)
æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç
ç�â×òrÙÜÔ�Ù6ÐåÛ Ñ×ÖõtÛ ÑÜà ÙßÐ&Û Ñ×Ö��
Ð@ÑßÑøà Ù6Ð�âÓÐ�ÑÓÙ6õ�ÑÜÛ ìçtòrâ6âßìÓÙ6ì6õñÛ ç�ÑÜÔ�î
õtÛ ÑÜà ÙßÐ&Û Ñ×ÖÙ6õ�Ñ×Û ìÓç�âÜòrÙ×Ô�Ù6ÐåÛ Ñ×ÖÐ&ðSÙ6Ð6ìßÑ6â6çëÖrÑßÐçtòrâ6âßìÓÙ6ì6õñÛ ç�ÑÜÔ�îæ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç
ÖrÑßÐ6Þ�Ñ×à à Ñ6Ò Û Ð��Ô�â6Ý2â×Û õ2â6çëÛ ÐÜô6árÐßì6Ñ6âßç
ç�ÝtÔ�â6âÜÖ
Ù6ç�ç2Ñ6ÝtÛ Ù6Ð�â6ìÓÒÕÛ ÐåðÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6Ñ×Öò6à Ù×ÖrõñÛ â6Ò ì×Û çtò6à Ù6î
æ Ñ×ÖSÐ&Ô�ÑÜà$à â×Ô ì6Ñ6âßçÖSÑ6Ð×Û ç�çtáSâÓçtòrâ6âßìÙßì6õtÛ ç�Ñ×Ô�î
æ Ñ×ÖSÐ&Ô�ÑÜà$à â×ÔKÛ ç�çñárâ6çÙÜò6ò6Ô�Ñ×òßÔ�Û ÙßÐ@âÓçtòSâ6â6ìÙßì6õtÛ ç�Ñ×Ô�îëô6áSÐ×ò6Û à ÑßÐìßÑ6â6çëÖrÑßÐ×Ô�â6Ý2â×Û õ2âøÛ Ð��
æ Ñ×ÖSÐ&Ô�Ñ×à à â×Ô Û ç�çtáSâ6çÙ×òßò6Ô�Ñ×òßÔmÛ Ù6Ð�âÓçtòrâßâ6ìÙ6ìßõtÛ ç�Ñ×Ô�îúÙ×ÖrìøòßÛ$à Ñ6Ð
c
Ù×ò6òßÔ�Ñ6Ù6Ýñð Ð�ÑøòrÙÜÔ�Ù×à$à â×àÔ�á6ÖrÒ Ù6î2çëÖrÑ6ÐßçtòrÙ6ÐåÛ Ù×à à îç�Ð�Ù6Ø6Øßâ×Ô�â6ì��
� Ò ÑÓÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐÜà Ù×ÖSì×Û ÖrØÝ�ÑÜÖrç�âßÝtárÐåÛ õ2â×à îúÑÜÖ�ì×Û Þ�Þ�â×Ô�â×ÖSÐÔ�á6ÖSÒ Ùßî�çëÛ$ÖÓÛ ÖrÐ�â×Ô�ç�âßÝ�ÐåÛ$ÖrØÓÑÜÔÝ�ÑÜÖrõ�âÜÔ�ØÜÛ$ÖSØÓÑ×òrâ×Ô�ÙßÐ&Û Ñ×ÖSç õñÛ ÑÜà Ù6Ð�âãÓÛ Ö6Û ãÓá6ãûì×Û Þ@Þ�â×Ô�âÜÖrÝ�âøÛ ÖÐåð6Ô�â6çtðSÑ×à ìÓÝtÔ�Ñ6ç�çñÛ$ÖrØÓÐåÛ ã�â��
ü Ö�ÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐßõtÛ ÑÜà ÙßÐ@âßç ÐåðrâÖrÑ×Ö�� ÐåÔ�Ù×Örç2Ø×Ô�â6ç�çtÛ Ñ×Ö�2Ñ×ÖrâÒÕð6Û à âÓÙ×Û Ô�òrÑ×Ô�ÐÜÛ çúÝ�ÑÜÖrì×áSÝ�Ð&Û ÖrØÛ$ÖSì6â×òSâ×Örìßâ×ÖrÐ�@ó�� Ù×ò6òßÔ�Ñ6Ù6ÝtðSâ6çÐ�Ñ òSÙ×Ô�Ù×à à âÜà�Ô�á6ÖrÒ Ùßî�ç �
� Ò ÑÓÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6ÑÜÖ�Þ&Û ÖrÙ×à
Û Ö��tÐ&Ô�Ù×Û à�ç�âÜòrÙ×Ô�ÙßÐ&Û Ñ×Ö�ÒÕð6Û à â� Û Ñ×à ÙßÐ&Û Ñ×Ö�ÑßÞ×ãÓÛ Ö6Û$ãäá6ã
ÑÜÖ�Þ&Û ÖrÙ×àDÙ×ò6òßÔ�Ñ6Ù6Ýtð�Ð�Ñç2Ù×ã�âøÔ�á6ÖrÒ Ùßî
� Û Ñ×à ÙßÐ&Û Ñ×Ö�ÑßÞ6ì×Û ç�Ð�Ù×ÖrÝ2âÚÑÜÔ"ÐåÛ ã�âç2â×òrÙÜÔ�ÙßÐ&Û Ñ×ÖÓôSâ6Ð@Ò âßâ×Ö�ç�ÐåÔ�â6Ù×ã�çÑßÞ6Ù×Û Ô�ÝtÔ�Ù6Þ�Ð×à Ù×ÖrìÜÛ$ÖSØÚÑÜÖ�ì×Û Þ@Þ�â×Ô�â×ÖSÐÔ�áßÖrÒ Ù6î�ç
� Û ÑÜà Ù6ÐåÛ Ñ×Ö�Ñ6ÞÜãÓÛ$ÖßÛ$ãÓáßã ç2â×òrÙÜÔ�ÙßÐ&Û Ñ×Ö
ì6âÜòrÙ×Ô�Ðåá6Ô�âÓÐåÔ�Ù6Þ�Þ&Û ÝúÞ&Ô�Ñ×ã ÖSâ6Ù×Ô�ôrîôrâßÐ@Ò âßâ×Ö�Ù×Ô�Ô�Û õ2Ù×à�ÐåÔ�Ù6Þ�Þ&Û ÝúÙ×Örì
Þ�â6â6ìßâ×Ô Ù×Û$Ô�òSÑ×Ô�Ð@ç��
ü Ö�ÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐßÞ@ÙÜÛ$à çÐ�Ñ ã÷Ù×ù�âÓÐåá6Ô�ÖÞåÔ�ÑÜã ôSÙ6ç�âÓÐ�ÑÞåÛ$ÖrÙÜà�Ù×òßò6Ô�Ñ6Ù6Ýñð��
c
¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
Added what have learned from accidents
Mathematical completenessStates, inputs, outputs, transitions
Added basic engineering principles (e.g., feedback)
Defined completeness for each part of state machine
Mapped the parts of a control loop to a state machine
How were criteria derived?
I/O
I/O
´�µ�¶�µ�·�¸�¹�º"»$Î�Î
´�µ�¶�µ�·�¸�¹�º"»$Î�Í
Requirements Completeness Criteria (2)
c
c
Completeness: Requirements are sufficient to distinguishthe desired behavior of the software fromthat of any other undesired program thatmight be designed.
Most software−related accidents involve software requirementsdeficiencies.
Accidents often result from unhandled and unspecified cases.
We have defined a set of criteria to determine whether arequirements specification is complete.
Derived from accidents and basic engineering principles.
Validated (at JPL) and used on industrial projects.
Requirements Completeness
��µ�����Ë Ç µ�Â�µ�¹�Á ·tÉ&¹�Å�Ê À ·�Ë ·
tools can check them.Most integrated into SpecTRM−RL language design or simple
Mode transitions
c ´�µ�¶�µ�·�¸Ì¹�º�½�ÿ
Value and timing
Requirements Completeness Criteria (3)
Path RobustnessPreemptionReversibilityFeedbackLatencyData ageRobustness
Load and capacity
Inputs and outputs
Startup, shutdown
About 60 criteria in all including human−computer interaction.
(won’t go through them all they are in the book)
Human−computer interfaceFailure states and transitionsEnvironment capacity
¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
Automatic code generation?
Human Error Analysis
Software Deviation Analysis
State Machine Hazard Analysis (backwards reachability)
Model Execution, Animation, and Visualization
Requirements Analysis
Completeness
¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
Test Coverage Analysis and Test Case Generation
Results of execution could be input into a graphical
output can go into another model or simulator.Inputs can come from another model or simulator and
Model Execution and Animation
´�µ�¶�µ�·�¸�¹�º"»$Î�½
SpecTRM−RL models are executable.
´�µ�¶�µ�·�¸�¹�º"»$Î�¾c
visualization
Model execution is animated
c
��µ�·�Ë ��¹
��µ�·�Ë ��¹
Design should incorporate basic safety design principles
RedundancySafety Factors and Margins
Reducing exposureIsolation and containmentProtection systems and fail−safe design
HAZARD ELIMINATION
HAZARD REDUCTION
HAZARD CONTROL
DAMAGE REDUCTION
Safe Design Precedence
Software design must enforce safety constraints
Should be able to trace from requirements to code (vice versa)
Design for Safety
´�µ�¶�µ�·�¸�¹�º�½��
´�µ�¶�µ�·�¸�¹�º�Î�¼
Failure Minimization
c
c
Increasing effectiveness
Decreasing cost
SubstitutionSimplificationDecouplingElimination of human errorsReduction of hazardous materials or conditions
Design for controllabilityBarriers
Lockins, Lockouts, Interlocks
(Systems Theory Accident Modeling and Processes)
STAMP is a new theoretical underpinning for developing
.
..
STAMP
more effective hazard analysis techniques for complex systems.
¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
Design for safety
Hazard analysis
´�µ�¶�µ�·�¸�¹�º"»�»��
´�µ�¶�µ�·�¸�¹�º"»���¼
Investigating and analyzing accidents
Preventing accidents
Accident models provide the basis for
c
suitable for use)Assessing risk (determining whether systems are
c
Performance modeling and defining safety metrics
¿�¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·
��� �"!$#"%"&(' )"!*#"+-,. �0/210,3#4,�1(+"5�1-6*1-7
' ,�' 89)"� #08�8:&(�;' <2#"=">? ' /2' +"' , 79�0�(,�1(+"5@6BA"' C #D #(#")*)"#(� 82�"+"+(#"C���� �"!)(�08�8:' E"C #4�3� 1"F"!*#(+0, 8:>8�/2�;#"#(+4,��4/2�"+-,�1(' +G � � ? ' =(#*!*#082A
�"&0,��0�",31"+(5H>,��*5�#"#()*!*�"' 8�,�&(�;#
1-8�,31"+"5�1(F"#082>
I 82#*="#-8:' /�/:1(+0,
J #"="&-/:#*)(�;#08�82&"�;#
rupturemetalWeakened
projectedFragments
injuredPersonnel
damaged
8:&08�/2#")0,3' E"C #4,3�*="1(!*1"F"#(>
´�µ�¶�µ�·�¸�¹�º"»����
´�µ�¶�µ�·�¸�¹�º"»��/»
1"+(=4��� 1"F"!*#(+0,310,3' �"+">#0K�,3#"+082' ? #*=(1"!*1"F(#="�(#082L")"� # ? #"+0,3' +"F*!*�(�;#,3�*�;&")-,�&(�;#*E"#-���(�;#4,31"+"5I 8:#*E"&(� 8�,�=(' 1()"A"� 1"F(!
�3�"�;#-8:#(#"1"E(C #*C ' �3#0,3' !*#">�31"' C &"� #*)"�"' +-,�="&"� ' +"F�;#(="&0/2#48�,�� #"+"F-,�A4,3�/2�"�;� �082' �"+46B' C CH+"�-,,3A"' /25H+"#-8H8�8:�M ? #"� ="#082' F"+*!*#-,�1(C
!*�"' 8�,3&"�;#(>/2�"+0,310/�,"6B' ,�A8�,�#(#"CN,3�*)"� # ? #"+0,)"C 1-,�#4/21"� E"�"+8�,�#(#"CH�(�H/:�(10,��-�I 82#48�,�1(' +"C #08�8
c
AND OR
Events almost always involve component failure, human error, or energy−related event
Form the basis of most safety−engineering and reliabilityengineering analysis:
e.g., Fault Tree Analysis, Probabilistic Risk Assessment,FMEA, Event Trees
and design:
e.g., redundancy, overdesign, safety margins, ...
as a forward chain over time.Explain accidents in terms of multiple events, sequenced
Chain−of−Events Models
c
Equipment
pressureOperating
Moisture Corrosion Tank
Chain−of−Events Example
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
Software error
System accidents
Models need to include the social system as well as the technology and its underlying science.
OQP*R SUTWVWX R;Y*Z\[^]�_Za`*b^X;P*RaZ�cdZ�e�Y*fge�T*PUehSU]�`Ui�X j*Y*ZkSUl*]�SU`*Z^Y*m*n*`*P*R;Zam*P*o*jWj*Y*b^X Z^X;`*oWba]HXpe�Y*]�X;P*_Zab^X;Y*o*baYWY*fWY*]�ndY*Za_rqsitY*]HR c�X o*nWYUi�Y*]�cue�Y*baT*o*X;b^P*Ra`*]vbaX-itX;RaZ�cdZ�e�Y*fwX;Z\PP*Rpe�Td`*l*n*Tre�T*Yre�Y*baT*o*`*R;`*n4cxfyP4cxz*Yr{|Y*R;Raj*YUi�Y*R `aSUY*jWR `*o*nWz*Y*}�`*]HYre�T*Y~|o*j*Y*]�R c�X o*nWYUi�Y*]�cue�Y*baT*o*`*R;`*n4c�X;ZsPUe4R Y*P*Z�e4`*o*YWz*P*Z^X bsZab^X Y*o*b^Y*m
Limitations of Event Chain Models:
Social and organizational factors in accidents
E1: Worker washes pipes without inserting slip blind
E2: Water leaks into MIT tank
E6: Wind carries MIC into populated area around plant
E5: MIC vented into air
Chain−of−Events Example: Bhopal
E4: Relief valve opens
E3: Explosion occurs
c
´Ìµ�¶�µ�·�¸�¹�ºß»���¾
´Ìµ�¶�µ�·�¸�¹�ºß»���½
c
can easily be identified.of possible accidentsCombinatorial structure
for the trees.very likely will not see the forestdepartments in operational contextDecision makers from separate
Operational Decision Making:Accident Analysis:
Zeebrugge
HarborDesign
CargoManagement
PassengerManagement
TrafficScheduling
VesselOperation
DesignVessel
Time pressureOperations management
Captain’s planning
Berth design
procedure
Major accidents involve systematic migration of organizational
aggressive, competitive environment.behavior under pressure toward cost effectiveness in an
in isolation from theit into individual decisions and actions and studying itCannot effectively model human behavior by decomposing
Adaptation
dynamic work processvalue system in which it takes placephysical and social context
Deviation from normative procedure vs. established practice
Human error
Limitations of Event Chain Models (2)
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
to ZeebruggeTransfer of Herald
heuristicsOperations management
Standing orders
Calais
Operations management
Berth design
Excess numbersPassenger management
Capsizing
Unsafe
patternsCrew working
procedure
Change ofdocking
DesignStability Analysis
Shipyard Equipmentload added
Truck companies
Impairedstability
Excess load routines
Docking
c
c ´Ìµ�¶�µ�·�¸�¹�ºß»���Î
´Ìµ�¶�µ�·�¸�¹�ºß»���Í
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
3.
Analytic Reduction (Descartes)
Statistics
Treat as a structureless mass with interchangeable parts.
in their behavior that they can be studied statistically.
Use Law of Large Numbers to describe behavior in
Assumes components sufficiently regular and random
terms of averages.
Ways to Cope with Complexity (con’t.)
´Ìµ�¶�µ�·�¸�¹�ºß»���ÿ
´Ìµ�¶�µ�·�¸�¹�ºß»���ý
c
into the whole are themselves straightforward.
c
Ways to Cope with Complexity
Divide system into distinct parts for analysis purposes.
Examine the parts separately.
Three important assumptions:
The division into parts will not distort the phenomenon being studied.
1.
Components are the same when examined singlyas when playing their part in the whole.
2.
Principles governing the assembling of the components
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
(basis of system engineering)
systems −− how they interact and fit together.These properties derive from relationships between the parts of
Some properties can only be treated adequately in their entirety,taking into account all social and technical aspects.
Systems Theory
Developed for biology (Bertalanffly) and cybernetics (Norbert Weiner)
c
Most important properties are emergent.
Separation into non−interacting subsystems distorts results
For systems too complex for complete analysis
and too organized for statistical analysis
Concentrates on analysis and design of whole as distinct from parts
´Ìµ�¶�µ�·�¸�¹�ºß»����
´Ìµ�¶�µ�·�¸�¹�ºß»$¾�¼
c
Too organized for statistics
Separation into non−interacting subsystems distortsthe results.
The most important properties are emergent.
Too much underlying structure that distortsthe statistics.
What about software?
Too complex for complete analysis:
Irreducible
Represent constraints upon the degree of freedom of components a lower level.
Safety is an emergent system property
It is NOT a component property.
Hierarchies characterized by control processes working atthe interfaces between levels.
A control action imposes constraints upon the activity at one level of a hierarchy.
Control in open systems implies need for communication
information and control.in a state of dynamic equilibrium by feedback loops ofOpen systems are viewed as interrelated components kept
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·
It can only be analyzed in the context of the whole.
c ´Ìµ�¶�µ�·�¸�¹�ºß»$¾��
´Ìµ�¶�µ�·�¸�¹�ºß»$¾/»c
Levels characterized by emergent properties
2. Communication and control
Systems Theory (3)
Two pairs of ideas:
Systems Theory (2)
1. Emergence and hierarchy
Levels of organization, each more complex than one below.
¿�� É��h�
¿�� É��h�
that violate the constraints on safe component
People
Accidents arise from interactions among
behavior and interactions.
Physical system components
Engineering activities
Societal and organizational structures
Need to include the entire socio−technical system
c
c
c
Not simply chains of events or linear causality,but more complex types of causal connections.
c
Safety is an emergent system property.
A Systems Theory Model of Accidents
´�µ�¶�µ�·�¸�¹�º"»$½�ý
´�µ�¶�µ�·�¸�¹�º"»$½�ÿ
To understand accidents, need to examine control structure
and why events occurred.
Result from lack of enforcement of safety constraints
Mars Polar Lander.Software did not adequately control descent speed of
sealing gap in field joint
Views accidents as a control problem
STAMP (2)
e.g., O−ring did not control propellant gas release by
Events are the result of the inadequate control
¿�� É��h�
¿�� É��h�
itself to determine why inadequate to maintain safety constraints
continually adapting to achieve its ends and toA socio−technical system is a dynamic process
and adaptation.structure to enforce constraints on system behaviorPreventing accidents requires designing a control
react to changes in itself and its environment
(Systems−Theoretic Accident Model and Processes)
´�µ�¶�µ�·�¸�¹�º"»$Î�¼
´�µ�¶�µ�·�¸�¹�º"»$½��
Systems not treated as a static design
STAMP
Based on systems and control theory
c
c
c
c
Maintenance
Congress and Legislatures
Legislation
Company
Congress and Legislatures
Legislation
Legal penaltiesCertificationStandardsRegulations
Government ReportsLobbyingHearings and open meetingsAccidents
Case LawLegal penaltiesCertificationStandards
Problem reports
Incident ReportsRisk AssessmentsStatus Reports
Test reports
Test RequirementsStandards
Review Results
Safety Constraints
Implementation
Hazard Analyses
Progress Reports
Safety Standards Hazard AnalysesProgress Reports
Design, Work Instructions Change requests
Audit reports
Regulations
Insurance Companies, CourtsUser Associations, Unions,
Industry Associations,Government Regulatory Agencies
Management
Management
ManagementProject
Government Regulatory AgenciesIndustry Associations,
User Associations, Unions,
Documentation
and assurance
and Evolution
SYSTEM OPERATIONS
Insurance Companies, Courts
Physical
Actuator(s)
SYSTEM DEVELOPMENT
Accidents and incidents
Government ReportsLobbyingHearings and open meetingsAccidents
WhistleblowersChange reportsMaintenance ReportsOperations reportsAccident and incident reports
Change RequestsPerformance Audits
IncidentsProblem Reports
Hardware replacementsSoftware revisions
Hazard AnalysesOperating Process
Case Law
Safety−Related Changes
Operating AssumptionsOperating Procedures
Revisedoperating procedures
WhistleblowersChange reportsCertification Info.
Procedures
safety reports
work logs
Manufacturinginspections
Hazard Analyses
Documentation
Design Rationale
Company
ResourcesStandards
Safety Policy Operations Reports
ManagementOperations
ResourcesStandards
Safety Policy
audits
ManufacturingManagement
SafetyReports
Work
Policy, stds.
Process
Controller
Sensor(s)
Human Controller(s)
Automated
Controller 1
Controller 2
¿�� É��h�
Process 2
Accidents occur when:
Design does not enforce safety constraints
Inadequate control actions
Control structure degrades over time, asynchronous evolution
Control actions inadequately coordinated among multiplecontrollers.
unhandled disturbances, failures, dysfunctional interactions
Overlap areas (side effects of decisions and control actions)
Boundary areas
Process 1
¿�� É��h�
Controller 1
Controller 2Process
New model includes what do now and more
But does imply the need to enforce the safety constraintsin some way.
e.g., redundancy, interlocks, fail−safe design
maintenance procedures
manufacturing processes and procedures
or through process
Component failures may be controlled through design
Note:
Does not imply need for a "controller"
´�µ�¶�µ�·�¸�¹�º"»$Î��
´�µ�¶�µ�·�¸�¹�º"»$Î�¾cc
cc
Model of
¿�� É��h�
Process Models
(Controller) Human Supervisor Automated Controller
InterfacesProcessModel of Model of
Sensors
Actuators
ProcessControlled
inputsProcess
Controls
DisplaysDisturbancesAutomation
Accidents occur when the models do not match the process and
Time lags not accounted for
[Note these are related to what we called system accidents]
inadvertently commanding system into a hazardous stateunhandled or incorrectly handled system component failures
unhandled process statese.g. uncontrolled disturbances
Wrong from beginning
Relationship between Safety and Process Model
¿�� É��h�
incorrect control commands are given (or correct ones not given)
ProcessModel of
variablesMeasured
Controlled
Process
variables
outputs
The ways the process can change stateCurrent state (values of process variables)Required relationship among process variables
Process models must contain:
Missing or incorrect feedback and not updated correctly
Explains most software−related accidents
How do they become inconsistent?
´�µ�¶�µ�·�¸�¹�º"»$Î�Î
´�µ�¶�µ�·�¸�¹�º"»$Î�½c
cc
c
Also explains most human/computer interaction problems
How do I get it to do what I want?How did it get us into this state?What will it do next?Why did it do that?What did it just do?
What caused the failure?What can we do so it does not
happen again?
Or don’t get feedback to update mental models or disbelieve it
Safety and Human Mental Models
Explains developer errors
Why won’t it let us do that?
¿�� É��h�
¿�� É��h�
Pilots and others are not understanding the automation
In preventing accidents
Hazard analysis
Designing for safety
Is it better for these purposes than the chain−of−events model?
Is it useful?
etc.physical lawsdevelopment processrequired system or software behavior
May have incorrect model of
In accident and mishap investigation
Validating and Using the Model
Can it explain (model) accidents that have already occurred?
´�µ�¶�µ�·�¸�¹�º"»$Î�ÿ
´�µ�¶�µ�·�¸�¹�º"»$Î�Íc
c
c
c
¿�� É��h�
¿�� É��h�
Modeling Accidents Using STAMP
Three types of models are needed:
1. Static safety control structure
2. Dynamic structure
3. Behavioral dynamics
Dynamic processes behind the changes, i.e., why the system changes
Shows how the safety control structure changed over time
Safety requirements and constraintsFlawed control actionsContext (social, political, etc.)Mental model flawsCoordination flaws
Using STAMP in Accident and
Root Cause Analysis
Mishap Investigation and
c
c
c
c ´�µ�¶�µ�·�¸�¹�º"»$Î��
´�µ�¶�µ�·�¸�¹�º"»$Î�ý
Diagnostic andflight information
Horizontal velocity
command
commandMain engine
Horizontalvelocity
Main engineNozzle
OBC
SRI
Backup SRI
BoosterNozzles
platformStrapdown inertial
Nozzle
�:�������
being sent to nozzles.an attitude deviation that had not occurred. Results in incorrect commands
Process Model: Model of the current launch attitude is incorrect, i.e., it contains
nozzle to make a large correction for an attitude deviation that had not occurred.Unsafe Behavior: Control command sent to booster nozzles and later to main engine
of attack of more than 20 degrees.
Executes flight program; Controls nozzles of solidboosters and Vulcain cryogenic engine
Measures attitude oflauncher and its movements in space
Measures attitude oflauncher and its movements in space;Takes over if SRI unableto send guidance info
result in the launcher operating outside its safe envelope.
Full nozzle deflections of solid boosters and main engine lead to angleNozzles:
stage at altitude of 4 km and 1 km from launch pad.Triggered (as designed) by boosters separating from mainSelf−Destruct System:
OBC Safety Constraint Violated: Commands from the OBC to the nozzles must not
to disintegrate at 39 seconds after command for main engine ignition (H0).high angle of attack create aerodynamic forces that cause the launcher
OBC (On−Board Computer)
uses it for flight control calculations. With both SRI and backup SRI shut downControl Algorithm Flaw: Interprets diagnostic information from SRI as flight data and
and therefore no possibility of getting correct guidance and attitude information,loss was inevitable.
A rapid change in attitude and high aerodynamic loads stemming from a Ariane 5:
SRI that is available on the databus. to determine which) − does not include the diagnostic information from the
Interface Model: Incomplete or incorrect (not enough information in accident report
Feedback: Diagnostic information received from SRI
A
���H�0���0�h�����N���
B
cc
D C
C
ARIANE 5 LAUNCHER
Diagnostic andflight information
Nozzlecommand
command
Horizontal velocity
Main engine
Horizontalvelocity
Main engineNozzle
OBC
SRI
Backup SRI
BoosterNozzles
platformStrapdown inertial
�:�������
Process Model: Does not match Ariane 5 (based on Ariane 4 trajectory data);
where horizontal bias variable does not get large enough to cause an overflow. exception while calculating the horizontal bias. Algorithm reused from Ariane 4floating point value to a 16−bit signed integer leads to an unhandled overflow velocity input from the strapdown inertial platform (C). Conversion from a 64−bitused as an indicator of alignment precision over time) using the horizontal
Control Algorithm: Calculates the Horizontal Bias (an internal alignment variable
SRI Safety Constraint Violated: The backup SRI must continue to send guidance
inertial platform.
Executes flight program; Controls nozzles of solidboosters and Vulcain cryogenic engine
Measures attitude oflauncher and its movements in space
Measures attitude oflauncher and its movements in space;Takes over if SRI unableto send guidance info
Assumes smaller horizontal velocity values than possible on Ariane 5.
Process Model: Does not match Ariane 5 (based on Ariane 4 trajectory data);Assumes smaller horizontal velocity values than possible on Ariane 5.
information as long as it can get the necessary information from the strapdown
inertial platform.
Backup SRI (Inertial Reference System):
the bus (D).turns itself off (as it was designed to do) after putting diagnostic information on
results in the same behavior, i.e., shutting itself off.
information as long as it can get the necessary information from the strapdown
Unsafe Behavior: At 36.75 seconds after H0, SRI detects an internal error and
Because the algorithm was the same in both SRI computers, the overflow
exception while calculating the horizontal bias. Algorithm reused from Ariane 4where horizontal bias variable does not get large enough to cause an overflow.
Unsafe Behavior: At 36.75 seconds after H0, backup SRI detects an internal error and turns itself off (as it was designed to do).
SRI (Inertial Reference System):
SRI Safety Constraint Violated: The SRI must continue to send guidance
Control Algorithm: Calculates the Horizontal Bias (an internal alignment variable used as an indicator of alignment precision over time) using the horizontal velocity input from the strapdown inertial platform (C). Conversion from a 64−bitfloating point value to a 16−bit signed integer leads to an unhandled overflow
B
���H���H�0�h�����N�h�
D
cc
C
C
ARIANE 5 LAUNCHER
A
���0�H�h�
DEVELOPMENT
�d�4�^� *¡h�4¢¤£*¡¤¥ ¦�§�¦y¨d© ª4«��4���¬¯®4£4¬4¢*� ¢°¡hª4¬*� ¡h«^± ²h¢4�°± ª4£4¢¤¬¯£4¨4�³µ´�¶¸·º¹N´¼»4½¾» ¿4ÀÁ*Â
Ãı ±(��£4ů�4¬¯Æ�Ç�«^©;¬�� � «�£*±(¢4£4¬¯£¤£d¡h¢¤��ª4ů¬¯Èµ£*© �¤É°²h��¬4Ê4�°� ¡h«^± ²h¢4�4¢¥ ¦9§�¦yÉ°²h��¬4Ê4�¤¨4�*©;ůª*©;ÉQ�4¢$ª*¡Ä¬¯®4�¤£4��Ç�Å�± ª4ÈË¡Ä��Æ���¬¯�4É
Ì�¿4Í-Î4·pϤ³µ´�¶¸Á4·(¹N¿*Я¶Ñ·(Â
Òd���4¢¤¢4�4ů£*²4± ¬4Ó�£*± ²h�4�Ôůª*©Ñ¬¯�4��¬�� ¡h ¤��ª4ů¬¯ÈU£*©;�°� ɤ¨*± �4ɤ�*¡h¬¯£4¬�� ª*¡¦�£*± � ¢4£4¬¯�4¢¤¢4�4��� *¡Ä«�ªd¡h��¬¯£*¡h¬4Ê*²h¬*¡hª4¬4£4«�¬�²h£d±(«�ª*¡h��¬¯£*¡h¬
Õ¤� �^²4¡h¢4�*©;��¬¯£*� ¡h¢*� ¡h ¤ª4Å*± ª4£4¢¤¬¯£4¨4�¤«�© �4£4¬�� ª*¡Ä¨*© ª4«��4���Õ¤� �^²4¡h¢4�*©;��¬¯£*¡h¢*� ¡h ¤£4Ê4ª*²h¬4ȵ®4£4¬4«�ª*²4± ¢¤Ê4�¤¬¯�4��¬¯�a¢
Ö Î*¶Ñ·-¿*» Ö ´ ×¾Î*»4½ » ¿aÀÁ*Â
Titan 4/Centaur/Milstar OPERATIONS
LMAAnalex Denver
Engineering
IV&V of flight softwareHoneywell
Aerospace
development and testMonitor software
LMA Quality
Flight Control Software
Software Design and Development
IMS software
LMA System
Assurance
Ø$Ù*Ú*Û;Ü4Ý�Þ�ß$à�ß
á�â�ã3â�ä¯åpæhçaè�é�êc
operations management)(Responsible for ground
Third Space Launch Squadron (3SLS)
of LMA contract)(Responsible for administration
Center Launch Directorate (SMC)Space and Missile Systems
oversee the process
contract administrationsoftware surveillance
Management CommandDefense Contract
c
verify designAnalex−Cleveland
IV&VAnalex
construction of flight control system)(Responsible for design and
Prime Contractor (LMA)
System test of INULMA FAST Lab
Titan/Centaur/Milstar
(CCAS)Ground Operations
�������h�
�������h�
ë4ì í�ì î-ï;ð ñ@ò�ó
ôºõ�ö â�÷�ä õ�øhù�ú â�ä
ûËü-ý-þ"ÿ������0ýpþ��
���� � �� �þ"ü� ý��
��� ú å�÷ � æ�âh÷ â�ä�� ��� õ�ú¯ø â õ ä���÷ â ø â�æ ö
� å�ä ù � ö õ�ú ÷ â ù å�÷ ö ä���� æ ù � ö�� ÷ å ø*ø â���� � õ�ú � å øhø ��æ�� ö �
� ��ã�� ä3å�÷ � â�ä�� ô(õ ÷ æ�� æ���ä
� å ø:ùpú õ � æ ö ä
ô(õ�ö â�÷
Shallow location
Heavy rains
Porous bedrockMinimal overburden
��� �"!
No chlorinatorDesign flaw:
#�$&% %�'Design flaw:
#($&% %*)
� å�æ ö õ�ø � æ õ æ ö ä
ûËü-ý-þ"ÿ������0ýpþ��
÷ â�+���â�ä ö ä � å�÷,� æ � å
. .
..
- ò/.�02ð í/1202í3ï
3 ��� ��â ú � æ�â�ä
46587:9;9tò&1<14ì îpî(ì ò í�02ð î
ô(õ�ö â�÷
Shallow location
Heavy rains
Porous bedrockMinimal overburden
��� �"!
No chlorinatorDesign flaw:
#�$&% %�'Design flaw:
#($&% %*)
� å�æ ö õ�ø � æ õ æ ö ä
=>������â ö
Dynamic Structure
? ã3â�÷ ä�� � � ö
� å ú � � � â�ä
� æ�ä ù â � ö � å�æ õ æ��hå ö � âp÷0÷ â ù å�÷ ö ä
÷ â ù å�÷ ö äô(õ�ö â�÷0ä õ�øhù�ú â�ä
÷ â ù å�÷ ö äA@ 0�îpï ì í�B�C*D&E÷ â ù å�÷ ö ä
÷ â���� ú õ�ö � å�æ�ä÷ â ù å�÷ ö ä�2?&F � æ�ä ù â � ö � å�æ
? ù â�÷ õ�ö åp÷ � â�÷ ö � � � � õ�ö � å�æ
G,� æ õ æ � � õ�úIH æ � å�J
ûËü-ý-þ"ÿ������0ýpþ��
K -ALAM ë20/N ì O�D&P
÷ â�+���â�ä ö ä÷ â ù å�÷ öä ö õ�ö ��ä
÷ â ù å�÷ ö ä
Q�������â ö ä�� ú õ�ô ä
Q�������â ö ä�� ú õ�ô ä
Q�������â ö ä�� ú õ�ô ä
÷ â ù å�÷ ö ä
����� ��â ú � æ�â�äG0â���â�÷ õ�ú 5 ð ò/.(ì í*Oºì D P
÷ â ù å�÷ ö ä
R:0/D P ï"S TU0 V3ïXW3ò�ó&RU0>D&P ïXS ���� � �� �þ"ü� ý��
��� ú å�÷ � æ õ�ö � å�æ
ë4ì í�ì î-ï ð ñ@ò�ó ?&Y�Z[? ��\ � ú å�÷ � æ õ�ö � å�æ2=>� ú ú â ö � æ\(â�÷ ö � � � � õ�ö â�ä�å � � ù�ù ÷ å�ã õ�ú
Z â ú úä3â ú â � ö � å�æò V�0 ð D�ï ì ò í3î4(D&P ]"0 ð ï ò2í<5&7U9
õ æ��
^&9`_ M
� å�æ ö õ�ø � æ õ æ ö ä
÷ â���� ú õ�ö å�÷ � ù å ú � � �
÷ â���� ú õ�ö å�÷ � ù å ú � � �
46587:9;9tò&1<14ì îpî(ì ò í�02ð î
� ��ã�� ä3å�÷ � â�ä�� ô(õ ÷ æ�� æ���ä
� å�ä ù � ö õ�ú ÷ â ù å�÷ ö ä���� æ ù � ö�� ÷ å ø*ø â���� � õ�ú � å øhø ��æ�� ö �
��� ú å�÷ � æ�âh÷ â�ä�� ��� õ�ú¯ø â õ ä���÷ â ø â�æ ö
���� � �� �þ"ü� ý��
ûËü-ý-þ"ÿ������0ýpþ��
ôºõ�ö â�÷�ä õ�øhù�ú â�ä
ë4ì í�ì î-ï;ð ñ@ò�óR:0/D P ï"S
- ò/.�02ð í/1202í3ï
÷ â ù å�÷ ö ä
5 ð ò/.(ì í*Oºì D PG0â���â�÷ õ�ú
System Hazard: 5&a/E/P ì O ì îb0/c�V3ò�î�0/N4ï ò<0&W*Opò P ì;ò ðpò�ï"S*0 ð:S*0/D&P ïXS�d:ð 0&P D�ï 0/N<O-ò í�ï D 14ì í�D í�ï î@ïXS�ð ò&a*B&S2N ð ì í/]pì í*B<e2D�ï 02ð W
System Safety Constraints: f g�h 4(D�ï 02ð�i&a�D P ì ï ñj1ka�îpï2í3ò�ï E�0<O-ò&1<V�ð ò&14ì î�0>N&Wf l/h 5&a/E/P ì OjS�0/D P ïXS<120/D�î:a�ð 0�îj1<a3î-ï ð 0>N&a�O�0*ð ì î�]�ò�ó>0/c�V3ò�î�a�ð 0*ì ó/e2D�ï 02ð�i&a�D P ì ï ñ ì îbOpò 1<V�ð ò 14ì î�0/N f 0&W B&W m�í3ò�ï;ì ó ì O�D�ï ì ò2í2D í*N�V�ð ò>O�0>N&a�ð 0�î@ï ò4ó ò&P P ò/e h@ S�04î�D�ó 0�ï ñAO-ò í�ï;ð ò&P î-ï ð a�O-ïXa�ð 0�1<a�îpï V�ð 0>.�02í3ï/0>c:V�ò�î�a�ð 04ò�ó�ïXS�0�V/a>E/P ì O@ï ò<Opò2í3ï D&14ì í�D�ï 0>N<e2D�ï 0 ð W
. .
? ù â�÷ õ�ö åp÷ � â�÷ ö � � � � õ�ö � å�æ\(â�÷ ö � � � � õ�ö â�ä�å � � ù�ù ÷ å�ã õ�ú
÷ â ù å�÷ ö ä
� å ú � � � â�ä Z â ú úä3â ú â � ö � å�æ
÷ â ù å�÷ ö ä
÷ â ù å�÷ ö ä
ò V�0 ð D�ï ì ò í3î
ä ö õ�ö ��ä÷ â ù å�÷ ö÷ â�+���â�ä ö ä
K -ALAM ë20/N ì O�D&P
4(D&P ]"0 ð ï ò2í<5&7U9
TU0 V3ïXW3ò�ó&RU0>D&P ïXS÷ â ù å�÷ ö ä
? ã3â�÷ ä�� � � ö
õ æ��
÷ â���� ú õ�ö � å�æ�ä�2?&F � æ�ä ù â � ö � å�æ÷ â ù å�÷ ö ä
^&9`_ M
� å�æ ö õ�ø � æ õ æ ö ä� æ�ä ù â � ö � å�æ õ æ��hå ö � âp÷0÷ â ù å�÷ ö ä
÷ â���� ú õ�ö å�÷ � ù å ú � � �
÷ â���� ú õ�ö å�÷ � ù å ú � � �
ô(õ�ö â�÷0ä õ�øhù�ú â�ä
?&Y�Z[? ��\ � ú å�÷ � æ õ�ö � å�æ2=>� ú ú â ö � æ
Q�������â ö ä�� ú õ�ô ä
Q�������â ö ä�� ú õ�ô ä
@ 0�îpï ì í�B�C*D&E
���� � �� �þ"ü� ý��
��� ú å�÷ � æ õ�ö � å�æ
Q�������â ö ä�� ú õ�ô äë4ì í�ì î-ï ð ñ@ò�ó
÷ â ù å�÷ ö ä
����� ��â ú � æ�â�ä
á�â�ã3â�ä3å�æhçaè�é�n
á�â�ã3â�ä3å�æhçaè�é�é
c
c
c
5 ð ì .�D�ï 0
c
- ò/.�0 ð í>120 í3ï
oU0�îºì N/02í3ï î
4(D&P ]"0 ð ï ò2íoU0�îºì N/02í3ï î
ïXS*0�_�í*.ºì ð ò í/1202í3ï
@ 0�îpï ì í�B�C�D E
4(D&P ]"0 ð ï ò2í
- ò/.�0 ð í>120 í3ï
opa�ð D&P"^�ó ó D ì ð îq-ò�ò/N m�D í*N^&B2ð ì O:a>P ï"a�ð 0&më4ì í�ì îpï ð ñ@ò�ó
ïXS*0�_�í*.ºì ð ò í/1202í3ï
opa�ð D&P"^�ó ó D ì ð îq-ò�ò/N m�D í*N^&B2ð ì O:a>P ï"a�ð 0&më4ì í�ì îpï ð ñ@ò�ó
r8s2tvuxw
Modeling Behavioral Dynamics
yAz�{�|8},~
��},�}�� �8�x�;���8�
ux�j�
ux�v�u z��8� �� ���8� � ~ �
� �8}��"�,~*���
yA� �jr
ux�v��� �}�� �� |���~
� ���,~*�"����� � �8��� � �8�����8}�� �
� �"��� w ������ �,�� �����j�,��~*�w �"}����z��"}`~*� � z,~Ay�z8{�|8},~
c
� ��� � � {8}��,�}
� � � },��~*� �}��8},����� �t z�~��8�x�,~*� � r ��z,~*{������r ����~*}��"�
�`}����8� ~�}8{x vz8��� � ~ �� � r �8�x��� }8{`¡¢��~�}8�
� ��}8����~��8�� ���x��}�~�}8�,�}
�<� �£v� � � � � ��� �¤ � � },��~*� ���
w �"},�}���� }x� �r ��z,~*{������ r ����~*}���`}�¥�z8� �"}��x}8�,~ � � ���
¦�"����~�� �����8���<�,~*}� � ¤ � � },��~*� ���
� � � },��~*� �}��8},����� �yA� �jr§t {,�� ��8�"� },�
t ���8�"}���}����w z8¨�� � �
� ��}��"��~��8�
c
� � � },��~*� �}8��},����� �� ����~��"����� � �8�x� � �8�����8}�� �yA}�~ �x}�}�� u z8��� �� �8��� � ~ ����yA� �jr
w �"},�}���� }x� ����8¨x�<}��8��� ~*� ��|
©v}��"�,�� ~ ��� �� ��}8�"�,~*���w �"��¨�� }��ª�`}����8� ~�� ��|�<� �£v� �� ����~��8��� ���,~*� �8�x� �«<�"� �8£�� ��|`¡¬�,~*}��
jz��8� � ~ � � ����~��"�8� r ����~*}��� z��x���8� � �8�,~*�"��� � }�{`¡¬�,~*}��� � � },��~*� �}��8},����� �
� � � },��~*� �}��8},����� �r ���x��� � ��|
�<��~�}x� � ¤ �,��"}����}
� �}8� �� |���~
� � � },��~*� �}8��},����� �yA},~ ��}�}8�x����¨x���j�,��~��
� �8}��"�,~*���� �8�x��� � �8�,�}
� � w z��8� ����x}8�,~� ��}8�"�,~*���2¦}8���
��� ~*�x�<}8{ s �8��}t �8���,��8�,�}
� � w �"�,�},���u }8�,~*��� u ��{8}��
� � w �"�,�},���u }8�,~*��� u ��{8}��
� � w �"���},���u }���~��8� u �8{�}��� � w �"�,�}����u }��,~*��� u �8{�}��
� � w �"�,�},���u }8�,~*��� u ��{8}��
u z���� �� ���8� � ~ �
� � � �8�,~*���x� �8�,~*� ���x�<� �£
�`�,~*}x� � � �,��"}����}x� �¤ � � },��~*� ���x�`� �£
w }�� ��8�����t ¨�� � � ~*� }��
� � � },��~*� �}8��},����� �� ¥�z�� ���x}��,~u ��� �,~*}����8�,�}
w }�� � ���"�x���,�}x� �� �8� ���"� ���,~*��� �
¨,��yA� �vr�<}����z�� �},�t ���� � ��¨8� }2�}x� �
ux�j�� �}�� �� |���~
jz���� � ~ �®� �s �"��� �8� ��|
jz��8� � ~ ��� � ¡¬}�� �
���0�H�h�
RiskPerceived
Budget cutsdirected toward
safety
ExternalPressure
PerformancePressure
Expectations
Launch Rate
¯2° ±6°X²
of the Columbia AccidentA (Partial) System Dynamics Model
safety programsPriority of
Success RateSuccess
³`´
in complacencyRate of increase
Complacency
³�µ
¶¸·2¹�¹�º2»�»
Accident Rate
safety
Systemsafetyefforts
cutsBudget
³�µ¼`½�¾2¿2À º ± »;Á2Â<Ã,º¿ º2º2Ä6Å ° Æ º2Ç
¯2° ±6°X² » ² ¾
Èɵ¼ ·2»8Á ° Ä2Ê ² Á2º
ËÌ° ¹ ¾2À Â2»®ÍÌ· À Â2¹
safety increaseRate of
Safety
cc á�â�ã3â�ä¯åpæhçaè�é�Î
�������h�
�������h�
Coordination flaws
1. Identify
Mental model flaws
Change those factors if possible
Dynamic processes in effect that led to changesChanges to static safety control structure over time
2. Model dynamic aspects of accident:
Steps in a STAMP analysis:
Examines interrelationships rather than linear cause−effect chains
Looks at the processes behind the events
Includes entire socio−economic system
Includes behavioral dynamics (changes over time)
Want to not just react to accidents and impose controls for a while, but understand why controls drift toward ineffectiveness over time and
Context in which decisions made
3. Create the overall explanation for the accident
STAMP vs. Traditional Accident Models
Detect the drift before accidents occur
System hazardsSystem safety constraints and requirementsControl structure in place to enforce constraints
Control flaws (e.g., missing feedback loops)
Inadequate control actions and decisions
á�â�ã3â�ä3å�æhçaèÐÏ�Ñ
á�â�ã3â�ä3å�æhçaèÐÏ�è
c
c
c
c
STAMP−Based Hazard Analysis (STPA)
executable and analyzable
Assists in designing safety into system from the beginning
design, development, manufacturing, and operationsUsed to eliminate, reduce, and control hazards in system
Not just after−the−fact analysis
violated.
Can use a concrete model of control (SpecTRM−RL) that is
regulatory authoritiesIncludes software, operators, system accidents, management,
�������h�
�������h�
Provides information about how safety constraints could be
Risk Assessment
Safety Metrics and Performance Auditing
Hazard Analysis
Using STAMP to Prevent Accidents
á�â�ã3â�ä3å�æhçaèÐÏ�ê
á�â�ã3â�ä3å�æhçaèÐÏ�Ò
c
c
c
c
Operating
Aural AlertsDisplays
Pilot
Aural AlertsDisplays
Radar
Advisories
Radio
PilotAdvisories
Controller
Air TrafficFAA
Mode
6. Interference with ATC safety−related advisory
5. Interference with ground−based ATC system
4. Interference with other safety−related aircraft systems
3. Loss of control of aircraft
2. A controlled maneuver into the ground
(a pair of controlled aircraft violate minimum separationstandards)
1. A near mid−air collision (NMAC)
TCAS Hazards
�H�0���h�
�H�0���h�requirements and constraints on behavior
STPA − Step1: Identify hazards and translate into high−level
STPA − Step 2: Define basic control structure
Aircraft
Aircraft
Aircraft InformationOwn and Other
Aircraft InformationOwn and Other
TCAS
OperatingMode TCAS
OpsAirline
Mgmt.Ops
Airline
Flight DataProcessor
Mgmt.OpsATCLocal
Mgmt.
cc
c
ápâ�ã¯â�ä3å�æhç^èÐÏ�n
ápâ�ã¯â�ä3å�æhç^èÐÏ�Óc
�������h�
STPA − Step 3: Identify potential inadequate control actions that
could lead to hazardous process state
4. A correct control action is stopped too soon
provided too late (at the wrong time)3. A potentially correct or inadequate control action is
2. An incorrect or unsafe control action is provided.
1. A required control action is not provided
In general:
3. The pilot applies the RA but too late to avoid the NMAC
2. The pilot incorrectly executes the TCAS resolution advisory.
by TCAS (does not respond to the RA)
Pilot:1. The pilot does not follow the resolution advisory provided
4. The pilot stops the RA maneuver too soon.
�������h�
For the NMAC hazard:
1. The aircraft are on a near collision course and TCAS does not provide an RA
TCAS:
2. The aircraft are in close proximity and TCAS provides anRA that degrades vertical separation
3. The aircraft are on a near collision course and TCAS providesan RA too late to avoid an NMAC
4. TCAS removes an RA too soon.
á�â�ã3â�ä3å�æhçaèÐÏ�é
á�â�ã3â�ä3å�æhçaèÐÏ�Ïc
cc
c
Eliminate from design or control or mitigate in design or operations
STPA − Step 4: Determine how potentially hazardous control
Guided by set of generic control loop flaws
Where human or organization involved must evaluate:
Behavior−shaping mechanisms (influences)Context in which decisions made
Step 4a: Augment control structure with process models for eachcontrol component
Step 4b: For each of inadequate control actions, examine parts ofcontrol loop to see if could cause it.
actions could occur.
Can use a concrete model in SpecTRM−RL
�������h�
In general:
�������h�
Step 4c: Consider how designed controls could degrade over time
Assists with communication and completeness of analysis
Provides a continuous simulation and analysis environmentto evaluate impact of faults and effectiveness of mitigationfeatures.
á�â�ã3â�ä3å�æhçaèÐÏ�Ô
á�â�ã3â�ä3å�æhçaèÐÏ�Îc
cc
c
Classification
Current RA Level
Current RA Sense
Other Aircraft (1..30) Model
Reversal
Crossing
StatusStatus
RA Strength
Altitude Reporting
Sensivity Level
On
Fault DetectedSystem Start
1
Other Altitude
TCAS
Sensitivity Level
RangeOther BearingOther Bearing Valid Mode S Address
Other Altitude Valid
RA Sense
Own Aircraft Model
Increase Climb InhibitClimb Inhibit
Descent Inhibit Increase Descent Inhibit
Altitude Layer
INPUTS FROM OWN AIRCRAFT
Non−CrossingInt−CrossingOwn−CrossUnknown
DescendClimbNone
NoneUnknown
UnknownLostNoYesOn ground
AirborneUnknown
Proximate Traffic
ThreatUnknown
Other Traffic
Potential Threat
INPUTS FROM OTHER AIRCRAFT
2
Unknown
34567
Not InhibitedInhibited
Unknown
Layer 1
Layer 2Layer 3
Layer 4
Unknown
VSL 0VSL 500 VSL 1000VSL 2000
Increase 2500Nominal 1500
Unknown
Not Selected
ReversedNot Reversed
Equippage
DisturbancesDisplays
Controls
Processinputs
ControlledProcess
Actuators
Sensors
Model of
outputsProcess
Controlledvariables
Measuredvariables
Model ofProcess
Model ofAutomation
Config Climb InhibitAircraft Altitude Limit Õ�Ö,×>Ø2Ù
Prox Traffic DisplayAltitude RateAir StatusBarometric Altimeter Status
Radio Altitude StatusBarometric Altitude
Radio Altitude
Model ofProcess Interfaces
Automated ControllerHuman Supervisor(Controller)
Õ�Ö,×>Ø2Ù
Traffic Display Permitted
Own MOde S addressAltitude Climb InhibitIncrease Climb Inhibit Discrete
None
Not Inhibited
ClimbDescend
NoneVSL 0VSL 500VSL 1000VSL 2000Unknown
Inhibited
Unknown
Unknown
UnknownNot Inhibited
Inhibited
Inhibited
UnknownNot Inhibited
cc Ú�Û�Ü�Û�Ý�Þ�ß2à¸áÐâUá
Ú�Û�Ü�Û�Ý�Þ�ß2à¸áÐâ�ãcc
inadequate control actions
Design of control algorithm (process) does not enforce constraints
Process models inconsistent, incomplete, or incorrect (lack of linkup)
Communication flaw
Flaw(s) in creation or updating process Inadequate or missing feedback
Not provided in system design
Inadequate sensor operation (incorrect or no information provided)
Time lags and measurement inaccuracies not accounted for
Inadequate coordination among controllers and decision−makers(boundary and overlap areas)
STPA − Step 4b: Examine control loop for potential to cause
Inadequate Execution of Control ActionCommunication flaw
Time lagInadequate "actuator" operation
Inadequate Control Actions (enforcement of constraints)
Õ>Ö,×�Ø2Ù
Õ>Ö�×>Ø2Ù
e.g. operational procedures
Use information to design protection against changes:
over time.
E.g., specified procedures ==> effective procedures
controls over changes and maintenance activities
Use system dynamics models?
STPA − Step4c: Consider how designed controls could degrade
management feedback channels to detect unsafe changes
auditing procedures and performance metrics
Ú�Û�Ü�Û�ÝIÞ�ß2à¸áÐâ�ä
Ú�Û�Ü�Û�Ý�Þ�ß2à`áÐâ�åc
cc
c
STPA results more comprehensive
Top−down (vs. bottom−up like FMECA)
Includes HAZOP model but more general
caused by deviations in system variablesHAZOP guidewords based on model of accidents being
General model of inadequate control
Compared with TCAS II Fault Tree (MITRE)
Not physical structure (HAZOP) but control (functional) structure
Concrete model (not just in head)
Handles dysfunctional interactions, software, management, etc.
Guidance in doing analysis (vs. FTA)
Considers more than just component failures and failure events
Comparisons with Traditional HA Techniques
Õ�Ö,×>Ø2ÙÚ�Û�Ü�Û�Ý�Þ�ß2à¸áÐâ�æcc