completeness, robustness, and safety

Completeness, Robustness, and Safety Completeness, Robustness, and Safety

Dr. M. S. JaffeEmbry-Riddle Aeronautical University

in the Requirements Engineering Process for Safety-Critical Software Systems

4/18/09 © Copyright 2009, M.S. Jaffe. All rights reserved. MSJ - 2

Objectives

Understand the types of information required for a complete software requirements specification Understand the dependencies in their development and their relationships to other

software engineering activities

Understand the limitations (or better, the consequences) as well as the full scope of possible uses of abstraction and step-wise refinement in the requirements process

Understand a standard set of principles by which additional requirements are derivable from an initial set of requirements

Understand the relationship between completeness of software requirements and robustness and safety in the specified behavior

Understand some of the hazard analyses that should be performed on the software requirements for safety-critical systems


Current life cycle models and consensus documentation standards such as ANSI/IEEE std 830 are not intended to be guides to actually doing requirements engineering

And OOA techniques such as UML tend to focus more on requirements elicitation and an easy to read portrayal of high level information rather than the detailed elaboration of behavioral characteristics where safety issues often reside

The Background and Motivation

Requirements Requirements analysisanalysis

Code(Implementation)

Design

Maintenance

Test


Requirements: The Most Critical and Least Well Understood Phase in Software Engineering

Software errors found in field operations can be up to several hundred times more expensive to fix than if they were found in the requirements phase

Requirements errors are responsible for a disproportionate share of fielded problems

Published results range from over 30% up to over 60%

For safety critical systems, requirements errors can be a lot more distressing than merely $$$


Issues With Requirements Engineering (Particularly Important for Safety-Critical Systems)

No agreement as to:

What really is “a” requirement

What information is really required to specify a requirement

How many different levels of detail are there in theory

How many different levels of detail pertain to a given requirement

Where does requirements specification leave off and design begin

What are the dependencies among the derivation and specification of the different types and levels of requirements information

No rigorous definition of a stopping point – when are the requirements complete?


The Requirements Engineering Process for Safety-Critical Software (Overview)

addi

tiona

l der

ived

out

puts

2. Detailed behavioral characteristics2. Detailed behavioral characteristics (expressed in terms of inputs)(expressed in terms of inputs)

1. Initial outputs and constraints1. Initial outputs and constraints

3. Standard robustness3. Standard robustness

4. Completeness and consistency4. Completeness and consistency

5. Output hazard analyses5. Output hazard analyses

initial architectural

design

initial logical design


The Relationship Between Requirements and Design

An initial logical design* can be done well before most of the requirements details are developed

An initial architectural design can usually be developed based on the delineation of the set of required inputs and timing details well before the completion of all of the ultimately required requirements analysis

5. Output hazard analysis5. Output hazard analysis

4. Completeness and consisency4. Completeness and consisency

3. Standard Robustness3. Standard Robustness

2. Detailed behavioral characteristics2. Detailed behavioral characteristics

1. Initial Outputs and constraints1. Initial Outputs and constraintsad

ditio

nal d

eriv

ed o

utpu

ts

initial architectural design

initial logical design*

* The phrase “logical design” is not standard (few software engineering terms are) it's used here to mean the partitioning into loosely-coupled sets of outputs


Outputs: The Starting Point for Requirements Engineering

Paraphrasing David Parnas, the only purpose (i.e., function) of software is to produce its outputs correctly

Since good engineering starts from consideration of intended purpose (form follows function), the characteristics of black box outputs should be the starting point


Some Key Questions About Outputs

The key questions to be answered include: How much information is necessary to completely describe the requirement(s) for an output?

Where is stepwise refinement in levels of abstraction useful in the requirements process and why isn't it always uniformly applicable?

Where must it end up and why?

Why is there variation in where it is possible?

What are the risks of abstraction in designing safety critical systems?

Given a set of known outputs, what principles allow us to adduce the need for additional outputs?

Where and how do inputs and algorithms fit into the picture?

Is there other, legitimately black box information that should be derived and specified in a requirements specification (e.g., input capacity)? If so, what does that say about our notions of completeness?


Roadmap

1. Initial outputs and constraints1. Initial outputs and constraints

3. Standard robustness

4. Completeness and consistency

5. Output hazard analyses

2. Detailed behavioral characteristics


Initial Outputs, Boundaries, Safety Requirements, and Constraints

1. Outputs, boundaries, and constraints1.1 Initial outputs

1.1.2 Some initial derived outputs

2 22 4.3.2

preliminary hazard analysis

1.2 Black-box boundary identification

seman

tic

HMI d

esign

• • •use-cases narrative

specifications

existing interface documentation

1.3 Constraintsi.e., “thou shalt not ...”1.1.1 Principal

outputs


Principal Outputs

Principal outputs represent some original perception of the “purpose” of the software – e.g., the autopilot software shall generate outputs that control the flaperons

There are a variety of ways that principal outputs are identified, collected, and/or synthesized, not all of which necessarily pertain on any single project

Allocation from “higher level” systems engineering documents, such as System specifications Existing interface specifications HMI documentation

Various requirements elicitation techniques such as use-case analysis Observation or extraction from the behavior or documentation of predecessor systems,

particularly prototypes Et cetera


Principal Outputs (cont'd)

Since “perception of purpose” is inherently subjective and a function of both context and point-of-view (example to follow), it seems unlikely that there can be any rigorous method of identifying the principal outputs nor any rigorous definition of completeness of just the principal outputs in and of themselves, inisolation from other requirements derived later in the software engineering process

Principal requirements might thus be best considered as the “axioms” of a given project's requirements engineering – the starting point for further derivation and analysis, one type of which is completeness


An Example of the Context-Dependency of Completeness of Principal Outputs

Consider two different autopilot programs: One required to provide control for just elevator and ailerons

The other required to control three surfaces: elevator, ailerons, and rudder

Further suppose, for somewhat artificial example, that the requirements for control of just the elevator and ailerons were exactly the same in both cases (2-axis and 3-axis control)

Then a potentially complete set of requirements in the elevator-and-aileron-only system would be clearly incomplete if the purpose of the system were to control all three surfaces

Hence the completeness of a set of principal outputs can't be an attribute of the characteristics of the outputs themselves, but emerges only in the context of the system's purpose


The Black-Box Boundary

Precise delineation of the exact boundary of the software whose requirements are being specified is an important early step

Many derived requirements owe their existence to the location of the black box boundary – e.g., assuming that some given I/O interface requires a series of initialization messages, does this software have to initialize it or is that handled by some other software (e.g., the operating system)?

Correct specification of many timing requirements depends on where the observation point is – e.g., exactly where's the black box boundary to which we have 100 milliseconds to deliver our output? If we have to include 15 ms of OS processing in there, maybe our application only has 85 milliseconds, not 100. Or is our requirement only to deliver the output to the OS within 100ms?

Blackboxboundary


Initial Derived Outputs

A required output that is not a 'principal” output is often said to be a “derived” output (the vocabulary is not standard) For example, the main thing we really want is for the software to display radar plot data

to some human operators, but first the software has to negotiate addressing and security protocols with some external router or it will never receive any plot data to display

The distinction between “principal” and “derived” can be pretty subjective: to the accounting office, the logging and billing of the CPU usage data may be the most important output a system produces

In the end, the fuzzy boundary really doesn't matter: a required output is still a requirement to be documented, implemented, and verified, regardless of whether or not it was the first thing we thought of – the key question here is, if we don't think of it initially, are there engineering principles that will help ensure we derive it later?

Initial Initial derived derived outputsoutputs


Initial Derived Outputs (cont'd)

Common sources of initial derived output requirements include: I/O protocol messages (e.g., ack/nack) Data logging requirements Redundancy and backup/restart preparation messages Reconfiguration control messages Interface initialization and data request messages Et cetera

Many of these initial requirements, too, are system-context dependent, meaning that identifying them is more a case of engineering judgment than rigorous analysis, although, as the list above suggests, some items are so common across a wide range of real-time applications as to suggest that a standard “checklist” of initial derived requirement types could be helpful

Initial Initial derived derived outputsoutputs


Reviewing Principal Requirements

As a procedural matter, a specific review of just the principal outputs by themselves for “subjective completeness” seems a good idea – have we really identified everything we want, as opposed to what we may ultimately need (in the way of other outputs) to get the ones we really want under all the circumstances they're wanted

“What! You wanted the software to control the coffeemaker, too, not just the ailerons? Why didn't you say so six months ago?”


Reviewing Derived Requirements

Review of the derived requirements (initial or otherwise) can be accomplished later, in the “standard” technical/management reviews for software requirements specifications In practice, in much of the time spent in formal requirements reviews,

the emphasis actually shifts subtly from “do we really know what we're trying to do?” to “do we really know how to do it?”

But much of that “how” is still in terms of black box behavioral sequences (e.g., use-cases) rather than architecture or design, so it's still requirements engineering; but it's requirements engineering for detailed/derived requirements rather than principal ones which can, and generally should, be reviewed and baselined much earlier


Preliminary Hazard Analysis (PHA) as a Source of Initial Requirements

For safety critical systems, the PHA can be a source of requirements E.g., the software must command an audible warning alert 2

seconds before any movement of the robot control arm

Note again that the distinction between principal and derived requirement cannot be seen in either the syntactic or semantic nature of this requirement

In the initial development of a robot welding system, for example, the initial engineering emphasis might well be on the logic behind the commands to control the arm and the torch, and safety requirements are only derived later

Suppose, however, that the initial control logic is hardwired analogue and after the first accident brings in OSHA, a new computer controlled safety monitor is installed – the exact same requirement may be the sole purpose of the entire computer system, its one and only principal requirement


Preliminary Hazard Analysis (PHA) as a Source of Constraints

The PHA is also the source for many (most? all?) of the thou-shalt-not-never-nohow-noway constraints; E.g., the system must never generate a reactor command with a

power level setting of greater than 110% Often/usually called constraints rather than requirements – again, the distinction is

not universally agreed upon – since they usually must be handled differently downstream in the engineering process Adequate verification of a “thou shalt never” statement cannot usually be achieved via

testing – other than by exhaustive testing, which is generally not practical in the real world

They are therefore usually verified analytically somehow, later in the software engineering process – section 4 will address one possible form of analysis in more detail


Roadmap



1. Initial outputs and constraints

2. Detailed behavioral characteristics2. Detailed behavioral characteristics (expressed in terms of inputs)(expressed in terms of inputs)



Detailed Behavioral Characteristics? Detailed Requirements? Design? Other?

The question of whether to consider the documentation of the detailed behavioral characteristics of the software to be part of the requirements phase or part of the design phase is apparently religious in nature

Some authors are explicit in stating that such details are design, not requirements; others say exactly the opposite

Perhaps the real answer is, what difference does it make? The information will have to be derived, documented, and analyzed for hazards eventually; what we call that derivation/documentation activity is a lot less important than making sure we know why and how to do it

Since design is usually considered “whitebox” or “glassbox” information, to me it seems better to differentiate it from behavior visible outside the blackbox, which should then be considered requirements


Detailed Behavioral Characteristics(Expressed Ultimately in Terms of Inputs)

2. Output characteristics (and their referenced inputs and then the characteristics of those inputs, and so eventually, more outputs)

2.2 Output timing

2.2.2 Proximate triggers

2.3 Preconditions (a.k.a. States)

2.1 Output fields

2.1.2 Reference definition, a.k.a. initial algorithm definition

2.1.1 Delineation and classification 2.2.1 Basic abstraction(s)

1.1 Initial outputsadditional derived outputs 1.2 Black box boundary

33, 4, & 5

timing and modularization/backup/redundancy strategies are a major input to an architectural design process

coupling and cohesion analysis leading to or confirming/revising any initial modularization (top level logical design)


Output Fields: Delineation/Identification

A composite output such as aircraft position must eventually be decomposed into constituent fields (e.g., range and bearing)

Stepwise refinement is common – simply identifying the fields is a prerequisite to a great deal of requirements analysis or design that can proceed without knowing all the details that will ultimately include such things as: Precise field locations (bit positions within larger aggregates sharing common

timing characteristics) Representation conventions (e.g., big-endian 2's complement) Interpretation conventions (e.g. miles or kilometers)

Failure to take these messy little details seriously, however, regardless of whether they're considered to be requirements or design, has been the cause of numerous accidents – e.g., the Mars Climate Orbiter


Output Fields: Classification

Fields must be classified as approximate or exact* The requirement for an exact field specifies exactly what bit pattern must be present

for the output to be acceptable – e.g., the field must contain the ASCII bit pattern for the string “Hi there”; “Hi therd” or “Hi therf” would not be an acceptable output; the specification may be conditional, but under a given set of conditions, only one particular bit pattern will meet spec

An approximate field permits some indeterminacy which then forces the specification of two pieces of information: an accuracy and a reference

Aircraft range output shall be accurate to within ± ½ mile of ???

That “??? ” reference has historically been the source of some confusion in the requirements engineering process

* There are some other theoretic possibilities that are rarely (if ever) applicable to real time specifications


Accuracy References and Algorithms

It is often advantageous to develop the accuracy reference through a series of stepwise refinements, proceeding from references visible outside the blackbox boundary to references visible to the software at the boundary

The output aircraft range shall be sufficiently accurate to permit intercept guidance to acquire the target 98% of the time

The output aircraft range shall be accurate to within ±½ mile of the actual range of the actual aircraft at the time the range is output

The output aircraft range shall be accurate to within ±¼ mile of the reference value computed by the following reference algorithm:

… [20 pages of mathematics]


Sidebar on Stepwise Refinement

There is as yet no easy mapping of levels of abstraction to “stages” of systems or requirements engineering or “standard” engineering specification levels (if such really existed, which they don't) The survival likelihood over a 2 hour mission shall exceed 98%

The individual target Pk shall exceed 99%

The single shot Pk shall exceed 95%

The output aircraft range shall be sufficiently accurate to permit intercept guidance to acquire the target 98% of the time


Output aircraft range shall be accurate to within ±¼ mile of the reference value computed by the following reference algorithm: [20 pages of math]


Accuracy References and Algorithms (cont'd)

In the past, that last stage of specification of such a requirement was often written:

The software shall compute aircraft position using the following algorithm …

There are at least two problems with that language: That's not a testable requirement at the black box level; you can't see what algorithm has

actually been implemented without looking inside the box

It has also lead in the past to occasional arguments between systems engineering and software engineering, who wanted to use “an equivalent” algorithm – e.g., a table lookup for sin(x) rather than a Taylor series – or between software engineering and perhaps overly literal minded QA types who wanted to see the implemented algorithm exactly matching the specified requirement, e.g., “the spec says 'compute using X=Y+Z' but you coded X=Z+Y”



By noting that the algorithm is not actually a requirement but the definition of a reference against which the observable behavior will be verified, we can have our cake and eat it too: Analysis, derivation, and specification of reference algorithms is still appropriately considered a

requirements engineering activity (can't write the requirements spec without a reference for an accuracy requirement)

Downstream design may choose to implement alternative algorithms but the notion of equivalence is now well defined – equivalent to the reference algorithm within the specified accuracy, over the range of valid inputs

The reference algorithms themselves generally make reference to inputs which are then a source of additional derived requirements, e.g.: … accurate to within ¼ mile of the average of the last 3 inputs, but

only if they are valid inputs, where valid is defined as … [And if we get an invalid input, then what? Need a new requirement here!]



One cause of misunderstanding here is that not all outputs permit meaningful specification at every level of abstraction

There may not be any externally observable reference; look at the difference between:


versus

The recommended course to intercept shall be accurate to within ±3° of ???



The recommended course to intercept shall be accurate to within ± 3° of ???

There's no observable phenomenon to use as a (more abstract) reference for that latter requirement

It may takes years of analysis to pick an appropriate reference algorithm; but technically, it's a definition, not in and of itself a requirement and not per se design – although programmers are usually unlikely to want to duplicate the years of labor to come up with their own (demonstrably equivalent) algorithm


Exact Fields and Algorithms

Even the specification of acceptable values for an exact field may require an algorithm

It may be so simple that to consider it an algorithm seems silly – e.g., output the name of the user

But a slightly more complex example starts to look more algorithmic: e.g., if there are more than ten students select the three with the highest GPAs and display the last names in alphabetical order; if there are ten or fewer students, select only the top two

Regardless of how “algorithm-like” the language used in such requirements, it's still a definition of an acceptable output and hence part of a requirement, not a design constraint – e.g., the designer may choose to sort the entire set of students or merely insert all entries into a big-end-up heap based on GPA


Summary: Algorithm-As-Requirement Versus Algorithm-As-Design

It is important to distinguish between the two uses, particularly in what have sometimes been referred to as “implicit” requirements/design methodologies (such as SSA) where requirements and design are intermixed and a single document explicitly limited to the specification of only blackbox requirements is not normally produced

Algorithm-as-requirement is a definition, and, for an approximate field, an acceptable behavioral equivalence factor (i.e., accuracy) must also be specified

Algorithm-as-design is an instruction to a coder: code it this way

In textual specifications, pick an unambiguous set of phrases and enforce their use; e.g. “shall compute” for design, “is defined as”, for requirements – and don't use “shall compute” in a document clearly labeled a requirements specification


additional derived outputs

Roadmap: Output Timing

2. Output characteristics (and their referenced inputs and then the characteristics of those inputs, and so eventually, more outputs)

2.2 Output timing 2.2 Output timing

2.2.2 Proximate triggers2.2.2 Proximate triggers


2.1 Output fields


2.1.1 Delineation and classification 2.2.1 Basic abstraction(s)2.2.1 Basic abstraction(s)

1.1 Initial outputs 1.2 Black box boundary

33, 4, & 5

timing and modularization considerations (along with backup strategies) lead to architectural design

coupling and cohesion analysis leading to or confirming/revising any initial modularization (top level logical design)


Specification of Initial Timing Abstractions for Outputs

Real time systems tend to use only two basic timing abstractions: Stimulus-response (with or without graceful degradation) Periodic

Both of these are abstractions, useful and appropriate as initial requirements statements, but …

Abstraction is nonetheless a two edged sword: to abstract is to “omit unnecessary detail” but one engineer's unnecessary detail can wind up being another's accident

We'll look later at how to determine just how much information is necessary to refine the initial abstractions and either: Achieve a complete, unambiguous specification of timing behavior; or

Identify and document the omitted details and provide the rationale for considering them unnecessary


Proximate Triggers and State Preconditions

Outputs are required when certain events are observed at the black box boundary when the system is in a given state, e.g., Upon receipt of a “one ping only” command from

operator A, the system shall generate a “sonar pulse control” message, provided that operator B has previously enabled active sonar transmissions

States are histories of prior events – technically, a state is an equivalence class of possible histories

The proximate trigger is the final event (or non event, about which more later) that requires an output response and serves as a lower bound for the valid observable time for the output


Stepwise Refinement of Proximate Triggers and State Preconditions

Both triggers and preconditions may initially be stated with reference to events outside the black box boundary

Sound the MSAW alarm when the aircraft descends below 500' AGL [Parnas, SCR]

But eventually, they too must be expressed with reference only to I/O events visible to the software at its black-box boundary

Sound the MSAW alarm when a radar altimeter message is received with altitude < 500'


Summary of Detailed Behavioral Characteristics

2. Output characteristics (and their referenced inputs) (and then their characteristics, and eventually, more outputs)

2.2 Output timing

2.2.2 Proximate triggers


2.1 Output fields


2.1.1 Delineation and classification 2.2.1 Basic abstraction(s)

1.1 Initial outputsadditional derived outputs 1.2 Black box boundary

There's a lot of information necessary to completely characterize the observable behavior of an output

It doesn't all have to be developed at the same time or via the same levels of abstraction (stepwise refinement)

But failing to pay enough formal attention to such details can be hazardous to a project's health


Roadmap

addi

tiona

l der

ived

outp

uts

3. Standard robustness3. Standard robustness






Robustness and Hazards of Omission

There is a close relationship between completeness, robustness, and safety in real-time software requirements specifications – the definitions are intertwined

In particular, many hazards in safety critical systems come from incomplete software requirements specifications omitting “robustness” requirements to detect things going wrong: Failure to diagnose and respond to “principal” hazards in the environment1, 2

Failure to diagnose and respond to possible malfunctions in the environment or the controlling system

Failure of the software system to diagnose and respond to possible inconsistencies between it and its environment

1 The phrase “principal hazard” is not in widespread use, but I can't find anything better 2 Identification of these hazards is not a software requirements engineering activity


Sidebar: Why “Principal” Hazards Aren't a S/W Requirements Engineering Issue, but Omitted Requirements Are

"Principal" hazards: e.g., the reactor coolant temperature exceeding some threshold

It's not usually the job of the ordinary software requirements engineer to know that that's hazardous or what that threshold is – that's a domain safety expert's job

What we're looking for are software engineering principles for analyzing a set of software requirements for potentially hazardous omissions

Fixing the problem (or deciding that it is safe to ignore the possibility) will require knowledge of the safety characteristics of the domain

But knowing a standard set of potential problems that the software could detect should be a software engineering responsibility – who else will do it?


3. Standard robustness3.1 Input validity definition

Developing Robustness – Anticipating Unexpected, Unwanted, or Even Downright Impossible Events

3.3 Semi-final triggers and state preconditions

3.1.1 Input fields

3.1.1.1 Delineation & classification

3.1.1.2 Validity definition

3.1.2 Assumptions about the environment's behavior

3.2 Responses to invalid inputs

3.1.2.2 Input timing

3.1.2.1 State predictability

2

4

2

addi

tiona

l der

ived

out

puts


Robustness in the Requirements

How well will the software system respond to undesired events originating outside the software's black box boundary:

Failure of the environment to obey our specified understanding of its own rules

Inconsistency between the actual state of the environment and the executing software's internal model of it


Value Robustness:Trivial, But Its Been Overlooked Before

Always check for input values within specified ranges

Or provide a signed, written explanation of why not!

And then do it anyway!

Generate new software requirements to respond to input values “out-of-range”


Example (Caught in Testing): Porting of US Air Traffic Control Software

Logic developed initially for the US

Input data format included East or West longitude designation

Software logic did not check the designation, since all US airspace is West longitude

When the software was ported to England, tracks east of the Greenwich meridian displayed were displayed to the west of it


Actual Accident: Ariane 5

The > $109 loss of the first Ariane 5 was a direct consequence of the re-use of software whose requirements were not “value-robust”

The Ariane 4 hardware precluded the software from receiving an excessively large value

Since the value was “impossible”, the software requirements specification did not require an “out of range check”

In the Ariane 5 environment, this “impossible” event occurred quite readily


Hard to Do in Practice: Out-of Range Responses Are Often Not Trivial

I know it's impossible to get x>45, but what if we do? What could that possibly mean? How could it happen? You're absolutely certain it can't? Willing to sign your name to it? Even if the software is re-used in a different environment? Is there anything that could be safely done?

Total or partial shutdown? Alert operator but ignore the erroneous value? Just log it and then ignore it? Then use the prior value? … ?

The fact that this analysis is hard to do does not obviate its necessity; for safety critical systems, someone needs to look at such cases and either figure out a new requirement to deal with them or provide a documented analysis of why it is permissible for the software to ignore them – tacitly ignoring such cases has lead to accidents


Specification of “Data Out-of Range” Response Requirements (cont'd)

Thinking about the impossible and possible responses to it is a good Thinking about the impossible and possible responses to it is a good idea for software safety engineersidea for software safety engineers

At the very least, omission of such requirements (which would, after all, At the very least, omission of such requirements (which would, after all, consume CPU time and memory when implemented) should be consume CPU time and memory when implemented) should be documented, along with the rationaledocumented, along with the rationale

Either way, there is documentation available for review by other, knowledgeable engineers, domain experts, and safety analysts


Roadmap1.Initial outputs, boundaries, and constraints2.Output characteristics and their referenced inputs3.Standard robustness

3.1 Input validity definition 3.1.1 Input fields

3.1.2 Assumptions about the environment's behavior3.1.2 Assumptions about the environment's behavior 3.1.2.1 State Predictability3.1.2.1 State Predictability

ReversibilityReversibility

More complex external state predictabilityMore complex external state predictability

ResponsivenessResponsiveness

SpontaneitySpontaneity 3.1.2.2 Time dependent states 3.2 Responses to invalid inputs 3.3 Semi-final triggers and state preconditions4.Completeness and consistency5.Output hazard analyses


Environmental State Predictability: Example

Typical start of a too casually specified requirement:

If an input with valve_position = 'open' is received, then do X … .

But suppose we know (or should know) that the environment is supposed to only report when its key values change?

Should then have at least three requirements:

If an input is received with valve_position = 'open' and the prior input said valve_position = 'closed', then do X … .

But if an input is received with valve_position = 'open' and the immediately prior input did not say valve_position = 'closed', then do [something else]

What's the third one?


Environmental State Predictability (cont'd)

Prior example was simple reversibility; there are many other “standard” environmental possibilities. E.g.,

Environment is supposed to report first A then B then C before it should report A again

Once the environment first reports an increased value, there should be at least three successive increasing values before it is possible for there to be a decreasing value

Once the environment reports A , at least 10 but no more than 20 seconds should elapse before it reports B

⋮Understand the expected behavior of the environment and specify it in Understand the expected behavior of the environment and specify it in pre-conditions for the appropriate outputspre-conditions for the appropriate outputs


Untransformed Inputs

An output requirement may not “use” the extra information in describing the output characteristics (i.e., the program won't compute anything involving transformation of such data), so all we seem to need to say in the requirement is:

If an input with valve_position = 'open' is received, then do X … .

but: To help make a robust requirements specification, incorporate all possible To help make a robust requirements specification, incorporate all possible

checks on predictable and detectable behavior in the external environment checks on predictable and detectable behavior in the external environment and generate new requirements as necessaryand generate new requirements as necessary

If important environmental state conditions and their changes are not If important environmental state conditions and their changes are not detectable by the software, consider what system design changes might detectable by the software, consider what system design changes might be requested to make them visiblebe requested to make them visible


An Incident Involving Omitted Environmental State Conditions

Context: Aircraft Stores Management System (SMS)

As specified, the SMS was required to release ordinance without checking for aircraft G-loading

During a test flight, the pilot commanded ordinance release while the aircraft was inverted (negative G)

The “correctly” released ordinance then bounced onto the underside of the wings


Environmental State Predictability: Responsiveness

Classify each required system outputs as to closed loop or open loop interaction – i.e., is the environment expected to respond to it or not?

For each requirement for a response-expected output, there For each requirement for a response-expected output, there must be at least two additional s/w requirements: must be at least two additional s/w requirements:

The environment responds correctly, including within the proper The environment responds correctly, including within the proper (specified) time(specified) time

It doesn'tIt doesn't


An Accident Involving a Failure to Check for Responsiveness

Context: A steel smelting furnace controller:

Upon cold start, computer ordered 100% power to gas tubes

Burned out power supply to a thermometer resulted in a constant (low) reported temperature

Control logic did not require a check for rising temperature within some specified period of time after power applied

Result: Burnt out furnace


Environmental State Predictability: Spontaneity

Some inputs to our software can come at more or less arbitrary times (as seen by our software, but perhaps subject to the environment's own state transition rules)

• But some inputs should come only in response to prior outputs

Classify all inputs as to spontaneous or responsiveClassify all inputs as to spontaneous or responsive

The spontaneous arrival of what is supposed to be a non spontaneous The spontaneous arrival of what is supposed to be a non spontaneous input should be the trigger for a separate (error-handling) requirementinput should be the trigger for a separate (error-handling) requirement


Roadmap

1. Initial outputs, boundaries, and constraints

2. Output characteristics and their referenced inputs


3.1 Input validity definition

3.1.1 Input fields


3.1.2.1 State Predictability

3.1.2.2 Time dependent states3.1.2.2 Time dependent states Silence Emissivity Absorbtivity




Environmental State Predictability: Silence

It's generally not a good idea to let the environment have the “right” to remain silent indefinitely (if something breaks, how will you know?)

What is the maximum time between arrivals for each type of input before our system should do something? And it would be nice if the environment were responsive here so that we

could “jiggle” it if it were silent too long May well need (at least) two requirements here:

“Jiggle”

“Unjiggle”

And then of course various robust responses to the environment failing to jiggle or unjiggle properly


Environmental State Predictability: Variability

And even if the environment is not silent, but is reporting “constant” state, it would be nice to jiggle it eventually to see if that constant value is “real” and the environment is still capable of responding

Of course before committing to requiring jiggle commands (for whatever reason), jiggles will need to have their own hazard analysis

Routine jiggling of nuclear reactor control rods may be more hazardous than the risks of silence or stuck rods


Environment Characterization: Emissivity

For software running on interrupt capable hardware, there is always a finite capacity to deal with interrupts

The environment's ability to emit such signals may exceed the software system's capacity to respond to them “normally”

Even if we require our design to have the “right” capacity:

The environment might change some day

Malfunctions in the environment could cause it to exceed our assumptions

Software might get re-used in a different environment


An Accident Involving Emissivity

Context: Flight Control System on a fly-by-wire aircraft

Mechanical malfunction (survivable) caused an accelerated rate of input to the flight controller software

Controller not programmed to detect excessive rates

Controller malfunctioned (not survivable) resulting in loss of flight vehicle


Environment Characterization: Emissivity (cont'd)

Robust specifications should require software to explicitly recognize Robust specifications should require software to explicitly recognize and and respond somehowrespond somehow to approaching capacity limits to approaching capacity limits

Typical response options for “impending overload” requirements include: “Slow down” requests to the external environment

Graceful degradation

Function shedding

Inhibition (masking) of the interrupt


Environment Characterization: Emissivity (cont'd)

Can (should?) have a hierarchy of such multiple capacity recognition requirements, each leading to progressively more severe response: If the count of inputs of type X received within the last

10 seconds is > y but z, output a “reduce input rate” command

If the count of inputs of type X received within the last 10 seconds is > z but w, relax the accuracy requirement for output A [so that it can be computed using an alpha-beta filter instead of the “normal” Kalman filter]

If the count of inputs of type X received within the last 10 seconds is > w but q, cease all outputs of type O

If the count of inputs of type X received within the last 10 seconds is > q, mask input interrupts for X.


Environment Characterization:Capacity Overload Responses

“Slow down” requests to the external environment, assuming the environment is “throttle-able”

Graceful degradation: Value: reducing the accuracy required of certain outputs

Allows use of a higher speed algorithm

There are other, exotic, uncommon alternatives as well

Time: allowing a specified increase in response times Note: Not even necessarily for the “normal” output (may still require the

normal output in normal response time but permit some other, less critical outputs to be delayed)

Specification of gracefully degrading responses times has some interesting potential pitfalls of its own, to be discussed later


Environment Characterization:Capacity Overload Responses (cont'd)

Function shedding – “Dropping” requirements for certain outputs completely until some input rate moderates

The state pre-conditions for “droppable” requirements include lower capacity bounds

Inhibition (masking) of the interrupt – assuming the hardware allows masking of the interrupt

Presumably the masking must be reversible – so there must be “new” requirements to “re-enable' the interrupt

Same safety issues as system startup (to be covered later)


Environment Characterization:Capacity Overload Responses (cont'd)

Hazard analysis is obviously required for all of these possible “new” requirements Interrupt masking and function shedding have obvious possible

hazards, as does even simple graceful degradation

Graceful degradation, function shedding, and interrupt masking induce internal (and possibly even external!) state changes and hence presumably require new state transition requirements When is the software allowed (required) to resume “normal” behavior?

(And how does it “know” this?)

Is the environment “OK” with the resumption?

And if the overload recurs too quickly?


Environment Characterization: Absorbtivity

Often a system is effectively a transducer – a requirement appears to be to simply forward some input on to the environment, with or without some degree of transformation

The system may have the capacity to correctly process N of these inputs per unit time, but that does not allow the assumption that the environment can absorb N of the required outputs in that same time period

Categorize the environment's ability to absorb our software's outputs, Categorize the environment's ability to absorb our software's outputs, then, if necessary, develop new requirements to deal with an input rate then, if necessary, develop new requirements to deal with an input rate exceeding the environment's absorbtivityexceeding the environment's absorbtivity


Summary of Standard Robustness



3.1.1 Input fields





3.1.2.2 Input timing3.1.2.1 State predictability

2

4

2ad

ditio

nal d

eriv

ed o

utpu

ts

• There is a “standard” set of possible environmental anomalies or inconsistencies definable with reference to each software input identified/documented as part of the activities discussed in 2.1.2, 2.2.2, and 2.3, if not earlier

• Dealing safely with the standard set is necessary, not necessarily sufficient, for robust software

• There is obviously no limit on the complexity of “undesired events”, but the more complex they are, the more system specific they are likely to be

• The standard set defined here seems of broad applicability for embedded systems


Summary of Standard Robustness (cont'd)



3.1.1 Input fields





3.1.2.2 Input timing3.1.2.1 State predictability

2

4

2ad

ditio

nal d

eriv

ed o

utpu

ts

Determining safe responses to the standard anomalies may require domain safety expertise and extensive analysis; but highlighting the necessary cases is within the scope of good software engineering


Roadmap

addi

tiona

l der

ived

outp

uts

4. Completeness and consistency4. Completeness and consistency






Completeness and Consistency

4. Completeness & consistency

4.1 Individual requirements completeness 4.3 Consistency*4.3.1 Determinacy:

Consistency among output requirements

4.3.2 Safety (part I) : Consistency between requirements and safety constraints

1.33

5

2

addi

tiona

l der

ived

out

puts

4.1.2 Response4.1.1 Stimulus4.1.1.1 Events, conditions,

and states4.1.2.1 Uniqueness

4.1.2.2 Timing

4.1.2.3 Value

4.1.2.4 Abstraction Refinement

4.2 Set completeness

4.1.1.2 Proximate triggers4.1.1.2.1 Positive

4.1.1.2.2 Negative

* not discussed here


More on Completeness, Robustness, and Safety

A safe set of requirements must be complete:

Omitted cases cannot be analyzed or tested for hazards

It may not be possible to show that “never ever ever” requirements are inconsistent with other requirements if some of the other requirements are missing

Several additional requirements for robust behavior are a natural fallout from a rigorous definition of completeness


Two Major Aspects to Completeness in Requirements Specifications

Individual requirement's completeness

What is the minimum set of essential information required for the precise description/specification of an “individual software [behavioral] requirement”?

The description must be sufficiently detailed to allow discrimination between the desired behavior and all possible similar but undesired behaviors

Set completeness

When does the existence of one or more requirements imply the need for additional requirements?


Some Formalism: The General Form of an Individual Behavioral Requirement

S R, where

S is the stimulus predicate – the set of statements about observable (testable) phenomena that, when true, require the program to produce an output satisfying the predicate R

R is the response predicate – the set of statements about the required observable characteristics of some one or more required outputs


Significance of the Bi-Directionality

It's not enough for S R, meaning that the program must make some output response R when it receives some input stimulus S

Programs always have a too often implicit requirement that should be explicit: Don't make any output unless it's required

So the existence of an output must imply that the specified pre-conditions were met, R S

Hence S R


Significance of the Bi-Directionality (cont'd)

And when we start thinking about what an output really means, we'll discover several other phenomena that we can always infer from the existence of an output

These inferences from R must in fact be considered as preconditions and hence specified in S, where downstream analysis (set completeness) will allow us to derive more (robustness) requirements from them


General Form of S, the Stimulus Predicate: Triggers and States (Part 1)

An output is required when something happens while the program is in a certain state

The proximate trigger is an event, an instant in time at which point a response becomes required

A state is a set of conditions that are true for a (possibly indefinite) interval of time A state is a history of prior events

Formally, a state is an equivalence class, a set of one or more possible histories (sequences) of events that are equivalent in terms of some subsequent behavioral possibilities

States are also known as prerequisites or preconditions


Proximate Triggers

A proximate trigger is the last observable event after which either a required output must appear or the system's behavior has violated its specification:

Positive triggers: Black box observable events, typically (but not always) inputs or prior outputs

Negative triggers: Passage of a specified period of time without a specified event (or set of events)

The proximate trigger is the greatest observable lower bound* on the appearance time of a required output

* Not quite the same thing as the actual greatest lower bound – e.g., there may be a required minimum delay before the required output appears


Positive Trigger Issues: Observability

Some software requirements techniques (e.g., Parnas' SCR) allow reference to external events not necessarily immediately and obviously translatable to events visible at the software black-box boundary

If the aircraft altitude drops below 500' AGL ...

Exactly how will the software detect such events?

Inputs ultimately used to detect that event may (generally will) provide other information that is useful for robustness and hence lead to additional output requirements

So specifying those inputs is either requirements engineering already or design feedback to the requirements engineering phase – a distinction without too much difference


Refinement of Observability Abstractions

Do a requirements specification with a Do a requirements specification with a ““tighttight”” definition of the black-box definition of the black-box boundary – i.e., with respect only to software inputs and outputsboundary – i.e., with respect only to software inputs and outputs

Can call it design or make it a Can call it design or make it a ““downstreamdownstream”” phase of requirements phase of requirements documentation, but it needs to be done explicitly documentation, but it needs to be done explicitly sometimesometime

As noted earlier, exploitation of the characteristics of the inputs referenced in a “tight” specification is central to any notion of robustness, so it seems appro-priate to consider this “tightening” a requirements analysis/specification activity


Positive Trigger Issues: Observability and Safety

The event must be observable at the black box boundary of the program being specified

Specification of an “impossible” trigger value as part of the S for a “safety-positive”* output means that that output cannot in fact ever be produced

Insure that all proximate triggers for safety-positive* outputs are physically Insure that all proximate triggers for safety-positive* outputs are physically realizablerealizable

Testing should typically pick up such problems, but better to do it sooner rather than later

* Discussed in section 5, output hazard analyses


Positive Trigger Issues: Capacity

Remember, R S

If we see an output proximately triggered by the occurrence of an event of type X, don't we know something about how many X's there were earlier?

Could the program really be producing correct outputs if it received 103 triggers in the last second?

Yes? OK, how about 109? 1020?


Capacity requirements don't have to be linked solely to raw interrupts; Could have specified capacities with regard to inputs of value >200 and another capacity for

same input but with value >500 and so on Such value qualifications allow capacity to be defined for polled inputs as well as interrupt-

based inputs

Formally, capacity is not a separate “type” of requirement; it is a limit used in the definition of a state pre-condition for one or more output requirements

Every interrupt-triggered output requirement must explicitly specify the Every interrupt-triggered output requirement must explicitly specify the limit on the prior count of its proximate trigger that is a precondition to limit on the prior count of its proximate trigger that is a precondition to the program's production of this particular type of outputthe program's production of this particular type of output

May also include count with respect to other events; but recent proximate trigger count within specified capacity limit is mandatory

Positive Trigger Issues: Capacity (cont'd)


Positive Trigger Issues: Capacities vs Loads

Some systems specify a maximum load in place of, or in addition to, various capacity limits

These terms don't seem to be standard

My definitions here: Actual input countI - the actual number of inputs of a single type, I, per some

unit of time

CapacityI - the specified maximum allowable value for a count; so part of a some

state condition(s) may be input countI ≤ capacityI

Actual load - the (possibly weighted) sum of a number of different counts (must all be normalized, i.e.,counted with respect to the same [specified] period of time)

Load limit - analogous to capacity, but a limit on actual load instead of an individual count; part of a state condition


Positive Trigger Issues: Capacities vs Load (cont'd)

As for capacity, load is not itself a new type of requirement* (different than the output requirements we have been discussing up until now) but a state precondition that must be specified as a pre-requisite for all appropriate (e.g., “normal load”) outputs

A requirements specification (and any individual requirement) may include both loads and capacities

Specification of a maximum capacity limit is mandatory for each interrupt-signaled positive proximate trigger in the specification

Some environments also permit assumptions to be made (and then, of course, specified) about minimum arrival rates, m(Ij ) as well

The relationships among capacity, load(s), and minimum arrival rates sometimes permit either capacity or load to be deduced from other explicitly specified conditions (in which case redundant specification may lead to potential consistency problems)

* e.g, “performance”


Implicit Specification of a Capacity Limit C(Ik ) via an Inclusive Load Limit L

If a system specifies a load limit where

{Ij} is the count of inputs of type Ij received within some time period, and

wj is the (positive) weight for that category of input (often all Ij are of equal “significance” so the wj are all 1)

and there is no explicit capacity limit C(Ik ) specified for an Ik that is one of the Ij , above, then it is being implicitly specified that C(Ik ) = L, i.e., all the other other input counts could be zero unless there are some minimum arrival rate assumptions m(Ij ) specified for some of the Ij in which case C(Ik ) = L - m(Ij )

L = wj •{Ij},j

j k


Positive Trigger Issues: Minimum Time Between Events TBEmin

Another common real-world pre-condition too often omitted is where the environment can't (or shouldn't) generate trigger events of type X without a specified minimum time delay TBEmin(X) between them

So if we get a second such event too soon on the heels of the last one, something's wrong

In general, the more we know (and specify explicitly) about the nominal behavior of triggers, the more robust our specifications will be – assuming we make them set-complete (coming up)

Classify inputs as to whether or not they're supposed have a minimum Classify inputs as to whether or not they're supposed have a minimum delay between successive occurrences and then specify appropriate delay between successive occurrences and then specify appropriate triggers accordinglytriggers accordingly


Implicit Specification of C(Ii ) via TBEmin(Ii )

Note that a TBEmin(Ij ) implies an C(Ij) but not vice versa

Unless otherwise specified, nothing permits the assumption that all C(Ij)

of some Ij don't arrive at the same time

But given a specified TBEmin(Ij ), C(Ii) = 1 / TBEmin(Ij) For example, if TBEmin(Ij ) = 20 seconds, then C(Ij) = 3/minute


More Positive Trigger Issues Related to Minimum Time Between Events

A minimum time delay expectation can also pertain to the interval between a prior, “response expected” output and the current input

Again, characterize the environment!characterize the environment!

If the input is a simple “acknowledged” (in response to some prior output), there may be no meaningful minimum time delay expected, but ...


More Positive Trigger Issues Related to Minimum Time Between Events (cont'd)

Example: An output of a “close valve” command issued when the valve is open

Expected response: “Valve closed” status input received within 2 seconds

Known environmental characteristic: Closing an open valve takes at least a full second

Conclusion: We should smell a rat if the “valve closed” message comes back in 0.3 seconds


More Positive Trigger Issues Related to Minimum Time Between Events (cont'd)

If you don't know the response characteristics of the environment, find out!find out!

Classify every input as to spontaneous or responsiveClassify every input as to spontaneous or responsive

For every responsive input, specify the output being responded to and For every responsive input, specify the output being responded to and any appropriate any appropriate minimumminimum as well as a maximum time allowed for the as well as a maximum time allowed for the response to be valid (and possible data value considerations as well)response to be valid (and possible data value considerations as well)

Specify requirements to handle all Specify requirements to handle all ““invalidinvalid”” responses as well responses as well as the total lack of a (timely) responseas the total lack of a (timely) response


Roadmap





4.1 Individual requirements completeness

4.1.1 Completeness in stimulus predicates S

4.1.1.1 Events, conditions, and states

4.1.1.2 Proximate triggers

4.1.1.2.1 Positive triggers

4.1.1.2.2 Negative triggers4.1.1.2.2 Negative triggers

5. Output hazard analyses ⋮


Negative Triggers

A negative trigger is one that says, informally, “if something hasn't happened that was supposed to … .”

If there is no input of type X for 5 seconds, the program shall output an alarm of type Y

Issue: Suppose, continuing the example, above, that 10 seconds elapses between successive inputs of type X being received. How many outputs of type Y are required and when?

After each separate silent interval of five seconds duration, even if they overlap? With a granularity of a second? A millisecond? A microsecond?

There is nothing inconsistent with the requirement as worded above if the program produces 100 outputs after 5 seconds of silence after the last receipt of an input of type X – but that's probably not what is really intended/required


Negative Trigger Principles: Maximum Time Between Events (TBEMAX)

The bounds on the non-existence interval The bounds on the non-existence interval TBETBEmaxmax must eventually be must eventually be refined to tight black box observablesrefined to tight black box observables

Examples:

If no input of type X is received within 5 seconds after the receipt of an input of type …

If no input of type X is received within ±2.5 seconds of the receipt of an input of type …

If no input of type X is received within Z seconds of the receipt of an input of type …

where Z is defined as the average of … .


Maximum Time Between Events (TBEmax) vs Minimum Arrival Rate, m

Unless otherwise specified (i.e., via a negatively triggered requirement specifying the TBEmax), nothing permits us to assume that all m(Ii) of some Ii don't arrive at the same time, so given an explicit m(Ii), TBEmax still can't be assumed to be less than the time interval over which the m(Ii) is specified E.g., if m(Ii) is 3 every second than TBEmax could still be 1 second

But given an explicit TBEmax(Ii ), m(Ii) = 1 / TBEmax(Ii ) E.g., if TBEmax is 2 seconds than m(Ii) = 1 every 2 seconds


Example: TBEmax , m, L, and C

Suppose TBEmax(I1 ) = 20 seconds, in other words, an input I1 is expected to arrive at least every 20 seconds

Then the (implicit) m(I1 ), normalized upward to a per minute rate, for this example, would be 1/(1/3 minute) or 3/minute

Suppose the max system load L is specified as a total of 20 inputs of either of types I1 or I2 per minute (in any mixture); further suppose that there is no TBEmax specified for I2

Further suppose that no explicit capacity limit has been specified for either I1 or I2

Then the implicit maximum C(I1 ) = L = 20/minute but C(I2 ) = L - m (I1 ) = 17/minute


Summary of Minimum and Maximum Times and Rates

TBETBEmaxmax(I)(I) should be mandatory for most embedded systems inputs (indefinite silence is the should be mandatory for most embedded systems inputs (indefinite silence is the mark of something wrong; at the least, document why notmark of something wrong; at the least, document why not

TBETBEmaxmax(I)(I) directly implies an directly implies an m(I)m(I) but not directly vice versa; so even if an but not directly vice versa; so even if an m(I)m(I) is specified, is specified, there often should be a separate specification for there often should be a separate specification for TBETBEmaxmax(I)(I); and if not, document why not; and if not, document why not


Summary of Minimum and Maximum Times and Rates (cont'd)

C(IC(Ijj ) ) should be mandatory for most interrupt-driven embedded system should be mandatory for most interrupt-driven embedded system inputs - no processor has infinite capacity and an impending overload is a inputs - no processor has infinite capacity and an impending overload is a detectable abnormality that should have a specified responsedetectable abnormality that should have a specified response

C(I) can be specified in one of three ways:

Explicitly

Implicitly via an inclusive specified load, L

Implicitly via TBEmin(I )

Load limit L can be specified either explicitly or implicitly; In the implicit case L = C(Ij ) and the software will be required to handle all

inputs arriving at their individually maximum rates in the same time frame


One Last Comment on Negative Triggers on TBEmax(I )

After 25 years, the program still hasn't received an input of type X

It made its (one!) required output 24.9999... years ago

Still no other requirement to do anything else?

Consider a progression of Consider a progression of ““increasing silenceincreasing silence”” requirements requirements


An Example of a Progression of “Silence” Requirements

Req #1: If no input of type X is received within 5 seconds after the receipt of an input of type Y …

Req #2: If still no input of type X is received within 60 seconds after the receipt of an input of type Y …

Alternate form of req #2: If still no input of type X is received within 55 seconds after the output produced in response to the first trigger in req #1, above, …

Is this a valid trigger phrase? Does it specify the same required behavior as the first one? Exactly? (You're sure?)


Roadmap







4.1.2 Completeness of response predicates, 4.1.2 Completeness of response predicates, RR 4.1.2.1 Uniqueness of 4.1.2.1 Uniqueness of OO 4.1.2.2 Output predicate completeness for 4.1.2.2 Output predicate completeness for t(O)t(O) 4.1.2.3 Output predicate completeness for 4.1.2.3 Output predicate completeness for v(O)v(O)

4.1.2.4 Single trigger, multiple output issues4.1.2.4 Single trigger, multiple output issues



Necessary Digression for More Formalism: A More Detailed Look at S R

S, the stimulus that requires a response, often says “there exists an event I (the proximate trigger) such that some predicate P1 (I) about I (and some preconditions) is true”

This stimulus condition S being true not only requires the production of some response R but this stimulus condition S must be true whenever R is observed

Similarly, R says that there exists some observable/testable output event O, such that some predicate P2 (O) is true about O


A More Detailed Look at S R (cont'd)

For analytic purposes, detailed output requirements either look like, or must be equivalent to, either:

I P1 ( I ) O P2 ( O )

or

I P1 ( I ) O P2 ( O )

where the predicates P1 and P2 must be tight black box predicates involving only: Constants v(X) or t(X), the value or time of occurrence, respectively, of black box events X, where X can only

be either: I, an input O, the required output

Ej, other black box events (whose existence or non-existence must be explicitly existentially quantified) such as prior inputs or outputs


Why the Formalism?

Some safety critical projects may have to provide “proofs of correctness”, proving the consistency of code with requirements – if the requirements are incorrect, such proofs are useless

Some requirements can only be understood formally

It increases the degree of “trust” in these completeness principles by showing their origin


More Necessary Formalism: The Need for Uniqueness

I P1( I ) O P2( O )

meaning: an input I exists satisfying specified preconditions P1(I) if and only if an output O exists satisfying conditions P2(O)

That usually can't be right since it allows an arbitrary positive number of outputs to be triggered by single I In response to a low temperature report, the system shall output “move control rod up” [and then another? and then another?]

What is incompatible with the requirement as worded above if we actually generate 20 individually apparently correct outputs?

Permitting the output of an indeterminate number of such outputs can't be what's intended


The Need for Uniqueness (cont'd)

Usually, what's intended is that each unique input I satisfying some preconditions requires a matching (unique) output O

Formally, that's written ! I P1 ( I ) ! O P2 ( O ) where ! means, “there exists a unique”*

* Zermelo-Frankel notation

State transition diagrams avoid this ambiguityState transition diagrams avoid this ambiguity


Roadmap






4.1.1 Completeness in stimulus predicates S 4.1.2 Completeness of response predicates, R

4.1.2.1 Uniqueness of O

4.1.2.2 Output predicate completeness for t(O)4.1.2.2 Output predicate completeness for t(O) 4.1.2.2.1 Latency4.1.2.2.1 Latency 4.1.2.2.2 Order preservation4.1.2.2.2 Order preservation

5. Output hazard analyses⋮


Output Predicate t(O) Completeness

The legal time of observation The legal time of observation t(O)t(O) of an output of an output OO must be bounded (both upper must be bounded (both upper limit and lower limit specified)limit and lower limit specified)

The lower limit is often t(I), the time of observation of the proximate trigger; certainly the lower limit must at least be > t(I)

The upper limit is usually t(I) plus some response time There are theoretic alternatives to this, but I doubt very much that they ever have

any practical utility There are some mildly non-trivial implications of doing this rigorously that deserve

attention: Latency implications for negatively triggered outputs

Order preservation for positively triggered outputs


The Latency Limit for Negative Triggers

Recall that, informally, a negative trigger is one that says “if something hasn't happened for some period of time ... .”

Also recall that the end points of the silence period must be well defined in terms of black-box observables; e.g., Once 5 seconds elapses after the receipt of the last input of type I without the receipt of a subsequent input of type I (within the 5 second window), turn on the red light.

Now note that it would be erroneous to conclude from that example that if the red light came on, there hadn't been an input of type I within the previous 5 seconds; the significance (consequence, inference) of the red light coming on is a little more complicated than that


The Latency Interval for Negative Triggers (cont'd)

Look at it formally:

… ! I1 , I2 t( I1 ) < t( I2 ) < t( I1 ) + si …

! O t( I1 ) + si < t( O ) < t( I1 ) + si + rt …

where: si is the silence interval

rt > 0 is the response time (or better, latency limit) for O

In other words, the output O may be latent for a period of up to rt , meaning that the system is going to evince the behavior O even though it has not yet done so – but it is now required to evince O since all specified preconditions have been met


The Latency Interval for Negative Triggers (cont'd)

So, we may want O to signal that there has been no recent I within the silence interval prior to O, but it can never actually mean precisely that An input I could have arrived after the silence interval after some previous

event but within the latency period rt and thus before O The latency limit can usually be made quite small, often negligibly so, but it

cannot be made identically zero And what constitutes “negligible” is, of course, dependent on the

application environment and its hazards Note that once O actually appears, it's actual latency can be calculated,

t = t(O) - [ t( I1 ) +si ] and that the actual latency t must be < rt, or the evinced behavior will not be within spec


Negative Trigger Latency Example:

Requirement:

Once 5 seconds elapses after the receipt of the last input of type I without the receipt of a subsequent input of type I (within the 5 second window), turn on the red light within ¼ second.

Observed behavior: An input of type I arrives at time 27, red light appears at time 32.15

Latency limit specified: 0.25 seconds; actual latency for this occurrence of O: 0.15 seconds


Negative Triggers, t, and Safety

The following possible requirements safety question needs to be The following possible requirements safety question needs to be addressed for each negatively triggered requirement: Should there be addressed for each negatively triggered requirement: Should there be an additional requirement for each input of type an additional requirement for each input of type II having a negative having a negative trigger on it to check to see whether it (an input of type trigger on it to check to see whether it (an input of type II ) occurred ) occurred within the latency limit of the subsequent negatively triggered output within the latency limit of the subsequent negatively triggered output OO; the purpose being to warn the external world that the last ; the purpose being to warn the external world that the last OO really really wasn't quite right: there actually wasn't quite right: there actually waswas an input an input II received just before it received just before it (the output (the output OO), even if not by much – i.e., ), even if not by much – i.e., < < tt


Negative Triggers, t and Safety:Conclusion

The actual or observed latency, t, as opposed to the specified latency limit can be used in the specification of the new requirement(s), e.g.:

If the nominally inhibiting input I arrives within an actual latency period of 0.1•rt just prior to the (pseudo-false*) output O, then do X; else do Y

Which should lead one to consider whether or not one might also need a requirement to announce that, although the last output O was, in fact, correct (and not pseudo-false*), there was what would otherwise have been an inhibiting output just after it

So an input I (arriving “close” to the negatively triggered O) might then possibly trigger multiple different outputs, depending on various values (both positive and negative) of actual latency

* “Pseudo-false” (my neologism), because, as discussed earlier, the output O can never really mean that there wasn't an earlier input I


Roadmap




4. Completeness and Consistency



4.1.2 Completeness of response predicates, R

4.1.2.1 Uniqueness of O

4.1.2.2 Output predicate completeness for t(O)

4.1.2.2.1 Latency

4.1.2.2.2 Order preservation4.1.2.2.2 Order preservation


⋮


t and Order Preservation for Positive Triggers

For negative triggers, t is a necessary, if sometimes unwanted, consideration bounding the interval where the order we really want can't be guaranteed (i.e., no last minute I before the O)

For positive triggers, t is an easy way to guarantee order preservation, outputs always occurring in the same order as their triggering inputs

Order preservation may not always be a requirement, but it's always Order preservation may not always be a requirement, but it's always something that should be thought about and the analysis documentedsomething that should be thought about and the analysis documented

Lack of guaranteed order preservation should be explicitly addressed in a Lack of guaranteed order preservation should be explicitly addressed in a hazard analysis and the requirements specificationhazard analysis and the requirements specification


Order Preservation with Positive Triggers

Simple requirement with a response time of rt :

! I P( I ) ! O … t( I ) < t( O ) < t( I ) + rt …

If TBEmin( I ) < rt , order preservation is not being required, where TBEmin(I) is the minimum time delay required (assumed) between successive inputs I, and if not specified explicitly, it must be assumed to be 0

If the minimum-time-between-events assumption If the minimum-time-between-events assumption TBETBEminmin(I)(I) < < rrtt,, the requirements the requirements specification must include additional logic for order preservation or it is not being specification must include additional logic for order preservation or it is not being requiredrequired

rt

time

t(I1 ) t(I2 ) t(O2 ) t(O1 )

TBEmin(I)


t and Order Preservation for Positive Triggers

If order preservation is to be required and If order preservation is to be required and TBETBEminmin(I) < r(I) < rtt , specify a , specify a tt delay after the trigger — i.e., the output must be delayed by at least delay after the trigger — i.e., the output must be delayed by at least tt after its trigger, where after its trigger, where rrt t > > tt > > rrt t - - TBETBEminmin(I)(I)

! I P( I ) ! O … t( I ) + t < t( O ) < t( I ) + rt …

Result:rt

time

t(I1 ) earliest t(I2) earliest t(O2 ) latest t(O1 )

TBEmin(I) t


t and Order Preservation for Positive Triggers (cont'd)

Simply stating a “meta” requirement somewhere, e.g., like “unless otherwise specified, all outputs shall occur in the order of their respective triggers”, may often be sufficient

If formal proof of behavioral correctness is needed, you may need the math:

! I P( I ) ! O … t( I ) + t < t( O ) < t( I ) + rt …

There are other formal alternatives, but they are harder to specify and possibly/probably harder to implement and/or verify


Roadmap







4.1.2.1 Uniqueness of O 4.1.2.2 Output predicate completeness for t(O)

4.1.2.3 Output predicate completeness for 4.1.2.3 Output predicate completeness for v(O)v(O) 4.1.2.3.1 Logical Completeness 4.1.2.3.2 Existential completeness 4.1.2.3.3 Data age


⋮


Any conditional logic for the specification of the output value must be Any conditional logic for the specification of the output value must be logically completelogically complete

Can't just say, for example, “turn on the red light if the input value is greater than 0 ”; have to also specify what to output if the value of the input is less than or equal to 0

Not just … v( O ) = x1, if v(I) 0 ….

but

Output Value Predicate: Logical Completeness

x1, if v(I) 0 …… v(O) = x2, if v(I) < 0 …


Output Value Predicate: Existential Completeness

So,

Problem: If that I, above, is the unique proximate trigger (that requires the response O in the first place), we know it exists, so that logically complete predicate, above, seems ok; but ...

Suppose I is not the proximate trigger?

When the operator pushes the “sensor status request” button, output “OK!” if the last value received from the sensor > 0; otherwise output “Bum data”

How do we know there is a last value of I (the sensor data, in this example)?

x1, if v(I) 0 ……v(O) = x2, if v(I) < 0 …


Output Value Predicate: Existential Completeness (cont'd)

Even if I is not the proximate trigger, it might be that I is an absolute precondition for O – can't have an O unless that I existed earlier

Formally, it's existence (an absolute precondition) is part of every disjoint phrase in the trigger predicate S, when S is reduced to disjunctive normal form (DNF)

Otherwise, need to specify an existence phrase in that predicate

When the operator pushes the “status request” button, output “OK!” if there has been an earlier sensor report and the last value received from the sensor was greater than 0; otherwise, if the last sensor value received was greater than or equal to 0, output “Bum data”; finally, if there's no prior sensor report at all, output “No sensor report received”.


Output Value Predicate: Existential Completeness (cont'd)

Formally, an event referenced in Formally, an event referenced in P(O) P(O) must either be an absolute must either be an absolute precondition for precondition for O O or be existentially quantified in or be existentially quantified in P(O) P(O) itself itself

x1, if I v(I) 0 ……v(O) = x2, if I v(I) < 0 … x3, if ¬ I …


Output Value Predicate: Data Age Specification

Typical … terms in v(O) refer to events like “the most recent input I such that … .” E.g.,

Within 3 seconds of the operator pushing the “request sensor data” button, output the last received sensor input of type X, if such an input exists; otherwise output 0.

Still not good enough. How old can that most recent input be before it's not ok to use it? 5 seconds? 5 years? 500 years?

Each event Each event XX existentially quantified in existentially quantified in P(O)P(O) must have upper and lower must have upper and lower bounds placed on bounds placed on t(X)t(X)

x1, if ! I 0 < t(O) - t(I) < 5 seconds v(I) 0 ……v(O) = x2, …


Output Value Predicate: Data Age Specification (cont'd)

Mathematically, a simple “… otherwise” clause is legitimate, but better (“safer”) to make explicit at least three cases:

Got a last input of the right type and it's recent enough

Got a last input, but it's too old

Don't have any such input, since … when?

Ever in the entire history of the universe?

Most recent startup of this program (most common)?

Last event of type X ? (E.g., “reset”)

?


Data Age, Informally

In a real time system, very little data is “valid” forever, even if not superceded. At the least, some documentation should record that the issue has been

examined and that data item X really is safe to consider immortal; e.g., perhaps the last operator entered default value

Safe specifications state explicitly how current the data values used in outputs must be

And then logical set completeness will force the specification to include explicit requirements for dealing with obsolete data

This analysis will also force consideration of issues related to possible times of non-observability – times, such as pre-startup (about which more later), when events could have been generated by the environment but not “observed” by the software


An Accident Involving Data Age

The context: B-1B offensive avionics suite testing

The bomb bay doors had a mechanical interlock to allow them to be locked open for certain tests

The bomb bay door controller was programmed to apply a valid cockpit ordered command (without checking its age)

Someone had pushed the “Close bomb bay doors” button in the cockpit after the bomb bay doors were locked open hours earlier for some mechanical testing

When a test tech removed the mechanical interlock, the bomb bay doors closed around his head


Two Lessons from the B-1B Accident

1. Response time with reference to a proximate trigger assures that an output is based on current proximate trigger data; for all other referenced data, include a data currency (data age) check

2. Remember that from a safety standpoint, systems testing may be as real as “real” operations. The test environment may induce hazards that could not exist in “real operations”; hazard analyses must be performed for test environments as well as operational ones


When defining the observable characteristics of When defining the observable characteristics of v(O)v(O), terms used in the , terms used in the definition must be observable and (eventually) well defined at the black-box definition must be observable and (eventually) well defined at the black-box boundary:boundary:

ConstantsConstants

v(X)v(X) and and t(Xt(X)) for for Absolute pre-condition events Absolute pre-condition events XX Existentially quantified events Existentially quantified events XX

All conditionals must be tautologically completeAll conditionals must be tautologically complete

Data age limits must be placed on all events save Data age limits must be placed on all events save uniqueunique proximate triggers proximate triggers (where response time forces the same result)(where response time forces the same result)

The same rules apply to the bounds on t(O), although their specification is typically much simpler than that of v(O) – but rigorous specification of graceful degradation can be a problem

Output Value Predicate: General Summary


Non-Unique Proximate Triggers

If the the proximate trigger X contains a field v(X) and the output contains a field v(O) whose requirements are defined with respect to v(X), then the currency of that v(X) is known to be within the response time limit of the output

But that's only true of the proximate trigger is unique: depending on the formalism (if any) of the requirements language, it may be possible to write a requirement like: if operator A pushes the red button or operator B pushes the foot switch, then output v(O) …

If the output requirements for v(O) make reference to fields of the “red button” proximate trigger, it too will need to be existentially quantified in the predicate for v( O), since the red button proximate trigger is one of several possible proximate triggers for this output and hence the age (or even existence!) of its fields cannot be assumed


Roadmap







4.1.2.1 Uniqueness of O 4.1.2.2 Output predicate completeness for t(O)

4.1.2.3 Output predicate completeness for v(O)

4.1.2.4 Refinement of Abstractions4.1.2.4 Refinement of Abstractions

5. Output hazard analyses⋮


Refinement of Abstractions

As was discussed previously, the same type of output can, depending on the formalism, be required in response to semantically distinct trigger events

Contra wise, and perhaps more common, the same trigger event can require multiple responses, differing somehow in value or time or both

A set of successive, periodic outputs in response to the same basic trigger event is a common example; there are others

This is one place where our commonly used timing abstractions can dangerously mislead us


The program shall output X every 2 seconds

The program shall output X every 2 seconds 100 ms

After the operator presses the button, the program shall output X every 2 seconds 100 ms

Within 500 ms after the operator presses the button, the program shall start to output X every 2 seconds 100 ms

[ … every 2 seconds … ] measured relative to what? Previous output? System (or other) clock?

Example of Imprecision in Abstractions as a Source of Safety Problems

How much information does it take to specify a periodic?


Abstraction Can Conceal Alternative Formalisms for a Periodic

Periodic requirement #1: Periodicity interval p measured against external time (sometimes known as “phase locked”) ! I P(I) ! O0 t(I) < t(O0 ) < t(I) + rt

k > 0, ! Ok t (O0 ) + kp - a/2 < t(Ok ) < t (O0 ) +kp + a/2 Periodic requirement #2: Periodicity interval p measured against

the time of the prior output (relative timing) ! I P( I ) ! O0 t(I) < t(O0 ) < t(I) + rt

k > 0, ! Ok t(Ok-1 ) + p – a/2 < t(Ok ) < t(Ok-1 ) + p + a/2

where rt is the response time limit for the first output p is the required periodicity interval a is the required accuracy of the periodicity


Why Bother With the Math: Clock Drift

a

Periodic #1

Periodic #2

External time referencet(O0 ) +p +2p +3p

Legend:

range of legal output times

actual output times - for this example, always the earliest legal time

p is the required periodicity intervala is the required accuracy of the periodicity


An Accident Involving Digital Counters

Timing drift between two separate programs during Desert Storm caused the Patriot missile systems to progressively lose accuracy -- the result was a missed intercept and loss of life


Other Common Problems With Periodics

Can some future event(s) prevent any of the “remaining” (previously triggered and hence required) multiple outputs from being required? Which outputs get inhibited?

All remaining?

Just a select few? Which ones?

When are the periodic resumed? I.e., what event occurring in what state?

Upon resumption, what is the time reference (i.e.,what is the relationship of the time of the newly “resumed” periodic with the set output prior to temporary inhibition?)


Another Complex Abstraction: Graceful Degradation

Graceful degradation is an intuitively appealing abstraction (and useful in practical terms as well), but …

It is very hard to nail down precisely

Is the response time for all affected outputs , for example, supposed to degrade roughly equally?

Or can they vary all over the lot – some normally fast, some much slower?

Couldn't this have safety implications in the right (or wrong) system or environment?


Possibly Overly Simplistic Graceful Degradation of Response Time

Actual response

time

Specified response time lim

it

(increases as input load increases)

•

•

•

•

••••• •

•

••

•

•

Inverse log scale of actual input arrival times (load increasing with time)


Gracefully Degrading Response Time with Gracefully Degrading (But Not Monotonic) Predictability


Upper bound on legal response tim

e

Lower bound on required response delay

Response time predictability

•• •

••

Actual response time


Gracefully Degrading Response Time with Gracefully Degrading, Monotonic Predictability

• • •

••



e





Gracefully Degrading Response Time with Constant Predictability


e





• • •

••


Summary of Factors to be Considered for Graceful Degradation

Independent behavioral concepts:

Upper limit on response time (as a function of load)

Lower limit on response time (bounds predictability)

Monotonicity

Order preservation (discussed earlier)


Safety and Abstractions: Summary

Abstractions (such as “periodic”, or “graceful degradation”) are vital to our ability to understand and communicate about the behavior of our systems; but … Abstractions are intentionally incomplete in that they intentionally omit “irrelevan

t” detail

And in a safety-critical system, less may be irrelevant than we would like

Unless you're Unless you're really surereally sure that your abstraction captures everything of that your abstraction captures everything of behavioral significance for the environment of operation, eventually refine the behavioral significance for the environment of operation, eventually refine the requirement and use some mathematical formalism to express it requirement and use some mathematical formalism to express it – it's – it's a useful exercise that forces someone to really think through all the a useful exercise that forces someone to really think through all the possibilities, possibilities, always always a good idea for safety critical softwarea good idea for safety critical software

And even if you're really sure it's unnecessary, document the decision for the And even if you're really sure it's unnecessary, document the decision for the future (re-use in a different environment)future (re-use in a different environment)


Stepwise Refinement of Abstractions: Conclusion

It's not just designs that get produced via stepwise refinement, the requirements specifications themselves may require stepwise refinement – particularly for safety-critical systems

First draft with loose black box boundaries and/or “common” abstractions

Later version(s) with “tight” boundaries and little or no abstraction – spell it out with the math (or other unambiguous mechanisms) or state explicitly why not


Roadmap






4.2 Set Completeness4.2 Set Completeness4.2.1 Definitions and limitations4.2.1 Definitions and limitations4.2.2 Engineering completeness4.2.2 Engineering completeness

4.2.2.1 Logical completeness4.2.2.1 Logical completeness 4.2.2.2 Relationship to robustness 4.2.2.2 Relationship to robustness 4.2.2.3 Semantic completeness4.2.2.3 Semantic completeness



Completeness of a Set of Requirements

“External” completeness of principal requirements (sometimes called functional completeness) is not analytically tractable “What?! You want it to control the coffeemaker, too, as well as the

flaperons? Why didn't you say so (six months ago)?

But a useful definition for “standard engineering completeness” is possible: Internal or logical completeness

Completeness with respect to a set of domain-independent “undesired event” cases applicable to all real-time software (robustness)

Semantic completeness


Engineering Completeness of a Set of Requirements

What mathematical or engineering rules govern the derivation of new requirements from an initially smaller set of requirements?

The robustness principles discussed earlier are part of the answer, but they are somewhat ad hoc (based on a lot of engineering experience, however); here I want to examine a mathematical formulation which is both more general and more formal Generality – it encompasses the previous robustness requirements but leads to

additional types of requirements as well

Formality – makes clear the interdependence (not equivalence) of the concepts of “completeness” and “robustness” A complete specification will provide robust behavior with regard to a wide range (not all

inclusive) of aberrant behaviors

A robust specification must be complete – omitted cases can't be said to be either safe or unsafe


Logical Completeness: Definition

Formally, A set of requirements { Si Ri } is logically complete iff the logical 'or' of the Si is a tautology

Informally, if you've specified a response triggered by an input I whose value v(I) < 0 there'd better be some other requirement somewhere specifying what to do with an input trigger with v(I) 0

Si =

i


Logical Completeness: Some Consequences

The richer the set of explicitly specified input assumptions (a.k.a. pre-conditions) in Si , the more set completeness leads to many “standard” robustness requirements

Set completeness gets a lot more interesting when we consider state dependencies

Many specifications do an apparently good job on the state preconditions for a given output but then fail totally to address the question of what do to when an identical trigger arrives at other times (states)


Logical Completeness (cont'd)

Maybe it's ok to ignore these “unnecessary”, “meaningless”, or even “impossible” events and do nothing, but I think a safety critical requirements specification should say that explicitly (and somewhere document why)

Often, these inputs arriving at the “wrong” or “unexpected” or even “impossible” times are evidence of a disconnect between the system and it's environment

As noted earlier, robustness dictates that these events be used whenever possible to help recognize potential hazards


Logical Completeness and Robustness

Remember that R S; observation of the response implies that there was a proper stimulus that satisfied the pre-conditions (assumptions) of the predicate S

The more we make assumptions explicit in S:1. The subjectively “safer” the software will be

(a) More thinking required by the requirements engineer

(b) More chance for someone else knowledgeable to review and say, “that's not right, we don't know that, we can't assume that”

2. The more logical completeness will force robustness by requiring the specification of responses to triggers arriving under “wrong” or “unexpected” conditions


Startup as a Pre-Condition (and Its Consequences)

So, what's one thing we know, when we observe an O, that is rarely explicitly specified as a precondition in a requirements specification? The program is running!

Formally, most “normal” requirements should start something like ! E0, I t(E0) < t(I) … , where E0 is the most recent startup not yet superceded by a shutdown

E0 is surely visible at the software black box boundary, although the observation mechanism is different from the inside than it is outside


More on Startup Requirements

Given a requirement like Given a requirement like EE00 , , II t(Et(E0 0 )<t(I))<t(I) …… set completeness dictates consideration of the case set completeness dictates consideration of the case EE0 0 , , [ [ II tt((EE0 0 ) < ) < t(I)t(I) ] ] …… which in fact will require which in fact will require considering requirements for at least two cases: considering requirements for at least two cases:

The software cannot tell if there's ever The software cannot tell if there's ever beenbeen an input an input II

The most recent observable input The most recent observable input II occurred occurred prior to prior to EE00

Are these valid requirements in the real world?

I silence after startup is always observable

The observability of pre-startup events depends on the hardware


Startup Requirements: Responding to Silence

For each referenced input For each referenced input II, , consider the need for a requirement to consider the need for a requirement to deal with the environment exceeding deal with the environment exceeding ““mmaximum aximum ssilence ilence aafter fter sstartuptartup””, call it , call it TTmsas msas

Same basic situation as exceeding TBEmax, the maximum time between inputs

There have certainly been requirements specifications written which somehow managed to catch one case but not the other – and it should not be tacitly assumed that TBEmax = Tmsas

As before, consideration should be given to a series of progressively stronger responses


Startup Requirements: Pre-Startup Events

Depending on the nature of the hardware involved, there could be a detectable input “waiting” on the lines from before startup and the software might not be able to make any assumption on how long it had been there! Nor how many other messages the environment might have placed there

and overwritten! (If the hardware allows that.)

Since the actual time of arrival itself would not be observable by the software, the only way the software could know when the message arrived would be if the message included a time tag


Startup Requirements: Pre-Startup Events (cont'd)

This is a really great way to get the system and the environment thoroughly out of synch

Examples: This time tagged message is intended to be a time synch message from

the external time source – it will be used to set a software settable clock The environment has sent 200 messages and incremented some counter

of its own accordingly; the software, of course, has only received the most recent one and its counter is incremented by 1

The environment “expects” us to know its current state even though it has changed state while our software was not running


Startup Requirements: Pre-Startup Events (cont'd)

If the hardware permits inputs to If the hardware permits inputs to ““hanghang”” indefinitely (or even indefinitely (or even ““too longtoo long””) without ) without being read by software, there should be special software requirements to being read by software, there should be special software requirements to respond to those inputs apparently already present right at start uprespond to those inputs apparently already present right at start up

The point is to notnot let the specification blindly allow the input to trigger a “normal” response (when the input event itself may actually have happened two years earlier)

Same potential hazard exists when unmasking interrupts, which is why masking them is not something to do casually (i.e., without a great deal of documented and reviewed analysis)


Generalizing: Startup Inconsistencies

Changes in the external state of the environment when software is not “watching” are a potential cause of inconsistency and hence hazards

Software requirements engineers should know enough to go looking for such possibilities or for acceptable (documented) assurance that they can't happen – and then spec some sort of requirement anyway!


Roadmap






4.2 Set Completeness

4.2.1 Definitions and limitations

4.2.2 Engineering completeness

4.2.2.1 Logical completeness

4.2.2.2 Relationship to robustness

4.2.2.3 Semantic completeness4.2.2.3 Semantic completeness5. Output hazard analyses


Semantic Completeness (a.k.a.Output Value Completeness)

Does the specification describe the characteristics of all appropriate output values, particularly semantically discrete (a.k.a enumerated) values?

Example, the specification requires generation of an output O with v(O)=“close” but does not require any output with v(O)=“open”

May seem trivially obvious, but it's been overlooked before


Summary of Completeness and Consistency

4. Completeness & consistency

4.1 Individual requirements completeness 4.3 Consistency*4.3.1 Determinacy:

Consistency among output requirements

4.3.2 Safety (part I) : Consistency between requirements and safety constraints

1.33

5

2

addi

tiona

l der

ived

out

puts

4.1.2 Response4.1.1 Stimulus4.1.1.1 Events, conditions,

and states4.1.2.1 Uniqueness

4.1.2.2 Timing

4.1.2.3 Value

4.1.2.4 Abstraction Refinement

4.2 Set completeness

4.1.1.2 Proximate triggers4.1.1.2.1 Positive

4.1.1.2.2 Negative

* not discussed here


Roadmapad

ditio

nal d

erive

d ou

tput

s




5. Output hazard analyses5. Output hazard analyses



Additional Software Safety and Hazard Analyses

Precision in the specification of proximate triggers, trigger sets, sub-triggers, and states

(a.k.a. avoiding ambiguity, confusion, irritation, and other such inelegant, bad, and probably downright unsafe things in the specification of state transitions)

Safety related state transition rules


Note on State Transitions

State transitions are not formally black box requirements themselves but are interpretations given to behavioral relationships visible in requirements documents

See Weinberg, An Introduction to General Systems Thinking, for a good (and pleasant reading) discussion of this point

Several modern, graphically oriented requirements techniques lose this distinction anyway and just call one set of diagrams “state transition” diagrams


Trigger Sets, Proximate Triggers, and States

A trigger set is a (non-empty) set of events that trigger an output and, in so doing, are “consumed” There can't be another output of that type until all the events in the trigger set

occur again (possibly order or timing dependent, possibly not – up to the spec) The proximate trigger is just the last event in the trigger set

The trigger set is just a state, but I recommend that it have its own nomenclature: Triggers get “consumed” by one output, whereas

States can “enable” multiple, identically required outputs


Trigger Sets, Proximate Triggers, and States (cont'd)

Confusion as to whether an event (that is unambiguously an absolute precondition) is part of a trigger or just part of a state could obviously lead to safety problems:

Generate the missile firing command upon receipt of a sensor report of type X, but only if a prior authorization has been received

One authorization required per shot? Or is one blanket authorization good for multiple shots?


Proximate Triggers, Sub-Triggers and Optional Transitions

Proximate triggers don't necessarily force state changes

Informally, it's possible to stay in the same state and keep responding the same way each time a new proximate trigger arrives

Formally, that's usually described as a state transition that returns to the original state

That's only possible if the trigger set contains only a single event (which is then, by definition, the proximate trigger)


Trigger Sets and Mandatory Transitions

The proximate trigger of a trigger set containing multiple events does force a state change Sub-triggers are consumed by the arrival of (and response to) a proximate trigger That output is no longer possible (until the system changes states again as the sub-

triggers accumulate)

The real point? Requirements worded like Generate a missile firing command upon receipt of a

sensor report of type X, but only if a prior authorization has been received

are ambiguous For many safety critical systems, you shouldn't write (the final version of)

state transition requirements in English Use graphical, hierarchical state transition diagramsUse graphical, hierarchical state transition diagrams e.g., UML state diagrams


Rescinding Sub-Triggers

State transition diagrams also make explicit the problems of partial “rescinding” of sub-triggers

Generate the missile firing command upon receipt of a sensor report of type X, but only if a prior, and still unrescinded, authorization has been received from both operator A and operator B

OK, now an operator rescinds an authorization; before the system can fire again, does it require re-authorization from just that operator or from both?

Again, the problem is the English; use diagrams or math


Are All Input Events Always Proximate Triggers Sometimes?

I think the presumption should be that every possible observable input event (including every possible hardware trap) is a proximate trigger in every state Usually, it's only in some states that the event really is intended to be the

proximate trigger of an “intended output”

But, as noted earlier, an event arriving when the system is in the “wrong” state is often a sign of malfunction or inconsistency somewhere and so should be the proximate trigger for an error response of some sort

The burden should be on the requirements/safety analyst to state The burden should be on the requirements/safety analyst to state (in writing) why the system need not respond at all to a given input (in writing) why the system need not respond at all to a given input in a given statein a given state


Are All Input Events Always Proximate Triggers Sometimes? (cont'd)

Note that the notion of hierarchical states keeps the problem much more manageable than it might first appear to be

A requirement specifying what to do upon the arrival of an input in superstate A eliminates the need to specify individual requirements for that input arriving in each of the substates of A

Of course, the hazard level of the output triggered by that input had better be constant across all those sub-states


Are All Input Events Always Proximate Triggers Sometimes? (cont'd)

Note that not being required to observably respond is not the same thing as not processing – but processing refers to design, not behavior If it's part of a capacity precondition, for example, an event's arrival will still

cause the incrementing of a counter or the making of some other internal (design) record of its arrival

Sub-triggers can cause internal (design) state changes without producing external outputs

But, as just discussed, “unannounced” transitions (arrival of input events that are not proximate triggers – i.e., that do not themselves require output responses in the current state) should be documented somewhere

A requirements engineer or system's designer should have a really good reason for allowing an “unannounced” transition to a state with an increased hazard level – i.e., from which a hazardous output can be produced in the future


Roadmap






5.1 Precision in the specification of proximate triggers, trigger sets, sub- triggers, and states

5.2 Safety related state transition rules5.2 Safety related state transition rules


Classify Each Output for Safety ImpactClassify Each Output for Safety Impact

Safety-positive – never leads to an increased hazard level, sometimes reduces hazard levels

Safety-negative – never decreases hazard levels, sometimes increases them

Intrinsically hazardous – always increases hazard levels

Safety-neutral – never affects (direct?) hazard levels

Safety-ambiguous – safety effects unknown or sometimes increases hazard levels, sometimes decreases them


Minimize Safety-Ambiguous OutputsMinimize Safety-Ambiguous Outputs

Is there any (cost effective) way to obtain more data from the environment to clarify the ambiguity and split the safety-ambiguous requirement into two or more unambiguous ones? (Two separate sets of states)

Safety-ambiguous requirements are indicators of potential problems – you may often have to live with them, but don't accept them casually

Document the rationale for acceptance


Classify Each State for Hazard LevelClassify Each State for Hazard Level

Safe state – no safety negative or safety-ambiguous outputs can be produced from this state

Unsafe state – one or more safety negative outputs can be produced from this state

Safety-ambiguous state – everything else

No safety negative outputs possible

One or more safety-ambiguous outputs possible


Reachability Analysis and Path Robustness

Definitions: A path is a sequence of events

A path from state X to state Y is a set of events that “move” the system from state X to state Y

A state is O-possible if the output O can be produced from it (including any of its sub-states)

Note: There can be multiple paths between states The same output characteristics can describe an output required to be

produced from several different states, e.g.: If the system receives an input X and the prior input

was A, output Y if the system receives an input X and the total count

of prior inputs of type Z during the last z seconds is greater than z_max, output Y


Sidebar: Disjunctive Normal Form and Absolute Preconditions

Two apparently separate requirements If the system receives an input X and the prior input was A,

output Y If the system receives an input X and the total count of prior

inputs of type Z during the last z seconds is greater than z_max, output Y

can be re-written as If the system receives an input X and the prior input was A or

the system receives an input X and the total count of prior inputs of type Z during the last z seconds is greater than z_max, output Y

which is disjunctive normal form (DNF) A phrase (e.g., “the system receives an input X ”) present in all clauses of

the DNF form is an absolute precondition for the output being specified and may be referred to in the output predicate R, without existential quantification All other events in R must be existentially quantified


Reachability Analysis and Path Robustness (cont'd)

If a state X is not O-possible and an input event S is on every path from state X to every O-possible state, then S is a soft failure point for O:

If the ability to receive input S is ever lost, the system's ability to produce O may be lost (if the system is, or subsequently winds up, in state X )


Reachability Analysis and Path Robustness (cont'd)

If an event H is in every path into every O-possible state, then H is a potentially hard failure event for the output O

If the system is not in an O-possible state when it loses the ability to receive event H, the system may never again generate an O

Even if it is in an O-possible state when it loses the ability to receive H, if it leaves that state without generating O, it will never again generate that O


Safety Transition Rules

Startup and initialization states should be safeStartup and initialization states should be safe

There (must? should?) be some safe state (not necessarily the same one) reachable There (must? should?) be some safe state (not necessarily the same one) reachable from every state of the system's behaviorfrom every state of the system's behavior

If an output If an output OO is safety-positive, there probably should be no soft failure events for is safety-positive, there probably should be no soft failure events for O O (much less hard ones)(much less hard ones)

Every unsafe or safety negative output Every unsafe or safety negative output OO should usually have at least one hard should usually have at least one hard inhibitor event (or set of events)inhibitor event (or set of events)


Safety Transition Rules: Hard Inhibitors for Safety Negative Outputs

Other ways of looking at this:

Examine the set of all paths into all states from which safety negative outputs O can be produced to see if there are any “sneak” paths there

See if O can be “confined” to being generated from a single (possibly super-) state and examine and control all entries into that state

Same idea for higher hazard level safety-ambiguous outputs (even if they're not totally unsafe) Less likely to be totally totally true (or cost-effective)

But document the rationale for not doing this


Safety Transition Rules (cont'd)

Each unsafe state Each unsafe state U U should have a finite duration: should have a finite duration:

For each positive proximate trigger event For each positive proximate trigger event II that transitions the system into that transitions the system into U, U, there must be a negatively triggered requirement (time passed after there must be a negatively triggered requirement (time passed after II ) ) which eventually moves the system which eventually moves the system ““out ofout of ”” U U into a into a ““safersafer”” state state

Same idea for high hazard safety-ambiguous states

Again, less likely to be totally true, and

Again, still document the reasons for not doing this


Cyclic Reachability

Most behavior for most real-time systems is intended to be repeatable

E.G, if we turn off a valve, we probably want to be able to turn it on and off again in the future

Check that Check that allall states are part of states are part of allall necessary cycles; e.g., necessary cycles; e.g.,

There can be multiple states from which turning off a valve is possibleThere can be multiple states from which turning off a valve is possible

It is usually not enough that just one or more of those states has a path to one It is usually not enough that just one or more of those states has a path to one where the valve can be opened; presumably, they all should – if not, why not?where the valve can be opened; presumably, they all should – if not, why not?


Where Are We? (a.k.a. Are We Done Yet?)

addi

tiona

l der

ived

outp

uts






(Almost, Not Quite)


Additional Data and Activities During the Requirements Phase for Safety Critical Software

Activity Product

Preliminary Hazard Analysis (PHA)

Input (environment) characterization

Completeness analyses

Separate hazard document(s)Some initial software safety requirements

Additional (robustness) requirements

Additional behavioral data in individual requirements

Documentation of rationale for omission of apparent requirements

State diagrams for the environment

Annotation to input descriptions and environment state diagrams

Additional (robustness) requirements

Additional behavioral data in individual requirements


Additional Data and Activities During the Requirements Phase for Safety Critical Software (cont'd)

Activity Product

Output classifications

State safety classification

State graph analysis

Additional (robustness) requirements Annotation to individual requirementsState diagrams (if not already required)Annotations to state diagramsAdditional safety requirements

Possible modifications to state diagrams (and hence to requirements)

Annotations to state diagramsRationale for odd or omitted graph features


Now We're Done!

completeness, robustness, and safety

Documents

requirements process

additional requirements

requirements elicitation

safety issues

safety critical systems

motivation41809 copyright

different levels

types of information