cse 466 spring 2000 - introduction - 1 the final hardware: probably something on memory mapped i/o...
DESCRIPTION
CSE 466 – Spring Introduction - 3 Latent Faults Any fault this is not detectable by the system during operation has a probability of 1 – doesn’t count in single fault tolerance assessment Detection might not mean diagnosis. If system can detect secondary affect of device failure before a hazard arises, then this could be considered safe Backup H2 Valve Control Main H2 Valve Control watchdog handshake stuck valves could be latent if the controllers cannot check their state. May as well assume that they are stuck!TRANSCRIPT
CSE 466 – Spring 2000 - Introduction - 1
The Final
Hardware: probably something on memory mapped I/O (HW and SW) OS: Probably a task diagram of some kind, related to Tiny Something related to how the I2C bus works Something about frames, protocols, and layers…closely related to the
physical and transport layers that we discussed in class (not the master/slave one)
A safety question
CSE 466 – Spring 2000 - Introduction - 2
Time is a Factor
The TUV Fault Assessment Flow Chart T0: Fault tolerance time of the first failure T1: Time after which a second fault is likely Captures time, and the notion of “latent faults”
T0 – tolerance time for first fault T1 – Time after which a second fault is likely
Based on MTBF data Safety requires that
Ttest<T0<T1
1st Fault
hazard afterT0?
SystemUnsafe
yes
no
2nd
FaultSystem
Safe
FaultDetectedbefore T1?
yes
no
yes
no
hazard?
CSE 466 – Spring 2000 - Introduction - 3
Latent Faults
Any fault this is not detectable by the system during operation has a probability of 1 – doesn’t count in single fault tolerance assessment
Detection might not mean diagnosis. If system can detect secondary affect of device failure before a hazard arises, then this could be considered safe
BackupH2 ValveControl
MainH2 ValveControl
watchdoghandshake
stuck valves couldbe latent if the controllers cannotcheck their state.
May as well assume thatthey are stuck!
CSE 466 – Spring 2000 - Introduction - 4
Fail-Safe Design (just an example)
On reset processor checks status. If bad, enter “safe mode” power off reduced/altered functionality alarm restart
Safe mode is application dependent
Processor Watchdogprotocol
protocolfailure
reset
status
CSE 466 – Spring 2000 - Introduction - 5
Safety Architectures
Self Checking (Single Channel Protected Design) Redundancy Diversity or Heterogeneity
BrakePedal
PedalSensor Computer Computer
Bus
Brake
EngineControl
watchdogprotocol
parity/crcPeriodic internalCRC/Checksumcomputation(code/data corruption)
Beef up each link in the chain
CSE 466 – Spring 2000 - Introduction - 6
Single Channel Protection
Self Checking perform periodic checksums on code and data How long does this take? Is Ttest < T0?
Feasibility of Self Checking Fault Tolerance Time Speed of the processors Amount of ROM/RAM Recurring cost v. Development cost tradeoff
Computer(code
corruption)
ComputerBus
Brake
EngineControl
parity/crc on the bus
CSE 466 – Spring 2000 - Introduction - 7
Redundancy
Homogeneous Redundancy Low development cost…just duplicate High recurring cost No protection against systemic faults
Computer(code
corruption)
Brake
EngineControl
Computer
Computer VotingBus
could be implemented similar to collisiondetection
CSE 466 – Spring 2000 - Introduction - 8
Multi-Channel Protection
Heterogeneous Redundancy (Diversity) Protects against random and
some systemic faults. Best if implementation teams are kept
separated Space shuttle: five computers, 4 same 1 different
Proc/SW1
Brake
EngineControlProc/SW
2
VotingBus
CSE 466 – Spring 2000 - Introduction - 9
Design for Safety
1. Hazard Identification and Fault Analysis
2. Risk Assessment
3. Define Safety Measures
4. Create Safe Requirements
5. Implement Safety
6. Assure Safety Process
7. Test,Test,Test,Test,Test
CSE 466 – Spring 2000 - Introduction - 10
1. Hazard Identification – Ventilator Example
Hazard Severity Tolerance Time
Fault Example
Likelihood Detection Time
Mechanism Exposure Time
Hypo-ventilation
Severe 5 min. Vent Fails Rare 30sec Indep. pressure sensor w/ alarm
40sec
Esophageal intubation
Medium 30sec C02 sensor alarm
40sec
User mis-attaches breathing hoses
never N/A Different mechanical fittings for intake and exhaust
N/A
Over-pressurization
Sever 0.05sec Release valve failure
Rare 0.01sec Secondary valve opens
0.01sec
Humanin Loop
CSE 466 – Spring 2000 - Introduction - 11
FMEA – Working Forward
Failure Mode: how a device can fail Battery: never voltage spike, only low voltage Valve: Stuck open? Stuck Closed? Leaky? Motor Controller: Stuck fast, stuck slow? Hydrogen sensor: Will it be latent or mimic the presence of hydrogen? Thermistor: Looks hot or looks cold?
FMEA For each failure mode of each device perform hazard analysis as in the
previous flow chart Huge search space
CSE 466 – Spring 2000 - Introduction - 12
2. Risk Assessment
Determine how risky your system is S: Extent of DamageSlight injurySingle DeathSeveral DeathsCatastrophe
E: Exposure Timeinfrquentcontinuous
G: PrevenabilityPossibleImpossible
W: Probabilitylowmediumhigh
1
2
3
4
5
6
7
8
3
4
5
7
6
-
1
2
2
3
4
6
5
-
-
1
W3 W2 W1
S1
S3
S2
G2
G1
G2
G1
S4E2
E1
E2
E1
CSE 466 – Spring 2000 - Introduction - 13
Example Risk AssessmentDevice Hazard Extent of
DamageExposure Time
Hazard Prevention
Probability
TUV Risk Level
Microwave Oven
Irradiation S2 E2 G2 W3 5
Pacemaker Pacing too slowly
Pacing too fast
S2 E2 G2 W3 5
Power station burner control
Explosion S3 E1 -- W3 6
Airliner Crash S4 E2 G2 W2 8
CSE 466 – Spring 2000 - Introduction - 14
3. Define the Safety Measures
Obviation: Make it physically impossible (mechanical hookups, etc). Education: Educate users to prevent misuse or dangerous use. Alarming: Inform the users/operators or higher level automatic monitors of
hazardous conditions Interlocks: Take steps to eliminate the hazard when conditions exist (shut off
power, fuel supply, explode, etc.) Restrict Access. High voltage sources should be in compartments that
require tools to access, w/ proper labels. Labeling Consider
Tolerance time Supervision of the system: constant, occasional, unattended. Airport
People movers have to be design to a much higher level of safety than attended trains even if they both have fully automated control
CSE 466 – Spring 2000 - Introduction - 15
4. Create Safe Requirements: Specifications
Document the safety functionality eg. The system shall NOT pass more than 10mA through the ECG lead. Typically the use of NOT implies a much more general requirement
about functionality…in ALL CASES Create Safe Designs
Start w/ a safe architecture Keep hazard/risk analysis up to date. Search for common mode failures Assign responsibility for safe design…hire a safety engineer. Design systems that check for latent faults
Use safe design practices…this is very domain specific, we will talk about software
CSE 466 – Spring 2000 - Introduction - 16
5. Implement Safety – Safe Software
Language FeaturesType and Range Safe SystemsException Handling
Re-use, EncapsulationObjectsOperating SystemsProtocols
TestingRegression TestingException Testing (Fault Seeding)
Nuts and Bolts
CSE 466 – Spring 2000 - Introduction - 17
Language Features Type and Range Safe Systems: Pascal, Ada….Java?
Program WontCompile1;type
MySubRange = 10 .. 20; Day = {Mo, Tu, We, Th, Fr, Sa, Su};var MyVar: MySubRange; MyDate: Day;begin
MyVar := 9; {will not compile – range error}MyDate := 0; {will not compile – wrong type)
True type safety also requires runtime checking.
a[j] := b; what must be checked here to guarantee type safety?range of j, range of b – this takes a lot of time!
Over head in time and code size. But safety may require this. Does type-safe = safe? If no, then what good is a type safe system?
CSE 466 – Spring 2000 - Introduction - 18
Guidelines
Make it right before you make it fast Verify during program execution
Pre-condition invariants Things that must be true before you attempt to perform and
operation. Post-condition invariants
Things that must be true after and operation is performed eg
while (item!=tail) {process(item);if (item->next == null) {
throw new CorruptListException(“Item” + item.id());
} else item = item->next;}
Exception handlingWhat should happen in the event of an exception?
who shouldbe responsiblefor this check?
CSE 466 – Spring 2000 - Introduction - 19
Exception Handling Its NOT okay to just let the system crash if some operation fails! You must,
at least, get into safe mode. it is up to the designer to perform error checking on the value returned by f1
and f2. Easily put off, or ignored. Can’t distinguish error handling from not, no guarantee that all errors are handled gracefully.
a = f1(&b,&c)if (a) switch (a) {
case 1: handle exception 1case 2: handle exception 2…
}b = f2(&e,&f)if (a) switch (a) {
case 1: handle exception 1case 2: handle exception 2…
}
CSE 466 – Spring 2000 - Introduction - 20
Exception Handling in Java
void myMethod() throws FatalException {try {
a = x.f1(&b,&c)b = x.f2(&e,&f)
} catch (IOException e) {recover and continue
} catch (ArrayOutOfBoundsException e) {not recoverable, throw new FatalException(“I’m Dead”);
} finally {finish up and exit
}}Exceptions that are not handled will terminate the current procedure and raise
the exception to the caller, and so on. Exceptions are subclassed so that you can have very general or very specific exception handlers.
Separatesthrowing exceptionsfunctional codeexception handling
CSE 466 – Spring 2000 - Introduction - 21
Safety of Object Oriented SW Strongly typed at compile time Run time checking is not native, but can be built into class libraries for extensive
modularization and re-use. The class author can force the app to deal with exceptions by throwing them!
class embeddedList extends embeddedObject() {public add(embeddedObject item) throws tooBigException {
if (this.len() > this.max()) throw new tooBigException(“List size too big”);
else addItem2List();
} If you call embeddedList.add() you have three choices:
Catch the exception and handle it. Catch the exception and map it into one of your exceptions by throwing an
exception of a type declared in your own throws clause. Declare the exception in your throws clause and let the exception pass through
your method (although you might have a finally clause that cleans up first). Compiler will make you aware of any exceptions you forgot to consider!
When to use exceptions and when to use status codes or other means?
CSE 466 – Spring 2000 - Introduction - 22
More Language Features Garbage collection
What is this for Is it good or bad for embedded systems
Inheritance Means that type safe systems can still have functions that operate on generic
objects. Means that we can re-use commonalities between objects.
Encapsulation Means the the creator of the data structure also gets to define how the data
structure is accessed and used Means that the data structure can change without changing the users of the data
structure (is the queue an array or a linked list…who cares!) Re-use
Use trusted systems that have been thoroughly tested OS Networking etc.
next Friday … how would Java be mapped to an embedded processor…say C++ to C51. What restrictions would you need to support that?
CSE 466 – Spring 2000 - Introduction - 23
Testing
Regression Test
Fault Seeding
CSE 466 – Spring 2000 - Introduction - 24
Safe Design Process
Mainly, the hazard/risk/FMEA analysis is a process not an event! How you do things is as important as what you do. Standards for specification, documentation, design, review, and test
ISO9000 defines quality process…one quality level is stable and predictable.
CSE 466 – Spring 2000 - Introduction - 25
Next Week
No Monday Wednesday: PCB Layout Contest
maybe something on UML/java? Friday
Embedded Java UML Example for Engine Controller Demo: Air Trombone Demo: Talk Application Demo: Hi Fidelity Outcome of the PCB contest