1 8/8/00 the failure of a small satellite and the loss of a space science mission r. katz national...
TRANSCRIPT
8/8/00 1
The Failure of a Small Satellite and the Loss of a Space Science Mission
R. KatzNational Aeronautics and Space AdministrationElectrical Systems CenterGoddard Space Flight Center
8/8/00 2
Overview
• Background and Introduction
• How did the mission* fail?
• Why did mission fail?
* SMEX/WIRE Small Explorer Wide Field Infrared Explorer
8/8/00 3
"rk"
• Experience: JPL, NASA GSFC
• Design Engineer, Electrical• Galileo, Magellan, Cassini, ISTP, SIRTF, MGS, SMEX, etc.
• Research and Technology Development• Logic, FPGAs, Radiation, Design Techniques
• Reviews, Failure Investigations• Cassini, HST, EOS-AM, AXAF, HETE-2, SIRTF, etc.• Small Explorer WIRE
8/8/00 4
Failure Examples (Simplified)Mars Climate Orbiter Units
Mars Polar Lander 1 Line of Missing Software
Ariane V/501 Operand Error, Unprotected
Sea Launch Ground S/W Logic; Valve Config
Intelsat VI “Two wires crossed”
Terriers Inverted Sign
IUS 21 Tape/Thermal Wrap
Titan IV Data Entry Error
SMEX/WIRE 1 Wire, Disable Buffer
8/8/00 5
Payload/Launcher Success Rates
Year
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Su
cce
ss R
ate
(%
)
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
PayloadPayload FitLaunch VehicleLauncher Fit
8/8/00 6
1999 Payload Failures
1. WIRE (NASA)
2. TERRIERS (Boston University/AeroAstro)
3. Abrixas (Germany)
4. SACI 1 (Brazil)
All Small Scientific Satellites
8/8/00 7
Small Explorer (SMEX) Program
Spacecraft Mass(kg) Launch Date
Galileo 2,562 1989
SMEX 150-300 1992-1999
SMEX/WIRE 250 1999
UoSAT-12 325 1999
SNAP-1 7 2000
8/8/00 8
Wide-Field Infrared ExplorerProgrammatic
PI: JPL
Spacecraft: NASA Goddard Space Flight Center
Instrument: Utah State University - SDL
Launch: Orbital Science Corp. - Pegasus XL
Cost: $75 million
Duration: 4 Months
8/8/00 9
Wide-Field Infrared ExplorerTechnical
Objective: Deep Infrared, Extragalactic Survey
Detectors: Two 128 x 128 Si:As Arrays
Telescope: 30 cm Cassegrain
Cryostat: Solid Hydrogen; Dual Stage 7 K/12 K.
Orbit: 540 kilometer
8/8/00 10
Logic System Overview
RelayFET
PYRO
SPE
SCS
+28V
ARM
FIRE
LM117REG
+5VDC
CRYSTALOSC
+5VDC
200 kHz
PORR,C, 4093B
+5VDC
POR PULSE
+5VDC
A1020
200 kHz
POR
ARM
FIRE
PYRO BOXSpacecraft
8/8/00 11
WIRE Spacecraft
CompositeSpacecraft
Star Tracker
Modular Solar Array
ApertureShade
8/8/00 12
The WIRE Mission
March 4th: Launch, Vandenberg Air Force Base/L-1011
T+9 min: Separation Nominal
T+29 min: Antarctica Pass - Vent Command Xmitted
T+79 min: NORAD Tracks 3 Objects, Including Cover
T+99 min: Alaska Pass - Tumbling*
T+36 Hrs: Cryogen Supply Exhausted
March 8th: Mission Declared Lost
* Eventually Spun up to 60 rpm
8/8/00 13
Loss of Control - Telemetry
8/8/00 14
Root Cause of Failure (1)
The root cause of a failure is the mechanism that directly caused the mishap.
Significant contributing causes include events or conditions that could have been used to identify this condition as the phenomena has been understood.
Contributing factors are other events or conditions that might have been able to prevent the mishap and should have been done significantly better.
8/8/00 15
Root Cause of Failure (2)The root cause of the WIRE mission loss is a digital logic design error in the instrument pyro electronics box.
The transient performance of components was not adequately accounted for in its design.
The failure was caused by two distinct mechanisms that, either singly or in concert, resulted in inadvertent pyrotechnic device firing during the initial pyro box power-up.
8/8/00 16
Requirements for Failure
• Design Error (2)• Errors Not Caught In:
– Analysis– Simulation– Design Reviews– Box Level Tests– Instrument Level Tests– Spacecraft Integration Tests– Spacecraft Systems Tests– Final Reviews
8/8/00 17
SMEX/WIRE System
8/8/00 18
Why Did WIRE “Spin Up?”
• Zero Thrust Vent - a “T.”
• Vent Located To Minimize Pressure (Temperature).
• One Side of “T” Pointed At Connector.
• No Analysis of Exit Design During a Worst-Case Venting Scenario.
• ACS Could Not Overcome Force
• Spun Up To 60 RPM
8/8/00 19
"System" Perspective
Pyros
Spacecraft Power
Electronics
SpacecraftComputer
System
(80386/387)
+28V
ARM
FIRE
PYRO BOX
Spacecraft Instrument
+28VBUS
A 4th level of protection was an arming plug.
Cover
"PYRO Subsystem"
Pyros
Vent
8/8/00 20
Basic Pyro Characteristics
• NASA Standard Initiator, Type 1 (NSI-1)
• No-Fire: 1 Amp and 1 Watt for 5 minutes
• Bridgewire Impedance: ~ 1
• Fire Time: ~ 1 ms @ 5 amps
8/8/00 21
Vent Cover
"Pyro Box" Perspective
Power +28V
InstrumentPyro Box
MultiplePyro
Functions
Pyro Box is poweredoff during launch
Logic Signal Arm
• Pulse forming• Timing.• Lockouts.• Filtering.
Logic Signal Fire FPGA - Complex• FSM• Counters
8/8/00 22
Voltage Regulation
8/8/00 23
Regulator Circuit
15 F and 0.1 F capacitors.
+28V IN
+5V OUT
8/8/00 24
EM Regulator Performance
+5 VDC
+28V
5 ms/Div
8/8/00 25
Logic Design (1)
Reset Circuitryand
Crystal Clock Oscillator
8/8/00 26
Flight Oscillator on System Board
8/8/00 27
Crystal Oscillator CharacteristicsIt is known that crystal oscillators do not start immediately with the application of power. From Horowitz and Hill's The Art of Electronics, 2nd Edition:
... However, because of its high-resonant Q, a crystal oscillator cannot start up instantaneously, and an oscillator in the megahertz range typically takes 5-20 ms to start up; a 32 kHz oscillator can take up to a second (Q = 105). ...
Start up time for oscillators is sometimes not included in the specification.
- SMEX/WIRE Class S screening specification did not include a start up time limit.
8/8/00 28
Example Oscillator Start Time
200 kHz
+5 VDC
1 ms/Div
Power Supply Rise Time = 1 ms for this example
8/8/00 29
Summary of Oscillator Start TimesSMEX WIREOscillator Startup Time Test
T = 10C
Power Supply Rise Time (msec)Measured from 10%-90%
0 50 100 150 200
Sta
rt T
ime
(mse
c)F
rom
Pow
er S
uppl
y @
Sta
rtup
1
10
100
1000
8/8/00 30
Summary of Oscillator Start TimesSMEX WIRE
Oscillator Startup Time TestT = 10C
Power Supply Rise Time (msec)Measured from 10%-90%
0 50 100 150 200
Sta
rt T
ime
(mse
c)F
rom
Pow
er S
uppl
y @
Sta
rtup
0
50
100
150
200
250
8/8/00 31
Oscillator Startup on WIRE EM
+28V
+5V
200 kHzOscillator
Output
5 ms/Div
23 ms
8/8/00 32
• Reset Flip-Flips– 3 Flip-Flops– At Least One Must Be A “0” To Be Safe– 7 Chances In 8
• ARMCNT Block– 14 Flip-Flops– All Must Be A “0” To Be Safe– One Chance In 16,384
• TIMECNT Block– 8 Flip-Flops– All Must Be A “0” To Be Safe– One Chance In 256
Note: Two SidesPFailure ~ 25%
Logic AnalysisAssuming Random Power Up Of Flip-Flops
8/8/00 33
Logic Design (2)
FPGA Transient Behavior
8/8/00 34
FPGA and Drivers
RelayFET
PYRO
+5VDC
A1020FPGA
200 kHz
POR
ARM
FIRE
+28 VDC
8/8/00 35
FPGA Implementation:Charge Pump And Isolation FETs
CHARGE
PUMP
HV Isolation FETs
Module Input
ModuleOutput
Antifuse
8/8/00 36
A1020 Output TransientOverview
Documented In Actel App Notes; EEE Links, WWW Site
Not Documented In Data Sheet
I/O May Power-up Uncontrolled
Inputs May Source Current
Outputs May Be Invalid
Truth Tables Not Followed
Device Architecture
Requires HV Isolation
FETs ON
Charge Pump Needs
Time To Start, Bias HV
FETs
8/8/00 37
Output Transient - Investigation
• Flight Pattern Obtained From SDL
• Devices Programmed For Bench Test
– A1020B’s (3)
– Non-flight A1020 (1)
– Flight A1020 (2)
• Transients Observed On Critical Outputs
• Critical Outputs May Be Latched High
8/8/00 38
A1020 Sample Transient
5 ms/Div
Cover
Arm
VCC
Device Had Been Powered Off For 2 Days
8/8/00 39
A1020 FPGA Output Transient Summary
Longer power supply rise times
Increase the probability of the transient
Increase the size of the transient
Quick power cycles tend to eliminate transients
Long power-off times tend to increase the chance of a transient (memory effect).
Now it was known how to test the Engineering Model
8/8/00 40
Failure Demonstration on EM
A Side Power Input5 A/Div
13.5 msec
1.6 msec
8/8/00 41
Instrument Level Testing
Fidelity of Spacecraft Power Electronics (SPE) Simulation
8/8/00 42
Relay Operating CharacteristicsWIRE Failure AnalysisRelay Operate TimeFlight Spare S/N 001
NASA GSFCMay 19, 1999
Coil Voltage (volts)
11 12 13 14 15 16 17 18
Ope
rate
Tim
e (m
sec)
0
20
40
60
80
100
120
140
Notes:
1. Pulse width = 800 msec2. Neither of the two relays would operate at 11V
8/8/00 43
+28V Bench Power SupplyInstrument Level Testing
50 ms / Div
10V / Div
LogicBegins ToFunction
RelayStarts ToOperate
RelayCloses
8/8/00 44
Spacecraft Level Testing
Fidelity of Pyrotechnic Simulation
8/8/00 45
EED Simulator - Input Stage
Easy To “Trip”
Low-Impedance Switched In After Delay
8/8/00 46
EED Simulator - Delay
+5VDC2V/Div
CURRENT1 A/Div
10 ms/Div
23 ms
8/8/00 47
Spacecraft Level Testing
Problem Reporting and Analysis
8/8/00 48
Reporting Mechanism Not Used
• Simulator Box Tripped In System Level Tests
• Procedure Was To Reset The Simulator– Dispositioned "OK" By Similarity to Previous
Mission With Different Hardware Set– Not Troubleshot in Depth– Design Engineer Not Involved
• No Failure Report Written– Eliminated Reviews of Failure Report
8/8/00 49
Conclusions
and
Points for Discussion
8/8/00 50
Reviews• Single System Review
• Pyro Box Not Ready For Review– Never Reviewed: “Fell Through The Cracks”
• Would Reviews Prevented Mission Loss?– SDL Engineers Not Familiar With Startup
Transient In A1020 Device– Neither Was The Local Actel FAE– Customer Review Board Members?
Makeup Of Review Teams And Depth of Reviews Are Critical
8/8/00 51
Simulation• Simulation Is A Valuable Tool
• Simulation And Analysis Work On Models Of Hardware
• Simulation Models Are Not 100% Accurate.– Frequently Poor For Transient Conditions, Like
Startup, In Digital Circuits
Logic Simulation (Machine) Can Not Replace Analysis (Human)
8/8/00 52
Testing• Fidelity of Test Equipment Critical
• “Test As You Fly; Fly As You Test”– End-to-End Testing with Realistic Timelines
• Qualification By Test Limited– Reliability Can Not Be "Tested Into" A System
• Process For Failure Reporting And Disposition.– Real-Time Disposition Without Full
Documentation and Proper Analysis Is Obviously Risky
8/8/00 53
Fault Tolerance• Designers Concerned About Getting The
Cover Off, Not Keeping It On.
• Analysis Of Worst-Case Situations– Worst-Case Venting Scenarios for WIRE Not
Analyzed.
• Sizing of Attitude Control System
Trade-offs of Risk vs. Cost/Size vs. Performance
8/8/00 54
Complexity• Design More Complex Than Required.
• Extra Protection Features.
• Redundancy Doubled Probability of Failure.– Parallel Reliability Model For WIRE Architecture
• WIRE Designers' Comments on Critical Function Architectures– “KISS” principle
• Analyze requirements
• Develop several approaches to meet same
• Analyze from different perspectives
• Pick the simplest one, all other things being equal
8/8/00 55
Mission Outcome As AFunction of Complexity and Budget
WIRE
8/8/00 56
Mission Outcome As A Function of Complexity and Schedule
WIRE
8/8/00 57
Additional Reading and References• “WIRE Mishap Investigation Board Report,” Darrell R. Brancome, Chairman,
NASA Headquarters, June 8, 1999.
• “Small Explorer WIRE Failure Investigation Report,” Richard B. Katz, NASA Goddard Space Flight Center, May 29, 1999.
• “Startup Design and Analysis Note, ” Richard B. Katz, NASA Goddard Space Flight Center, May 12, 1999.
• “Start up Application Concerns with Actel Corp. Field Programmable Gate Arrays (FPGAs),” NASA Parts Advisory NA-046, May 27, 1999.
• “Why Space Mishaps Are On The Rise,” Marco Caceres, AIAA Aerospace America, July 2000, pp. 18-20.
• "Use of FPGA's in Critical Space Flight Applications-A Hard Lesson,” W. Gibbons and H. Ames, Utah State University, Mil/Aero Applications of Programmable Logic Devices International Conference, 1999.
• Failure Reports For Various Missions Collected at: http://rk.gsfc.nasa.gov/reports.htm
• "Aerospace Corp. Study Shows Limits of Faster-Better-Cheaper," Michael A. Dornheim, pp. Aviation Week & Space Technology, June 12, 2000, pp. 47-49
• "Recovery of the Wide-Field Infrared Explorer Spacecraft," D. Everett, T. Correll, S. Schick, and K. Brown, 14th Annual AIAA/USU Conference on Small Satellites, 2000.