presented by nathanael paul september 25, 2002 y2k: perrow’s normal accident theory (nat) tested...
TRANSCRIPT
Presented by Nathanael Paul
September 25, 2002
Y2K: Perrow’s Normal Accident Theory (NAT) Testedand“Normal Accidents-Yesterday and Today”
2
Some questions…
What is the most code you have ever written? Largest project (number of lines of code) that you have ever worked on?
Y2K – the ultimate system failure Were you an optimist or a pessimist?
3
Society and Systems after 1984
First non-negotiable deadline 180 billion lines of code needs inspecting
Social security: 30 million lines to fix After 400 people and 5 years, only 6 million fixed
“organizations are always screwing up… uncertainty drives system accidents, and this is a hallmark of Y2K”
Failures and Social Incoherence
6
Interdependency
Slight inconvenience or isolated hardships Not as interconnected Handling of problem was better than at first thought
Tight coupling One way of doing things w/o too much slack
“web of connections” Closest analogy to software yet… Alternative paths Testing
7
Optimists and Pessimists
Pessimists Biblical Apocalypse Computer and Financial experts
Optimists Industrial trade groups, government, and companies Late in their response
8
Key points to both sides
P: Everything is linked, so everything is “mission critical” Hard to prioritize
O: Experience with failures before will see us through
O: Testing results not announced b/c of liability P: Accident becomes a catastrophic disaster
(multiple failures with coupled single systems)
9
The Chip
Chips can’t be reprogrammed 7 billion programmable microcontrollers in ‘96
Air Traffic Control’s problems with mainframes People locked in car plant, prisoners let loose “We know something about unexpected
interactions and are more prepared to look for them than ever before.”
“The Butterfly Effect”
10
Electric Power
Society’s lifeblood Complex interconnected “grids” 1998: Most of the 75% electric N. American
power companies were in awareness/assessment (same findings in Jan. 1999)
“Just in time production” Nuclear facilities not “expecting” problems
11
Lack of Interest
Jan. 1998 at premier tech conf. of industry No sessions, no meetings on Y2K One presentation was scheduled People were mad. Y2K was a hoax and presenter was a
profiteering consultant
March, 1998 at 3rd annual industry-wide meeting on Y2k
70 of 8,000 companies were there
One summer’s power meeting canceled b/c of lack of interest
12
More on Power
Interconnectivity No telecom, no power. No power, no telecom. Available fuel supply and delivery
No service obligations to provide base load power to bulk power entities
Gov’t intervention not wanted. Merge, but no fix
13
And last, but not least… Nuclear Power
Jan. 1999: Only 31 percent ready Harder to fix?
Not expecting problems Hard to test all components If not ready by 2000, shutdown Provided 25% of power
40% in Northeast
14
Y2K going wrong… We give up.
Y2K compliance vs. Y2K readiness “The Domino Effect”
Banking Shipping Farming and hybrid seeds
Just show them the software warranty You’re probably not liable anyway
15
Conclusions about NAT and accidents today
Which of the characteristics of NAT does software normally exhibit? Tight/loose coupling? Interdependency? Linear/Complex? “web of connections”
What has been done in the past to help in reducing “accidents” of software? (reduction of tight coupling/complexity/interdependencies)
Let’s see what Strauch has to say…
16
Other Accidents and Views
Challenger and Chernobyl Do these accidents support NAT?
Operator Error Someone to blame? Justified blame?
Chemical Refineries, Nuclear Power, commercial aviation have all seen drops in accidents or in types of accidents in Perrow
Can Perrow’s assertions be justified that more system accidents would happen?
17
1995 crash of American Airlines Boeing 757 in Cali, Columbia
Saving time and expense by landing to the south (Miami to Cali)
Many tasks performed to get ready for approach Approach named after final approach fix
(unusual, not named after initial approach) Initial approach beacon, Tulua, deleted from
approach data Flew to final approach (not initial)
18
Factors involved in Cali crash
“Hide the results of operator actions from operators themselves”
Navigational database design Abbreviations used and instrument approach
procedures Nepal 1992 accident, very similar to Cali crash
Lesson was not learned.
19
Accident Frequency since ‘84
Depends on country and particular system Perrow’s assertions affected by:
Industry variables Cultural variables Hindsight of his work in helping others And…
20
Technology
Airbus Industrie (A-320) Introduce new technology, time to familiarize High to lower rate of fatalities as time goes on
Training Better able to emulate real system Focus on what people need seen in training Training related accidents all but eliminated Operator error reduction in training? Was Perrow
still right?
21
Aviation Technology
CFIT Ground Proximity Warning Systems (GPWS) not
good at high speeds (Cali crash good example) Terrain Aural Warning System (TAWS) No TAWS aircraft in CFIT crash, yet…
In flight collision of 2 aircraft Terminal Collision Alerting System (TCAS) No 2 planes with TCAS in a collision, yet…
22
Organizational Accidents – Organizational Features in system safety
Valujet ’96 crash of DC-9 Canisters of chemical oxygen generators Non-traditional contracting out of work Maintenance personnel were rushed to work on 2
aircraft to meet deadlines Canisters not labeled correctly Warehousing personnel returned the canisters to
their rightful owner
23
What Happened?
Cost reductions over safety Regulation (FAA) failed where accident may
have been prevented Enron
24
Learning from our mistakes
Something done before 1984… Shortcomings of navigational databases
addressed (Cali accident) FAA operational oversight addressed (Valujet) Financial system deficiencies addressed (Enron) Rejected take offs decreased after better training What about Exxon Valdez oil spell (vessel’s
master and alcohol)
25
Doomed to repeat, if there is no change
Airplane flaps and slats ’87 Northwest Airlines crash in Detroit (better
training and aural warnings) Dallas-Ft. Worth crash b/c of flaps and slats (made
sure this didn’t happen again)
Concorde Engine could eat tire debris, tires in front of engine Nothing done until 2000 Paris accident (problem
cited much earlier by engineers)
26
Conclusions
Was NAT successful? Why? “Features” can create deficiencies Are systems any more comprehensible? Operator error vs. Design Error Why the reduction in system accidents?
Have we truly stopped accusing the operator and started looking at the systems?
Technology Did it help or hurt more in system accidents?