why is the fda so interested in software...
TRANSCRIPT
Why is the FDA So Interested in Software
Process?
Relearning the lessons of history
Brian ShoemakerShoeBar Associates
and Christine ShoemakerPerceptive Informatics
The Team
FDA - Software Process
� In FDA-regulated settings, software people often find validation their biggest headache.
� We begin to understand when we see the FDA’s own history of tragedy and response.
� The Therac-25 story warns us what happens when we assume the software is infallible.
� A study of medical-device SW failures found lack of process steps a common root cause.
� Safety requires a process aimed to prevent errors, not remove them.
Any Big Pharma Projects Lately?
� Development documents galore!� User Requirements� Detailed Design� Traceability Matrix
� Detailed, scripted manual tests� Documented testing at multiple phases
� Unit testing� System testing� User Acceptance Test (a.k.a. “UAT”)
Pharma Projects: Developer’s Nightmare
Any Big Pharma Projects Lately? (cont.)
� Related process documents� Quality Assurance Plan� Configuration Management Plan� Problem Handling Guideline� Change Control Procedure
� Can anyone define IQ / OQ / PQ?
� Why all this bureaucracy anyway??
Pharma Projects: Developer’s Nightmare
FDA - Software Process
� In FDA-regulated settings, software people often find validation their biggest headache.
� We begin to understand when we see the FDA’s own history of tragedy and response.
� The Therac-25 story warns us what happens when we assume the software is infallible.
� A study of medical-device SW failures found lack of process steps a common root cause.
� Safety requires a process aimed to prevent errors, not remove them.
FDA – A Brief History
� 1862 – Department of Agriculture created with a Bureau of Chemistry
� 1906 – Upton Sinclair The Jungle published (filthy conditions, Chicago meat packing plant);Food and Drug Act of 1906 passed
� 1931 – Food and Drug Administration formed� 1937 – Sulfanilamide Incident
� For strep infections; liquid form developed using diethyleneglycol (antifreeze)
� 107 people, mostly children, died
Regulation: Tragedies and Responses
FDA – A Brief History (2)
� 1938 – Federal Food, Drug, and Cosmetic Act passed� Required scientific proof of safety before new drugs could be
marketed� Outlawed or tightly regulated addition of poisonous substances to food � Created authority for factory inspections.� Permitted Federal court injunctions against violators
� 1941 – Supreme Court ruling, safety violations� Established that FDA can hold individuals, as well as companies,
responsible for violations� 1939 – 1945: World War II � 1947 – Nuremberg Code
� From trials of Nazi war crimes; first ethical code concerning medical research on humans
Regulation: Tragedies and Responses
FDA – A Brief History (3)
� 1962 – Thalidomide incident� Used for sleep disorders and morning sickness; western Europe –
thousands of birth defects
� 1962 – Kefauver-Harris Drug Amendments� First time drug manufacturers required to prove effectiveness before
marketing� Required informing patients if they are taking experimental drugs� Required reporting any adverse effects found during clinical trials
� 1964 – Declaration of Helsinki� Adopted by World Medical Assembly; outlines basic principles of
human experimentation
Regulation: Tragedies and Responses
FDA – A Brief History (4)
� 1972 – Expose of Tuskegee Study� Effects of syphilis in black men; did not provide penicillin even though
they knew it would cure them
� 1974 – National Research Act� Established National Commission for the Protection of Human Subjects� Required IRB review of all clinical research
� 1979 – Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research
Regulation: Tragedies and Responses
FDA – A Brief History (5)
� 1970’s – Dalkon Shield� IUD made by A.H. Robins; series of 12 deaths from miscarriage-related
infections linked with larger of two sizes� Drove device off the market and Robins into bankruptcy, though some
2.8 million women had used the Shield
� 1976 – Medical Device Amendment passed� 1982 – Seven deaths in Chicago from Tylenol
contaminated with cyanide� 1983 – Tamper-resistant packaging regulations
Regulation: Tragedies and Responses
FDA – A Brief History (6)
� 1980’s – Silicone breast implants� 1985-1987 – Therac-25 accidents� 1990 – Safe Medical Devices Act� 1992 – Mammography Quality Standards Act� 1999 – Jesse Gelsinger – U. Pennsylvania� 2000 – Office of Human Research Protections (OHRP)
Regulation: Tragedies and Responses
FDA - Software Process
� In FDA-regulated settings, software people often find validation their biggest headache.
� We begin to understand when we see the FDA’s own history of tragedy and response.
� The Therac-25 story warns us what happens when we assume the software is infallible.
� A study of medical-device SW failures found lack of process steps a common root cause.
� Safety requires a process aimed to prevent errors, not remove them.
Background� Linear accelerator system built for cancer therapy� Developers: Atomic Energy of Canada Ltd. and GCR
(French company)� Instrument was further advancement of earlier models
� Therac-6: 6 MeV, X-ray only, primarily mechanical� Therac-20: 20 MeV, both X-ray and electron beam; added computer
controls for convenience � Therac-25: used "beam folding" to achieve 25 MeV energy; designed
around control entirely by PDP-11 computer (machine control as well as safety limits - no hardware interlocks)
� 11 Units installed in US and Canada; hundreds of patients treated
� Six massive overdoses occurred in 1985-1987; recalled in 1987
Therac-25: Technological Hubris
Linear Accelerators
� Use radiation to destroy cancer tissue� Electron beam treats shallow tissue� X-rays penetrate deeper with minimal damage to overlying area
� X-rays produced by hitting metal target with high-energy electrons
� Patient dosed inside a treatment room; operator stands outside
Therac-25: Technological Hubris
The Safety Issue
� Single electron gun produces both modes
� In x-ray mode, electron energy must be ~100x higher (target is a good attenuator)
� Low energy + target = underdoseHigh energy + no target = huge overdose
Therac-25: Technological Hubris
Therac-25 Software
� Evolved from Therac 6 system; also included reused Therac-20 code
� Written in PDP-11 assembler language� Custom multitasking real-time kernel� Shared-Memory Multiprogrammed Application with no explicit
concurrency control� Little documentation during development; no problem tracking after
release� Minimal unit and integration testing� QA was primarily 2700 hours of use as integrated system� Programmer left AECL in 1986, little information about his
backgroundTherac-25: Technological Hubris
Typical Therac-25 Installation
Therac-25: Technological Hubris
Therac-25 Turntable
Therac-25: Technological Hubris
Therac-25 Input Screen
Therac-25: Technological Hubris
Incident History
� June 1985: Patient at Marietta GA overdosed (shoulder, arm damaged)� Technician informed overdose is impossible
� July 1985: Hamilton Ont. patient severely burned (hip destroyed)� Operator overrode "no dose" indications. Overdose not suspected until patient
returned.� AECL focused on possible position-sensor fault
� December 1985: Yakima WA patient overdosed (striped burn pattern)� Not ascribed to overdose until second accident
� March 1986: First Tyler TX accident (patient died 5 months later)� Audio/video room monitors out of service� Patient felt jolt, got up and got another jolt, pounded on door!� Malfunction 54; sensor read underdosage� AECL found no electrical faults, claimed no previous incidents
Therac-25: Technological Hubris
Incident History (cont.)
� April 1986: 2nd Tyler TX accident (burned in face, died 3 weeks later)� Hospital physicist & operator reproduced SW race condition� AECL: "fix" by disabling up-arrow key. FDA, CHPB inquiries began.
� January 1987: 2nd Yakima WA accident (burn pattern; died in April)� FDA, CHPB recalled device
� July 1987: Equipment repairs approved� November 1988: Final safety report
Therac-25: Technological Hubris
Sequence for Beam Mode Failure1) Operator selects X-ray mode, realizes mistake, and switches back to
electron-beam mode (all within 8 seconds)2) During this time, system is setting up bending magnets and ignoring keyboard
entry. New mode is never copied to the variable read by the gun emission control task.
3) Other tasks register the edit, however; turntable is moved to electron-beam position and screen updated.
4) Screen indicates "Beam Ready." Technician enters "B" to fire the beam.5) Screen indicates "Malfunction 54" and dose monitor shows substantial underdose
(because detectors are saturated).6) Treatment Pause state allows operator to press [P] key to proceed, trying again up
to 5 times.
Result: Instrument administers 15,000-25,000 rads instead of the intended 80-200 (1000 rads whole body can be fatal)
Reproduced on Therac-20, but interlocks shut down the system!
Therac-25: Technological Hubris
Reconstructed Cause: Set-Up Failure� Operator enters prescription data at keyboard (outside) before completing
setup of machine parameters (inside)� Operator positions patient by use of "field light" feature - turntable set with mirror in
place to shine light, not electrons or x-rays, on treatment area� Set-up subroutine loops during manual positioning. If turntable not in position,
increments a counter (nonzero = inhibit beam)� Housekeeping task uses the same counter as a flag: if nonzero, run position check (if
zero, skip position check)� Counter is only 8 bits; it overflows every 256 ticks, and the "flag" indicates a zero
(proceed) condition� If "set" command is given at precisely that instant, software turns on full 25 MeV
electron beam with no target or scanning. Blocking tray (part of field light attachment) produces striped pattern.
Result: patient receives 4,000 - 5,000 rads (doubled, because of 2 tries) instead of 86 rads prescribed
Therac-25: Technological Hubris
Selected IssuesSystem:� Cryptic error messages; errors and malfunctions common� Operators could, and often did, override Treatment Pause� Did not produce audit trail that could help diagnose problemsAECL's Responses:� Repeatedly denied possibility of accidental overdose� Focused on convenience over safety� Did not alert all users when accidents occurred� Failed to look for root causes - seized on each issue as the problem � Ignored portions of both Canadian and US safety-improvement requests� Treated requirements / design / directed testing as troublesome afterthoughtFDA:� Had no requirement that users report medical device incidents� Could not force a recall, only recommend one
Therac-25: Technological Hubris
Dangerous S/W Errors: not just history
Therac-25: Technological Hubris
� June 6, 2006: Ventilators recalledhttp://www.fda.gov/oc/po/firmrecalls/hamilton06_06.htmlVentilators with older generation software, under specific conditions following oxygen cell calibration without compressed air supply, can be put in a state where no visible or audible alarms are triggered because of a software algorithm designed to suppress false positive alarms. Note that oxygen cell calibration is intended to be performed while the ventilator is connected to both air and oxygen high pressure gas sources (i.e. error occurred when not following instructed procedure).
� March 6, 2006: Dialysis device recalledhttp://www.fda.gov/cdrh/recalls/recall-081605.htmlClass I recall of dialysis device (11 injuries, 9 deaths): excessive fluid loss may result if caregiver overrides device's "incorrect weight change detected" alarm. (Device used for continuous solute and/or fluid removal in patients with acute renal failure.)
� March 15, 2005: Infusion pump fails if receives dataSerial port on back of infusion pump allows nurse call system or hospital information system to monitor pump. If external system sends data to pump, however, failure condition 16:336 may result; if this occurs during infusion, pump must be powered off and restarted.
FDA - Software Process
� In FDA-regulated settings, software people often find validation their biggest headache.
� We begin to understand when we see the FDA’s own history of tragedy and response.
� The Therac-25 story warns us what happens when we assume the software is infallible.
� A study of medical-device SW failures found lack of process steps a common root cause.
� Safety requires a process aimed to prevent errors, not remove them.
Learn from past software mistakes?
D. Wallace & R. Kuhn (NIST, 1999):� 383 SW-related device recalls� Manufacturer recalls 1983-1997� No deaths or serious injuries� Data only from FDA records� Could only classify fault type for 342
The Lessons of Failure
Recall Study – Device Types
The Lessons of Failure
Recall Study - Fault Type Breakdown
The Lessons of Failure
Recall Study – Fault Distribution
The Lessons of Failure
Recall Study: Practices Suggested
Development / Maintenance (error prevention)� Complete specification of requirements� Traceability of all artifacts � Software configuration management� Change impact analysis� Traceability & CM of all changes� Expertise in application domain� Attention to details of current process� Training
The Lessons of Failure
Quality Assurance (error prevention)� Disciplined inspection / review� Traceability Analysis� Mental execution of trouble spots� Code reading� Recording & using fault information� Recording symptoms that indicate faults� Employing checklists, questions, methods to force symptoms
to occur� Proving algorithm correctness, formally or informally� Simulating complex situations
Recall Study: Practices Suggested
The Lessons of Failure
Testing (error detection)� Build test cases to elicit known problem
symptoms� Stress test� Analyze impact of changes; regression test� Release new versions only with impact analysis & regression
testing� Integration tests: focus on interface values� System test under many environmental conditions and with
bad input data� Record test results, especially failures
Recall Study: Practices Suggested
The Lessons of Failure
� Authors have placed paper on an NIST site(http://hissa.nist.gov , Fault & Failure Analysis Repository)
� Failure study is only one part of the collection� Also compiled: failure types / prevention techniques for C++
and for OO� Related information available from British research
collaboration on dependability (DIRC)(http://www.dirc.org.uk; health system failures athttp://www.dirc.org.uk/publications/inproceedings/papers/29.pdf)
Recall Study: Building a Handbook
The Lessons of Failure
FDA - Software Process
� In FDA-regulated settings, software people often find validation their biggest headache.
� We begin to understand when we see the FDA’s own history of tragedy and response.
� The Therac-25 story warns us what happens when we assume the software is infallible.
� A study of medical-device SW failures found lack of process steps a common root cause.
� Safety requires a process aimed to prevent errors, not remove them.
Safety: Where Software Meets Regulation� Tragedies and responses form the background� Therac-25: attention to software and devices� Failure study - identify error prevention practices� SQA philosophy harmonizes with GMP, GLP, GCP
The principles are the same as in any design field: figure it out, plan it, check the pieces and the whole.
FDA is interested in SQA process!
Software developer hit with FDA warning letterMarch 17, 2006 (http://www.fda.gov/foi/warning_letters/g5783d.htm)
The FDA issued a … warning letter to … Agile Radiological Technologies because [its] Radiation Analyzer (RAy) Film Dosimetry software does not meet good manufacturing practices (GMP) standards and deviates from quality system (QS) regulations.
Issues :Corrective and Preventive Actions• Software errors ("bugs") not used to identify existing and potential quality problemsDesign Controls• No complete procedures to control design of the device• Design changes not documented, validated, reviewed, approved• No Design History File (DHF) established or maintainedProduction and Process Controls• No validation of software used as part of the quality systemManagement Controls• Management failed to ensure that an effective quality system has been established and
implemented• No adequate procedures established for quality audits, nor were audits conducted
Sources� Therac-25 Investigation:
Leveson, N.G., Safeware - System safety and Computers. 1 ed. 1995: Addison-Wesley Publishing Company Inc.Revised version at: http://sunnyday.mit.edu/papers/therac.pdf
� Medical Device Software Failure Study:Wallace, D., R. and D.R. Kuhn, Lessons from 342 Medical Device Failures. in 4th IEEE International Symposium on High-Assurance Systems Engineering. 1999. Washington, D.C.: IEEE. NIST version at: http://hissa.nist.gov/project/lessonsfailures.pdf
� British study of healthcare system failures:Mackie, J. and I. Sommerville, Failures of healthcare systems, in Proceedings of the First Dependability IRC Workshop, Edinburgh, UK, March 22-23, 2000.
Questions? Comments?
Brian ShoemakerPrincipal Consultant
ShoeBar Associates
781-929-5927