systems prognostic health management emis 7305 march 28, 2006
DESCRIPTION
Systems Engineering Program. Systems Prognostic Health Management EMIS 7305 March 28, 2006. Christopher Thompson Senior Research Engineer Lockheed Martin Missiles and Fire Control. - PowerPoint PPT PresentationTRANSCRIPT
Systems Prognostic Health ManagementEMIS 7305
March 28, 2006
Christopher ThompsonSenior Research Engineer
Lockheed Martin Missiles and Fire Control
Systems Engineering Program
Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation.
2
Topic Outline
• Introduction• Definitions• The Goal of Prognostic Health Management• PHM Stakeholders• PHM Modeling• Sensors• Prognostics Analysis Tools• Availability• Examples
3
Introduction
EducationB.S. in Electrical Engineering, SMU (1997)
M.S. in Mechanical Engineering, SMU (2001)- Focus: Fatigue and Fracture Mechanics
M.S. in Systems Engineering (one class remaining)- Focus: Reliability, Statistical Analysis
Ph.D. in Applied Science (anticipated ~ 2008)- Proposed Dissertation Title: Sensor Optimization for Systems Prognostic-Diagnostic Health Management in a Unmanned Ground Combat Vehicle
4
Introduction
ExperienceLockheed Martin Missiles and Fire Control, Dallas TX
Systems Engineer - Multifunction Utility/Logistics Equipment (MULE)Reliability Engineer- Army Tactical Missile System (TACMS)
Lockheed Martin Aeronautics, Fort Worth TXVehicle Systems - Prognostic Health Management- F-35 Joint Strike Fighter
SMU School of Engineering- TA for Dr. Jerrell Stracener
5
Introduction
Future Combat Systems MULE Program
6
Introduction
Some keys to the successful fielding of the U.S. Army’s Future Combat Systems are:
• Reducing the Logistics footprint• Increasing Availability• Reducing total cost of ownership• Implementing Performance Based Logistics• Improvements in the ‘ilities’ (RAM-T)
– Reliability– Availability– Maintainability– Testability– Supportability
7
Some Definitions
Prognostics - Of or relating to prediction; a sign of a future happening; a portent.
Prognostics is the process of calculating and reporting an estimate of remaining useful life for a component, within sufficient time to repair or replace it before failure occurs.
8
Some Definitions
Prognostic Health Management (PHM) – The implementation of an integrated software and hardware system which monitors the health, status and performance of a vehicle or system, tracks consumables (oil, batteries, ammunition, filters, fuel, coolant…) and configuration (software versions, part history…), and determines remaining life of all safety and performance critical components, predicting failures before they occur, thereby enhancing logistics and maintenance activities. PHM consists of ‘on-board’ as well as ‘off-board’ components.
9
Some Definitions
Diagnostics - The identification of a fault or failure condition of an element, component, sub-system or system, combined with the deduction of the lowest measurable cause of that condition through confirmation, localization, and isolation.
• Confirmation is the process of validation that a failure/fault has occurred, the filtering of false alarms, and assessment of intermittent behavior.
• Localization is the process of restricting a failure to a subset of possible causes.
• Isolation is the process of identifying a specific cause of failure, down to the smallest possible ambiguity group.
10
Some Definitions
Fault – A condition that renders an element unable to perform its required function at desired levels of performance, or in a degraded mode.
Failure – The inability of a component, system or
sub-system to perform its intended function as designed. Failure may be the result of one or more faults.
Fault Tolerance – The design of a system so that it will continue to operate in a degraded or reduced level rather than failing completely, when some part of the system fails.
11
Some Definitions
Failure Cascade – The result when a failure occurs in a system of interconnected components, and the successful operation of a component depends on the successful operation of a preceding component. Conversely, a failure can trigger the failure of successive parts, and potentially amplify the result or impact. Redundancy and fault tolerant design can reduce the criticality or impact of the cascade, but not necessarily prevent a failure.
12
Some Definitions
Design Failures – These take place due to inherent errors or flaws in the system design.
Infant Mortality Failures - These cause newly manufactured systems to fail, and can generally be attributed to errors in the manufacturing process, or poor material quality control.
Random Failures - These can occur at any time during the entire life of a system. Electrical systems are more likely to fail in this manner.
Wear Out Failures - As a system ages, degradation will cause systems to fail. Mechanical systems are more likely to fail in this manner.
13
Some Definitions
One-To-One Redundancy - Each active component in a system has a redundant backup on standby. The active component is monitored at all times, and the standby component will activate if the primary component fails. Since the probability of both components failing at the same time is low, One-To-One Redundancy provides the highest level of availability, but at a considerable disadvantage of requiring double the size, weight, power and cost, while reducing reliability (more components which can fail).
14
Some Definitions
N + X Redundancy – N components are required to perform a function, but the system is configured with N + X components. When any of the N components fail, one of the X modules activates. The advantage lies in reduced size, weight, power and cost of the system, in the case where X is smaller than N. In case of multiple component failures, this scheme provides lesser system availability.
15
Some Definitions
Load Sharing – Multiple components share a combined load. A higher level component manages load distribution, and monitors the health and status of the components. If one of the load sharing components fails, the load is re-distributed among the others, allowing for graceful performance degradation. In this scheme, there is almost no extra cost. The main disadvantage is that multiple failures, system performance may degrade below an acceptable level.
16
The Ultimate Goal of Prognostics
The purpose of Prognostic Health Management is to repair systems before they fail, while maximizing useful life consumption, and to have the necessary parts, tools and maintainers waiting nearby to resolve the correct problem as quickly and efficiently as possible.
17
PHM Stakeholders
SYSTEMS ENGINEERING
SOFTWARE & SIMULATION
TEST ENGINEERING
MECHANICAL ENGINEERING
ELECTRICAL ENGINEERING
TRAINING & PROD. SUPP.
PHM ModelDesign
InterfaceManagement
RequirementsDevelopment
SensorOptimization
CAIV/WAIVAnalysis
PrognosticTrending
SystemArchitecture
PHM ModelIntegration
SoftwareInterfaces
Fault/FailureSimulation
ContinuousBIT/PHM
TestPlanning
Fault/FailureCriticality
Fault/FailurePropagation
Fault/FailureSimulation
PlatformIntegration
Crack GrowthSensing
Stress/StrainSensing
CorrosionSensing
VibrationSensing
ConsumablesMonitoring
AcousticSensing
ThermalSensing
SensorImplementation
SensorIntegration
Data Management
Data Architecture
Reliability/Failure Modes
Maintainability& Testability
Logistics &Sustainment
Training
Safety
18
Systems Engineering’s Role in PHM
• Requirements Development• System Integration• System Architecture• Interface Management• Risk Assessment• Performance Measures: TPM’s & KPP’s• System Modeling & Knowledge Integration• Functional Decomposition
19
PHM Requirements
• The PHM system shall isolate X percent of all detected failures to a single component, within Y percent confidence interval.• The PHM system shall predict X percent of expected failures for the next Y hours of operation.• The PHM system shall predict all failures that can result in a Safety Critical Failure.• The PHM system shall incorporate sensors to assess platform health, status and performance.• The PHM system shall incorporate sensors to monitor platform consumables.• The PHM system shall record and store all sensor data in onboard memory.
20
The ‘Ilities’ & Product Support
• Reliability- FMECA: Failure Modes & Effects Criticality- FRACAS: Failure Reporting & Corrective Actions- Measures: MTBF, MTBSA, MTBEFF, MTBUMA
• Maintainability- Maintenance Ratio- Preventive Maintenance Checks- Condition Based Maintenance- Design for Maintainability
• Availability- AO, AI, AA
21
The ‘Ilities’ & Product Support
• Testability- Verification and Validation- Fault Insertion- Simulation
• Supportability- Consumables Monitoring- Supply Planning and Prediction
• System Safety- Single & Multiple Fault Tolerant Design- Safety Critical Failures- Human/Machine Interaction
22
PHM Modeling
• eXpress Modeling Tool
• Model Based Reasoning
• Case Based Reasoning
• Knowledge Bases
• Prognostics Analysis Tools
23
eXpress Modeling Tool
MissionAssurance,Availability& Success
PerformanceBased
Logistics
Run-TimePrognostic
HealthManagement
DATA MINING
SENSOR FUSION
LIFE CYCLETRADESPACE
REQUIREMENTSANALYSIS
DIAGNOSTIC,PROGNOSTIC
& PHM DESIGN
RISKASSESSMENT
BUSINESS CASES
FRACAS &FMECA
DEVELOPMENT
CONOPS,SPECS &
LOGISTICS
24
Impact Technologies
Prognostics developed at Impact Technologies: • Gas Turbine Engines and Auxiliary Systems • Avionics PHM and Reasoning • Aircraft Actuators (EMA, EHA) • Switching Mode Power Supplies, GPS Receivers and Power Electronics • Generators and Electric Drive Systems • Bearings, Gears, Shafts, Drive Trains, and Clutches • Hydraulic, Lube Oil and Fuel Systems • Structures and Components • Diesel Engines
25
Impact Technologies
Prognostics modules have been developed and successfully tested on the following systems: • Pratt & Whitney F-100 engine on F-15 and F-22• Engine, generator, lubrication system and gearbox on Honeywell F124 • Oil wetted components on GE F110-129, GE F404, Rolls Royce F405 • CH-47 T-55 engine and drive-train and • CH-60 intermediate gearbox • Blackhawk Carrier Plate Prognosis System • JSF Clutch Wear and Lift-Fan Prognosis System • Fuel system and Power generation system on DDG-class Navy Ships
26
Impact Technologies
A number of different techniques have been used in the development of these prognostics: • Analytical and stochastic physics of failure models• Advanced signal processing• Feature extraction methods • Health state estimation and prediction algorithms • Statistical reliability• Bayesian updating methods • Component damage accumulation models • Probabilistic remaining useful life estimation • Data driven modeling techniques
27
Model Based Reasoning
Model Based Reasoning (MBR) is a qualitative scheme where a model of the system is combined with an inference engine that is able to accomplish fault detection and fault isolation. The qualitative model is used to describe system elements and components, interconnections, and input/output behavior of the system being diagnosed, or ‘Knowledge Base’ and to establish an envelope of ‘correct behavior’. To accomplish diagnosis, the model determines what differences exist between the actual behavior of the system and the model of the system. The inference engine, using this comparison information, accomplishes the fault isolation task.
28
Case Based Reasoning
Case Based Reasoning (CBR) is the process of solving problems based on past understanding of similar problems. The vast majority of this type of information is contained within the maintainers and operators – the experience and knowledge of the person using the system in question. CBR compares a case, forms an implicit generalization of the case, and then identifies commonalities between a retrieved case and the target problem.
29
Knowledge Bases
KNOWLEDGE BASEFMECA data
fault/failure propagationsystem level interactions
functional interdependenciesphysical interdependencies
design knowledgeprognostic trend analysis
CAD modelscircuit layouts
‘inorganic’sensor data
‘organic’sensor data
subsystem/LRU internal sensor data
BIT data
consumablesmonitors
sensor fusion and signal conditioning
maintainerinputs
DatabaseManagement:
DataMining
&Feature
Extraction
off-board prognostic trend analysis
30
Prognostic Analysis Tools
Learning Systems & Artificial Intelligence • Genetic Algorithms• Expert Systems • Fuzzy Logic • Neural Networks
Database Techniques• Feature Extraction• Data Mining
Mathematical Techniques• Kalman Filtering• Dempster-Schafer Method• Wavelets• Statistical Analysis• Chaos Math?
31
Prognostic Analysis Tools
Traditional Academic Solutions to PHM:• Run-to-Failure analysis of large, expensive systems, such as ship or rail engines• Analysis involves impractical, complex math models that require years of training to understand and interpret• Very expensive• Time consuming process• Rarely offer concrete design guidelines or solutions
32
Prognostic Analysis Tools
Why Engineers in Industry Need More:• We have bottom lines and schedules to meet!• We have customer requirements to satisfy!• Systems Engineers work with designers who don’t like impractical, complex math models that require years of training to understand and interpret!• We have program managers who don’t like very expensive, time consuming solutions!• We like concrete design guidelines and solutions!
33
Sensor Technology
• BIT/BITE
• Sensor Fusion and Virtual Sensors
• Sensor Conditioning and Filtering
• Smart Sensors
34
• Availability, Achieved
where
MTBF = Mean Time Between Failure
MTTR = Mean Time To Repair
MTTRMTBF
MTBF
Time Down
Time UpAA
Availability Analysis
35
• Availability, Operational
where
MTBUMA = Mean Time Between Unscheduled Maintenance Actions
ALDT = Administrative Logistical Down Time
MTTR = Mean Time To Repair
MTTRALDTMTBUMA
MAMTBU
Time Down
Time UpAO
Availability Analysis
36
• MTBUMA = Mean Time Between Unscheduled Maintenance Actions
where
MTBM = Mean Time Between Failures
MTBM = Mean Time Between Maintenance
MTBM
1
MTBM
1
MTBF
1MTBUMA
defect noinduced
Availability Analysis
37
• How can we improve AO?
- By decreasing Administrative & Logistical Down Time (ALDT)
- By increasing Mean Time Between Failures (MTBF)
- By decreasing Mean Time To Repair (MTTR)
- By increasing Mean Time Between Unscheduled Maintenance Actions (MTBUMA) – [by decreasing MTBR induced and MTBR no defect]
Availability Analysis
38
• How can we decrease ALDT?
- By improving Logistics Improve scheduling of inspectionsImprove commonality of partsDecrease time to get replacements
- By improving PrognosticsReplace parts before they fail, not afterMaximize use of component lifeImprove off-board prognostics trendingMore sensors!!
Availability Analysis
39
• How can we increase MTBF?
- By improving ReliabilitySelect more rugged componentsImprove life screening and testingImprove thermal management
- By improving QualityBetter parts screeningBetter manufacturing processes
- By adding RedundancyAt the cost of Size, Weight and Power!
Availability Analysis
40
• How can we decrease MTTR?
- By improving MaintainabilityImprove quality and efficacy trainingSimplify fault isolationDecrease number of tools and special equipmentDecrease access time (panels, connectors…)Improve Preventative Maintenance
- By improving DiagnosticsImprove BIT and BITEDecrease ambiguity group sizeImprove maintenance manuals and training
Availability Analysis
41
• How can we increase MTBM (induced/no defect)?
- By improving SafetyLimit the potential for accidental damage
- By improving PrognosticsImprove PHM models to monitor induced damage
- By improving DiagnosticsLower the false alarm rateDon’t repair/replace things which aren’t broken!
Availability Analysis
42
Engine Health/Performance Monitoring:
Place an acoustic sensor on the engine housing. Establish ‘nominal’ operating parameters.Develop library relating fault precedents to failures:
= odd sounds which warn of impending failure. Monitor for ‘out of nominal’ acoustic signature.
Sensor Example
43
Consider a toaster: Not just any toaster, but the toaster on the first mission to Mars. NASA could only afford to send one, and it must work, every time, or else the astronauts won’t have toast. The toaster must also not endanger the mission by causing a safety hazard or waste bread.
Mission Critical Function: - make toast
Safety Critical Functions: - don’t injure the astronauts - don’t damage the spaceship- don’t burn the toast!
PHM Example
44
• Identify the elements of a toaster.
• What are the failure modes?
• What should we monitor for safety hazards?
• What elements should we monitor for diagnostics?
• What data should we collect for prognostics?
• How would we optimize the sensor coverage and data collection?
PHM Example
45
• Continually monitoring sensors and storing all that data for analysis will quickly consume available bandwidth and storage space.
• Capturing ‘profound knowledge’ of a complex engineered system and its myriad failure modes is very difficult, and involves integrating knowledge which crosses discipline boundaries: SE, EE, ME, RAM-T, Safety, Software, Math, Statistics, Physics…
• Prognostic analysis of data is a very difficult problem, with no easy or universal solution.
• PHM is a relatively new field.
Issues Related to PHM
46
• Do I have any practical PHM suggestions?
- Aim for the low hanging fruitUse the sensors you already have in creative ways.Only add sensors when you must.You can’t monitor everything, so don’t try.
- Don’t reinvent the wheelBuild on other’s work and experience.Find good tools to design your system.
Final Remarks
47
Additional Prognostic Analysis Tool