systems prognostic health management emis 7305 march 28, 2006

Systems Prognostic Health ManagementEMIS 7305

March 28, 2006

Christopher ThompsonSenior Research Engineer

Lockheed Martin Missiles and Fire Control

Systems Engineering Program

Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation.

2

Topic Outline

• Introduction• Definitions• The Goal of Prognostic Health Management• PHM Stakeholders• PHM Modeling• Sensors• Prognostics Analysis Tools• Availability• Examples

3

Introduction

EducationB.S. in Electrical Engineering, SMU (1997)

M.S. in Mechanical Engineering, SMU (2001)- Focus: Fatigue and Fracture Mechanics

M.S. in Systems Engineering (one class remaining)- Focus: Reliability, Statistical Analysis

Ph.D. in Applied Science (anticipated ~ 2008)- Proposed Dissertation Title: Sensor Optimization for Systems Prognostic-Diagnostic Health Management in a Unmanned Ground Combat Vehicle

4

Introduction

ExperienceLockheed Martin Missiles and Fire Control, Dallas TX

Systems Engineer - Multifunction Utility/Logistics Equipment (MULE)Reliability Engineer- Army Tactical Missile System (TACMS)

Lockheed Martin Aeronautics, Fort Worth TXVehicle Systems - Prognostic Health Management- F-35 Joint Strike Fighter

SMU School of Engineering- TA for Dr. Jerrell Stracener

5

Introduction

Future Combat Systems MULE Program

6

Introduction

Some keys to the successful fielding of the U.S. Army’s Future Combat Systems are:

• Reducing the Logistics footprint• Increasing Availability• Reducing total cost of ownership• Implementing Performance Based Logistics• Improvements in the ‘ilities’ (RAM-T)

– Reliability– Availability– Maintainability– Testability– Supportability

7

Some Definitions

Prognostics - Of or relating to prediction; a sign of a future happening; a portent.

Prognostics is the process of calculating and reporting an estimate of remaining useful life for a component, within sufficient time to repair or replace it before failure occurs.

8

Some Definitions

Prognostic Health Management (PHM) – The implementation of an integrated software and hardware system which monitors the health, status and performance of a vehicle or system, tracks consumables (oil, batteries, ammunition, filters, fuel, coolant…) and configuration (software versions, part history…), and determines remaining life of all safety and performance critical components, predicting failures before they occur, thereby enhancing logistics and maintenance activities. PHM consists of ‘on-board’ as well as ‘off-board’ components.

9

Some Definitions

Diagnostics - The identification of a fault or failure condition of an element, component, sub-system or system, combined with the deduction of the lowest measurable cause of that condition through confirmation, localization, and isolation.

• Confirmation is the process of validation that a failure/fault has occurred, the filtering of false alarms, and assessment of intermittent behavior.

• Localization is the process of restricting a failure to a subset of possible causes.

• Isolation is the process of identifying a specific cause of failure, down to the smallest possible ambiguity group.

10

Some Definitions

Fault – A condition that renders an element unable to perform its required function at desired levels of performance, or in a degraded mode.

Failure – The inability of a component, system or

sub-system to perform its intended function as designed. Failure may be the result of one or more faults.

Fault Tolerance – The design of a system so that it will continue to operate in a degraded or reduced level rather than failing completely, when some part of the system fails.

11

Some Definitions

Failure Cascade – The result when a failure occurs in a system of interconnected components, and the successful operation of a component depends on the successful operation of a preceding component. Conversely, a failure can trigger the failure of successive parts, and potentially amplify the result or impact. Redundancy and fault tolerant design can reduce the criticality or impact of the cascade, but not necessarily prevent a failure.

12

Some Definitions

Design Failures – These take place due to inherent errors or flaws in the system design.

Infant Mortality Failures - These cause newly manufactured systems to fail, and can generally be attributed to errors in the manufacturing process, or poor material quality control.

Random Failures - These can occur at any time during the entire life of a system. Electrical systems are more likely to fail in this manner.

Wear Out Failures - As a system ages, degradation will cause systems to fail. Mechanical systems are more likely to fail in this manner.

13

Some Definitions

One-To-One Redundancy - Each active component in a system has a redundant backup on standby. The active component is monitored at all times, and the standby component will activate if the primary component fails. Since the probability of both components failing at the same time is low, One-To-One Redundancy provides the highest level of availability, but at a considerable disadvantage of requiring double the size, weight, power and cost, while reducing reliability (more components which can fail).

14

Some Definitions

N + X Redundancy – N components are required to perform a function, but the system is configured with N + X components. When any of the N components fail, one of the X modules activates. The advantage lies in reduced size, weight, power and cost of the system, in the case where X is smaller than N. In case of multiple component failures, this scheme provides lesser system availability.

15

Some Definitions

Load Sharing – Multiple components share a combined load. A higher level component manages load distribution, and monitors the health and status of the components. If one of the load sharing components fails, the load is re-distributed among the others, allowing for graceful performance degradation. In this scheme, there is almost no extra cost. The main disadvantage is that multiple failures, system performance may degrade below an acceptable level.

16

The Ultimate Goal of Prognostics

The purpose of Prognostic Health Management is to repair systems before they fail, while maximizing useful life consumption, and to have the necessary parts, tools and maintainers waiting nearby to resolve the correct problem as quickly and efficiently as possible.

17

PHM Stakeholders

SYSTEMS ENGINEERING

SOFTWARE & SIMULATION

TEST ENGINEERING

MECHANICAL ENGINEERING

ELECTRICAL ENGINEERING

TRAINING & PROD. SUPP.

PHM ModelDesign

InterfaceManagement

RequirementsDevelopment

SensorOptimization

CAIV/WAIVAnalysis

PrognosticTrending

SystemArchitecture

PHM ModelIntegration

SoftwareInterfaces

Fault/FailureSimulation

ContinuousBIT/PHM

TestPlanning

Fault/FailureCriticality

Fault/FailurePropagation

Fault/FailureSimulation

PlatformIntegration

Crack GrowthSensing

Stress/StrainSensing

CorrosionSensing

VibrationSensing

ConsumablesMonitoring

AcousticSensing

ThermalSensing

SensorImplementation

SensorIntegration

Data Management

Data Architecture

Reliability/Failure Modes

Maintainability& Testability

Logistics &Sustainment

Training

Safety

18

Systems Engineering’s Role in PHM

• Requirements Development• System Integration• System Architecture• Interface Management• Risk Assessment• Performance Measures: TPM’s & KPP’s• System Modeling & Knowledge Integration• Functional Decomposition

19

PHM Requirements

• The PHM system shall isolate X percent of all detected failures to a single component, within Y percent confidence interval.• The PHM system shall predict X percent of expected failures for the next Y hours of operation.• The PHM system shall predict all failures that can result in a Safety Critical Failure.• The PHM system shall incorporate sensors to assess platform health, status and performance.• The PHM system shall incorporate sensors to monitor platform consumables.• The PHM system shall record and store all sensor data in onboard memory.

20

The ‘Ilities’ & Product Support

• Reliability- FMECA: Failure Modes & Effects Criticality- FRACAS: Failure Reporting & Corrective Actions- Measures: MTBF, MTBSA, MTBEFF, MTBUMA

• Maintainability- Maintenance Ratio- Preventive Maintenance Checks- Condition Based Maintenance- Design for Maintainability

• Availability- AO, AI, AA

21

The ‘Ilities’ & Product Support

• Testability- Verification and Validation- Fault Insertion- Simulation

• Supportability- Consumables Monitoring- Supply Planning and Prediction

• System Safety- Single & Multiple Fault Tolerant Design- Safety Critical Failures- Human/Machine Interaction

22

PHM Modeling

• eXpress Modeling Tool

• Model Based Reasoning

• Case Based Reasoning

• Knowledge Bases

• Prognostics Analysis Tools

23

eXpress Modeling Tool

MissionAssurance,Availability& Success

PerformanceBased

Logistics

Run-TimePrognostic

HealthManagement

DATA MINING

SENSOR FUSION

LIFE CYCLETRADESPACE

REQUIREMENTSANALYSIS

DIAGNOSTIC,PROGNOSTIC

& PHM DESIGN

RISKASSESSMENT

BUSINESS CASES

FRACAS &FMECA

DEVELOPMENT

CONOPS,SPECS &

LOGISTICS

24

Impact Technologies

Prognostics developed at Impact Technologies: • Gas Turbine Engines and Auxiliary Systems • Avionics PHM and Reasoning • Aircraft Actuators (EMA, EHA) • Switching Mode Power Supplies, GPS Receivers and Power Electronics • Generators and Electric Drive Systems • Bearings, Gears, Shafts, Drive Trains, and Clutches • Hydraulic, Lube Oil and Fuel Systems • Structures and Components • Diesel Engines

25

Impact Technologies

Prognostics modules have been developed and successfully tested on the following systems: • Pratt & Whitney F-100 engine on F-15 and F-22• Engine, generator, lubrication system and gearbox on Honeywell F124 • Oil wetted components on GE F110-129, GE F404, Rolls Royce F405 • CH-47 T-55 engine and drive-train and • CH-60 intermediate gearbox • Blackhawk Carrier Plate Prognosis System • JSF Clutch Wear and Lift-Fan Prognosis System • Fuel system and Power generation system on DDG-class Navy Ships

26

Impact Technologies

A number of different techniques have been used in the development of these prognostics: • Analytical and stochastic physics of failure models• Advanced signal processing• Feature extraction methods • Health state estimation and prediction algorithms • Statistical reliability• Bayesian updating methods • Component damage accumulation models • Probabilistic remaining useful life estimation • Data driven modeling techniques

27

Model Based Reasoning

Model Based Reasoning (MBR) is a qualitative scheme where a model of the system is combined with an inference engine that is able to accomplish fault detection and fault isolation. The qualitative model is used to describe system elements and components, interconnections, and input/output behavior of the system being diagnosed, or ‘Knowledge Base’ and to establish an envelope of ‘correct behavior’. To accomplish diagnosis, the model determines what differences exist between the actual behavior of the system and the model of the system. The inference engine, using this comparison information, accomplishes the fault isolation task.

28

Case Based Reasoning

Case Based Reasoning (CBR) is the process of solving problems based on past understanding of similar problems. The vast majority of this type of information is contained within the maintainers and operators – the experience and knowledge of the person using the system in question. CBR compares a case, forms an implicit generalization of the case, and then identifies commonalities between a retrieved case and the target problem.

29

Knowledge Bases

KNOWLEDGE BASEFMECA data

fault/failure propagationsystem level interactions

functional interdependenciesphysical interdependencies

design knowledgeprognostic trend analysis

CAD modelscircuit layouts

‘inorganic’sensor data

‘organic’sensor data

subsystem/LRU internal sensor data

BIT data

consumablesmonitors

sensor fusion and signal conditioning

maintainerinputs

DatabaseManagement:

DataMining

&Feature

Extraction

off-board prognostic trend analysis

30

Prognostic Analysis Tools

Learning Systems & Artificial Intelligence • Genetic Algorithms• Expert Systems • Fuzzy Logic • Neural Networks

Database Techniques• Feature Extraction• Data Mining

Mathematical Techniques• Kalman Filtering• Dempster-Schafer Method• Wavelets• Statistical Analysis• Chaos Math?

31


Traditional Academic Solutions to PHM:• Run-to-Failure analysis of large, expensive systems, such as ship or rail engines• Analysis involves impractical, complex math models that require years of training to understand and interpret• Very expensive• Time consuming process• Rarely offer concrete design guidelines or solutions

32


Why Engineers in Industry Need More:• We have bottom lines and schedules to meet!• We have customer requirements to satisfy!• Systems Engineers work with designers who don’t like impractical, complex math models that require years of training to understand and interpret!• We have program managers who don’t like very expensive, time consuming solutions!• We like concrete design guidelines and solutions!

33

Sensor Technology

• BIT/BITE

• Sensor Fusion and Virtual Sensors

• Sensor Conditioning and Filtering

• Smart Sensors

34

• Availability, Achieved

where

MTBF = Mean Time Between Failure

MTTR = Mean Time To Repair

MTTRMTBF

MTBF

Time Down

Time UpAA

Availability Analysis

35

• Availability, Operational

where

MTBUMA = Mean Time Between Unscheduled Maintenance Actions

ALDT = Administrative Logistical Down Time

MTTR = Mean Time To Repair

MTTRALDTMTBUMA

MAMTBU

Time Down

Time UpAO


36

• MTBUMA = Mean Time Between Unscheduled Maintenance Actions

where

MTBM = Mean Time Between Failures

MTBM = Mean Time Between Maintenance

MTBM

1

MTBM

1

MTBF

1MTBUMA

defect noinduced


37

• How can we improve AO?

- By decreasing Administrative & Logistical Down Time (ALDT)

- By increasing Mean Time Between Failures (MTBF)

- By decreasing Mean Time To Repair (MTTR)

- By increasing Mean Time Between Unscheduled Maintenance Actions (MTBUMA) – [by decreasing MTBR induced and MTBR no defect]


38

• How can we decrease ALDT?

- By improving Logistics Improve scheduling of inspectionsImprove commonality of partsDecrease time to get replacements

- By improving PrognosticsReplace parts before they fail, not afterMaximize use of component lifeImprove off-board prognostics trendingMore sensors!!


39

• How can we increase MTBF?

- By improving ReliabilitySelect more rugged componentsImprove life screening and testingImprove thermal management

- By improving QualityBetter parts screeningBetter manufacturing processes

- By adding RedundancyAt the cost of Size, Weight and Power!


40

• How can we decrease MTTR?

- By improving MaintainabilityImprove quality and efficacy trainingSimplify fault isolationDecrease number of tools and special equipmentDecrease access time (panels, connectors…)Improve Preventative Maintenance

- By improving DiagnosticsImprove BIT and BITEDecrease ambiguity group sizeImprove maintenance manuals and training


41

• How can we increase MTBM (induced/no defect)?

- By improving SafetyLimit the potential for accidental damage

- By improving PrognosticsImprove PHM models to monitor induced damage

- By improving DiagnosticsLower the false alarm rateDon’t repair/replace things which aren’t broken!


42

Engine Health/Performance Monitoring:

Place an acoustic sensor on the engine housing. Establish ‘nominal’ operating parameters.Develop library relating fault precedents to failures:

= odd sounds which warn of impending failure. Monitor for ‘out of nominal’ acoustic signature.

Sensor Example

43

Consider a toaster: Not just any toaster, but the toaster on the first mission to Mars. NASA could only afford to send one, and it must work, every time, or else the astronauts won’t have toast. The toaster must also not endanger the mission by causing a safety hazard or waste bread.

Mission Critical Function: - make toast

Safety Critical Functions: - don’t injure the astronauts - don’t damage the spaceship- don’t burn the toast!

PHM Example

44

• Identify the elements of a toaster.

• What are the failure modes?

• What should we monitor for safety hazards?

• What elements should we monitor for diagnostics?

• What data should we collect for prognostics?

• How would we optimize the sensor coverage and data collection?

PHM Example

45

• Continually monitoring sensors and storing all that data for analysis will quickly consume available bandwidth and storage space.

• Capturing ‘profound knowledge’ of a complex engineered system and its myriad failure modes is very difficult, and involves integrating knowledge which crosses discipline boundaries: SE, EE, ME, RAM-T, Safety, Software, Math, Statistics, Physics…

• Prognostic analysis of data is a very difficult problem, with no easy or universal solution.

• PHM is a relatively new field.

Issues Related to PHM

46

• Do I have any practical PHM suggestions?

- Aim for the low hanging fruitUse the sensors you already have in creative ways.Only add sensors when you must.You can’t monitor everything, so don’t try.

- Don’t reinvent the wheelBuild on other’s work and experience.Find good tools to design your system.

Final Remarks

47

Additional Prognostic Analysis Tool

systems prognostic health management emis 7305 march 28, 2006

Documents

failure condition

armys future combat

hardware system

mechanical engineering

electrical engineering

remaining life

process of validation

specific cause of failure