national aeronautics and space administration adapting

31
National Aeronautics and Space Administration National Aeronautics and Space Administration IMM Adapting NASA-STD-7009 to Assess the Credibility of Biomedical Models and Simulations Lealem Mulugeta 1 , Marlei Walton 2 , Emily Nelson 3 and Jerry Myers 3 1. Universities Space Research Association, DSLS 2. Wyle Science, Technology & Engineering Group 3. NASA Glenn Research Center ASME Verification and Validation Conference May 7-9, 2014 – Las Vegas, NV

Upload: others

Post on 23-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: National Aeronautics and Space Administration Adapting

National Aeronautics and Space Administration National Aeronautics and Space Administration

IMM

Adapting NASA-STD-7009 to Assess the Credibility of Biomedical Models and Simulations

Lealem Mulugeta1, Marlei Walton2, Emily Nelson3 and Jerry Myers3

1. Universities Space Research Association, DSLS 2. Wyle Science, Technology & Engineering Group

3. NASA Glenn Research Center

ASME Verification and Validation Conference May 7-9, 2014 – Las Vegas, NV

Page 2: National Aeronautics and Space Administration Adapting

Background

• The standard was initially developed for engineering systems M&S

• NASA’s Digital Astronaut Project (DAP) and Integrated Medical Model (IMM) have successfully adapted NASA-STD-7009 for biomedical models for clinical, management, and research applications

• Given the highly comprehensive nature of the standard, substantial steps have been taken to establish a systematic process to apply the standard to HRP needs – Systematic analysis of the model application criticality – Weighting of factors for consistence with model application

2 2

Page 3: National Aeronautics and Space Administration Adapting

M&S Criticality and Risk Assessment

• How are the models and simulations (M&S) going to be used? – What are the decisions to be made? – Is it for research or clinical applications? – Do the M&S provide insight to guide decisions or are they the decision

making tool? – Is there substantial data to strengthen confidence in the results?

• What is the impact on human health or the mission?

3

Must apply 7009 per HRP-47069

DAP: Reduced - g

DAP: Micro-g

IMM

Page 4: National Aeronautics and Space Administration Adapting

Main Elements of 7009

1. System & Analysis Frameworks – This is where the evaluator documents details regarding the real world system (RWS) to be represented and includes the basic structure of the M&S, along with the abstractions and assumptions.

2. M&S Analysis Results & Caveats - This is where the evaluator documents details regarding the uncertainty in the M&S results and any further qualifying statements surrounding the analysis.

3. M&S Credibility Assessment – This is where the evaluator documents details regarding the integrity of the data and processes used to develop and vet the M&S.

4

Page 5: National Aeronautics and Space Administration Adapting

M&S System & Analysis Frameworks (Scope)

5

Legend (A) – Analyst (D) – Developer (O) – Operator

Page 6: National Aeronautics and Space Administration Adapting

M&S Analysis Results & Caveats

6

Legend (A) – Analyst (D) – Developer (O) – Operator

Page 7: National Aeronautics and Space Administration Adapting

Credibility Levels of Evidence Su

ffici

ency

Thr

esho

lds

Red: IMM Thresholds

7

Blue: DAP Biomechanics model thresholds

Page 8: National Aeronautics and Space Administration Adapting

Technical Review Subfactor Scoring

Levels Tech Review

(Sub Factor Specific) 4 Favorable external peer review accompanied

by independent factor evaluation 3 Favorable external peer review. 2 Favorable formal internal review 1 Favorable informal internal review 0 Insufficient evidence

Example Subfactor Weight

Assessed score

Factor Score

Evidence - Subfactor Scoring Level

0.7 2

2.3

Review - Subfactor Scoring Level 0.3 3

• 5 Factors Require Technical Review • Verification • Validation • Input Pedigree • Results Uncertainty • Results Robustness

• Review (Subfactor) Scoring Listed in Table • Weighting between

evidence and peer review • Customer Defined • Peer review NOT to be

weighted more than 30% 8

Page 9: National Aeronautics and Space Administration Adapting

Credibility Assessment Matrix: Proposed Weighting Strategy

9

Factor Weight (Proposed) Deterministic Probabilistic 1 Verification 0.2 0.075 2 Validation 0.25 0.15 3 Input Pedigree 0.1 0.275 4 Results Uncertainty 0.1 0.2 5 Results Robustness 0.1 0.15 6 Use History 0.15 0.15 7 M&S Management 0.05 0.05 8 People Qualifications 0.05 0.05

TOTAL 1.0 1.0

0.05 < Wi < 0.25 : ΣWi= 1

Based on Application by DAP and IMM for HRP

Page 10: National Aeronautics and Space Administration Adapting

Example of Credibility Scoring – With Factor Weighting

10

*Threshold: The required score agreed to by the end-user/customer and M&S provider to achieve sufficient confidence in the M&S for intended use

Unweighted – Model would have a CS = 0

Page 11: National Aeronautics and Space Administration Adapting

Lessons Learned and Takeaways

• The sooner M&S credibility assessment is integrated as part of the M&S development and implementation process, the more likely: – Researchers and decision makers gain confidence in the M&S – The M&S can have positive impact on biomedical research and operations – The greater medical community will see potential of M&S to inform clinical

interventions • The sooner the end-user/customer is engaged to inform the M&S

development and implementation process, the more likely the end product will have a higher impact

• It is important to appropriately weight the different credibility assessment factors for the problem of interest, – M&S should be applied within their validation domain to maintain highest confidence

in results • The greater medical community recognizes the importance of rigorously

vetting computational models and looks to NASA for leadership

11

Getting It Right: Better Validation Key to Progress in Biomedical Computing - Bringing models closer to reality - 10/19/12

Page 12: National Aeronautics and Space Administration Adapting

National Aeronautics and Space Administration National Aeronautics and Space Administration

IMM

Thank you! Questions?

Page 13: National Aeronautics and Space Administration Adapting

National Aeronautics and Space Administration National Aeronautics and Space Administration

IMM

Backup slides

13

Page 14: National Aeronautics and Space Administration Adapting

Verification and Validation

• Verification is the process of determining if the model implementation accurately represents the developer’s conceptual/mathematical description (underlying physical principles) and its solution.

• Validation is the process of determining the degree to which a model is an accurate representation of the real world system from the perspective of the intended uses of the model (e.g. compare a simulated exercise outputs with other subjects performing the same exercise).

14

Page 15: National Aeronautics and Space Administration Adapting

What is NASA-STD-7009?

Comprehensive set of requirements and processes for developing and applying models and simulations (M&S) • Credibility assessment ensures that the application

domains of the M&S are appropriate • Provides a foundation for deriving the confidence level

for any given M&S • Documentation is critical for appropriate interpretation

of the M&S results by the end-user

15

Page 16: National Aeronautics and Space Administration Adapting

NASA-STD-7009: Standard for Models and Simulations (7009)

16 https://standards.nasa.gov/documents/detail/3315599

Page 17: National Aeronautics and Space Administration Adapting

M&S Implementation Key Personnel

• Operators (O) – Execute the model to perform a simulation, and is generally the least technical but is most familiar with using the model.

• Analysts (A) – They usually define the initial conditions and boundaries of a simulation, and review the results of the simulation. Above all, analysts are responsible for the credibility/ validation of the simulations (not the model).

• Developers (D) – They develop the fundamental principles and mathematical abstractions of the model. They can play a role in the other two areas (and should), however their responsibility is scientific/technical application of various principles to provide a means of creating relevant simulations. They are responsible for credibility and validation of the model.

17

Page 18: National Aeronautics and Space Administration Adapting

Credibility Levels of Evidence

18

Page 19: National Aeronautics and Space Administration Adapting

Credibility Levels of Evidence - Thresholds

19

0

1

2

3

4Verification

Validation

Input Pedigree

ResultsUncertainty

ResultsRobustness

Use History

M&SManagement

PeopleQualifications

DAP Biomechanics model thresholds

IMM Thresholds

Page 20: National Aeronautics and Space Administration Adapting

Weighting of Credibility Assessment Score – Deterministic M&S

20

Deterministic Models and Simulations Weight Explanation for default values

Verification 0.2 The complexity of such models requires that verification of the implementation of the underlying concept be a relatively high importance to the model

Validation 0.25

Due to the use of the model, achieving the customer’s desired level of validation quantified by direct comparison to the real world system is considered imperative and must be assigned the highest weighting possible.

Input Pedigree 0.1

Although important, the IP is at the highest level possible due to limited HRP data set availability. Weighting should reflect this situational condition.

Results Uncertainty 0.1

From an HRP POV, RU is more critical in understanding the limits of the Validation activity. Weighting should reflect the partial capture of this parameter under the validation condition.

Results Robustness 0.1

Sensitivity of the model to parameter variation is partially captured in the validation parameter. Weighting should reflect the importance in understanding model performance outside the know operation space.

Use History 0.15 Under HRP, successful use of the model for decision or research support in respected works is considered important and is thus weighted third in the overall weighting strategy.

M&S Management 0.05

Management is relatively equal during the model development activities due to program and project oversight and required processes. Weighting reflects these in-place conditions.

People's Qualification 0.05

Although critical in general, the use of competitive, peer review of proposed work is considered to recruit qualified people specific to the model application. Weighting reflects this in quality control issue.

Total: 1.0 NOTE: The sum of the weightings must equal 1.0

Developed by DAP and IMM for HRP

The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.

Page 21: National Aeronautics and Space Administration Adapting

Weighting of Credibility Assessment Score – Probabilistic M&S

21

Probabilistic Models and Simulations Weight Justification/Explanation for default values

Verification 0.075

The complexity of such models is considered mathematically straight forward. Verification remains important, however the implementation of the underlying model is not considered complex, so less weight should be placed on the contribution to credibility.

Validation 0.1

Achieving the customer’s desired level of validation remains important, although quantified direct comparison to the real world system is difficult. It is considered that it should significantly contribute to the overall credibility when the capability to perform the validation is possible..

Input Pedigree 0.175

The second most critical factor in defining the likelihood and consequence. Assumption is that IP must be at the highest level possible due to limited HRP data set availability. Weighting should reflect this important situational condition.

Results Uncertainty 0.2

From an HRP POV, RU is the most critical in capturing the knowledge regarding the likelihood and consequence. Weighting should reflect the importance of this parameter under the validation condition.

Results Robustness 0.15

Sensitivity of the model to parameter variation is critical in understanding the contributing parameter importance in the underlying logic. Weighting should reflect this importance.

Use History 0.15 Under HRP, successful use of the model for decision or research support in respected works is considered important and is thus weighted highly in contributing to credibility.

M&S Management 0.05

Management is relatively equal during the model development activities due to program and project oversight and required processes. Weighting reflects these in-place conditions.

People's Qualification 0.1

Although critical in general, the use of competitive, peer review of proposed work is considered to recruit qualified people specific to the model application. Weighting reflects this in quality control issue.

Total: 1.0 NOTE: The sum of the weightings must equal 1.0

Developed by DAP and IMM for HRP

The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.

Page 22: National Aeronautics and Space Administration Adapting

Key Steps to Credibility Assessment

1. Sufficiency threshold levels needs to be established for each credibility factor (highly dependent on the available data and expertise)

2. The target user community or customer should be consulted in setting the minimum thresholds

22

Page 23: National Aeronautics and Space Administration Adapting

Example: DAP’s M&S Development and Implementation Strategy

23

ARED M&S have had impact on exercise research and operations sooner than anticipated and continue to provide high value

Page 24: National Aeronautics and Space Administration Adapting

M&S Validation and Application Domain

24

Page 25: National Aeronautics and Space Administration Adapting

M&S Credibility Assessment (1 of 2)

25

Sufficiency threshold = target score

Page 26: National Aeronautics and Space Administration Adapting

M&S Credibility Assessment (2 of 2)

26

Sufficiency threshold = target score

Page 27: National Aeronautics and Space Administration Adapting

Visual Representation of Credibility Assessment

27

Sufficiency threshold

Score

Spider (Radar) Plot

Page 28: National Aeronautics and Space Administration Adapting

Impact in the Medical/Healthcare Field (3 of 3)

28

Getting It Right: Better Validation Key to Progress in Biomedical Computing - Bringing models closer to reality

The ground laid by DAP and IMM was featured in the 2012 fall issue (10/19/12) of the Biomedical Computation Review magazine and lauded as a “Comprehensive Validation” method.

http://biomedicalcomputationreview.org/content/getting-it-right-better-validation-key-progress-biomedical-computing

Page 29: National Aeronautics and Space Administration Adapting

Example of Credibility Scoring – Without Factor Weighting

29

*Threshold: The required score agreed to by the end-user/customer and M&S provider to achieve sufficient confidence in the M&S for intended use

Page 30: National Aeronautics and Space Administration Adapting

Impact in the Medical/Healthcare Field (2 of 3)

• As a direct consequence of a presentation given NIH/IMAG regarding how NASA uses 7009 to vet biomedical models, the Food and Drug Administration is heavily leveraging 7009 to develop a new standard for “Verification and Validation of Computational Modeling of Medical Devices”

• The FDA regularly consults with IMM and DAP in the development of this new standard

• DAP Project Scientist has been invited to be a member of the ASME V&V40 Sub-committee that is working with the FDA to develop the standard for “Verification and Validation of Computational Modeling of Medical Devices”

30

Page 31: National Aeronautics and Space Administration Adapting

Weighting of Credibility Assessment Score – Overview

31

Deterministic Probabilistic Uniform Verification 0.2 0.075 0.125 Validation 0.25 0.1 0.125 Input Pedigree 0.1 0.175 0.125 Results Uncertainty 0.1 0.2 0.125 Results Robustness 0.1 0.15 0.125 Use History 0.15 0.15 0.125 M&S Management 0.05 0.05 0.125 People's Qualification 0.05 0.1 0.125

Total: 1.0 1.0 1.0

The customer/end-user may use different weighting scheme. But the minimum weight that can be assigned to any of the factors is 0.05, and the maximum weight is 0.25.

Developed by DAP and IMM for HRP