injecting statistical rigor to drive your modeling

Injecting Statistical Rigor to Drive Your Modeling & Simulation Validation Program

Jim Simpson, Jim Wisnowski, Stargel Doane, Laura Freeman, Kelly Avery

ITEA, Dec 2018

DEFINITIONS, POLICY AND EXAMPLES

Introduction

• Modeling and Simulation (M&S):• “The discipline that comprises the development and/or use of

models and simulations.” (DoDD 5000.59, DoDI 5000.61)

• “The use of models, including emulators, prototypes, simulators, and stimulators, either statically or over time, to develop data as a basis for making managerial or technical decisions.” (DMSCO M&S Glossary)

• “Using computers to imitate, or simulate, the operations of various kinds of real-world facilities or processes (a system).” (Law 2015)

Live Testing and Simulation Integrated

4

Adapted From Coleman, H. W., & Steele, W. G. (2018). Experimentation, validation, and uncertainty analysis for engineers. John Wiley & Sons.

Reality

Experimental data Simulation result

ExperimentModeling

AndSimulation

Policy Example

5

• Per DOT&E Memorandum: Guidance on the Validation of Models and Simulation used in Operational Test and Live Fire Assessments dated 16 March 2016

• “Whenever M&S is used for operational test and evaluation, I need to have the same understanding of and confidence in the data obtained from M&S as I do any other data collected during an operational or live fire test.”

• “…I expect the validation of M&S to include the same rigorous statistical and analytical principles that have become standard practice when designing live tests.“

DOD M&S Example: Space System

6

• Examine coverage, probability of detection, data collection, etc.

DOD M&S Example: Missile Program

• Estimate miss distance and probability of kill for missiles intercepting threats

7

Image courtesy of US Air Force, approved for public release

PLANNING FOR MS V&V

Simulation Information vs. Test Data

Adapted From Coleman, H. W., & Steele, W. G. (2018). Experimentation, validation, and uncertainty analysis for engineers. John Wiley & Sons.

Design of Experiments (DOE)• Statistical Design of Experiments: “the process of

planning the experiment so that appropriate data that can be analyzed by statistical methods will be collected, resulting in valid and objective conclusions.” (Montgomery, 2004)

• Refers to a methodology (or overall thought process)

• Not limited to factorial designs

• Proven in industry and the DoD

• Sequential experimentation

Screen Characterize Optimize

Risk (coverage and sample size)• Coverage of the operational envelope and sample size:

Referent vs. Simulation Prediction

REF

EREN

T (c

over

age)

SIMULATION (coverage)Small N and simple model

low

highRisk (coverage)

Large N and complex model

Model-Test-Model Method for V&V

• Build the M&S• Intended use and requirements focused

• Collect M&S data via Design• Use DOE for defining runs

• Analyze M&S runs• Inform live fire, sensitivity

analysis, statistical emulator • Collect SME feedback

• Qualitative “first cut” at validation• Collect Live-Fire data via • Compare M&S to Live

• Use calibration to update model Source: MITRE

TEST DESIGN

Factors in a Test Space

We compare designs with test points allocated to combinations of factors and levels geometrically, where each factor occupies a physical dimension.

With only one factor (Range) with two levels (4,8) the design can be displayed as:

With two factors (Range, Altitude), with two levels each (4,8 for Range; 5000,10000 for Altitude), the design can be displayed as:

14

Range(Factor A)

4 8

This test point is the Range = 4, Altitude = 10000 weapon release condition

B: Altitude

A: Range4 8

10000

5000

3 Factor Test Space and Design Add the third factor (Airspeed) with two levels (450, 650), and the 3-

factor test space is:

There is one test space point for each run in the test design table

15

Range

Airspeed

Altitude

B: Altitude

A: Range4 8

10000

5000C: Airspeed

450

650

A: Range

B: Altitude

C: Airspeed

4 5000 450

8 5000 450

4 10000 450

8 10000 450

4 5000 650

8 5000 650

4 10000 650

8 10000 650

Test Design TableTest Space

A

BC

D – D +Fractional Factorial Design – 4 Factors Mixed-level Fractional Factorial Design

Experiment Designs β β β ε= <

= + + +∑ ∑01

k

i i ij i ji i j

Y x x x

Higher Order Design – Nested CCDFace Centered Design (FCD)

β β β β εβ= < = =

= + + + ++∑ ∑ ∑∑ 320

1 11

k k

i i ij i j ii ii i j i

k

iii ii

Y x xx x x

Objectives CriteriaCharacterize Sample SizeCompare PowerScreen ME

2FI

Assume Replicates - Pure ErrorSame factors OrthognonalityGeneral Model = ME + 2FI Terms aliasedType I Error = 0.05 Word length count

VIFResponse type: Numeric Categoric balance

Interaction balance (GBM)Partial aliasingModel misspecification (lack of fit)

3FICurvatureQuadratic

Range of InputsRobustness to outliers/missing data

Points total for rep/LOFFunctionality

Levels per factor - intendedLevels per factor - design

Objectives CriteriaEstimate Sample SizePredict Prediction VarianceOptimize 50% FDSMap 90% FDS

Assume 95% FDSSame factors G-eff (min max prediction)General Model = ME + 2FI+PQ I-eff (avg pred variance)Type I Error = 0.05 Replicates - Pure Error

OrthognonalityVIF

Prediction UniformityRotatability

Uniform precisionModel misspecification (lack of fit)

3FIPure Cubic

Range of InputsSensitivity to outliers/missing data

Influence / LeveragePoints total for rep/LOF

minimize Euclidean distance among ptsDesign Functionality

Levels per factorNumber of evenly spaced levels

Characterize Optimize

17

Metrics for Evaluating Designs

Space Filling Designs - Fast Flexible Filling

Altitude OBA Range Airspeed Tgt Speed Tgt Aspect Aircraft8124.77 73.56656 2464.397 182.0344 9.2927026 67.0516708 MQ-1C

6544.412 51.86306 1996.116 201.4982 20.347079 37.3369392 MQ-97772.047 20.48332 4650.107 143.0561 12.073336 28.8426356 MQ-8B

3089.45 1.429844 7207.649 180.714 29.360262 127.313819 MQ-1C7567.705 10.95524 3520.424 208.0989 5.3408474 159.578213 MQ-8B5085.954 18.3226 2312.825 168.6706 24.859351 95.0826254 MQ-95846.776 5.124027 2069.501 150.6884 8.8732435 170.617956 MQ-8B3143.386 55.45448 5114.701 133.8235 15.27493 163.497747 MQ-1C7417.629 32.65899 7051.655 134.9096 24.684361 177.454579 MQ-96944.406 78.24684 1281.389 139.1982 15.96613 98.9226386 MQ-1C8483.266 86.45598 3184.39 132.5854 22.499819 17.2042035 MQ-8B4166.927 61.31156 1494.924 148.5866 29.132976 3.89247765 MQ-93301.864 14.67895 1051.136 176.388 7.1261783 23.9119169 MQ-8B5926.307 2.465024 5229.345 205.979 26.845357 9.47937507 MQ-93650.802 43.724 4137.213 157.0095 21.586499 71.2319827 MQ-1C6018.577 87.01505 7585.281 190.7568 13.461514 154.844957 MQ-94210.502 79.39332 4600.135 209.3737 24.003057 137.916939 MQ-8B5217.174 68.12297 6710.76 136.4136 7.196721 85.490677 MQ-1C8960.589 60.87401 7813.957 193.0084 27.894638 61.3639641 MQ-1C6447.587 3.213803 7988.409 130.6679 19.625356 64.916259 MQ-8B

7497.99 81.26777 5711.103 174.7418 29.719593 40.2173831 MQ-98571.817 44.36984 5476.161 164.9517 5.9050451 124.202816 MQ-8B

6140.96 34.43484 7860.747 169.7476 10.125772 3.43389979 MQ-1C4321.363 21.93552 6420.699 196.6034 16.343319 77.5481755 MQ-9

Hybrid Design Start with a Space-Fill Design and Augment for a Quadratic Model

SF SF + I-optimal Statistical Power

Missile Example – Hellfire II Romeo

AGM-114R Hellfire II (Hellfire Romeo)•Target: All Target Types•Range: 8,000 m (8,749 yd)•Guidance:

•Semi-active laser homing•Warhead: Multi-function warhead•Weight: 50 kg (110 lb)•Speed: Mach 1.3•Unit Cost: $99,600 (All-Up Round, 2015 USD)[

The AGM-114R "Romeo" Hellfire II entered service in late 2012. It uses a semi-active laser homing guidance system and a K-charge multipurpose warhead[10][11] to engage targets that previously needed multiple Hellfire variants. It will replace AGM-114K, M, N, and P variants in U.S. service – source: Wikipedia.

https://en.wikipedia.org/wiki/AGM-114_Hellfire#cite_note-29



Example Replicated 22 Design

Consider weapon effectiveness performance test of an AGM-114R Hellfire II Romeo, involving ground range and height above target (HAT)

Range

HAT

−− +

+

ANALYSIS AND VALIDATION

Generic M&S and Live Framework

Design-Expert® SoftwareFactor Coding: Actual

Miss Distance (ft)Design points above predicted valueDesign points below predicted value

1 23

X1 = A: RangeX2 = B: HAT

300 700

1100 1500

19002000 3000

4000 5000

6000

0

5

10

15

20

25

Mis

s D

ista

nce

(ft)

A: Range (m)B: HAT (ft)

16.39616.39616.396

Regression Model Illustrated 0β β β β ε= + + + +A A B B AB A By x x x x

0ˆ 15

ˆ 3.22

ˆ 4.72

ˆ 3.52

A

B

AB

yA

B

AB

β

β

β

β

= =

= =

= = −

= =

ˆ 15 3.2 4.7 3.5A B A By x x x x= + − +

Response Surface

General Model

Regression Model

Example: SDB II Weapon Effectiveness Notional live fire program considered 7 factors initially in

Resolution IV screening design—16 runs and 4 centers; augmented the 5 significant factors to face-centered CCD

Simulated same experimental runs Goal is to see if the two are approximately equal

Analysis of Live Data: SDB II Quick Comparison

First cut to see if the two models are similar

Live Fire Versus Actual: SDB II Slopes Compare the differences in slopes (parameter estimates)

between live and simulated values

Then test statistically to see if the parameter values are significantly different

Live Fire Versus Actual: SDB II Slopes Test for all parameters from live design are equal to the values

given by simulated—Custom Test (F)

Not enough evidence to suggest the joint regression surface differs

Summary

Problem: More emphasis being placed on simulation to verify and validate system adequacy throughout acquisition life cycle

Solution: Practices for V&V recently adopting a more rigorous test science and statistical approach have shown promise

Attempts are being made to: Inform organizations and programs of these

methods, and to Develop a process that can be routinely practiced

injecting statistical rigor to drive your modeling

Documents