injecting statistical rigor to drive your modeling
TRANSCRIPT
Injecting Statistical Rigor to Drive Your Modeling & Simulation Validation Program
Jim Simpson, Jim Wisnowski, Stargel Doane, Laura Freeman, Kelly Avery
ITEA, Dec 2018
DEFINITIONS, POLICY AND EXAMPLES
Introduction
• Modeling and Simulation (M&S):• “The discipline that comprises the development and/or use of
models and simulations.” (DoDD 5000.59, DoDI 5000.61)
• “The use of models, including emulators, prototypes, simulators, and stimulators, either statically or over time, to develop data as a basis for making managerial or technical decisions.” (DMSCO M&S Glossary)
• “Using computers to imitate, or simulate, the operations of various kinds of real-world facilities or processes (a system).” (Law 2015)
Live Testing and Simulation Integrated
4
Adapted From Coleman, H. W., & Steele, W. G. (2018). Experimentation, validation, and uncertainty analysis for engineers. John Wiley & Sons.
Reality
Experimental data Simulation result
ExperimentModeling
AndSimulation
Policy Example
5
• Per DOT&E Memorandum: Guidance on the Validation of Models and Simulation used in Operational Test and Live Fire Assessments dated 16 March 2016
• “Whenever M&S is used for operational test and evaluation, I need to have the same understanding of and confidence in the data obtained from M&S as I do any other data collected during an operational or live fire test.”
• “…I expect the validation of M&S to include the same rigorous statistical and analytical principles that have become standard practice when designing live tests.“
DOD M&S Example: Space System
6
• Examine coverage, probability of detection, data collection, etc.
DOD M&S Example: Missile Program
• Estimate miss distance and probability of kill for missiles intercepting threats
7
Image courtesy of US Air Force, approved for public release
PLANNING FOR MS V&V
Simulation Information vs. Test Data
Adapted From Coleman, H. W., & Steele, W. G. (2018). Experimentation, validation, and uncertainty analysis for engineers. John Wiley & Sons.
Design of Experiments (DOE)• Statistical Design of Experiments: “the process of
planning the experiment so that appropriate data that can be analyzed by statistical methods will be collected, resulting in valid and objective conclusions.” (Montgomery, 2004)
• Refers to a methodology (or overall thought process)
• Not limited to factorial designs
• Proven in industry and the DoD
• Sequential experimentation
Screen Characterize Optimize
Risk (coverage and sample size)• Coverage of the operational envelope and sample size:
Referent vs. Simulation Prediction
REF
EREN
T (c
over
age)
SIMULATION (coverage)Small N and simple model
low
highRisk (coverage)
Large N and complex model
Model-Test-Model Method for V&V
• Build the M&S• Intended use and requirements focused
• Collect M&S data via Design• Use DOE for defining runs
• Analyze M&S runs• Inform live fire, sensitivity
analysis, statistical emulator • Collect SME feedback
• Qualitative “first cut” at validation• Collect Live-Fire data via • Compare M&S to Live
• Use calibration to update model Source: MITRE
TEST DESIGN
Factors in a Test Space
We compare designs with test points allocated to combinations of factors and levels geometrically, where each factor occupies a physical dimension.
With only one factor (Range) with two levels (4,8) the design can be displayed as:
With two factors (Range, Altitude), with two levels each (4,8 for Range; 5000,10000 for Altitude), the design can be displayed as:
14
Range(Factor A)
4 8
This test point is the Range = 4, Altitude = 10000 weapon release condition
B: Altitude
A: Range4 8
10000
5000
3 Factor Test Space and Design Add the third factor (Airspeed) with two levels (450, 650), and the 3-
factor test space is:
There is one test space point for each run in the test design table
15
Range
Airspeed
Altitude
B: Altitude
A: Range4 8
10000
5000C: Airspeed
450
650
A: Range
B: Altitude
C: Airspeed
4 5000 450
8 5000 450
4 10000 450
8 10000 450
4 5000 650
8 5000 650
4 10000 650
8 10000 650
Test Design TableTest Space
A
BC
D – D +Fractional Factorial Design – 4 Factors Mixed-level Fractional Factorial Design
Experiment Designs β β β ε= <
= + + +∑ ∑01
k
i i ij i ji i j
Y x x x
Higher Order Design – Nested CCDFace Centered Design (FCD)
β β β β εβ= < = =
= + + + ++∑ ∑ ∑∑ 320
1 11
k k
i i ij i j ii ii i j i
k
iii ii
Y x xx x x
Objectives CriteriaCharacterize Sample SizeCompare PowerScreen ME
2FI
Assume Replicates - Pure ErrorSame factors OrthognonalityGeneral Model = ME + 2FI Terms aliasedType I Error = 0.05 Word length count
VIFResponse type: Numeric Categoric balance
Interaction balance (GBM)Partial aliasingModel misspecification (lack of fit)
3FICurvatureQuadratic
Range of InputsRobustness to outliers/missing data
Points total for rep/LOFFunctionality
Levels per factor - intendedLevels per factor - design
Objectives CriteriaEstimate Sample SizePredict Prediction VarianceOptimize 50% FDSMap 90% FDS
Assume 95% FDSSame factors G-eff (min max prediction)General Model = ME + 2FI+PQ I-eff (avg pred variance)Type I Error = 0.05 Replicates - Pure Error
OrthognonalityVIF
Prediction UniformityRotatability
Uniform precisionModel misspecification (lack of fit)
3FIPure Cubic
Range of InputsSensitivity to outliers/missing data
Influence / LeveragePoints total for rep/LOF
minimize Euclidean distance among ptsDesign Functionality
Levels per factorNumber of evenly spaced levels
Characterize Optimize
17
Metrics for Evaluating Designs
Space Filling Designs - Fast Flexible Filling
Altitude OBA Range Airspeed Tgt Speed Tgt Aspect Aircraft8124.77 73.56656 2464.397 182.0344 9.2927026 67.0516708 MQ-1C
6544.412 51.86306 1996.116 201.4982 20.347079 37.3369392 MQ-97772.047 20.48332 4650.107 143.0561 12.073336 28.8426356 MQ-8B
3089.45 1.429844 7207.649 180.714 29.360262 127.313819 MQ-1C7567.705 10.95524 3520.424 208.0989 5.3408474 159.578213 MQ-8B5085.954 18.3226 2312.825 168.6706 24.859351 95.0826254 MQ-95846.776 5.124027 2069.501 150.6884 8.8732435 170.617956 MQ-8B3143.386 55.45448 5114.701 133.8235 15.27493 163.497747 MQ-1C7417.629 32.65899 7051.655 134.9096 24.684361 177.454579 MQ-96944.406 78.24684 1281.389 139.1982 15.96613 98.9226386 MQ-1C8483.266 86.45598 3184.39 132.5854 22.499819 17.2042035 MQ-8B4166.927 61.31156 1494.924 148.5866 29.132976 3.89247765 MQ-93301.864 14.67895 1051.136 176.388 7.1261783 23.9119169 MQ-8B5926.307 2.465024 5229.345 205.979 26.845357 9.47937507 MQ-93650.802 43.724 4137.213 157.0095 21.586499 71.2319827 MQ-1C6018.577 87.01505 7585.281 190.7568 13.461514 154.844957 MQ-94210.502 79.39332 4600.135 209.3737 24.003057 137.916939 MQ-8B5217.174 68.12297 6710.76 136.4136 7.196721 85.490677 MQ-1C8960.589 60.87401 7813.957 193.0084 27.894638 61.3639641 MQ-1C6447.587 3.213803 7988.409 130.6679 19.625356 64.916259 MQ-8B
7497.99 81.26777 5711.103 174.7418 29.719593 40.2173831 MQ-98571.817 44.36984 5476.161 164.9517 5.9050451 124.202816 MQ-8B
6140.96 34.43484 7860.747 169.7476 10.125772 3.43389979 MQ-1C4321.363 21.93552 6420.699 196.6034 16.343319 77.5481755 MQ-9
Hybrid Design Start with a Space-Fill Design and Augment for a Quadratic Model
SF SF + I-optimal Statistical Power
Missile Example – Hellfire II Romeo
AGM-114R Hellfire II (Hellfire Romeo)•Target: All Target Types•Range: 8,000 m (8,749 yd)•Guidance:
•Semi-active laser homing•Warhead: Multi-function warhead•Weight: 50 kg (110 lb)•Speed: Mach 1.3•Unit Cost: $99,600 (All-Up Round, 2015 USD)[
The AGM-114R "Romeo" Hellfire II entered service in late 2012. It uses a semi-active laser homing guidance system and a K-charge multipurpose warhead[10][11] to engage targets that previously needed multiple Hellfire variants. It will replace AGM-114K, M, N, and P variants in U.S. service – source: Wikipedia.
Example Replicated 22 Design
Consider weapon effectiveness performance test of an AGM-114R Hellfire II Romeo, involving ground range and height above target (HAT)
Range
HAT
−− +
+
ANALYSIS AND VALIDATION
Generic M&S and Live Framework
Design-Expert® SoftwareFactor Coding: Actual
Miss Distance (ft)Design points above predicted valueDesign points below predicted value
1 23
X1 = A: RangeX2 = B: HAT
300 700
1100 1500
19002000 3000
4000 5000
6000
0
5
10
15
20
25
Mis
s D
ista
nce
(ft)
A: Range (m)B: HAT (ft)
16.39616.39616.396
Regression Model Illustrated 0β β β β ε= + + + +A A B B AB A By x x x x
0ˆ 15
ˆ 3.22
ˆ 4.72
ˆ 3.52
A
B
AB
yA
B
AB
β
β
β
β
= =
= =
= = −
= =
ˆ 15 3.2 4.7 3.5A B A By x x x x= + − +
Response Surface
General Model
Regression Model
Example: SDB II Weapon Effectiveness Notional live fire program considered 7 factors initially in
Resolution IV screening design—16 runs and 4 centers; augmented the 5 significant factors to face-centered CCD
Simulated same experimental runs Goal is to see if the two are approximately equal
Analysis of Live Data: SDB II Quick Comparison
First cut to see if the two models are similar
Live Fire Versus Actual: SDB II Slopes Compare the differences in slopes (parameter estimates)
between live and simulated values
Then test statistically to see if the parameter values are significantly different
Live Fire Versus Actual: SDB II Slopes Test for all parameters from live design are equal to the values
given by simulated—Custom Test (F)
Not enough evidence to suggest the joint regression surface differs
Summary
Problem: More emphasis being placed on simulation to verify and validate system adequacy throughout acquisition life cycle
Solution: Practices for V&V recently adopting a more rigorous test science and statistical approach have shown promise
Attempts are being made to: Inform organizations and programs of these
methods, and to Develop a process that can be routinely practiced