if you fix everything you lose fixes for everything else
DESCRIPTION
International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007TRANSCRIPT
If you fix everything youlose fixes for everything else
Tim Menzies (WVU)Jairus Hihn (JPL)
Oussama Elrawas (WVU)Dan Baker (WVU)Karen Lum (JPL)
International Workshop on Living with Uncertainty,IEEE ASE 2007, Atlanta, Georgia,Nov 5, 2007
This work was conducted at West Virginia University and the Jet PropulsionLaboratory under grants with NASA's Software Assurance Research
Program. Reference herein to any specific commercial product, process, orservice by trademark, manufacturer, or otherwise, does not constitute or
imply its endorsement by the United States Government.
[email protected]@mix.wvu.edu
2
What does this mean?
Q: for what models does (a few peeks) = (many hard stares)?
A supposedly np-hard task
abduction over first-order theories
nogood/2
3
A: models with “collars”
Grow– Monte Carlo a model
Picking input settings atrandom
– For each run Score each output Add score to each input
settings Harvest
– Rule generation experiments, favoring settings with better
scores If “collars”, then
– … small rules …– … learned quickly …– … will suffice
“Collar” variables set the othervariables
– Narrows Amarel in the 60s
– Minimal environments DeKleer ’85
– Master variables Crawford & Baker ‘94
– Feature subset selection Kohavi & John ‘97
– Back doors Williams et al ‘03
– Etc Implications for uncertainty?
Feather & Menzies RE’02
4
STAR: collars + simulated annealing onBoehm’s USC’s software process models
USC software process models for effort, defects, threats– y[i] = impact[i] * project[i] + b[i] for i ∈ {1,2,3,…}– α ≤ project[i] ≤ β : uncertainty in project description– χ ≤ impact[i] ≤ δ : uncertainty in model calibration
Random solution– pick project[i] and impact[i] from any α .. β , χ .. δ– α .. β set via domain knowledge;
e.g. process maturity in 3 to 5– range of χ .. δ known from history;
Score solution by effort (Ef),defects (De) and Threat (Th)
For example
uncontrollable
controllable
5
Two studies y[i] = impact[i] * project[i] + b[i]
Certain methods– Using much historical data– Learn the magnitude of the
impact[i] relationship– With fixed impact[I]
Monte Carlo atandom across theproject[i] settings
E.g.– Regression-based tools that
learn impact[I] from historicalrecords
– 93 records of JPL systems– SCAT:
JPL’s current methods– 2CEE:
WVU’s improvement overSCAT (currently under test)
Methods with more uncertainty– Using no historical data– Monte Carlo at random across
the project[i] settings andimpact[i] settings
E.g.– STAR– Monte Carlo a model– Score each output– Sort settings by their “C”,
“C”= cumulative score– Rule generation experiments,
favoring settings with better “C”.
Tameuncontroll-ables viahistoricalrecords
one two
6
for setting ∈ Sx { value[setting] += E }
Sort all settings by their value– Ignore uncontrollables impact[I]– Assume the top
(1 ≤ i ≤ max) project[I] settings– Randomly select the rest
“Policy point” :– smallest I with lowest E
Median = 50% percentile– Spread = (75-50)% percentile
Bad
Good
22 good ideas
38 not-so- good ideas
Inside STAR
1. sampling - simulated annealing2. summarizing - post-processor
7
SCAT vs 2CEE vsSTAR project[i]
8
SCAT vs 2CEE vsSTAR project[i]
Control impact[I] viahistorical data
9
SCAT vs 2CEE vsSTAR project[i]
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
10
Median: 50% pointSpread : (75 - 50)%
SCAT vs 2CEE vsSTAR project[i]
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
11
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
SCAT vs 2CEE vsSTAR project[i]
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
12
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vsSTAR project[i]
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
13
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vsSTAR project[i]
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
14
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vsSTAR project[i]
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
Ignoring historical data is useful (!!!?)
15
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vsSTAR project[i]
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
Ignoring historical data is useful (!!!?)
16
STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%
Median: 50% pointSpread : (75 - 50)%
STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%
STAR/2cee= 30/620= 5%STAR/scat= 30/730= 4%
SCAT vs 2CEE vsSTAR project[i]
STAR/2cee= 180/ 400= 45%STAR/scat= 180/1900= 60%
Stagger aroundsuperset of possibleimpact[I]
Control impact[I] viahistorical data
If you fix everything, you lose fixes for everything elseIgnoring historical data is useful (!!!?)
Luke, trust the force, I mean, collars
IEEE Computer, Jan 2007“The strangest thing about software”
Extra Material
19
Related work
Feather, DDP, treatment learning– Optimization of
requirement models
XEROC PARC, 1980s, qualitativerepresentations (QR)
– not overly-specific,– Quickly collected in a new
domain.– Used for model diagnosis
and repair– Can found creative solutions in
larger space of possiblequalitative behaviors, than in the tighter space of precise
quantitative behaviors
Abduction :– World W = minimal set of
assumptions (w.r.t. size) such that T ∪ A => G Not(T U A => error)
– Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,…
– Theoretically slow (NP-hard) butthis should be practical:
Abduction + stochastic sampling Find collars Learn constraints on collars
20
Possible optimizations(not used here)
STAR, an example of a generalprocess:
– Stochastic sampling– Sort settings by “value”– Rule generation experiments
favoring highly “value”-ed settings See also, elite sampling in the
cross-entropy method
If SA convergence too slow– Try moving back select into the SA;– Constrain solution mutation to
prefer highly “value”-ed settings
BORE (best or rest)– n runs– Best= top 10% scores– Rest = remaining 90%– {a,b} = frequency of
discretized range in {best, rest– Sort settings by
-1 * (a/n)2 / (a/n + b/n)
Other valuable tricks:– Incremental discretization:
Gama&Pinto’s PID +Fayyad&Irani
– Limited discrepancy search:Harvey&Ginsberg
– Treatment learning: Menzies&Yu
Askme why,off-line
“Uncertaintyhelps
planning”
(questions? comments?)
22
At the “policy point”,STAR’s random solutionsare surprisingly accurateLC : learn impact[i] via regression (JPL data)STAR: no tuning, randomly pick impact[i]
Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual
{ “●” “❍”} same at {95, 99}% confidence (MWU)
Why so little Diff (median= 75%)?– Most influential inputs tightly constrained
diff same
diff diff
same same
diff diff
same same
63% 66%ground
111% ●❍ 112% ●❍OSP
125% ●❍ 99%OSP2
121% ●❍ 101% ●❍flight
75% 91%all
tacticalstrategic∑ mre(lc) / ∑ mre(star)
23
(Model uncertainty = collars) << inputs
In many models, a few “collar” variables set the other variables– Narrows (Amarel in the 60s)– Minimal environments (DeKleer ’85)– Master variables (Crawford & Baker ‘94)– Feature subset selection (Kohavi & John ‘97)– Back doors (Williams et al ‘03)– See “The Strangest Thing About Software (IEEE Computer, Jan’07)”
Collars appear in all execution traces (by definition)– You don’t have to find the collars, they’ll find you
So, to handle uncertainty– Write a simulator– Stagger over uncertainties– From stagger, find collars– Constrain collars
This talk: a very simple example of this process
24
Comparisons Standard software process modeling
– Models written more than run (PROSIM community) Limited sensitivity analysis Limited trade space
– Or, expensive, error-prone, incomplete data collectionprograms Point solutions
Here:– No data collection– Found stable conclusions
within a space of possibilities– Search : very simple– Solution, not brittle
With trade-off space
22 good ideas, sorted
25
Summary Living with uncertainty
– Sometimes, simpler than youmay think
– more useful than you mightthink
Simple:– Here, the smallest change
to simulating annealing
Useful:– Sometimes uncertainty can
teach you more than certainty– If you fix everything, you lose
fixes to everything else
Collars control certainty– Uncertainty plus constrained
collars → more certainty– Also, can drive model to
better performance
An example youcan explain to
any business user
Bad
Good
22 good ideas, sorted
An example youcan explain to
any business user