if you fix everything you lose fixes for everything else

If you fix everything youlose fixes for everything else

Tim Menzies (WVU)Jairus Hihn (JPL)

Oussama Elrawas (WVU)Dan Baker (WVU)Karen Lum (JPL)

International Workshop on Living with Uncertainty,IEEE ASE 2007, Atlanta, Georgia,Nov 5, 2007

This work was conducted at West Virginia University and the Jet PropulsionLaboratory under grants with NASA's Software Assurance Research

Program. Reference herein to any specific commercial product, process, orservice by trademark, manufacturer, or otherwise, does not constitute or

imply its endorsement by the United States Government.

[email protected]@mix.wvu.edu

2

What does this mean?

Q: for what models does (a few peeks) = (many hard stares)?

A supposedly np-hard task

abduction over first-order theories

nogood/2

3

A: models with “collars”

Grow– Monte Carlo a model

Picking input settings atrandom

– For each run Score each output Add score to each input

settings Harvest

– Rule generation experiments, favoring settings with better

scores If “collars”, then

– … small rules …– … learned quickly …– … will suffice

“Collar” variables set the othervariables

– Narrows Amarel in the 60s

– Minimal environments DeKleer ’85

– Master variables Crawford & Baker ‘94

– Feature subset selection Kohavi & John ‘97

– Back doors Williams et al ‘03

– Etc Implications for uncertainty?

Feather & Menzies RE’02

4

STAR: collars + simulated annealing onBoehm’s USC’s software process models

USC software process models for effort, defects, threats– y[i] = impact[i] * project[i] + b[i] for i ∈ {1,2,3,…}– α ≤ project[i] ≤ β : uncertainty in project description– χ ≤ impact[i] ≤ δ : uncertainty in model calibration

Random solution– pick project[i] and impact[i] from any α .. β , χ .. δ– α .. β set via domain knowledge;

e.g. process maturity in 3 to 5– range of χ .. δ known from history;

Score solution by effort (Ef),defects (De) and Threat (Th)

For example

uncontrollable

controllable

5

Two studies y[i] = impact[i] * project[i] + b[i]

Certain methods– Using much historical data– Learn the magnitude of the

impact[i] relationship– With fixed impact[I]

Monte Carlo atandom across theproject[i] settings

E.g.– Regression-based tools that

learn impact[I] from historicalrecords

– 93 records of JPL systems– SCAT:

JPL’s current methods– 2CEE:

WVU’s improvement overSCAT (currently under test)

Methods with more uncertainty– Using no historical data– Monte Carlo at random across

the project[i] settings andimpact[i] settings

E.g.– STAR– Monte Carlo a model– Score each output– Sort settings by their “C”,

“C”= cumulative score– Rule generation experiments,

favoring settings with better “C”.

Tameuncontroll-ables viahistoricalrecords

one two

6

for setting ∈ Sx { value[setting] += E }

Sort all settings by their value– Ignore uncontrollables impact[I]– Assume the top

(1 ≤ i ≤ max) project[I] settings– Randomly select the rest

“Policy point” :– smallest I with lowest E

Median = 50% percentile– Spread = (75-50)% percentile

Bad

Good

22 good ideas

38 not-so- good ideas

Inside STAR

1. sampling - simulated annealing2. summarizing - post-processor

7

SCAT vs 2CEE vsSTAR project[i]

8


Control impact[I] viahistorical data

9


Stagger aroundsuperset of possibleimpact[I]


10

Median: 50% pointSpread : (75 - 50)%




11


STAR/2cee= 50/ 800= 6%STAR/scat= 50/1300= 4%




12

STAR/2cee= 400/1600= 25%STAR/scat= 400/1900= 21%








13









14









Ignoring historical data is useful (!!!?)

15









Ignoring historical data is useful (!!!?)

16









If you fix everything, you lose fixes for everything elseIgnoring historical data is useful (!!!?)

Luke, trust the force, I mean, collars

IEEE Computer, Jan 2007“The strangest thing about software”

Extra Material

19

Related work

Feather, DDP, treatment learning– Optimization of

requirement models

XEROC PARC, 1980s, qualitativerepresentations (QR)

– not overly-specific,– Quickly collected in a new

domain.– Used for model diagnosis

and repair– Can found creative solutions in

larger space of possiblequalitative behaviors, than in the tighter space of precise

quantitative behaviors

Abduction :– World W = minimal set of

assumptions (w.r.t. size) such that T ∪ A => G Not(T U A => error)

– Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,…

– Theoretically slow (NP-hard) butthis should be practical:

Abduction + stochastic sampling Find collars Learn constraints on collars

20

Possible optimizations(not used here)

STAR, an example of a generalprocess:

– Stochastic sampling– Sort settings by “value”– Rule generation experiments

favoring highly “value”-ed settings See also, elite sampling in the

cross-entropy method

If SA convergence too slow– Try moving back select into the SA;– Constrain solution mutation to

prefer highly “value”-ed settings

BORE (best or rest)– n runs– Best= top 10% scores– Rest = remaining 90%– {a,b} = frequency of

discretized range in {best, rest– Sort settings by

-1 * (a/n)2 / (a/n + b/n)

Other valuable tricks:– Incremental discretization:

Gama&Pinto’s PID +Fayyad&Irani

– Limited discrepancy search:Harvey&Ginsberg

– Treatment learning: Menzies&Yu

Askme why,off-line

“Uncertaintyhelps

planning”

(questions? comments?)

22

At the “policy point”,STAR’s random solutionsare surprisingly accurateLC : learn impact[i] via regression (JPL data)STAR: no tuning, randomly pick impact[i]

Diff = ∑ mre(lc)/ ∑ mre(star)Mre = abs(predicted - actual) /actual

{ “●” “❍”} same at {95, 99}% confidence (MWU)

Why so little Diff (median= 75%)?– Most influential inputs tightly constrained

diff same

diff diff

same same

diff diff

same same

63% 66%ground

111% ●❍ 112% ●❍OSP

125% ●❍ 99%OSP2

121% ●❍ 101% ●❍flight

75% 91%all

tacticalstrategic∑ mre(lc) / ∑ mre(star)

23

(Model uncertainty = collars) << inputs

In many models, a few “collar” variables set the other variables– Narrows (Amarel in the 60s)– Minimal environments (DeKleer ’85)– Master variables (Crawford & Baker ‘94)– Feature subset selection (Kohavi & John ‘97)– Back doors (Williams et al ‘03)– See “The Strangest Thing About Software (IEEE Computer, Jan’07)”

Collars appear in all execution traces (by definition)– You don’t have to find the collars, they’ll find you

So, to handle uncertainty– Write a simulator– Stagger over uncertainties– From stagger, find collars– Constrain collars

This talk: a very simple example of this process

24

Comparisons Standard software process modeling

– Models written more than run (PROSIM community) Limited sensitivity analysis Limited trade space

– Or, expensive, error-prone, incomplete data collectionprograms Point solutions

Here:– No data collection– Found stable conclusions

within a space of possibilities– Search : very simple– Solution, not brittle

With trade-off space

22 good ideas, sorted

25

Summary Living with uncertainty

– Sometimes, simpler than youmay think

– more useful than you mightthink

Simple:– Here, the smallest change

to simulating annealing

Useful:– Sometimes uncertainty can

teach you more than certainty– If you fix everything, you lose

fixes to everything else

Collars control certainty– Uncertainty plus constrained

collars → more certainty– Also, can drive model to

better performance

An example youcan explain to

any business user

Bad

Good

22 good ideas, sorted

An example youcan explain to

any business user

if you fix everything you lose fixes for everything else

Business

impacti relationship

project description

historical records

point spread

cee vs starprojecti8

cee vs starprojecti7

input settings

percentile spread