on parameter tuning in search-based software engineering: a replicated empirical study

27
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study Abdel Salam Sayyad Katerina Goseva-Popstojanova Tim Menzies Hany Ammar West Virginia University, USA International Workshop on Replication in Software Engineering Research (RESER) Oct 9, 2013

Upload: abdel-salam-sayyad

Post on 11-May-2015

182 views

Category:

Technology


0 download

DESCRIPTION

Presentation at RESER workshop

TRANSCRIPT

Page 1: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

On Parameter Tuning in Search-Based Software Engineering:

A Replicated Empirical Study

Abdel Salam Sayyad Katerina Goseva-Popstojanova

Tim MenziesHany Ammar

West Virginia University, USA

International Workshop on Replication in Software Engineering Research (RESER)

Oct 9, 2013

Page 2: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

2

Sound bitesSearch-based Software Engineering

Is here… to stay.A helper… Not an alternative to human SE

Randomness… is an essential part of Search Algorithms

… hence the need for statistical examination (A lot to learn from Empirical SE)

Parameter TuningA real problem…Default values (rules of thumb) do exist… and (sadly?) they are being followed

Default parameter values fail to optimize performance… … As seen in the original study, and in this replication…

No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97]the same parameter values don’t optimize all algorithms for all

problems.

Page 3: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

Page 4: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

Page 5: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

5

Searching for what?

• Correct solutions…– Conform to system relationships and constraints.

• Optimal solutions…– Achieve user objectives/preferences…

• Complex problems have big Search spaces– Exhaustive search not a practical idea.

Page 6: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

6

Genetic Algorithm

• Start with a large population of candidate solutions… (How large?)

• Evaluate the fitness of your solutions.• Let your candidate solutions crossover –

exchange genes… (How often?)• Mutate a small portion of your solutions.

(How small?)• How do those choices affect performance?

Page 7: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

7

Multi-objective Optimization

Higher-level Decision Making

The Pareto Front

The Chosen Solution

Page 8: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

8

Survival of the fittest(according to NSGA-II [Deb et al. 2002])

Boolean dominance (x Dominates y, or does not):- In no objective is x worse than y- In at least one objective, x is better than y Crowd

pruning

Page 9: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

9

Indicator-Based Evolutionary Algorithm (IBEA) [Zitzler and Kunzli ‘04]

1) For {old generation + new generation} do– Add up every individual’s amount of dominance with

respect to everyone else

– Sort all instances by F– Delete worst, recalculate, delete worst, recalculate, …

2) Then, standard GA (cross-over, mutation) on the survivors Create a new generation Back to 1.

Page 10: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

10

NSGA-II… the default algorithm• Much prior work in SBSE (*)

--------------------------(*) Sayyad and Ammar, RAISE’13

Used NSGA-II

Didn’t state why!

Page 11: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

Page 12: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

12

The Original Study

• A. Arcuri and G. Fraser, "On Parameter Tuning in Search Based Software Engineering," in Proc. SSBSE, 2011, pp. 33-47.

• A. Arcuri and G. Fraser, "Parameter Tuning or Default Values? An Empirical Investigation in Search-Based Software Engineering," Empirical Software Engineering, Feb 2013.

• Problem: generating test vectors for object-oriented software.

• Fitness function: percentage of test coverage.

Page 13: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

13

Results of original study

• Different parameter settings cause very large variance in the performance.

• Default parameter settings perform relatively well, but are far from optimal on individual problem instances.

Page 14: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

Page 15: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

15

Feature–oriented domain analysis [Kang 1990]

• Feature models = a lightweight method for defining a space of options

• De facto standard for modeling variability, e.g. Software Product Lines

Cross-Tree Constraints

Cross-Tree Constraints

Page 16: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

16

What are the user preferences?• Suppose each feature had the following metrics:

1. Boolean USED_BEFORE?2. Integer DEFECTS3. Real COST

• Show me the space of “best options” according to the objectives:1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)2. That offers most features3. Maximize overall feature that were used before. (promote re-use)4. Minimize overall known defects.5. Minimize cost.

Page 17: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

17

Previous Work [Sayyad et al. ICSE’13]

• IBEA (continuous dominance criterion) beats NSGA-II and a host of other algorithms based on Boolean dominance criterion.

• Especially with a high number of objectives.• Quality indicators:– Percentage of conforming (useable) solutions

• We’re interested in 100% conforming solutions.

– Hypervolume (how close to optimal?)– Spread (how diverse?)

Page 18: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

18

Setup

Page 19: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

19

What are “default settings”?

• Population size = 100

• Crossover rate = 80%– 60% < Crossover rate < 90%• [A. E. Eiben and J. E. Smith, Introduction to Evolutionary

Computing.: Springer, 2003.]

• Mutation rate = 1/Features • [one bit out of the whole string]

Page 20: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

20

Research Questions

Page 21: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

21

Results [10 sec / algorithm / FM]

Page 22: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

22

Answer to RQ1

• RQ1: How Large is the Potential Impact of a Wrong Choice of Parameter Settings?

• We confirm Arcuri and Fraser’s conclusion: “Different parameter settings cause very large variance in the performance.”

Page 23: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

23

Answer to RQ2

• RQ2: How Does a “Default” Setting Compare to the Best and Worst Achievable Performance?

• Arcuri and Fraser concluded that: “Default parameter settings perform relatively well, but are far from optimal on individual problem instances.”

• We make a stronger conclusion: “Default parameter settings perform generally poorly, but might perform relatively well on individual problem instances.”

Page 24: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

24

Answer to RQ3

• RQ3: How does the performance of IBEA’s best tuning compare to NSGA-II’s best tuning?

• Our results show that “IBEA’s best tuning performs generally much better than NSGA-II’s best tuning.”

Page 25: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

25

RQ4: Parameter Training• Find best tuning for a group of problem instances, apply it to a

new problem instance, would it be best tuning for the new problem?

• Arcuri and Fraser concluded that: “Tuning should be done on a very large sample of problem instances. Otherwise, the obtained parameter settings are likely to be worse than arbitrary default values.”

• Our conclusion: “Tuning on a sample of problem instances does not, in general, result in the best parameter values for a new problem instance, but the obtained setting are generally better than the defaults settings.”

Page 26: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

Page 27: On Parameter Tuning in Search-Based Software Engineering: A Replicated Empirical Study

27

Conclusion

• And, sadly, many SBSE researchers choose “default” algorithms (e.g. NSGA-II) along with “default” parameters.

• Default parameter values fail to optimize performance…

Acknowledgment

This research work was funded by the

Qatar National Research Fund under the National Priorities

Research Program • Alternatives?– A long way to go!

• Parameter control• Adaptive parameter control