on parameter tuning in search-based software engineering: a replicated empirical study

On Parameter Tuning in Search-Based Software Engineering:

A Replicated Empirical Study

Abdel Salam Sayyad Katerina Goseva-Popstojanova

Tim MenziesHany Ammar

West Virginia University, USA

International Workshop on Replication in Software Engineering Research (RESER)

Oct 9, 2013

2

Sound bitesSearch-based Software Engineering

Is here… to stay.A helper… Not an alternative to human SE

Randomness… is an essential part of Search Algorithms

… hence the need for statistical examination (A lot to learn from Empirical SE)

Parameter TuningA real problem…Default values (rules of thumb) do exist… and (sadly?) they are being followed

Default parameter values fail to optimize performance… … As seen in the original study, and in this replication…

No Free Lunch Theorems for Optimization [Wolpert and Macready ‘97]the same parameter values don’t optimize all algorithms for all

problems.

Roadmap

① Randomness of Search② The original study③ The replication④ Conclusion

5

Searching for what?

• Correct solutions…– Conform to system relationships and constraints.

• Optimal solutions…– Achieve user objectives/preferences…

• Complex problems have big Search spaces– Exhaustive search not a practical idea.

6

Genetic Algorithm

• Start with a large population of candidate solutions… (How large?)

• Evaluate the fitness of your solutions.• Let your candidate solutions crossover –

exchange genes… (How often?)• Mutate a small portion of your solutions.

(How small?)• How do those choices affect performance?

7

Multi-objective Optimization

Higher-level Decision Making

The Pareto Front

The Chosen Solution

8

Survival of the fittest(according to NSGA-II [Deb et al. 2002])

Boolean dominance (x Dominates y, or does not):- In no objective is x worse than y- In at least one objective, x is better than y Crowd

pruning

9

Indicator-Based Evolutionary Algorithm (IBEA) [Zitzler and Kunzli ‘04]

1) For {old generation + new generation} do– Add up every individual’s amount of dominance with

respect to everyone else

– Sort all instances by F– Delete worst, recalculate, delete worst, recalculate, …

2) Then, standard GA (cross-over, mutation) on the survivors Create a new generation Back to 1.

10

NSGA-II… the default algorithm• Much prior work in SBSE (*)

--------------------------(*) Sayyad and Ammar, RAISE’13

Used NSGA-II

Didn’t state why!

Roadmap


12

The Original Study

• A. Arcuri and G. Fraser, "On Parameter Tuning in Search Based Software Engineering," in Proc. SSBSE, 2011, pp. 33-47.

• A. Arcuri and G. Fraser, "Parameter Tuning or Default Values? An Empirical Investigation in Search-Based Software Engineering," Empirical Software Engineering, Feb 2013.

• Problem: generating test vectors for object-oriented software.

• Fitness function: percentage of test coverage.

13

Results of original study

• Different parameter settings cause very large variance in the performance.

• Default parameter settings perform relatively well, but are far from optimal on individual problem instances.

Roadmap


15

Feature–oriented domain analysis [Kang 1990]

• Feature models = a lightweight method for defining a space of options

• De facto standard for modeling variability, e.g. Software Product Lines

Cross-Tree Constraints

Cross-Tree Constraints

16

What are the user preferences?• Suppose each feature had the following metrics:

1. Boolean USED_BEFORE?2. Integer DEFECTS3. Real COST

• Show me the space of “best options” according to the objectives:1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)2. That offers most features3. Maximize overall feature that were used before. (promote re-use)4. Minimize overall known defects.5. Minimize cost.

17

Previous Work [Sayyad et al. ICSE’13]

• IBEA (continuous dominance criterion) beats NSGA-II and a host of other algorithms based on Boolean dominance criterion.

• Especially with a high number of objectives.• Quality indicators:– Percentage of conforming (useable) solutions

• We’re interested in 100% conforming solutions.

– Hypervolume (how close to optimal?)– Spread (how diverse?)

18

Setup

19

What are “default settings”?

• Population size = 100

• Crossover rate = 80%– 60% < Crossover rate < 90%• [A. E. Eiben and J. E. Smith, Introduction to Evolutionary

Computing.: Springer, 2003.]

• Mutation rate = 1/Features • [one bit out of the whole string]

20

Research Questions

21

Results [10 sec / algorithm / FM]

22

Answer to RQ1

• RQ1: How Large is the Potential Impact of a Wrong Choice of Parameter Settings?

• We confirm Arcuri and Fraser’s conclusion: “Different parameter settings cause very large variance in the performance.”

23

Answer to RQ2

• RQ2: How Does a “Default” Setting Compare to the Best and Worst Achievable Performance?

• Arcuri and Fraser concluded that: “Default parameter settings perform relatively well, but are far from optimal on individual problem instances.”

• We make a stronger conclusion: “Default parameter settings perform generally poorly, but might perform relatively well on individual problem instances.”

24

Answer to RQ3

• RQ3: How does the performance of IBEA’s best tuning compare to NSGA-II’s best tuning?

• Our results show that “IBEA’s best tuning performs generally much better than NSGA-II’s best tuning.”

25

RQ4: Parameter Training• Find best tuning for a group of problem instances, apply it to a

new problem instance, would it be best tuning for the new problem?

• Arcuri and Fraser concluded that: “Tuning should be done on a very large sample of problem instances. Otherwise, the obtained parameter settings are likely to be worse than arbitrary default values.”

• Our conclusion: “Tuning on a sample of problem instances does not, in general, result in the best parameter values for a new problem instance, but the obtained setting are generally better than the defaults settings.”

Roadmap


27

Conclusion

• And, sadly, many SBSE researchers choose “default” algorithms (e.g. NSGA-II) along with “default” parameters.

• Default parameter values fail to optimize performance…

Acknowledgment

This research work was funded by the

Qatar National Research Fund under the National Priorities

Research Program • Alternatives?– A long way to go!

• Parameter control• Adaptive parameter control

on parameter tuning in search-based software engineering: a replicated empirical study

Technology

default parameter settings

parameter tuning

default settings

search algorithms

empirical software engineering

default algorithm

replication conclusion

optimal solutions