vrp2013 - comp aspects vrp
TRANSCRIPT
Computational Aspects Of Vehicle Routing
Victor PillacMay 20
VRP 2013, Angers, France
1
Agenda
2
• Introduction• What is complexity? Why is it important?
• Data structures• How to represent a solution efficiently?
• Algorithmic tricks• What are the main bottlenecks and how to avoid them?
• Parallelization• How do parallel computing work? Why, when, and how to
parallelize?• Software engineering
• How to design flexible and reusable code?• Resources
• How to avoid reinventing the wheel?
INTRODUCTION
3
About Me
• Finished my Ph. D. in 2012 at the Ecole des Mines de Nantes (France) and Universidad de Los Andes (Colombia)• Dynamic vehicle routing: solution methods and computational tools
• Since Oct. 2012, researcher at NICTA (Melbourne, Australia)
• Disaster management team• NICTA in a few numbers:
• 700 staff, 260 PhDs• 7 research groups, 4 business teams
• 550+ publications in 2012
4
Assumptions• General knowledge on vehicle routing• General knowledge of common heuristics
• Local search• Variable Neighborhood Search (VNS)
• General knowledge of object-oriented programming• Examples are in Java
5
Time Complexity
6
• Measure the worst case number of operations• Expressed as a function of the size of the problem n
Time Complexity
6
• Measure the worst case number of operations• Expressed as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Time Complexity
6
• Measure the worst case number of operations• Expressed as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Performs n*(1+1+2) = 4n operationsComplexity is O(n)
Time Complexity
6
• Measure the worst case number of operations• Expressed as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Performs n*(1+1+2) = 4n operationsComplexity is O(n)
For S ⊆ {1..n}a = 1 + |S|
Time Complexity
6
• Measure the worst case number of operations• Expressed as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Performs n*(1+1+2) = 4n operationsComplexity is O(n)
For S ⊆ {1..n}a = 1 + |S|
Performs 2n operationsComplexity is O(2n)
Space Complexity
7
• Measure the worst case memory usage• Expressed in as a function of the size of the problem n
Space Complexity
7
• Measure the worst case memory usage• Expressed in as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Space Complexity
7
• Measure the worst case memory usage• Expressed in as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Stores at most 4 integers simultaneouslyComplexity is O(1)
Space Complexity
7
• Measure the worst case memory usage• Expressed in as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Stores at most 4 integers simultaneouslyComplexity is O(1)
For S ⊆ {1..n}a = 1 + |S|
Space Complexity
7
• Measure the worst case memory usage• Expressed in as a function of the size of the problem n
For i = 1 to na = 1 + ib = 2 * ac = a * b + 2
Stores at most 4 integers simultaneouslyComplexity is O(1)
For S ⊆ {1..n}a = 1 + |S|
Stores at most n+1 integers simultaneouslyComplexity is O(n)
Complexity In Practice
8
10 100 1000 10,000 100,000 1000,000
n 0.1 ns 1 ns 10 ns 100 ns 1 µs 10 µs
n.log(n) 0.1 ns 2 ns 30 ns 400 ns 5 µs 60 µs
n2 1 ns 100 ns 10 µs 1 ms 100 ms 10 s
n3 1 ns 10 µs 10 ms 1 s 2.7 h 115 d
en 22 µs 8.5 1024 years - - - -
Computational time for a single floating point operation on a recent desktop processor
Complexity In Practice
9
10 100 1000 10,000 100,000 1000,000
n 320 b 3.2 kb 32 kb 320 kb 3.2 Mb 32 Mb
n.log(n) 320 b 64 kb 96 kb 1.28 Mb 16 Mb 190 Mb
n2 3.2 kb 320 kb 32 Mb 3.2 Gb 320 Gb 32 Tb
n3 32 kb 32 Mb 32 Gb 32 Tb 32 Pb 32 Eb*
en 700 kb 8 1026 Eb
Memory requirement to store a single floating point precision number
(*The world’s storage capacity is estimated to be 300 Eb - or 300 billion Gb)
Local Search & Terminology
10
Initial solution
S0
Local Search & Terminology
10
Initial solution
NeighborhoodS0
Local Search & Terminology
10
Initial solution
Move
Neighbor
NeighborhoodS0
Local Search & Terminology
10
Initial solution
Move
Neighbor
Neighborhood
Current solution
S0
S1
Executed Move
Local Search & Terminology
10
Initial solution
Move
Neighbor
Neighborhood
Current solution
S0
S1
S3
Executed Move
DATA STRUCTURES
11
Representing Routes
12
• Routes are the base of solving vehicle routing problems• It is critical to have efficient data structures to store them• There is no best data structure
• Performance depends on how it is used• Tradeoff between simplicity and performance
• Choice should be motivated by• Purpose: prototype v.s. state of the art algorithm• Usage: what are the most common operations?
Dynamic Array List
• Common operation complexity• Access to the customer by position: O(1) • Access to the position of customer by id: O(n)
• Iteration: O(1)• Insertion/deletion: O(n)
• See [ArrayListRoute.java]13
1 2 3 4 5
0 2 3 4 01 2 3 4 5 6
0 1 6 5 7 00
7 5
64
3
21
Doubly Linked List
• Common operation complexity• Access to the customer by position: O(n) • Access to the position of customer by id: O(n)
• Iteration: O(1)• Insertion/deletion: O(1)
• See[LinkedListRoute.java]14
0
0
2
1
3
6
4
6
0
7 00
7 5
64
3
21
• Common operation complexity• Access to the customer by position: O(n)• Access to the position of customer by id: O(1)• Iteration: O(1)• Insertion/deletion: O(1)
Doubly Linked List V2
15
0
7 5
64
3
21 1 2 3 4 5 6 7
0 0 2 3 6 1 5
1 2 3 4 5 6 7
6 3 4 0 7 5 0
Predecessor
SuccessorFirst
Last
• Common operation complexity• Access to the customer by position: O(n)• Access to the position of customer by id: O(1)• Iteration: O(1)• Insertion/deletion: O(1)
Implementation can be tricky, especially for repeated nodes (e.g., depot)
Warning: The implementation in VroomModeling is full of bugs “incomplete”
Doubly Linked List V2
15
0
7 5
64
3
21 1 2 3 4 5 6 7
0 0 2 3 6 1 5
1 2 3 4 5 6 7
6 3 4 0 7 5 0
Predecessor
SuccessorFirst
Last
16
/* HANDS ON */
17
Resources / Solutions:http://victorpillac.com/vrp2013
18
Naming Conventions:m: prefix for instance fields (e.g., mMyField)s: prefix for static fields (e.g., sMyStaticField)I: prefix for interface names (e.g., IMyInterface)Base: suffix for abstract types (e.g., MyTypeBase)
Logging:Uses Log4J, see VRPLogging.java
19
Opened files
Documentation, Console
Packages, source files
Class structure
20
21
22
22
Compilation errorClick for quick fix
22
Error explanation
Compilation errorClick for quick fix
22
Possible fixes
Error explanation
Compilation errorClick for quick fix
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
Start with a solution from CW and apply VND
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
Start with a solution from CW and apply VND
Parallel implementation
of GRASP
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
Start with a solution from CW and apply VND
Parallel implementation
of GRASP
Takes a set of routes and build
a solution
examples package util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
Start with a solution from CW and apply VND
Parallel implementation
of GRASP
Takes a set of routes and build
a solution
examples package
Each class contains a main method that we will use to run the examples
util package
VNS
algorithms package
23
IVRPOptimizationAlgorithm
GRASPVND ParallelGRASP HeuristicConcentrationCW
Clarke and Wright heuristic
to generate routes
Explore a number of
neighborhoods
Start with a solution from CW and apply VND
Parallel implementation
of GRASP
Takes a set of routes and build
a solution
examples package
Each class contains a main method that we will use to run the examples
util package
Classes to make our life easier
VNS
24
[ExampleRoutesAtomic.java]
• Compares ArrayListRoute and LinkedListRoute
• Append a node• Get a node at a random position• Remove the first node
25
[ExampleRoutesAtomic.java]
• Compares ArrayListRoute and LinkedListRoute
• Append a node• Get a node at a random position• Remove the first node
25
ArrayListAppend:123.1ms GetNodeAt:18.7ms RemoveFirst:134.2ms
LinkedListAppend:129.4ms GetNodeAt:66.6ms RemoveFirst:110.6ms
[CW.java]
• Clarke and Wright constructive heuristic
26
0
4
3
21
213
4
0
4
3
21
213
4
0
4
3
21
213
4
Initialization: create one route
per node
Each step: Merge the two routes to
generate the greatest saving
Repeat until there are no more
feasible merging
[CW.java]
• Clarke and Wright constructive heuristic
26
0
4
3
21
213
4
0
4
3
21
213
4
0
4
3
21
213
4
Initialization: create one route
per node
Each step: Merge the two routes to
generate the greatest saving
Repeat until there are no more
feasible merging
Implemented in VroomHeuristics in package vroom.common.heuristics.cw
[ExampleCW.java]
27
[ExampleCW.java]
27
true
[ExampleCW.java]
27
true
LEVEL_WARN
Variable Neighborhood Descent
• Explore different neighborhoods sequentially
• The final solution is a local optima for all neighborhoods
28
0
4
32
110
4
32
11
N1 N2 0
4
32
11
Variable Neighborhood Descent
• Explore different neighborhoods sequentially
• The final solution is a local optima for all neighborhoods
28
0
4
32
110
4
32
11
N1 N2 0
4
32
11
N1startImprovement
found?
Variable Neighborhood Descent
• Explore different neighborhoods sequentially
• The final solution is a local optima for all neighborhoods
28
0
4
32
110
4
32
11
N1 N2 0
4
32
11
N1startImprovement
found?
yes
Variable Neighborhood Descent
• Explore different neighborhoods sequentially
• The final solution is a local optima for all neighborhoods
28
0
4
32
110
4
32
11
N1 N2 0
4
32
11
N1 N2startImprovement
found?
yesno
Variable Neighborhood Descent
• Explore different neighborhoods sequentially
• The final solution is a local optima for all neighborhoods
28
0
4
32
110
4
32
11
N1 N2 0
4
32
11
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno no
[VND.java]
29
[VND.java]
29
Ignore for now
[VND.java]
29
The constraints are defined separately from the neighborhoods.
Each constraint is responsible for checking if a move is feasible
Ignore for now
[VND.java]
29
The constraints are defined separately from the neighborhoods.
Each constraint is responsible for checking if a move is feasible
Ignore for now
Instantiate the neighborhood that will be used later
[VND.java]
30
[VND.java]
30
Performs a local search in the neighborhood of the
current solution
[VND.java]
30
Performs a local search in the neighborhood of the
current solution
The parameters control how the search is performed, in this case
deterministic & best improvement
[ExampleVND.java]
• Run the main method
• Is the ordering of neighborhoods in VND.java logical?
• How to improve it?• Is the localSearch implementation in VND.java coherent
with the definition of VND?
31
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno no
[ExampleVND.java]
• Run the main method
• Is the ordering of neighborhoods in VND.java logical?
• How to improve it?• Is the localSearch implementation in VND.java coherent
with the definition of VND?
31
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno no
[ExampleVND.java]
• Run the main method
• Is the ordering of neighborhoods in VND.java logical?
• How to improve it?• Is the localSearch implementation in VND.java coherent
with the definition of VND?
31
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno no
[ExampleRoutesOptim.java]
• Compares ArrayListRoute and LinkedListRoute
• Constructive heuristic (CW)• Variable Neighborhood Descent optimization (VND)
32
[ExampleRoutesOptim.java]
• Compares ArrayListRoute and LinkedListRoute
• Constructive heuristic (CW)• Variable Neighborhood Descent optimization (VND)
32
ArrayListCW 617.1ms VND:67,576.7ms
LinkedListCW 414.4ms VND:86,443.3ms
Store Routes
33
• Store routes for future use• Requirements
• Memory-efficient• Avoid repeated routes• Store a minimalistic route representation
• Low computation overhead• Two approaches
• Exhaustive list• Issue: repeated routes
• Hash based set
Hash Functions• Compress the information stored in a route• Desired characteristics
• Determinism• Uniformity
• Issues• Two different routes can have the same hash (hash collision)• Computational cost of hash evaluation
34
35
/* HANDS ON */
• See Groer et al. 2010 - [GroerSolutionHasher.java]
• Produces a 32-bit integer that depend on the set and sequence of nodes in the route
010 XOR 111 = 101(2) (7) (5)
Sequence Dependent Hash
36
Input: -‐ rnd: An array of n random integers-‐ route: A routeOutput:-‐ A hash value for route
1.if route.first > route.last1.route ← reverse ordering of route
2.hash ← 03.For each edge (i,j) in route
1.hash ← hash XOR rnd[i+j % n]4.return hash
Sequence Independent Hash• See Pillac et al. 2012 - [NodeSetSolutionHasher.java]
• Produce a 32-bit integer that depends on the set of nodes visited by the route
• Advantage:• Implicit filtering of
duplicated routes
37
Input: -‐ rnd: An array of n random integers-‐ route: A routeOutput:-‐ A hash value for route
1.hash ← 02.For each node i in route
1.hash ← hash XOR rnd[i % n]3.return hash
Example• Greedy Randomized Adaptive Search Procedure
38
Randomized Constructive Heuristic
Start
Local Search
End
[GRASP.java]
• Clarke and Wright construction heuristic• Variable Neighborhood Descent optimization
39
[GRASP.java]
• Clarke and Wright construction heuristic• Variable Neighborhood Descent optimization
39
[GRASP.java]
• Clarke and Wright construction heuristic• Variable Neighborhood Descent optimization
39
[ExampleGRASP.java]
• Runs the GRASP procedure on a single instance
40
Heuristic Concentration
41
Randomized Constructive Heuristic
Start
Local Search
Heuristic Concentration
41
Randomized Constructive Heuristic
Start
Local Search Route pool
Heuristic Concentration
41
Randomized Constructive Heuristic
Start
Local Search
End
Route pool
Set Covering
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
• Set covering model:
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
Set of routes
• Set covering model:
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
Set of routesCost of route p
• Set covering model:
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
Set of routesCost of route p
1 if route p is selected
• Set covering model:
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
Set of routesCost of route p
1 if route p is selected
Set of nodes
• Set covering model:
Heuristic Concentration
42
minX
p2⌦
cpxp
s.t.X
p2⌦
a
ipxp � 1 8i 2 N
xp 2 {0, 1} 8p 2 ⌦
Set of routesCost of route p
1 if route p is selected
1 if route p visits node i
Set of nodes
• Set covering model:
• Adapt the GRASP procedure to collect routes• Add the following fragment where needed
• Hint: we want to collect as many routes as possible• Experiment with different route pools
• What is the impact on the number of routes and HC time?
[ExampleGRASPHC.java]
43
Heuristic Concentration
44
Randomized Constructive Heuristic
Start
Local Search
End
Route pool
Set Covering
ALGORITHMIC TRICKS
45
Bottlenecks In Heuristics For VRP
46
• Size of the neighborhood• Areas of the neighborhood are not interesting• Only minor changes are made to the solution at each move
• How different is the new neighborhood?• How to avoid restarting from scratch?
• Move evaluation• Cost & Feasibility• Performed millions of times• Which is most costly? Which should be done first?
Granular Neighborhoods
47
• Reduce the size of the neighborhoods• See Toth and Vigo (2003)• Costly (long) arcs are less likely to be in good solutions
• Filter out moves that involves only costly arcs• Costly arc threshold
54
3
2
1
0
Heuristic solution
Number of nodes+ Number of vehicles
# = � · z0
n+K0
Sparsification parameter (e.g., =2.5)
Granular Neighborhoods
47
• Reduce the size of the neighborhoods• See Toth and Vigo (2003)• Costly (long) arcs are less likely to be in good solutions
• Filter out moves that involves only costly arcs• Costly arc threshold
54
3
2
1
0
Inserting 5 between 3 and 4 involves 2 costly arcsHeuristic solution
Number of nodes+ Number of vehicles
# = � · z0
n+K0
Sparsification parameter (e.g., =2.5)
Granular Neighborhoods
47
• Reduce the size of the neighborhoods• See Toth and Vigo (2003)• Costly (long) arcs are less likely to be in good solutions
• Filter out moves that involves only costly arcs• Costly arc threshold
54
3
2
1
0
Inserting 5 between 3 and 4 involves 2 costly arcs
Inserting 5 between 1 and 2 involves 1 costly arcs
Heuristic solution
Number of nodes+ Number of vehicles
# = � · z0
n+K0
Sparsification parameter (e.g., =2.5)
Static Move Descriptor (SMD)• Store information between moves• See Zachariadis and Kiranoudis (2010)• Precompute and maintain all moves• Example with relocate (relocation of a single node)
48
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... 0 0 ...
0 ... ... ... ...
0 ... ... ... ...
... ... 0 ... ...
... ... ... ... 0
... 0 ... ... ...
x
x
Static Move Descriptor (SMD)• Store information between moves• See Zachariadis and Kiranoudis (2010)• Precompute and maintain all moves• Example with relocate (relocation of a single node)
48
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... 0 0 ...
0 ... ... ... ...
0 ... ... ... ...
... ... 0 ... ...
... ... ... ... 0
... 0 ... ... ...
x
Cost of relocating 4 after 3:c3,4+c0,5-c5,4-c3,0
x
Static Move Descriptor (SMD)• Store information between moves• See Zachariadis and Kiranoudis (2010)• Precompute and maintain all moves• Example with relocate (relocation of a single node)
48
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... 0 0 ...
0 ... ... ... ...
0 ... ... ... ...
... ... 0 ... ...
... ... ... ... 0
... 0 ... ... ...
x
Cost of relocating 4 after 3:c3,4+c0,5-c5,4-c3,0
Cost of relocating 1 after 5:c0,5+c1,4-c0,1-c5,4x
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
SMD Update• One static SMD table is created per neighborhood• Static update rules are predefined to know which SMDs need
to be updated after a move was executed
49
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
SMD Update• One static SMD table is created per neighborhood• Static update rules are predefined to know which SMDs need
to be updated after a move was executed
49
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
SMD Update• One static SMD table is created per neighborhood• Static update rules are predefined to know which SMDs need
to be updated after a move was executed
49
54
3
2
1
0
n2=0 n2=1 n2=2 n2=3 n2=4 n2=5
n1=0
n1=1
n1=2
n1=3
n1=4
n1=5
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
... ... ... ... ...
SMD Update• One static SMD table is created per neighborhood• Static update rules are predefined to know which SMDs need
to be updated after a move was executed
49
54
3
2
1
0
Selecting The Best Neighbor• All SMDs are store in a Fibonacci Heap
• O(1) access to the lowest cost SMD• O(1) insertion• O(n.log(n)) deletion
• How to find the best feasible neighbor?• Pop the lowest cost SMD
until a feasible move is found
50
Source: http://en.wikipedia.org/wiki/Fibonacci_heap
SMD In Practice
51
ARTICLE IN PRESS
status: this can be achieved by augmenting the cost tag ofinfeasible SMD instances by a very large penalization term, so thatthe feasible SMD instances are firstly extracted from the heaps.For both the aforementioned cases however, the SMD cost updaterules for the application of a move must be appropriatelydesigned to reflect the changes that this specific move has causedin terms of the dynamic feasibility status of the SMD instances.
3.6. The acceleration role of the static move descriptors
To present the acceleration role of the proposed SMD repre-sentation, the following experiment was performed: we executedthe local search method described in Section 3.4 using the new SMDrepresentation of tentative moves, and with the classic moverepresentation. The local search operator employed was thequadratic 1-0 exchange. The termination condition used was thecompletion of 50,000 iterations. To avoid being trapped in localoptima, deteriorating moves were allowed, and move reversalswere eliminated using the tabu strategy. In terms of the cost of thefinal solutions, both methods produced identical results, as theexact same search rules were applied. The two compared methodsdid only differ on the representation used for evaluating solutionneighborhoods, and therefore on the total CPU effort required.
The comparative results obtained by the aforementionedexperimental procedure are summarized in Fig. 8. As seen fromFig. 8, the computational times required by both representationsare comparable for problem scales of up to 350–400 customers.This is because the acceleration effect of the SMD concept iscounterbalanced by the extra computations performed internallyto the Fibonacci Heap structure. However, as the problem scalebecomes larger, the CPU time required by the classicrepresentation exhibits quadratic growth. On the contrary, theCPU time required by the search process with the use of the SMDrepresentation presents a linearithmic growth rate, as moreclearly illustrated in Fig. 9. Note that for the problem instanceof 1200 customers, the complexity reduction achieved by the useof the SMD representation reduces the total time of the searchprocess by a remarkable 87.96% (classic representation: 2485.89 s,SMD representation: 299.36 s). Comparative algorithmicexecutions were also conducted with the use of the 1-1exchange and 2-opt operators. For the 1-1 exchangeneighborhood structure, similar conclusions were yielded, interms of the acceleration role of the SMD representation. Forthe 2-opt operator, the speed-up effect of the SMD move mappingwas also evident, although slightly reduced, due to the fact that
the number of necessary SMD instance cost updates is larger thanthat required by the 1-0 and 1-1 exchange move types, aspresented in Table 1.
4. A VRP Tabu search based on the SMD concept
In this section, we propose a VRP metaheuristic which exploitsthe SMD representation of solution neighborhoods analyticallydescribed in Section 3. Let PSMDA (Penalized Static Move
SMD
inst
ance
s ex
tract
ed u
ntil
the
first
adm
issi
ble
SMD
was
obt
aine
d
CVR
PD
CVR
P1-0 Exchange 1-1 Exchange 2-opt
0
200
400
600
048
121620
01020304050
02468
101214
0
500
1000
1500
2000
0200400600800
1000
n: Problem Size
200 300 400 500 200 300 400 500 200 300 400 500
150 450 750 1050 150 450 750 1050 150 450 750 1050
Fig. 7. Number of SMD instances extracted until the first admissible is obtained against problem size.
0
500
1000
1500
2000
2500
n: Problem Size
CPU
Tim
e fo
r 500
00 it
erat
ions
(sec
)
ClassicRepresentation
SMDreprentation
200 400 600 800 1000 1200
Fig. 8. The acceleration role of the SMD representation.
0
50
100
150
200
250
300
350
n log(n) n: Problem Size
CPU
tim
e fo
r 500
00 it
erat
ions
(sec
)
0 500 1000 1500 2000 2500 30003 500 4000
Fig. 9. The linearithmic behavior of the search process using the SMD concept.
E.E. Zachariadis, C.T. Kiranoudis / Computers & Operations Research 37 (2010) 2089–21052098
Source: Zachariadis and Kiranoudis (2010)
Comparison of computational times
Sequential Search• Explore neighborhoods in a smart way• See Irnich et al. (2006)• Decompose moves in partial moves
• Example with swap
52
54
321
5
Sequential Search• Explore neighborhoods in a smart way• See Irnich et al. (2006)• Decompose moves in partial moves
• Example with swap
52
54
321
5
Sequential Search• Explore neighborhoods in a smart way• See Irnich et al. (2006)• Decompose moves in partial moves
• Example with swap
52
54
321
5
5
321
Sequential Search• Explore neighborhoods in a smart way• See Irnich et al. (2006)• Decompose moves in partial moves
• Example with swap
52
54
321
5
5
321
54
2
5
Sequential Search In Practice• Neighborhoods are explored by considering partial moves• Exploration is pruned using bounds on the partial move cost
53
S. Irnich et al. / Computers & Operations Research 33 (2006) 2405–2429 2423
Or-Opt
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
String-Exchange
0100200300400500600700800900
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
Special 2-Opt*
0
20
40
60
80
100
120
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
Swap
0
20
40
60
80
100
120
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
Relocation
0
10
20
30
40
50
60
70
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
2-Opt
0
5
10
15
20
25
30
35
0 500 1000 1500 2000 2500
Acc
eler
atio
nFa
ctor
f =100
f =75
f =50
f =25
f =100
f =75
f =50
f =25
f =100
f =75
f =50
f =25
f =100
f =75
f =50
f =25
f =100
f =75f =50
f =25
f =100
f =75
f =50
f =25
Size n
Size n
Size n
Size n
Size n
Size n
Fig. 7. Acceleration factor comparing the running times of lexicographic search and sequential search algorithms.
point (=factor) in the diagram corresponds to a fixed pair of size n and average number f of customersin a tour. It is computed considering several thousands of runs (10 series, 4 starting solutions, from 35 upto 500 iterations). Note that we have chosen the maximum string length k = 3 for or-opt and relocationneighborhoods.
All six diagrams show that there is a substantial speedup when the classical lexicographical searchapproach is replaced by a sequential search procedure. The most remarkable insight is that for all neigh-borhoods considered, the acceleration factor mainly depends on the average number f of customers in aroute. The smaller f is, the more constrained is the problem instance and the smaller is the accelerationfactor. One may interpret the results in the following way. In more constrained problems (i.e., with smallervalues f ) there are more improving moves which are not necessarily feasible. Since sequential searchprunes the search only based on cost (and not on feasibility) considerations, it is less effective in problemswith tighter constraints.
2424 S. Irnich et al. / Computers & Operations Research 33 (2006) 2405–2429
3-Opt*
02000
40006000
800010000
1200014000
16000
200 300 400 500A
ccel
erat
ion
Fact
or
3-Opt*
0
500
1000
1500
2000
200 300 400 500
Avg
Tim
epe
rSe
arch
[ms]
Size n Size n
f =100
f =75
f =50f =25
f =100
f =75
f =50
f =25
Fig. 8. Acceleration factor comparing lexicographic search and sequential search algorithms, average running times of a singlesequential search iteration for 3-opt* moves.
A second aspect to analyze is the correlation between the acceleration factor and the size of the in-stances. Considering the diagrams it can be recognized that there is a positive correlation between theacceleration factor and n for the swap and relocation neighborhoods. For special 2-opt*, 2-opt, and or-optneighborhoods, there is obviously no significant correlation. That means that there is no decrease in thespeedup. Large acceleration factors can be observed when comparing lexicographic and sequential searchalgorithms for the string-exchange neighborhood. The factor is in the range between 24 and more than750, while the maximum string length was chosen as k = 3. The string-exchange neighborhood is theonly neighborhood under consideration, for which the acceleration drastically decreases with the sizen of the instances. At the first glance it seems that for larger instances, the sequential search approachmight become slower than the lexicographical search approach. This is not the case. Additional com-putational tests for larger instances with up to 5000 customers have shown that the acceleration neverfalls below a certain threshold (never below factor 20 for f = 25). The decrease of the accelerationfactor with increasing n depends on the check in step 10 of Algorithm 8. In an earlier implementation,with corresponding results presented in Irnich et al. [39], we forgot to prune the search according tothe criterion B2 > 0. As a result, the acceleration factor was nearly constant for fixed f , but substan-tially smaller. It is an open research problem why the behavior differs for the types of neighborhoodswe considered.
Since the lexicographic search for improving 3-opt* neighbors is too time-consuming, we modifiedthe computational tests in the following way. The 3-opt* search procedures are only applied to cur-rent solutions which are local optima w.r.t. all other (quadratic) neighborhoods. If an improving 3-opt*neighbor is found, the search is again directed to determine a local optimum of the quadratic neighbor-hoods. Consequently, the number of computationally costly searches within the 3-opt* neighborhoodis small in comparison to the number of searches in the quadratic neighborhoods. However, due to thefast growing effort of lexicographic search in the 3-opt* neighborhood, we considered instances withn!500 only. Results for the 3-opt* neighborhood are depicted in Fig. 8. The acceleration factor isbetween 300 for f = 25 and more than 10.000 for f = 100. The time for a single sequential searchiteration for the 3-opt* neighborhood (see also Section 5.3) ranges from a few milliseconds to about 2 sfor large-scale instances with 500 customers and f =100. Both, the acceleration factor and the time for asingle iteration, explain why lexicographic search is computationally intractable for even larger probleminstances.
f: average number of customers in a route
Speedup for swap and 3-Opt* neighborhoods
Source: Irnich et al. (2006)
Store Cumulative Information• Reduce the complexity of move evaluation• Store and maintain useful information• For example: waiting time / forward slack time
• See Savelsbergh (1992)• Constant time time window feasibility check• More details in Module 2
54
PARALLELIZATION
55
Moore’s Law
56Source: http://en.wikipedia.org/wiki/Moore's_law
Moore’s Law
56Source: http://en.wikipedia.org/wiki/Moore's_law
Doubles every 2 years
Moore’s Law
56Source: http://en.wikipedia.org/wiki/Moore's_law
Doubles every 2 years
Doubles every 3 years
Clock Frequency
57
Source: http://cpudb.stanford.edu/visualize/clock_frequency
Promises Of Parallelization
58
• Overcome the stalling of CPU performance increase• Increased availability of parallel computing
• Personal computers with multiples CPUs/cores• Most universities have access to large grids• On demand cloud services (e.g., Amazon)
Promises Of Parallelization
58
• Overcome the stalling of CPU performance increase• Increased availability of parallel computing
• Personal computers with multiples CPUs/cores• Most universities have access to large grids• On demand cloud services (e.g., Amazon)
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusion
Architecture Overview
59
CPU
CoreThreads
Architecture Overview
59
CPU
Core
L1 c
ache
L1 c
ache
L1 c
ache
L1 c
ache
Threads
Architecture Overview
59
CPU
L2 c
ache
(~ M
b) Core
L1 c
ache
L1 c
ache
L1 c
ache
L1 c
ache
Very Fast
Threads
Architecture Overview
59
CPU
L2 c
ache
(~ M
b)
RAM(~ Gb)
Core
L1 c
ache
L1 c
ache
L1 c
ache
L1 c
ache
Fast
Very Fast
Threads
Architecture Overview
59
CPU
L2 c
ache
(~ M
b)
RAM(~ Gb)
HDD(~ Tb)
Core
L1 c
ache
L1 c
ache
L1 c
ache
L1 c
ache
Fast
Slow
Very Fast
Threads
Extremely Slow
Architecture Overview
59
CPU
L2 c
ache
(~ M
b)
RAM(~ Gb)
HDD(~ Tb)
Core
L1 c
ache
L1 c
ache
L1 c
ache
L1 c
ache
Fast
Slow
Very Fast
Threads
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
thread1
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
thread1
Takes time
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
Limited control on the actual execution sequence
thread1
Takes time
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
Limited control on the actual execution sequence
thread1
Increased memory usage
Takes time
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
Takes time
Limited control on the actual execution sequence
thread1
Increased memory usage
Takes time
CPU
Concepts And Limitations
60
My ProgramexecuteMySequentialCode()thread1 = new Thread(do A, do B)thread1.run()thread2 = new Thread(do C)thread2.run()
Operating Systemthread2
Create a new threadAssign it to cpu core #4Execute the instructions A,B
Create a new threadAssign it to cpu core #1Execute the instructions C
Takes time
Concurrent access to shared resources
Limited control on the actual execution sequence
thread1
Increased memory usage
Takes time
Sharing Is Caring
61
Thread 1 Thread 2object
x = 11234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓ z = object.getX()y = object.getX()1234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓ z = object.getX()
z = z + 2
y = object.getX()
y = y + 11234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓
x = ?
z = object.getX()
z = z + 2
object.setX(z)
y = object.getX()
y = y + 1
object.setX(y)
1234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓
x = ?x = 2
z = object.getX()
z = z + 2
y = object.getX()
y = y + 1
object.setX(y)
1234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓
x = ?
x = 3
x = 2
z = object.getX()
z = z + 2
object.setX(z)
y = object.getX()
y = y + 1
object.setX(y)
1234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓
x = ?
x = 3
x = 2
z = object.getX()
z = z + 2
object.setX(z)
y = object.getX()
y = y + 1
object.setX(y)
y = object.getX() ? ✗
1234
1234
Sharing Is Caring
61
Thread 1 Thread 2object
x = 1✓
x = ?
x = 3
x = 2
x = 1 + 1 + 2 ≠ 3✗
z = object.getX()
z = z + 2
object.setX(z)
y = object.getX()
y = y + 1
object.setX(y)
y = object.getX() ? ✗
1234
1234
Sharing With Care
6224
Thread 1 Thread 2object123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2objectlock(object)
(waiting)
lock(object)123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2object
x = 1
x = 2
y = object.getX()
y = y + 1
object.setX(y)
lock(object)
(waiting)
lock(object)123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2object
x = 1
x = 2
y = object.getX()
y = y + 1
object.setX(y)
release(object)
lock(object)
(waiting)
lock(object)123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2object
x = 1
x = 2
y = object.getX()
y = y + 1
object.setX(y)
release(object)
lock(object)
(waiting)
lock(object)
getlock(object)
(waiting)
123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2object
x = 1
x = 2
x = 2 z = object.getX()
z = z + 2
object.setX(z)
release(object)
y = object.getX()
y = y + 1
object.setX(y)
release(object)
lock(object)
(waiting)
lock(object)
getlock(object)
(waiting)x = 4
123456789
10
12345678910
Sharing With Care
6224
Thread 1 Thread 2object
x = 1
x = 2
x = 2
x = 1 + 1 + 2 = 4✓
z = object.getX()
z = z + 2
object.setX(z)
release(object)
y = object.getX()
y = y + 1
object.setX(y)
release(object)
lock(object)
(waiting)
lock(object)
getlock(object)
(waiting)
y = object.getX()
x = 4
x = 4
123456789
10
12345678910
The Limits Of Sharing
63
Thread 1 Thread 2object
lock(object)
(waiting)
release(object)
getlock(object)
(waiting)
release(object)
lock(object)
• Lock/release mechanisms force threads to wait
• In the worst case the execution is sequential
• In general
• Lock an object while it may be modified
• Do no lock for read-only operations
• Check for inconsistencies at runtime
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusion
Parallelization In Practice
64
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)x (1- e- time spent debugging )
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)x (1- e- time spent debugging )x (1- e- number of headaches )
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)x (1- e- time spent debugging )x (1- e- number of headaches )
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)x (1- e- time spent debugging )x (1- e- number of headaches )
Not so random
Parallel CPU time =Sequential CPU time
Number of CPUs
The parallelization illusionParallel CPU time = Sequential CPU timeNumber of CPUs
Converges to 1
Parallelization In Practice
64
x Random(1, +∞)x (1- e- time spent programming)x (1- e- time spent debugging )x (1- e- number of headaches )
Not so random
Amdahl’s Law
65
S =1
1�↵P + ↵
Given:- A fraction α of the code can be parallelized- P processors
The speedup is bounded by:
Amdahl’s Law
65
S =1
1�↵P + ↵
Given:- A fraction α of the code can be parallelized- P processors
The speedup is bounded by:
Source: http://en.wikipedia.org/wiki/Parallel_computing
Illusion
(P )
(↵)
(S)
Amdahl’s Law
65
S =1
1�↵P + ↵
Given:- A fraction α of the code can be parallelized- P processors
The speedup is bounded by:
“When a task cannot be partitioned because of sequential constraints, the
application of more effort has no effect on the schedule. The bearing of a child
takes nine months, no matter how many women are assigned.”
Fred Brooks
Two Approaches• Run a sequential algorithm in different threads
• E.g., different experiments, or runs of a same algorithm• No synchronization issues• Limited shared resources issues
• Design a parallel algorithm• Potentially a real speedup of the algorithm• Increase complexity and harder to debug
66
Learnt From Experience• Limit number of shared resources
• Avoid risk of concurrent modifications• Use bullet proof synchronization / locks / error checks
• Limit complex debugging• Limit communication between threads
• Reduce waiting for other threads to exchange information• Execute a significant number of operations in each thread
• Execution time ≫ thread creation overhead
67
68
/* HANDS ON */
Thread 1 Thread 2 Thread 3
A Simple Example
• Parallel Greedy Randomized Adaptive Search Procedure
69
Randomized Constructive Heuristic
Start
Local Search
End
Randomized Constructive Heuristic
Local Search
Randomized Constructive Heuristic
Local Search
[ParallelGRASP.java]
70
[ParallelGRASP.java]
70
Ask the system how many processors are available
[ParallelGRASP.java]
70
Ask the system how many processors are available
Create one GRASP instance per iteration
[ParallelGRASP.java]
70
Ask the system how many processors are available
Create one GRASP instance per iteration
The executor will be responsible for the creation of threads
[ParallelGRASP.java]
71
[ParallelGRASP.java]
71
The executor will creates threads as needed, execute the GRASP subprocesses, and return
the results
[ParallelGRASP.java]
71
The executor will creates threads as needed, execute the GRASP subprocesses, and return
the resultsLoop through pairs
<GRASP subprocess, Best solution>
[ExampleParallelGRASP.java]
• Run the for different instances and compare with the sequential version• What is the speedup?• Are the solutions identical?
• Going further ....• Why do we create GRASP instances with a single iteration?• What are the synchronization issues?
72
LS LS
Variable Neighborhood Search
• Similar to the VND• Random exploration of each neighborhood• Local search
73
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno noLS
LS LS
Variable Neighborhood Search
• Similar to the VND• Random exploration of each neighborhood• Local search
73
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno noLS
See [VNS.java]
LS LS
Variable Neighborhood Search
• Similar to the VND• Random exploration of each neighborhood• Local search
73
N1 N2start Nn endImprovement
found?Improvement
found?Improvement
found?
yesyesyesno noLS
See [VNS.java]
NeighborhoodExplorer
[VNS.java]
74
Define 4 string-exchange neighborhoods of increasing size
Define an explorer for each neighborhood: include one
neighborhoo and a local search
LSNi
LSN2
LS
LS
Parallel Variable Neighborhood Search
• Explore all neighborhoods in parallel• Select best neighbor
75
N1
N2start
Nn
endImprovement
found?
yes
no
LS
LS
LS
Parallel Variable Neighborhood Search
• Explore all neighborhoods in parallel• Select best neighbor
75
N1
N2start
Nn
endImprovement
found?
yes
no
LS
See [ParallelVNS.java]
LS
LS
Parallel Variable Neighborhood Search
• Explore all neighborhoods in parallel• Select best neighbor
75
N1
N2start
Nn
endImprovement
found?
yes
no
LS
See [ParallelVNS.java]
NeighborhoodExplorer
[ParallelVNS.java]
• In the provided version, neighborhoods are explored in the same thread
• Exercise: explore each neighborhood in a separate thread• Hints:
• Use mExecutor
• See ParallelGRASP.java for reference
• Compare the speed-up for small and large instances
76
Parallel Algorithms Classification• Classification according to three dimensions (Crainic 2008)
• Search control cardinality• 1-control / p-control
• Search control and communications• Rigid / Knowledge synchronization• Collegial / Knowledge Collegial
• Search differentiations• Same initial point / Multiple initial point• Same search strategy / Different search strategy
• In which category fall the ParallelGRASP and ParallelVNS?77
Synchronous 1-Control
78
Thread 1
Thread 2
Thread 3
Thread 4
Control
Control
Thread 1
Thread 2
Thread 3
Thread 4
Control
Synchronous 1-Control
78
Thread 1
Thread 2
Thread 3
Thread 4
Control
Control
The control starts new threads to run part of the optimization in parallel
Thread 1
Thread 2
Thread 3
Thread 4
Control
Do your assignments
Synchronous 1-Control
78
Thread 1
Thread 2
Thread 3
Thread 4
Control
Control
The control starts new threads to run part of the optimization in parallel
Once all threads are finished, the control gathers the information and
proceed with the optimizationThread 1
Thread 2
Thread 3
Thread 4
Control
Do your assignments
Show me your results
Synchronous P-Control
79
Thread 1
Thread 2
Thread 3
Thread 4
Control Control
Control Control
Control Control Control Control
Main Control
Synchronous P-Control
79
Thread 1
Thread 2
Thread 3
Thread 4
Control Control
Control Control
Control Control Control Control
At fixed points of the optimization, some threads synchronize and
exchange information
Main Control
Synchronous P-Control
79
Thread 1
Thread 2
Thread 3
Thread 4
Control Control
Control Control
Control Control Control Control
At fixed points of the optimization, some threads synchronize and
exchange information
Main Control
I found a new local optima!
I found a new best solution!
Thread 1
Thread 2
Thread 3
Thread 4
Main Control
Asynchronous P-Control
80
Control
Control
Control
Control
Control
Control
Control
Control
Shared information
Thread 1
Thread 2
Thread 3
Thread 4
Main Control
Asynchronous P-Control
80
Control
Control
Control
Control
Control
Control
Control
Control
Shared information
At arbitrary points of the optimization, each thread exchange
information with a centralized component
Thread 1
Thread 2
Thread 3
Thread 4
Main Control
Asynchronous P-Control
80
Control
Control
Control
Control
Control
Control
Control
Control
Shared information
At arbitrary points of the optimization, each thread exchange
information with a centralized component
I found a new best solution!
I’m stuck, give me the best solution
found so far
I found a new local optima!
NOTES ON SOFTWARE DEVELOPMENT
81
Software Development For Research
82
Specifications
Design
Implementation
Test
Prototype
Final product
Software Development For Research
82
Specifications
Design
Implementation
Test
Prototype
Final productWhat do I need now?
What will I need in the future?What may I need in the future
Software Development For Research
82
Specifications
Design
Implementation
Test
Prototype
Final productWhat do I need now?
What will I need in the future?What may I need in the future
How to implement what I need, will need and may need?
How to ensure I will be able to reuse/extend my code?
Software Development For Research
82
Specifications
Design
Implementation
Test
Prototype
Final productWhat do I need now?
What will I need in the future?What may I need in the future
How to implement what I need, will need and may need?
How to ensure I will be able to reuse/extend my code?
How to do what I need now?
Software Development For Research
82
Specifications
Design
Implementation
Test
Prototype
Final productWhat do I need now?
What will I need in the future?What may I need in the future
How to implement what I need, will need and may need?
How to ensure I will be able to reuse/extend my code?
How to do what I need now?
Is everything working as expected?Is something that worked before
now broken?
A Typical Design Problem
83
• Current need: two-opt local search for the VRP• Data model
• How to represent an instance, customer, solution, route?• Optimization algorithm
• How to represent • A local search?• A neighborhood?• A move?
• How to check the feasibility of a move?
A First Design
84
Instanceint[] customersdouble[] demandsdouble[][] distancesint fleetSizedouble vehicleCapacity
Solutionint[]<> routesdouble[] loads
TwoOpttwoOpt(Instance,Solution){ for each move: check if feasible evaluate return best move}
A First Design
84
Instanceint[] customersdouble[] demandsdouble[][] distancesint fleetSizedouble vehicleCapacity
Solutionint[]<> routesdouble[] loads
TwoOpttwoOpt(Instance,Solution){ for each move: check if feasible evaluate return best move}
What if I now want to solve the VRPTW?
A First Design
84
Instanceint[] customersdouble[] demandsdouble[][] distancesint fleetSizedouble vehicleCapacity
Solutionint[]<> routesdouble[] loads
TwoOpttwoOpt(Instance,Solution){ for each move: check if feasible evaluate return best move}
What if I now want to solve the VRPTW?
What if I want anOr-Opt local search?
Some Design Tips• Identify what is reusable
• For instance, logic common to all neighborhoods• Separate clearly responsibilities
• An instance stores the data• A solution stores a solution• An objective function evaluates a solution and moves • A constraint evaluates the feasibility of a solution or move
• Keep in mind possible extensions• What other problems may I have to solve?
• Warning: avoid over-designing85
Flexible And Extensible Designs
86
NeighborhoodConstraint<> constraintslocalSearch(instance,solution,objective){ for each move in listAllMoves(instance,solution): for each constraint in constraints: constraint.check(move) objective.evaluate(instance,solution,move) return best feasible move}abstract listAllMoves(instance,solution)
TwoOptlistAllMoves(instance,solution){ ...}
Flexible And Extensible Designs
86
NeighborhoodConstraint<> constraintslocalSearch(instance,solution,objective){ for each move in listAllMoves(instance,solution): for each constraint in constraints: constraint.check(move) objective.evaluate(instance,solution,move) return best feasible move}abstract listAllMoves(instance,solution)
TwoOptlistAllMoves(instance,solution){ ...}
OrOptlistAllMoves(instance,solution){ ...}
Flexible And Extensible Designs
87
Constraintabstract check(move)
Capacitycheck(move){ ...}
Flexible And Extensible Designs
87
Constraintabstract check(move)
Capacitycheck(move){ ...}
TimeWindowcheck(move){ ...}
MaxDurationcheck(move){ ...}
Designing Tools
88
• Create UML diagrams to model the organization of the code• Generate code from a model
• Once the design is stable• Generate all the code skeleton in one click
• Generate a model from code (hazardous)• Examples
• Visual Paradigm (free community edition)• Enterprise Architect ($$$)• Check with your university / computer science department
Implementation• Document your code and use coherent conventions
• Explain what are the inputs, outputs, main steps• Saves a lot of time when you have to come back to it
• Make your code reusable and extensible• Use the benefits of object-oriented programming• Spend time now, save time tomorrow
• Build on top of existing libraries• Avoid reinventing the wheel
89
Testing• Create simple test cases that check key functionalities
• Unit test cases• E.g., check that the methods to manipulate a solutions are
working• More elaborate test cases
• E.g., solution found by a 2-Opt neighborhood• Profile your code to detect bottlenecks and memory leaks
90
Final product
Development Process
91
Define problem
Relaxation(simplify the problem)
Select approach
Design & Implement
Test & Debug
Benchmark & Profile
Restore relaxationAdjust parameters
Publish paper!
Prototype
LIBRARIES & FRAMEWORKS
92
Vehicle Routing
93
• VROOM (Java) - http://victorpillac.com/vroom/• VROOM-‐Modelling
• Library to manipulate VRP instances• VROOM-‐Heuristics
• Library of common (meta)heuristics• CW, [Adaptive] VNS, [Parallel] [Adaptive] LNS, GRASPx[ILS,ELS]
• VROOM-‐Technicians
• Improved implementations for the TRSP• VROOM-‐jMSA
• Event driven multiple scenario approach for dynamic vehicle routing
Vehicle Routing• VRPH (C++) - https://sites.google.com/site/vrphlibrary/
• Library of heuristics for the VRP• CW, VNS
• Symphony-VRP - https://projects.coin-or.org/SYMPHONY• Exact solver based on the Symphony and Concorde solver
• CVRPSEP - Lysgaard (2004)• Valid inequality generation
• Concorde - http://www.tsp.gatech.edu/concorde.html• Exact solver for the TSP
94
Parallelization• Java
• Since Java 5: java.util.concurrent framework
• Since Java 7: Fork/Join framework
• C++• POSIX Threads (http://computing.llnl.gov/tutorials/pthreads)
• OpenMP (http://computing.llnl.gov/tutorials/openMP)
• Boost.Thread (http://www.boost.org/)
• Python• Parallel Python (http://www.parallelpython.com/)
95
Logging• Java
• Log4J
• java.util.logging package
• C++• Boost Log / Logging
• Log4cpp
• Python• logging module
96
VRPRep Instance Repository
97
• VRPREP Website: http://rhodes.ima.uco.fr/vrprep/web/home• XML schema to describe most vehicle routing problems
• Easy to read for your program• XML data binding: creates the objects for you
• Repository of exiting instances• Possibility to define your own problem
• Tool to generate a sample XML file• Upload your instances
WRAP UP
98
Today We Have Seen ...
99
• Introduction• What is complexity? Why is it important?
• Data structures• How to represent a solution efficiently?
• Algorithmic tricks• What are the main bottlenecks and how to avoid them?
• Parallelization• How do parallel computing work? Why, when, and how to
parallelize?• Software engineering
• How to design flexible and reusable code?• Resources
• How to avoid reinventing the wheel?
Take Away1. Developing efficient optimization algorithms requires careful
software engineering✓ Complexity of the problems at hand✓ Efficient data structures, Algorithmic tricks, Parallelization
2. Invest in developing flexible and extensible code✓ Detailed design, Documentation✓ Will save you time later
3. Use existing libraries and share your code✓ Do not reinvent the wheel✓ Help others (good for your resumé too)
100
Discrete Optimization
101
• Pascal Van Hentenryck - http://www.coursera.org/course/optimization• Online community of thousands of students
• Topics
• Dynamic programming• Constraint programming
• Local search• Linear programming
• Join the challenge to solve TSPs and VRPs!
2nd International Optimisation Summer School
• 12th to 17th January 2014, Kioloa, NSW, Australia• http://www.cse.unsw.edu.au/~tw/school/ • Lectures
• Constraint programming, Integer programming, Column generation
• Modelling• Uncertainty• Vehicle routing, Scheduling, Supply networks• Research skills.
102
NICTA IS DEDICATED TO RESEARCH EXCELLENCE IN ICT AND WEALTH
CREATION FOR AUSTRALIANICTA IS AUSTRALIA'S PRE-EMINENT NATIONAL
ICT RESEARCH CENTRE OF EXCELLENCE
AUSTRALIA’S ICT PHD FACTORY
CONNECTING SMALL BUSINESS AND COMMERCIALISINGTECHNOLOGY
* NICTA ISCREATING NEW BUSINESSES
* NICTA ISTRANSFORMING INDUSTRY
* NICTA ISBUILDING SKILLS AND CAPACITYFOR THE DIGITAL ECONOMY
NICTA TECHNOLOGY IS IN 1.5 BILLION MOBILE PHONES AROUND THE WORLD
17 PARTNERUNIVERSITIES
700OF THE BEST ICTSCIENTISTS AND STUDENTS.
BRISBANE
SYDNEY
CANBERRA
MELBOURNE
Crash-proof code: “One of the world’s top 10 technologies”: MIT.
Making Amazon's business more secure in the cloud.
A global leader in digital audio networking, used at the London 2012 Olympics and the Queen's Jubilee Concert.
Revolutionising pain management through the use of implants in the spinal cord.
25% OFICT PHD STUDENTSIN AUSTRALIA
340 graduates,260 enrolled students, that’s:
NICTA is working with schools to promote opportunities in ICT.
Fleet logistics helping to save 15% of transport costs for Tip Top.
Big Data analytics for the Australian finance sector.
AUDINATE
OPEN KERNEL LABS
SALUDA MEDICAL
Optimising freight pick-ups and deliveries across Australia.
OPTURION
Reduce roadside maintenance costs by $60m using computer vision for Sensis.
Providing cutting edge services and applications for the NBN.
INCREASING PRODUCTIVITY AND PROFIT.
Helping diagnose and treat prostate cancer for Peter Mac Cancer Centre, and helping to build the bionic eye.
Saved water costs by 30% for the Australian dairy industry.
YURUWAREwho have gone onto work for:
11 spin-outs, 5 more in the pipeline... creating highly skilled jobs for Australians.
FOR MORE INFORMATION, CONTACT US ON
including UNSW, UoM, USYD, MOnash, and ANU.
* INFRASTRUCTURE
* FINANCE
* AGRICULTURE
* TRANSPORT
* MEDICINE
References
104
• Crainic, T., Parallel Solution Methods for Vehicle Routing Problems, The Vehicle Routing Problem: Latest Advances and New Challenges, Operations Research/Computer Science Interfaces Volume 43, 2008, pp 171-198
• Groer, C.; Golden, B. & Wasil, E., A library of local search heuristics for the vehicle routing problem, Mathematical Programming Computation, Springer Berlin / Heidelberg, 2010, 2, 79-101
• Irnich, S., Funke, B., & Grünert, T. (2006). Sequential search and its application to vehicle-routing problems. Computers & Operations Research, 33(8), 2405-2429.
• Lysgaard, J.,(2004).CVRPSP: A package of separation routines for the capacitated vehicle routing problem, Working Paper 03–04.
• Pillac, V.; Guéret, C. & Medaglia, A. L. (2012). A parallel matheuristic for the Technician Routing and Scheduling Problem, Optimization Letters, doi:10.1007/s11590-012-0567-4
• Savelsbergh, M. (1992). The vehicle routing problem with time windows: minimizing route duration. INFORMS, 4(2):146–154, doi:10.1287/ijoc.4.2.146.
• Toth, P. and Vigo, D. (2003). The Granular Tabu Search and Its Application to the Vehicle-Routing Problem, INFORMS Journal on Computing15, 333-346;
• Zachariadis, E. E., & Kiranoudis, C. T. (2010). A strategy for reducing the computational complexity of local search-based methods for the vehicle routing problem. Computers & Operations Research, 37(12), 2089-2105.