chapter 2 d irect m ethods for s tochastic s earch
DESCRIPTION
Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH. Organization of chapter in ISSO Introductory material Random search methods Attributes of random search Blind random search (algorithm A) - PowerPoint PPT PresentationTRANSCRIPT
CHAPTER 2CHAPTER 2 DDIRECT IRECT MMETHODS FOR ETHODS FOR
SSTOCHASTIC TOCHASTIC SSEARCHEARCH
•Organization of chapter in ISSO–Introductory material
–Random search methods•Attributes of random search•Blind random search (algorithm A)•Two localized random search methods (algorithms B and C)
–Random search with noisy measurements
–Nonlinear simplex (Nelder-Mead) algorithm•Noise-free and noisy measurements
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
2-2
Some Attributes of Direct Random Search Some Attributes of Direct Random Search with Noise-Free Loss Measurementswith Noise-Free Loss Measurements
• Ease of programming
• Use of only L values (vs. gradient values) – Avoid “artful contrivance” of more complex methods
• Reasonable computational efficiency
• Generality – Algorithms apply to virtually any function
• Theoretical foundation – Performance guarantees, sometimes in finite samples
– Global convergence in some cases
2-3
Algorithm A: Algorithm A: Simple Random (“Blind”) SearchSimple Random (“Blind”) Search
Step 0 (initialization) Step 0 (initialization) Choose an initial value of = inside of . Set k = 0.
Step 1 (candidate value)Step 1 (candidate value) Generate a new
independent value new(k+1) , according to the
chosen probability distribution. If L(new(k+1)) <
set = new(k+1). Else take = .
Step 2 (return or stop) Step 2 (return or stop) Stop if maximum number of L evaluations has been reached or user is otherwise satisfied with the current estimate for ; else, return to step 1 with the new k set to the former k+1.
0
( ),kL
1ˆ
k 1ˆ
k k
2-4
First Several Iterations of Algorithm A First Several Iterations of Algorithm A on Problem with Solution on Problem with Solution = [1.0, 1.0] = [1.0, 1.0]T T
(Example 2.1 in (Example 2.1 in ISSOISSO))
Iteration k new(k)T
L(new(k)) Tk ˆ( )kL
0 [2.00, 2.00] 8.00
1 [2.25, 1.62] 7.69 [2.25, 1.62] 7.69
2 [2.81, 2.58] 14.55 [2.25, 1.62] 7.69
3 [1.93, 1.19] 5.14 [1.93, 1.19] 5.14
4 [2.60, 1.92] 10.45 [1.93, 1.19] 5.14
5 [2.23, 2.58] 11.63 [1.93, 1.19] 5.14
6 [1.34, 1.76] 4.89 [1.34, 1.76] 4.89
2-5
(a) Continuous L(); probability density for new is > 0 on = [0, )
(b) Discrete L(); discrete sampling for new with
P(new = i ) > 0 for i = 0, 1, 2,...
(c) Noncontinuous L(); probability density for new
is > 0 on = [0, )
Functions for Convergence (Parts (a) Functions for Convergence (Parts (a) and(b)) and Nonconvergence (Part (c)) of and(b)) and Nonconvergence (Part (c)) of
Blind Random SearchBlind Random Search
2-6
Algorithm B: Algorithm B: Localized Random SearchLocalized Random Search
Step 0 (initialization) Step 0 (initialization) Choose an initial value of = inside of . Set k = 0.
Step 1 (candidate value)Step 1 (candidate value) Generate a random dk. Check if
. If not, generate new dk or move to nearest valid
point. Let new(k+1) be or the modified point.
Step 2 (check for improvement) Step 2 (check for improvement) If L(new(k+1)) < set
= new(k+1). Else take = .
Step 3 (return or stop) Step 3 (return or stop) Stop if maximum number of L evaluations has been reached or if user satisfied with current estimate; else, return to step 1 with new k set to former k+1.
0
( ),kL
1ˆ
k
k kd k kdk kd
1ˆ
k k
2-7
Algorithm C: Algorithm C: Enhanced Localized Random SearchEnhanced Localized Random Search
• Similar to algorithm B
• Exploits knowledge of good/bad directions
• If move in one direction produces decreasedecrease in loss, add bias to next iteration to continuecontinue algorithm moving in “good” direction
• If move in one direction produces increaseincrease in loss, add bias to next iteration to move algorithm in oppositeopposite way
• Slightly more complex implementation than algorithm B
2-8
Formal Convergence of Random Formal Convergence of Random Search AlgorithmsSearch Algorithms
• Well-known results on convergence of random search
– Applies to convergence of and/or L values
– Applies when noise-freenoise-free L measurements used in algorithms
• Algorithm A (blind random search) converges under very general conditions
– Applies to continuous or discrete functions
• Conditions for convergence of algorithms B and C somewhat more restrictive, but still quite general
– ISSO presents theorem for continuous functions
– Other convergence results exist
• Convergence raterate theory also exists: how fast to converge?
– Algorithm A generally slow in high-dimensional problems
2-9
Example Comparison of Algorithms Example Comparison of Algorithms A, B, and C A, B, and C
• Relatively simple p = 2 problem (Examples 2.3 and 2.4 in ISSO)
– Quartic loss function (plot on next slide)
• One global solution; several local minima/maxima
• Started all algorithms at common initial condition and compared based on common number of loss evaluations
– Algorithm A needed no tuning
– Algorithms B and C required “trial runs” to tune algorithm coefficients
2-10
Multimodal Quartic Loss Function for Multimodal Quartic Loss Function for pp = 2 Problem (Example 2.3 in = 2 Problem (Example 2.3 in ISSOISSO))
2-11
Example 2.3 in Example 2.3 in ISSO ISSO (cont’d): (cont’d): Sample Means of Terminal Values Sample Means of Terminal Values
– – LL(()) in Multimodal Loss Functionin Multimodal Loss Function
(with Approximate 95% Confidence Intervals)(with Approximate 95% Confidence Intervals)
Algorithm A
Algorithm B Algorithm C
2.51 [1.94, 3.08]
0.78 [ 0.51, 1.04]
0.49 [ 0.32, 0.67]
Notes:
Sample means from 40 independent runs.
Confidence intervals for algorithms B and C overlap slightly since 0.51 < 0.67
ˆ( )kL
2-12
Examples 2.3 and 2.4 in Examples 2.3 and 2.4 in ISSO ISSO (cont’d):(cont’d):Typical Adjusted Loss Values ( Typical Adjusted Loss Values ( – – LL(())) )
and and Estimates in Multimodal Estimates in Multimodal Loss Function (One Run)Loss Function (One Run)
Algorithm A
Algorithm B
Algorithm C
Adjusted L value 2.60
0.80
0.49
estimate for above L value
093
552
.
.
902
682
.
.
962
742
.
.
Note: = [2.904, 2.904]T
( )kL
2-13
Random Search Algorithms with Noisy Random Search Algorithms with Noisy Loss Function MeasurementsLoss Function Measurements
• Basic implementation of random search assumes perfect (noise-free) values of L
• Some applications require use of noisynoisy measurements: y() = L() + noise
• Simplest modification is to form average of y values at each iteration as approximation to L
• Alternative modification is to set threshold > 0 for improvement before new value is accepted in algorithm
• Thresholding in algorithm B with modified step 2:
Step 2 (modified) Step 2 (modified) If y(new(k+1)) < set =
new(k+1). Else take = .
• Very limited convergence theory with noisy measurements ˆ( ) ,ky
1ˆ
k 1ˆ
kˆ
k
2-14
Nonlinear Simplex (Nelder-Mead) AlgorithmNonlinear Simplex (Nelder-Mead) Algorithm
• Nonlinear simplex method is popular search method (e.g., fminsearch in MATLAB)
• Simplex is convex hull of p + 1 points in p
– Convex hull is smallest convex set enclosing the p + 1 points
– For p = 2 convex hull is triangle
– For p = 3 convex hull is pyramid
• Algorithm searches for by moving convex hull within • If algorithm works properly, convex hull shrinks/collapses
onto
• No injected randomness (contrast with algorithms A, B, and C), but allowance for noisy loss measurements
• Frequently effective, but no general convergence theory and many numerical counterexamples to convergence
2-15
Steps of Nonlinear Simplex AlgorithmSteps of Nonlinear Simplex Algorithm
Step 0 (Initialization)Step 0 (Initialization) Generate initial set of p + 1 extreme points in p, i (i = 1, 2, …, p + 1), vertices of initial simplex
Step 1 (Reflection)Step 1 (Reflection) Identify where max, second highest, and min loss values occur; denote them by max, 2max, and min,
respectively. Let cent = centroid (mean) of all i except for max.
Generate candidate vertex refl by reflecting max through cent
using refl = (1 + )cent max ( > 0).
Step 2a (Accept reflection) Step 2a (Accept reflection) If L(min) L(refl) < L(2max), then
refl replaces max; proceed to step 3; else go to step 2b.
Step 2b (Expansion) Step 2b (Expansion) If L(refl) < L(min), then expand reflection
using exp = refl + (1 )cent, > 1; else go to step 2c. If
L(exp) < L(refl), then exp replaces max; otherwise reject
expansion and replace max by refl. Go to step 3.
2-16
Steps of Nonlinear Simplex Algorithm (cont’d)Steps of Nonlinear Simplex Algorithm (cont’d)
Step 2c (Contraction) Step 2c (Contraction) If L(refl) L(2max), then contract
simplex: Either case (i) L(refl) < L(max), or case (ii) L(max)
L(refl). Contraction point is cont = max/refl + (1 )cent, 0
1, where max/refl = refl if case (i), otherwise max/refl = max. In
case (i), accept contraction if L(cont) L(refl); in case (ii),
accept contraction if L(cont) < L(max). If accepted, replace max
by cont and go to step 3; otherwise go to step 2d.
Step 2d (Shrink) Step 2d (Shrink) If L(cont) L(max), shrink entire simplex
using a factor 0 < < 1, retaining only min. Go to step 3.
Step 3 (Termination)Step 3 (Termination) Stop if convergence criterion or maximum number of function evaluations is met; else return to step 1.
2-17
Illustration of Steps of Nonlinear Simplex Illustration of Steps of Nonlinear Simplex Algorithm with Algorithm with pp = 2 = 2
max min
cent
2max refl
Reflection
exp
Expansion whenL(refl) < L(min)
max min
cent
refl
cont
max min
refl
cont
cent
2max
Contraction whenL(refl) L(max)
(“inside”)
Shrink after failed contraction when L(refl) < L(max)
cont
max min
cent
refl
Contraction whenL(refl) < L(max)
(“outside”)
2max 2max
max min
cent
2maxrefl