[ieee 2011 2nd international conference on intelligent systems, modelling and simulation (isms) -...

5
Improved Simulated Annealing using Momentum Terms Mohammad Mehdi Keikha Dept. of Computer, Engineering Faculty University of Isfahan Isfahan, Iran [email protected] AbstractSimulated Annealing is one of the important evolutionary algorithms which can be used in many applications especially in optimization problems. Simulated Annealing has two main phases, the first one is annealing schedule and the second is acceptance probability function. I proposed three annealing schedule methods and one acceptance probability function. The idea of adding momentum terms was used to improve speed and accuracy of annealing schedulers and prevent extreme changes in values of acceptance probability function. Some of my proposed methods show a good accuracy and the others make significant improvement in the speed of simulated Annealing algorithms than the original functions which have been used in the original simulated annealing algorithm. Keywords- Simulated Annealing; Annealing Schedule; Acceptance Probability; Temperature. I. INTRODUCTION The name and Inspire of SA 1 comes from Annealing in metallurgy. Annealing involves heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. If temperature decreases very slowly, a stable state will be seen in the system with great crystals which can be formed regularly beside each other, they have minimum energy and will meet our needs but if the temperature decreases quickly, the crystals won’t have enough time to form regular and stable structures due to high energy of crystals and they can’t reach to a stable state. SA decreases temperature slowly and in each temperature considers a neighbor of current state. If energy of neighbor is better than the current state, SA moves the system to neighbor state but if energy of neighbor was lower than the current state, SA uses an acceptance function to decide whether to move to neighbor state or not. Sometimes, SA uses a bad state based on the acceptance function for reaching to the final state in future, because at the present, the system may fall into a local minimum state and this movement entails to void the local minimum state. This step repeats until the system reaches to a state which is good enough for the application or continues until a given computational budget has been exhausted. I will search for 1 Simulated Annealing minimizing the cost function but I might move to an undesired state and I am hopeful to reach to a good state in future. For implementing SA, 3 factors should be determined. (1) Starting point which is a point in search space that searching starts from it. (2) Neighbor state generation which stands task of generating neighbor states. (3) Annealing schedule that includes those parameters specify the method of decreasing the temperature. For example an Annealing scheduler will determine “when temperature decreases and what it must decrease”. Fig. 1 shows the steps of SA algorithm. Figure1. Simulated Annealing Algorithm [15] SA uses for optimization problems because all of optimization problems have a final state which has minimum energy. The goal of each optimization problem is reaching to the final state. The system will reach to a stable state in the final state. Thus, SA can be used for optimization problems, because SA reaches a system to a stable state. The paper is organized as follows: in section II the details of each step of SA algorithm and mainly the 2011 Second International Conference on Intelligent Systems, Modelling and Simulation 978-0-7695-4336-9/11 $26.00 © 2011 IEEE DOI 10.1109/ISMS.2011.18 44 2011 Second International Conference on Intelligent Systems, Modelling and Simulation 978-0-7695-4336-9/11 $26.00 © 2011 IEEE DOI 10.1109/ISMS.2011.18 44

Upload: mohammad-mehdi

Post on 13-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

Improved Simulated Annealing using Momentum Terms

Mohammad Mehdi Keikha

Dept. of Computer, Engineering Faculty University of Isfahan

Isfahan, Iran [email protected]

Abstract— Simulated Annealing is one of the important evolutionary algorithms which can be used in many applications especially in optimization problems. Simulated Annealing has two main phases, the first one is annealing schedule and the second is acceptance probability function. I proposed three annealing schedule methods and one acceptance probability function. The idea of adding momentum terms was used to improve speed and accuracy of annealing schedulers and prevent extreme changes in values of acceptance probability function. Some of my proposed methods show a good accuracy and the others make significant improvement in the speed of simulated Annealing algorithms than the original functions which have been used in the original simulated annealing algorithm.

Keywords- Simulated Annealing; Annealing Schedule; Acceptance Probability; Temperature.

I. INTRODUCTION

The name and Inspire of SA1 comes from Annealing in metallurgy. Annealing involves heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. If temperature decreases very slowly, a stable state will be seen in the system with great crystals which can be formed regularly beside each other, they have minimum energy and will meet our needs but if the temperature decreases quickly, the crystals won’t have enough time to form regular and stable structures due to high energy of crystals and they can’t reach to a stable state. SA decreases temperature slowly and in each temperature considers a neighbor of current state. If energy of neighbor is better than the current state, SA moves the system to neighbor state but if energy of neighbor was lower than the current state, SA uses an acceptance function to decide whether to move to neighbor state or not. Sometimes, SA uses a bad state based on the acceptance function for reaching to the final state in future, because at the present, the system may fall into a local minimum state and this movement entails to void the local minimum state. This step repeats until the system reaches to a state which is good enough for the application or continues until a given computational budget has been exhausted. I will search for

1 Simulated Annealing

minimizing the cost function but I might move to an undesired state and I am hopeful to reach to a good state in future.

For implementing SA, 3 factors should be determined. (1) Starting point which is a point in search space that searching starts from it. (2) Neighbor state generation which stands task of generating neighbor states. (3) Annealing schedule that includes those parameters specify the method of decreasing the temperature. For example an Annealing scheduler will determine “when temperature decreases and what it must decrease”. Fig. 1 shows the steps of SA algorithm.

Figure1. Simulated Annealing Algorithm [15]

SA uses for optimization problems because all of

optimization problems have a final state which has minimum energy. The goal of each optimization problem is reaching to the final state. The system will reach to a stable state in the final state. Thus, SA can be used for optimization problems, because SA reaches a system to a stable state. The paper is organized as follows: in section II the details of each step of SA algorithm and mainly the

2011 Second International Conference on Intelligent Systems, Modelling and Simulation

978-0-7695-4336-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISMS.2011.18

44

2011 Second International Conference on Intelligent Systems, Modelling and Simulation

978-0-7695-4336-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISMS.2011.18

44

Page 2: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

related work that has to be done on it, will be discussed. This section composed of two sub section. The first is studying Annealing module of SA and the second is studying various acceptance probability functions. The main ideas of this paper will be explained in section III. Finally the experimental results for TSP problem will be compared with TSPLib data set on section IV.

II. SA MODULES

A. Annealing module:

The first module of SA algorithm specifies “when and what temperature must be decreased”. This section is very important because it can determine the speed and accuracy of the system. Annealing schedule has an effect on accuracy but not alone and acceptance probability function is also effective. Annealing module contains the following subsections: 1. Starting temperature

Reyward Smith has asserted that this should be started with a high temperature then decreased until 60% of the undesired states were accepted, then from this point to the end the temperature decreased slower than before [3]. Dowsland also has the same idea with some changes in determining that the decreasing temperature point must be started [7]. 2. Ending temperature

This parameter can be determined based on the type of problem and importance of time of returning a solution for a problem but usually set to zero.

3. Annealing schedule

There are many annealing methods for decreasing the temperature now but major methods more used for various type of applications, can be seen below (in all of equations, k is state number or loop count index): • AARTS: In this method the temperature is held fixed during each loop.

))

3)1log((1/(1

k

kkkk

TTTσ

σ++=+ (1)

In (1) kσ is the standard deviation of observed values of the cost function during kth loop of algorithm [1, 8, 9]. • Geometric: at the end of each loop temperature is updated with the following rule:

kk TT α=+1 (2)

Usually α is selected from 0.8 < α < 0.99. The speed of decreasing the temperature in this approach extremely depends on choosing the value of α [10]. • Lundy:

)1/( 00 TkTTk β+= (3)

0T is the starting temperature and β is a tunable parameter according to problem [5, 18]. Temperature is updated in each iteration. • Logarithmic:

)log(/ 0nkCTk += (4)

C is a tentative parameter that 100 < c < 1000, but you can choose it out of this range, k is the number of iterations and 0n is a base. The disadvantage of this method is its lower speed than the others but has the highest accuracy among all annealing schedulers. It also updates temperature per iterations [10]. • Boltzmann: This method has a high accuracy but it is like logarithmic very slow. I should say accuracy of Boltzmann is lower than Logarithmic. It also updates the temperature at each iteration [1]. The rule for updating the temperature acts according to the following formula:

)1log(/0 kTTk += (5)

• Cauchy: One of the fastest methods which exist for decreasing

temperature is Cauchy. It updates temperature by the following rule:

)1/(0 kTTK += (6)

It usually has a good performance with Cauchy acceptance function which will be explained below. It also updates temperature in each iteration [12, 13].

4. Fixed temperature In this subsection an appropriate temperature must be

found. I have done so experimentally, by running a fixed temperature for a range of temperatures and choosing the temperature which gives the best performance, say the best average solution in N iterations. Connolly proposes a method of determining a fixed temperature by running a fast cooling algorithm and pays attention to the temperature at which the best solution found first occurred [14].

B. Acceptance Probability module The second module of SA algorithm is choosing

acceptance probability function. In a local optimization algorithm, a new state was accepted when it optimized cost function whereas SA can accept an undesirable state based on the value of an acceptance probability function. Two major acceptance probability functions are Boltzmann and Cauchy. Boltzmann function is defined as follows:

)/exp()( btEEP Δ−=Δ (7) b is Boltzmann constant and t is temperature. ΔE is the

difference between energy of neighbor state and current state [1, 15, 16]. But SA can accept an undesired state with

4545

Page 3: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

Cauchy probability function which is defined by the following rule [12, 13, 17]:

22/)( ETTEP Δ+=Δ (8)

Usually the acceptance probability will be chosen according to annealing schedule and optimization problem.

III. PROPOSED METHODS The proposed methods further improve the speed and

in comparison with some of the mentioned methods also have higher accuracy. Firstly annealing schedule methods are described and then are exposed to acceptance probability function. Functions that used in the following formulas don’t have units and I have not considered any units for variables which used in them, because the variables that used in the formulas are different parameters and not dependent to the others. I interpret the returned values of following formulas as temperature and probability values.

A. Annealing schedules methods:

There are three methods proposed by the researcher which all of them are according to the idea of adding momentum terms to available methods.

1. Hybrid

The first method that can decrease the speed regularly is hybrid method. When I saw the defect of mentioned methods especially for speed, firstly I found the formula by using the first momentum term that could reduce the proportion from the current temperature in order to affect the speed of decreasing the temperature. But when it was experimented I found that this term can’t have an effect on the speed alone. For further improvements, I added the second momentum term to the previous formula. The result of implementing the last formula showed that I could control the temperature to drop slow enough to prevent from extreme changes in cost function. The formula is as follows:

)/*(1K

kkk eTkTTT Δ−−=+ α (9) In the above α is similar to Geometric but usually is

small and k is the number of iterations. Also TΔ is the difference between current temperature and the previous temperature. In fact I use current and previous temperature for the next temperature. In (9) TΔ has a composer role because always TΔ < 0 and causes the second momentum term to become very important to improve the speed. When I have just the first momentum term, system still cools slowly and can’t establish a trade between speed and accuracy. For this reason I tried to use the second momentum term. By the second momentum term, system moves to global minimum faster because the second term tends to zero when k becomes big. This means that when the system is far from optimal solution, temperature drops faster

but when it reaches close to the solution the second momentum term becomes zero and prevents from changing in cost function and as a result of this behavior of (9), system will reach to minimum cost faster. 2. Extended logarithmic

Logarithmic method has the highest accuracy between available methods but is also very slow. The main reason of this behavior is the existence of logarithm function in denominator because the entrance parameter for logarithm function is the number of iterations. When this parameter increments one by one, this change can’t affect the output of logarithm and then temperature changes very slowly. For this reason I added a momentum term which dropped the temperature faster and also obtained accuracy of Logarithmic method. In order to do this I extended the original formula as follows:

)log()/())log(/( 0 kekkTCT Kk −−+= (10)

In the above 0T is the starting temperature and k is the

number of iteration and C is the same with the original logarithmic. The first momentum term has the same role in (9) but the second momentum term increases with k but its changes are not extreme in order to be able to affect the accuracy of cost function even when k is large. This term helps the total formula to reach faster to global minimum of cost function. Extended logarithmic against original logarithmic is faster but accuracy of it than the original logarithmic is lower. 3. Extended Boltzmann

For improving faster speed of Boltzmann method I added a logarithm term to it that acts the same as the second momentum term in extended logarithmic method in (10). The extended rule is:

)1log()]1log(/[ 0 kkTTk +−+= (11)

In (11) k is the number of iterations. Like extended logarithmic, firstly I added the first

momentum term in (10) to (11) but didn’t see the expected results that were related to the type of decreasing temperature in Boltzmann. Then I found this subject and tried other functions for momentum terms and finally found that with log function, system has the best performance against other functions. It should be stated that extended Logarithmic had higher accuracy than extended Boltzmann and speed of extended Boltzmann was more than extended Logarithmic.

4646

Page 4: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

B. Extended Boltzmann Acceptance probability I could reach to new acceptance probability function

based on the idea of adding a momentum term to Boltzmann probability function. The new function had a good performance especially whenever used with the extended Boltzmann method. This function will be calculated according to following rule:

)/exp()( TEEP Δ−=Δ (12) The difference between equations (7) and (12) is in defining

EΔ . In (12) EΔ the calculation is as follows:

ji EEE −=′Δ (13)

ETEE ′Δ−′Δ=Δ α (14)

In (7) EΔ is according to just (13) but in (12) EΔ is calculated based on both (13) and (14).

In (14) α is a run time parameter and depends on the problem. The value of α is dependent to annealing method and starting temperature.

I should say that in (14) E′Δ > 0 because if E′Δ < 0 then I’ll accept the new state undoubtedly, because energy of new state is lower than the current state and the new state is close to minimum cost compared to the current state and there shouldn’t be any worries about the negative term under radical because this condition does not happen ever.

IV. EXPERIMENTAL RESULT I use The Travelling Salesman Problem (TSP) for

testing my approach. TSP is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once.

Tables Ι-Ш show the result of running all original annealing schedules. In each table I used one of the acceptance probability functions. Tables Ι-Ш show only accuracy of methods because according to rules for comparing the result for TSP problem only accuracy of methods is compared with others. Each value in tables is average of 10 run of methods for the specified problem size. I should say the proposed methods have lower accuracy than the original logarithmic and Boltzmann but they are faster than these two methods. I tried to establish a trade between speed and accuracy in SA algorithm.

I use Lin318 ،Att532 ، Rat783 and F11400 benchmarks that each of them shows 318، 532، 783 and 11400 cities for TSP problem, respectively.

TABLE I. Compares accuracy of annealing schedule methods with

Boltzmann Probability Fl1400 Rat783 Att532 Lin318 Annealing

schedule 20127 8806 27686 42029 Optimal

solution 20871 9032 27734 42112 Geometric 21562 9469 28342 42848 Lundy 20281 8862 27693 42029 Logarithmic 20316 8897 27705 42038 Boltzman 20518 9014 27729 42083 Cauchy 20316 8879 27709 42035 hybrid 20438 9006 27719 42052 extended

Logarithmic 20463 9013 27728 42073 extended

Boltzman

In TABLE Ι the accuracy of logarithmic is higher than others and extended logarithmic method is the next.

TABLE II. Compare accuracy of annealing schedule methods with Cauchy

Probability Fl1400 Rat783 Att532 Lin318 Annealing

schedule 20127 8806 27686 42029 Optimal

solution 20871 9032 27734 42112 Geometric 21562 9469 28342 42848 Lundy 20281 8862 27693 42029 Logarithmic 20316 8897 27705 42038 Boltzman 20518 9014 27729 42083 Cauchy 20316 8879 27709 42035 hybrid 20438 9006 27719 42052 extended

Logarithmic 20463 9013 27728 42073 extended

Boltzman

In TABLE П again accuracy of logarithmic method is higher than others and extended logarithmic method is the tertiary.

TABLE III. Compare accuracy of annealing schedule methods with

extended acceptance Probability Fl1400 Rat783 Att532 Lin318 Annealing

schedule 20127 8806 27686 42029 Optimal

solution 20913 9063 27781 42257 Geometric 21618 9464 28579 42927 Lundy 20299 8870 27693 42161 Logarithmic 20338 8898 27711 42148 Boltzman 20525 9057 27742 42191 Cauchy 20351 8883 27709 42076 hybrid 20438 9035 27728 42096 extended

Logarithmic 20465 9041 27734 42212 extended

Boltzman

4747

Page 5: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

In TABLE Ш again the accuracy of logarithmic method is higher than others and extended logarithmic method is the next. In TABLE П and Ш it is illustrated that the proposed acceptance function has better results than Cauchy function.

V. CONCLUSION I have proposed 3 methods for annealing schedule. The

idea behind these 3 methods is based on adding momentum terms to results original annealing schedules that makes the algorithm become faster. Beside these annealing schedulers I obtained a new acceptance function that can prevent extreme changes in acceptance function. The idea of this function is also adding a momentum term to Boltzmann probability.

REFERENCES

[1] E. H. L. Aarts and J. H. M. Korst, Simulated Annealing and

Boltzmann Machines: A stochastic approach to combinatorial optimization and neural computing, Chichester, Wiley, interscience series in discrete mathematics and optimization ISBN 0-471-92146-7,xii +272, 1989, pp. 24-95.

[2] V. Cěrny, “A Thermodynamical Approach to the Travelling Salesman Problem; An Efficient Simulation Algorithm,” Journal of Optimization Theory and Appli,. Vol. 45, 1985, pp. 41-55.

[3] K. A. Dowsland, Simulated Annealing. In Modern Heuristic Techniques for Combinatorial Problems (Ed. Inc. C. Reeves), McGraw-Hill, 1995.

[4] S. Kirkpatrick et al. “Optimization by Simulated Annealing,” Science, vol. 220, 1983, pp. 671-680.

[5] M. Lundy and A. Mees, “Convergence of an Annealing Algorithm,” Math. Prog, vol. 34, 1986, pp. 111-124.

[6] N. Metropolis et al. “Equation of State Calculation by Fast Computing Machines,” J. of Chem. Phys., vo. l, 1953, pp. 1087-1091.

[7] V. J. Rayward-Smith et al. Modern Heuristic Search Methods. John Wiley & Sons , 1996.

[8] E. H. L. Aarts and P. J. M. Van Laarhoven, “Statistical cooling: A general approach to combinatorial optimization problems,” Phillips Journal of Research, vol. 40, 1985, pp. 193-226.

[9] P. J. M. Van Laarhoven and E. H. L. Aarts, Simulated annealing: theory and applications D. Reidel Publishing Company, Kluwer, Dordrecht, 1987.

[10] H. Cohn and M. Fieldin, “Simulated Annealing : Searching for an Optimal Temperature Schedule,” SIAM Journal on Optimization, vol. 9, 1999, pp. 779 – 802.

[11] H. Szu and R. Hartley, “Fast Simulated Annealing ,” Physics Letters, vol. 122, 1987, pp. 157-162.

[12] L. INGBER, “Very fast simulated re-annealing,” Mathematical and Computer Modelling, vol. 12(8), 1989, pp. 967-973.

[13] L. INGBER and B. ROSEN, “Genetic Algorithms and Very Fast Simulated Reannealing: A Comparison ,“ Mathematical and Computer Modelling, vol. 16, 1992, pp. 87-100.

[14] D. T. connolly, “An Improved Annealing scheme for the QAP,” European Journal of Operations Research, vol. 46, 1990, pp. 96-100.

[15] J. W. Pepper and B. Golden, “Solving the Traveling Sales man Problem With Annealing-Based Heuristics: A Computational Study,” IEEE TRANS. ON SYSTEMS, MAN, AND CYBERNETICS—PARTA: SYSTEMS AND HUMANS, VOL. 32, NO.1, JANUARY 2002.

[16] E. Aarts et al. Search Methodologies, springer, 2005.

[17] L. Lamberti, “An efficient simulated annealing algorithm for design optimization of truss structures,” Computers & Structures, vol. 86, October 2008, Pages 1936-1953.

[18] D. Tefankovi et al. “Adaptive simulated annealing: A near-optimal connection between sampling and counting,” J. ACM, vol. 56, no. 3, May, 2009, pp. 1-36.

4848