a fully saturated opencl particle swarm optimizer

Donald Pupecki

Upload: allayna

Post on 12-Jan-2016




1 download


A Fully saturated OpenCL Particle swarm optimizer. Donald Pupecki. Why GPUs?. GPUs are scaling faster than CPUs[1] Intel recently wrote a paper claiming a two year old GPU was “only 10x faster” than their top-end CPU. [2]. Why OpenCL?. Portable. Can run on most parallel platforms CPU - PowerPoint PPT Presentation


Page 1: A Fully saturated OpenCL Particle swarm optimizer

Donald Pupecki

Page 2: A Fully saturated OpenCL Particle swarm optimizer

Why GPUs?GPUs are scaling faster than CPUs[1]

Intel recently wrote a paper claiming a two year old GPU was “only 10x faster” than their top-end CPU.


Page 3: A Fully saturated OpenCL Particle swarm optimizer

Why OpenCL?Portable.

Can run on most parallel platforms CPU GPU FPGA Cell B.E.

Industry Wide SupportBinding For many languages

Page 4: A Fully saturated OpenCL Particle swarm optimizer

Why PSO?Particle Swarm Optimization

Created in 1995 by Kennedy, Eberhart , Shi[5][6]

Inspired by BOIDS, a bird flocking simulation by Craig Reynolds[3][4]

Provides good exploration, exploitation sometimes requires parameter tuning. [7]

A subset of Swarm Intelligence, which is itself a subset of Evolutionary Computation

Video: http://vimeo.com/17407010

Page 5: A Fully saturated OpenCL Particle swarm optimizer

Is it parallelizable?The PSO, like most evolutionary

computations is iterative. General process:


while not term_conditions

update position, velocity

recalculate fitness

update local bests

Within each iteration there is a lot to do.

Page 6: A Fully saturated OpenCL Particle swarm optimizer
Page 7: A Fully saturated OpenCL Particle swarm optimizer

Parallel Gotchas

Bank conflicts[18]Divergent Warps

Struct SizesCoalescingDebugging

SIMDNo Branch Prediction

Page 8: A Fully saturated OpenCL Particle swarm optimizer

Goals: Implement More Test FunctionsThe previous version of this work only

implemented 4 test functions. More non-parabolic test functions could help

confirm the soundness of the algorithm.

Page 9: A Fully saturated OpenCL Particle swarm optimizer

Test Functions ImplementedSphere Unimodal, Symmetric

Rosnebrock Multimodal, Nonsymentric, Banana Shaped

Rastragin Highly Multimodal, Symmetric

Grienwank Regularly Distributed Minima

Ackley Highly Multimodal, Funnel Shaped

Schwefel Globally distant minima

Sum of Powers Unimodal, non Symmetric


Page 10: A Fully saturated OpenCL Particle swarm optimizer

Goals: Shift Test FunctionsPSOs have a tendency to flock to the middle.

[10]Remedied by “Region Scaling”[10][11][12] or

“Center offset” approach. Used center offset with an offset of 2.5

Page 11: A Fully saturated OpenCL Particle swarm optimizer

Goals: More ParallelismVersion 1.0 only does fitness calculations

over Num Particles. Dimensional Calculations almost all contain

products or sums over all dimensions and should be done in parallel and combined with a reduction.

Page 12: A Fully saturated OpenCL Particle swarm optimizer

Goals: Move Everything to CardLess than 1/10th of the time in the 1.0 version

was spent doing calculation.

Method Time Data xferred

WriteBuffer 0.08559 2.34WriteBuffer 0.07935 2.34WriteBuffer 0.07701 0.23WriteBuffer 0.07851 2.34WriteBuffer 0.07641 0.08rast__k3_ATI

RV7701 0.04503 0

ReadBuffer 0.02778 2.34ReadBuffer 0.02052 2.34

Page 13: A Fully saturated OpenCL Particle swarm optimizer

OCLPSO 1.1Get everything to the card!Needed to implement a LCRNG on card.Eliminated large data transfers. Added hill climber to fall back on.Got inconsistent results.

Note on synchronizations. Barrier, Mem Fence, FP Atomics, Compute Capability 2.x and the lack of global sync.

Page 14: A Fully saturated OpenCL Particle swarm optimizer

OCL PSO 2.0Rewrite. Introduced the Fully Saturated concept. Compute Units are now self contained. All dimensions of all particles of all swarms

are evaluated in parallel.

Page 15: A Fully saturated OpenCL Particle swarm optimizer
Page 16: A Fully saturated OpenCL Particle swarm optimizer

OCLPSO 2.0 vs. CUDA PSO 2.0The CUDA PSO[16] [17] was submitted for

the GECCO competition on GPGPGPU.comIt features a ring topology for its bests


Page 17: A Fully saturated OpenCL Particle swarm optimizer
Page 18: A Fully saturated OpenCL Particle swarm optimizer

OCLPSO 2.0 vs. SPSOThe 2006 Standard PSO was written by Clerc

and Kennedy.Pretty frills-free compared to newer (2010)

Standard PSO codes or TRIBES. Non-parallel but extremely efficient.

Page 19: A Fully saturated OpenCL Particle swarm optimizer
Page 20: A Fully saturated OpenCL Particle swarm optimizer

ConclusionsOCL PSO 2.0 performs competitively as

compared to CUDA PSO and trumps SPSO.This is despite its lack of neighborhood

bests(it uses only a global best). This is also without the hill climb component

as I needed to strip as much complexity out as I could in the final rewrite.

Page 21: A Fully saturated OpenCL Particle swarm optimizer

Future workRe-implement hill climber, it was a great help

for peaks. Re-add neighborhoods. Implement on newest generation of CUDA 2.x

cards supporting global floating point atomicsImplement TRIBES on the GPU.

Page 22: A Fully saturated OpenCL Particle swarm optimizer

References[1] Nvidia, “TESLA DPU computing: Supercomputing at 1/10th the Cost.”

http://www.nvidia.com/docs/IO/96958/Tesla_Master_Deck.pdf , 2011[2] Nvidia, “Why Choose Tesla,” http://www.nvidia.com/object/why-choose-tesla.html,

2011[3] BOIDS paper or wiki:boids http://www.red3d.com/cwr/boids/[4] Reynolds, Craig (1987), "Flocks, herds and schools: A distributed behavioral model.",

SIGGRAPH '87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques (Association for Computing Machinery): 25–34

[5] Kennedy, J.; Eberhart, R. (1995). "Particle Swarm Optimization". Proceedings of IEEE International Conference on Neural Networks. IV. pp. 1942–1948. doi:10.1109/ICNN.1995.488968. http://www.engr.iupui.edu/~shi/Coference/psopap4.html.

[6] Shi, Y.; Eberhart, R.C. (1998). "A modified particle swarm optimizer". Proceedings of IEEE International Conference on Evolutionary Computation. pp. 69–73.

[7] Clerc, M.; Kennedy, J. (2002). "The particle swarm - explosion, stability, and convergence in a multidimensional complex space". IEEE Transactions on Evolutionary Computation

[8] M. Nobile. “Particle Swarm Optimization”, http://vimeo.com/17407010, 2011

Page 23: A Fully saturated OpenCL Particle swarm optimizer

References – Cont. [9] M. Clerc, Confinements and Biases in Particle Swarm Optimization.

http://clerc.maurice.free.fr/pso/ , 2006.[10] C. K. Monson and K. D. Seppi, "Exposing Origin-Seeking Bias in PSO," presented at

GECCO'05, Washington, DC, USA, 2005, pp. 241-248.[11] P. J. Angeline. Using selection to improve particle swarm optimization. In

Proceedings of the IEEE Congress on Evolutionary Computation (CEC 1998), Anchorage, Alaska, USA, 1998.

[12] D. K. Gehlhaar and D. B. Fogel. Tuning evolutionary programming for conformationally flexible molecular docking. In Evolutionary Programming, pages 419–429, 1996.

[13] D. Pupecki, “OpenCL PSO, Development, Benchmarking, Lessons, Future Work,” 2011. http://web.cs.sunyit.edu/~pupeckd/oclpsopres.pdf

[14] Molga, M., Smutnicki, C., 2005. “Test functions for optimization needs”, http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf

[15] Pohlheim, Hartmut. Genetic and Evolutionary Algorithm Toolbox for usewith MATLAB . Technical Report, Technical University Ilmenau, 1998.

[16] L. Mussi, F. Daolio, and S. Cagnoni. Evaluation of parallel particle swarm optimization algorithms within the CUDA architecture. Information Sciences, 2011, in press.

Page 24: A Fully saturated OpenCL Particle swarm optimizer

References – Cont. [17] L. Mussi, Y.S.G. Nashed, and S. Cagnoni. GPU-based Asynchronous Particle Swarm

Optimization. Proc.GECCO 2011, 2011.[18] NVIDIA, OpenCL Programming Guide for the CUDA Architecture,

http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf, 2011

[19] M. Clerc. TRIBES - un exemple d’optimisation par essaim particulaire sans param`etres de contrˆole. In Optimisation par Essaim Particulaire (OEP 2003), Paris, France, 2003.