data-intensive computing for competent genetic algorithms: a pilot study using meandre
DESCRIPTION
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.TRANSCRIPT
![Page 1: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/1.jpg)
Xavier Llorà
National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbana, Illinois, 61801
[email protected]://www.ncsa.illinois.edu/~xllora
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre
![Page 2: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/2.jpg)
Outline
• Data-intensive computing and HPC?• Is this related at all to evolutionary computation?• Data-intensive computing with Meandre• GAs and competent GAs• Data-intensive computing for GAs
![Page 3: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/3.jpg)
2 Minute HPC History
• The eighties and early nineties picture • Commodity hardware rare, slow, and costly• Supercomputers were extremely expensive• Most of them hand crafted and only few units• Two competing families
• CISC (e.g. Cray C90 with up to 16 processors)• RISC (e.g. Connection Machine CM-5 with up 4,096 processors)
• Late nineties commodity hardware hit main stream• Start becoming popular, cheaper, and faster• Economy of scale• Massive parallel computers build from commodity components become a
viable option
![Page 4: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/4.jpg)
Two Visions
• C90 like supercomputers were like a comfy pair of trainers • Oriented to scientific computing• Complex vector oriented supercomputers• Shared memory (lots of them)• Multiprocessor enabled via some intercommunication networks• Single system image
• CM5 like computers did not get massive traction, but a bit• General purpose (as long as you can chop the work in simple units)• Lots of simple processors available• Distributed memory pushed new programming models (message passing)• Complex interconnection networks
• NCSA have shared memory, distributed memory, and gpgpu based
![Page 5: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/5.jpg)
Miniaturization Building Bridges
• Multicores and gpgpus are reviving the C90 flavor• The CM-5 flavor now survives as distributed clusters of not so
simple units
![Page 6: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/6.jpg)
Control Models of Parallelization in EC
Run 2 Run 6
Run 3 Run 7
Run 1 Run 5
Run 10
Run 11
Run 9 Master
Slave
Individual Evaluation
Slave Slave
Migration
![Page 7: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/7.jpg)
But Data is also Part of the Equation
• Google and Yahoo! revived an old route• Usually refers to:
• Infrastructure• Programming techniques/paradigms
• Google made it main stream after their MapReduce model• Yahoo! provides and open source implementation
• Hadoop (MapReduce)• HDFS (Hadoop distributed filesystem)
• Store petabytes reliably on commodity hardware (fault tolerant)• Programming model
• Map: Equivalent to the map operation on functional programming• Reduce: The reduction phase after maps are computed
![Page 8: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/8.jpg)
n!
i=0
x2 ! reduce(map(x, sqr), sum)
A Simple Example
x x x x
x2 x2 x2 x2
sum
map map map map
reduce reducereduce reduce
![Page 9: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/9.jpg)
Is This Related to EC?
• How can we easily benefit of the current core race painlessly?• NCSA’s Blue Waters estimated may top on 100K• Yes on several facets
• Large optimization problems need to deal with large population sizes (Sastry, Goldberg & Llorà, 2007)
• Large-scale data mining using genetic-based machine learning (Llorà et al. 2007)
• Competent GAs model building extremely costly and data rich (Pelikan et al. 2001)
• The goal?• Rethink parallelization as data flow processes• Show that traditional models can be map to data-intensive computing
models• Foster you curiosity
![Page 10: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/10.jpg)
Data-Intensive Computing with Meandre
![Page 11: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/11.jpg)
The Meandre Infrastructure Challenges
• NCSA infrastructure effort on data-intensive computing• Transparency
• From a single laptop to a HPC cluster• Not bound to a particular computation fabric• Allow heterogeneous development
• Intuitive programming paradigm • Modular Components assembled into Flows• Foster Collaboration and Sharing
• Open Source• Service Orientated Architecture (SOA)
![Page 12: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/12.jpg)
Basic Infrastructure Philosophy
• Dataflow execution paradigm• Semantic-web driven• Web oriented• Facilitate distributed computing• Support publishing services• Promote reuse, sharing, and collaboration• More information at http://seasr.org/meandre
![Page 13: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/13.jpg)
Data Flow Execution in Meandre
• A simple example c ← a+b• A traditional control-driven language
a = 1b = 2c = a+b
• Execution following the sequence of instructions • One step at a time
• a+b+c+d requires 3 steps • Could be easily parallelized
![Page 14: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/14.jpg)
Data Flow Execution in Meandre
• Data flow execution is driven by data• The previous example may have 2 possible data flow versions
value(a)value(c)+
value(b)
Stateless data flow
value(a) value(c)value(b)
State-based data flow
?+
![Page 15: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/15.jpg)
The Basic Building Blocks: Components
Component
RDF descriptor of thecomponents behavior
The componentimplementation
![Page 16: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/16.jpg)
Go with the Flow: Creating Complex Tasks
• Directed multigraph of components creates a flow
PushText
PushText
ConcatenateText
To UpperCase Text
PrintText
![Page 17: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/17.jpg)
Automatic Parallelization:Speed and Robustness
• Meandre ZigZag language allow automatic parallelization
PushText
PushText
ConcatenateText
To UpperCase Text
PrintText
To UpperCase Text
To UpperCase Text
![Page 18: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/18.jpg)
GAs and Competent GAs
![Page 19: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/19.jpg)
Selectorecombinative GAs
1. Initialize the population with random individuals
2. Evaluate the fitness value of the individuals
3. Select good solutions by using s-wise tournament selection
without replacement (Goldberg, Korb & Deb, 1989)
4. Create new individuals by recombining the selected population
using uniform crossover (Sywerda, 1989)
5. Evaluate the fitness valueof all offspring
6. Repeat steps 3-5 until convergence criteria are met
![Page 20: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/20.jpg)
Extended Compact Genetic Algorithm
• Harik et al. 2006
• Initialize the population (usually random initialization)
• Evaluate the fitness of individuals
• Select promising solutions (e.g., tournament selection)
• Build the probabilistic model• Optimize structure & parameters to best fit selected individuals
• Automatic identification of sub-structures
• Sample the model to create new candidate solutions• Effective exchange of building blocks
• Repeat steps 2–7 till some convergence criteria are met
![Page 21: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/21.jpg)
eCGA Model Building Process
• Use model-building procedure of extended compact GA• Partition genes into (mutually) independent groups• Start with the lowest complexity model• Search for a least-complex, most-accurate model
Model Structure Metric[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000
[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933
[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819
[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644
… [X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273
… [X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895
![Page 22: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/22.jpg)
Data-Intensive Flows for Competent GAs
![Page 23: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/23.jpg)
Selectorecombinative GA
![Page 24: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/24.jpg)
ucbps
noit
twrops
eps
soed
sbp
0
5000
10000
15000
20000
sGAs Execution Profile and Parallelization
noitucbps/reducer
ucbps/paralell/3ucbps/paralell/2ucbps/paralell/1ucbps/paralell/0ucbps/mapper
twropseps/reducer
eps/paralell/3eps/paralell/2eps/paralell/1eps/paralell/0eps/mapper
soedsbp
0
5000
10000
15000
20000
• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
![Page 25: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/25.jpg)
eCGA Model Model building
![Page 26: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/26.jpg)
print_model
greedy_ecga_mb
update_partitions
init_ecga
0
10000
20000
30000
40000
50000
eCGA Execution Profile and Parallelization
print_model
greedy_ecga_mb
update_partitions/reduce
update_partitions/paralell/3
update_partitions/paralell/2
update_partitions/paralell/1
update_partitions/paralell/0
update_partitions/mapper
update_partitions/mapper
update_partitions/mapper
init_ecga
0
10000
20000
30000
40000
50000
• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
![Page 27: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/27.jpg)
●
●
●
●1
23
45
Number of cores
Spee
dup
vs. O
rigin
al e
CGA
Mod
el B
uild
ing
1 2 3 4
eCGA Model Building Speedup
• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.• Speedup against original eCGA model building
![Page 28: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/28.jpg)
Scalability on NUMA Systems
• Run on NCSA’s SGI Altix Cobalt• 1,120 processors and up to 5 TB of RAM• SGI NUMAlink• NUMA architecture• Test for speedup behavior• Average of 20 independent runs• Automatic parallelization of the partition evaluation• Results still show the linear trend (despite the NUMA)
• 16 processors, speedup = 14.01• 32 processors, speedup = 27.96
![Page 29: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/29.jpg)
Wrapping Up
![Page 30: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/30.jpg)
Summary
• Evolutionary computation is data rich• Data-intensive computing can provide to EC:
• Tap into parallelism quite painless• Provide a simple programming and modeling• Boost reusability• Tackle otherwise intractable problems
• Shown that equivalent data-intensive computing versions of traditional algorithms exist
• Linear parallelism can be tap transparently
![Page 31: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre](https://reader033.vdocuments.site/reader033/viewer/2022042814/55500d9bb4c90535638b4829/html5/thumbnails/31.jpg)
Xavier Llorà
National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbana, Illinois, 61801
[email protected]://www.ncsa.illinois.edu/~xllora
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre