gpu based stacking sequence optimization for composite skins...
TRANSCRIPT
S0214 : “GPU Based Stacking Sequence Generation For Composite Skins Using GA”
Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session)
Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,
Infosys Limited, Electronic City, Hosur Road, Bangalore, India
www.infosys.com
1
Contents: GPU Based Stacking Sequence Generation For Composite Skins Using GA
• Overview
• Composite Skin Engineering - Overview.
• Optimization of Aircraft Composite Skins
Assumptions
Decision variables, Objective, constraints
• Genetic Algorithms Based Composite
Stacking Sequence Generation Approach
Encoding and Decoding, Initial solutions generation
Improving the solutions :Genetic Algorithm Operators
Convergence Criteria
• Speedup using GPU
Graphics processing Unit(GPU) and data parallel applications
Method of using CUDA based GPU and Flow chart
Speed up achieved and Observations.
• Stacking Sequence Results for three typical examples.
• Future work and Conclusion ( Pictures in this page are for illustration only )
Part1
Part2
Part3
2
Composite Laminate Skins- Overview
• Composites relevance to Aircraft industry.
• Plies, fiber orientations, laminate, zones
1
1
2 3 4 5 6
7
8
9
10
11
12
13
14
15
16
17
18
4
90o ply
45o ply
0o ply
-45o ply
Laminate Stacking (45o/-45o/0o/90o)
Fiber Orientation
Zones
Plan view of Aircraft Wing Skin
Optimization of Aircraft Composite Skins
• Optimization of real life aircraft composite skins is performed in two stages.
In First Stage- a gradient based optimization technique
In Second Stage, stacking sequence generation - the scope of current study.
• Objectives of the current study are :
demonstrate utility of genetic algorithms for the problem
showcase performance benefits of parallel computing using GPUs.
5
S. No. Stacking Sequence Rule
1 Laminate should be symmetric.
2 Number of plies of each orientation should remain same.
3 Laminate stacking sequence should not contain more than 4 plies in the same orientation together.
4 Laminate stacking sequence should not have more than 2 plies in the same orientation together at the top of the
laminate.
5 Maximum difference in angle orientations between two consecutive plies must be equal to 450
6 At the top of the laminate 00 ply should be placed such that there are at least 3 plies between 00 ply and outer
surface of the laminate.
The stacking sequence generation is subjected to the following “stacking rules “.
Assumptions
Problem Formulation
first level composite skin optimization is already performed.
Constraints
6
Mathematical formulation Cond….
Minimize Violation of stacking rules, i.e.: Minimize Penalty P = P1 + P2 +… + Pn
Where P1, P2, …..Pn are penalties for non-compliance of 1, 2,… nth stacking sequence rule
Objective function
Decision Variables For each layer of laminate , assign one of the orientations: (00, 900, +450, -450)
7
Genetic Algorithms
• Genetic algorithms are search techniques based on principles of natural selection. Methods are suitable for combinatorial problems like stacking sequence generation.
Capable of generating good solutions by evaluating a fraction of solutions among all possible options
No need of gradient information of the objective function, so oblivious to the domain of the problem.
• The genetic algorithm process starts by assigning allowable ply orientations 00, 450, 900 and -450 randomly to design variables.
• The fitness (reciprocal of number of rule violations) for this solution is computed.
9
A Simple Genetic algorithm has the below steps:
Step1. Encoding, decoding and Initial solutions creation: random 1’s and 0’s
Step2. Improving the initial solutions: Selection, Crossover, Mutation
Step3. Convergence and Stopping criterion: 85-90% similarity in solutions, or
reaching preset maximum number of iterations.
GA Steps
10
Step1: Encoding, decoding
and initial solution
Step2: Improving Initial solutions
using Selection, Cross Over, Mutation Step3 : Convergence
1001011010011001
0101010010010101
1001001010101011
0100101010101011
1100110110001010
1001001001010101
1101001001001001
1001001000100101
15.0
23.0
124.0
0.0
23.0
89..0
21.0
10.0
0100011010011001
0101001010010101
0101001010101011
1010101011000011
1100110110001010
1001001001010101
1101001000101001
1101001011001011
15.0
0 00
204.0
10.0
21.0
81.0
2.0
10.0
1100110110001010
1100110110001010
1100110110001010
1100110110001010
1100110110001010
1001001001010101
1100110110001010
1001001000100101
100.0 100.0
100.0
100.0
100.0
2.0
5.0
18.0
First Generation Second Generation Last Generation
Solution1
Solution2
Solution3
Solution4
Solution N
…..
…..
Selection,
Cross Over,
Mutation
Selection,
Cross Over,
Mutation
Fitness
GA Step1: Encoding, decoding and Initial solutions creation
• Design Variables are encoded into bits of 1’s and 0’s
• In the current problem Design variables are the ply orientations at each level of the laminate
• Each ply orientation is encoded with two bits as given below
• The Objective function in the current problem, i.e. number of stacking rule violations, is translated into the ‘fitness’ of an individual solution.
Ply Orientation Encoded Bits
0o 00
45o 01
-45o 10
90o 11
Encoding
Decoding
Problem Space Laminate Stacking
Sequence
(90/45/90/0/-45/0)
Solution Space Encoded Laminate Stacking Sequence:
Sol 1: 110111001000
Sol 2: 011111001000
………………………….. Sol n: …….………………..
Evaluate Solution
11
Fittest solution
occupies more
area on the wheel
GA Step2 : Improving the initial solutions :GA Operators Three operators : 1. selection, 2. cross-over 3. mutation
1. Selection : Selection operator is based on the survival of the fittest i.e. each solution gets number of copies into new solution space, in proportion to its fitness.
2. Cross-Over Operator: Cross over operators picks up two solutions at random within a generation and, between these solutions does swapping of bits from one to another, to create two new solutions.
3. Mutation Operator: Mutation randomly flips bits in solution according to a preset probability. This operator increases the chances of avoiding local minimum by keeping the population diverse to a minimum extent. Usually the probability of
mutation is low e.g. 0.001.
Selection
Point Wheel
rotation
Least Fit solution
occupies smallest
segment on the wheel
40%
35%
12%
8%
5%
Mutation
Cross Over
Parent1
Parent2
Offspring1
Offspring2
Cross Sites
12
• Maximum number of iterations reaches a prefixed number.
• 85-90% of similarity in solutions in the current generation is reached.
GA Step3: Convergence Criteria
13
Scope for Parallelization
Parallelization can be achieved using:
1. Multiple CPUs : expensive, limited cores
2. GPGPUs : becoming less expensive, commonly used for graphics processing,
considered in this study
Computations across generations are
dependent : can’t be parallel.
Computations
within a
generation are
independent of
each other:
Can be parallel.
14
Step1: Encoding, decoding
and initial solution
Step2: Improving Initial solutions
using Selection, Cross Over, Mutation
Step3 :
Convergence
1001011010011001
0101010010010101
1001001010101011
0100101010101011
1100110110001010
1001001001010101
1101001001001001
1001001000100101
15.0
23.0
124.0
0.0
23.0
89..0
21.0
10.0
0100011010011001
0101001010010101
0101001010101011
1010101011000011
1100110110001010
1001001001010101
1101001000101001
1101001011001011
15.0
0 00
204.0
10.0
21.0
81.0
2.0
10.0
1100110110001010
1100110110001010
1100110110001010
1100110110001010
1100110110001010
1001001001010101
1100110110001010
1001001000100101
First Generation Second Generation Last Generation
Solution1
Solution2
Solution3
Solution4
Solution N
…..
…..
Selection,
Cross Over,
Mutation
Selection,
Cross Over,
Mutation
Fitness
100.0 100.0
100.0
100.0
100.0
2.0
5.0
18.0
• A GPU is an additional computational device for a computer, in addition to
CPU to perform faster computations.
• GPUs are designed to perform thousands of massively parallel
computations and are traditionally used for large scale matrix operations.
• The serial code needs to be parallelized using the GPU specific languages
like
CUDA(NVIDIA’s GPU API) used in this study
OPENCL(platform neutral).
GPUs Introduction
16
Steps in Programming Using a GPU
17
Step CPU task GPU related task
1 Declare pointers to host(which is another name for CPU) data Declare pointers to device(another
name for GPU) data
2 Allocate host pointers with Malloc Allocate device pointers with
CUDAMalloc
3 Populate input data pointers of host. Copy data from host to device
using CUDAMemcpy(with parameter HostToDevice).
4 Specify number of blocks and number of threads (kernel configuration).
5 Specify the kernel code (code to be run on device for each thread) and
make a call to kernel code.
6 Perform processing on device
7 Copy the result data from device to host using CUDAMemcpy (with
parameter DeviceToHost).
8 If required post process the result on host and present the results to
user.
Flow Chart Start
• Allocate host and device pointers.
• Populate the Population data and other host data.
• Copy data from host to device using CUDAMemcpy(with argument HostToDevice).
• Launch as many threads on Device as there are individuals in populations and number of
blocks equal to by number of zones.
• Call kernel function with parameters such as pointers to matrices of initial population,
pointers to output data, genetic parameters (like number if points of cross over), stiffness
information of plies and their orientations.
• Each thread is meant to compute fitness of one typical solution of stacking sequence and
performs Genetic algorithm iterations.
Fitness computation:
Penalized objective
function is computed
based on constraint
violations.
Selection
Cross Over
Mutation
Converged
?
• Copy computed results
from Device to Host,
using CUDAMemcpy
(with argument
DeviceToHost)
• Write solution to an
output file.
Stop
Yes
No
Routines that run on GPU
Positions of calls to
__syncthreads ().
18
Problem Number of
Zones
CPU Alone
(Seconds)
CPU + GPU
(Seconds)
Performance benefit
Composite Skin Problem 1 5 2.25 0.5 4.5 times
Composite Skin Problem 2 6 3.2 0.52 6.15 times
Composite Skin Problem 3 40 300 37 8.1 times
CPU Used GPGPU Used
Intel Xeon Processor with 2GB RAM, 2.53GHz
clock rate
1.3 NVIDIA Tesla T10 Processor with 4GB Global
Memory 30 Multi-processors 240 Cores, 16K Shared
Memory per block, 32 Size warps, 512 thread per block,
1.3GHz clock rate
Comparison of Execution time with CPU alone and with GPU
Programming.
20
Observations :
1. Speed-up of up to 8 times were observed if GPU computation is used.
2. As the size of problem increases the performance benefit is higher because the full power of GPU is
utilized.
This composite skin as shown contains four zones with the number of plies, initial thickness law and initial stacking sequence as shown. Each of the ply thickness is considered as 0.125mm which is same for all the composite skins analyzed in the current work.
Composite Skin Problem1
150mm
300mm
2
3
4
1
Zone
Number
Number of
Plies
Initial Stacking Sequence
(un-optimized)
1 40 (010/4510/9010/-4510)
2 30 (012/456/906/-456)
3 24 (08/456/904/-456)
4 10 (04/452/902/-452)
21
Generated Stacking Sequence
Zone Stacking Sequence
1 [-452/ 90/ 45/ 0/ 45/ 0/ 90/ 0/ 45/-45/ 452/-45/ 0/ 90/ 0/
902/-45]s
2 [45/ 90/ 45/-45/ 45/-45/ 0/ 90/ 02/ 02/ 90/ 0/-45]s
3 [90/ 45/ 90/-45/ 02/ 45/-45/ 0/ 45/ 0/-45]s
4 [-45/ 45/ 90/ 02]s
Initial Stacking
Best
Solution
Zone 1 Stacking Sequence Rule
violated
1 [-452/ 90/ 45/ 0/ 45/ 0/ 90/ 0/ 45/-45/
452/-45/ 0/ 90/ 0/ 902/-45]s
None
2 [903/-45/ 0/ 45/ 02/ 45/ 0/ 90/ 45/ 0/
45/-45/ 90/-45/ 45/-452]s
4
3 [90/ 45/ 90/ 45/-45/0/ 45/0/ 45/0/ 90/
45/-45/90/02/90/-453]s
3
CPU Alone
(Seconds)
CPU+GPU
(Seconds)
Performance
benefit
2.25 0.5 4.5 times
Performance benefit
500mm 75mm
1700mm
2 3 4 5 6
1
1
Zone
Number
Number
of Plies
Thickness Law & Initial
Stacking Sequence (un-
optimized)
1 40 (010/4510/9010/-4510)
2 30 (012/456/906/-456)
3 24 (08/456/904/-456)
4 20 (08/454/904/-454)
5 16 (06/454/902/-454)
6 8 (02/452/902/-452)
Composite Skin Problem2
This tapered composite skin has 6 zones with 2 zones having the same thickness. The geometry and thickness law for this composite skin are shown.
22
Generated Stacking Sequence
Zone Stacking Sequence
1 [-452/ 90/ 45/ 0/ 45/ 0/ 90/ 0/ 45/-45/ 452/-
45/ 0/ 90/ 0/ 902/-45]s
2 [45/ 90/ 45/-45/ 45/-45/ 0/ 90/ 02/ 02/ 90/ 0/-
45]s
3 [90/ 45/ 90/-45/ 02/ 45/-45/ 0/ 45/ 0/-45]s
4 [90/-45/ 45/-45/ 0/ 90/ 0/ 45/ 02]s
5 [45/ 90/-45/ 45/ 02/-45/ 0]s
6 [-45/ 45/ 90/ 0]s
Initial Stacking
CPU Alone
(Seconds)
CPU+GPU
(Seconds)
Performance
benefit
3.2 0.52 6.15 times
Performance benefit
Zone
Number
Number of
Plies
Thickness Law & Initial Stacking
Sequence (un-optimized)
1 200 (050/4550/9050/-4550)
2 190 (076/4538/9038/-4538)
3 176 (062/4544/9026/-4544)
4 166 (084/4524/9034/-4524)
5 216 (054/4554/9054/-4554)
6 202 (082/4540/9040/-4540)
7 188 (066/4546/9030/-4546)
8 184 (092/4528/9036/-4528)
9 216 (054/4554/9054/-4554)
10 200 (080/4540/9040/-4540)
11 194 (068/4548/9030/-4548)
12 178 (090/4526/9036/-4526)
13 160 (040/4540/9040/-4540)
14 182 (074/4536/9036/-4536)
15 192 (078/4538/9038/-4538)
16 196 (044/4544/9044/-4544)
17 168 (042/4542/9042/-4542)
18 144 (036/4536/9036/-4536)
19 136 (068/4520/9028/-4520)
20 142 (070/4520/9028/-4524)
21 96 (024/4524/9024/-4524)
22 96 (048/4514/9020/-4514)
22 104 (052/4516/9020/-4516)
23 102 (042/4520/9020/-4520)
24 104 (052/4516/9020/-4516)
25 100 (040/4520/9020/-4520)
26 80 (020/4520/9020/-4520)
27 90 (046/4512/9020/-4512)
28 90 (036/4518/9018/-4518)
29 96 (048/4514/9020/-4514)
30 82 (034/4516/9016/-4516)
31 16 (04/454/904/-454)
32 28 (014/454/906/-454)
33 30 (012/456/906/-456)
34 36 (018/454/908/-456)
35 32 (014/456/906/-456)
36 14 (06/452/904/-452)
37 10 (04/452/902/-452)
38 12 (06/452/902/-452)
39 16 (08/452/904/-452)
40 8 (02/452/902/-452)
Composite Skin Problem3
• 40 zones skin
23
Composite Skin Problem 3-Results
Zone Stacking Sequence
1 [45/ 90/-452/ 0/ 90/ 04/ 90/ 45/ 0/ 45/ 0/ 45/ 0/ 90/ 0/ 90/ 45/-45/ 0/ 90/ 45/ 0/ 90/ 45/-45/ 0/ 45/ 0/ 45/ 0/ 45/
0/ 45/-45/ 0/ 90/ 45/ 0/ 90/ 45/-45/ 45/ 0/ 90/ 0/ 45/-45/ 0/ 90/ 45/ 0/ 45/ 0/ 45/-45/ 45/ 0/ 90/ 0/ 90/ 45/-45/
45/ 45/-45/ 0/ 45/-45/ 45/-45/ 45/-45/ 90/-452/ 904/-45/-45/ 90/-453/ 90/-453/ 902/-45/ 90/-45/ 902]s
2 [902/-45/ 452/0/ 45/-45/ 45/-45/ 45/-45/ 0/ 90/ 0/ 90/ 0/ 90/ 45/-45/ 0/ 45/-45/ 0/ 90/ 02/ 90/ 45/ 0/ 45/-45/ 45/-
45/ 454/ 0/ 45/-45/ 45/ 0/ 90/ 02/ 45/-45/ 45/-45/ 0/-452/ 0/ 90/ 02/ 90/-45/-45/ 02/ 90/ 04/ 90/ 04/ 90/ 0/-45/ 0/
90/ 0/ 90/ 0/ 90/ 0/ 90/ 04/ 90/ 0/-453/ 90]s
3 [90/-45/ 90/ 90/-45/ 0/ 45/ 0/ 90/ 45/-45/ 452/-45/ 0/ 90/ 0/ 90/ 45/-45/ 0/ 45/-45/ 0/ 45/-45/ 0/ 45/ 0/ 45/-45/
0/ 45/ 0/ 90/ 45/ 0/ 90/ 45/-45/ 0/ 90/ 45/-45/ 452/ 0/ 90/ 45/ 0/ 45/-45/ 45/ 0/ 90/ 0/ 45/-45/ 453/-45/ 90/ 03/-
45/ 0/ 90/ 03/ 0/-45/ 0/-45/ 03/-45/ 0/-453/ 0/-452/ 02]s
6 [-452/ 45/ 0/ 90/ 45/-45/ 0/ 902/-45/ 454/ 0/ 45/ 0/ 45/ 0/ 45/ 0/ 90/ 45/-45/ 0/ 45/ 0/ 45/-45/ 0/ 45/ 0/ 90/ 0/
90/ 45/ 0/ 45/-45/ 453/-45/ 45/-45/ 45/-45/ 90/ 0/ 90/-45/ 0/ 90/ 0/ 90/ 0/-452/ 03/ 90/ 0/ 90/ 0/ 90/ 0/-45/ 0/
90/-45/ 0/ 90/ 0/ 90/ 03/ 90/ 0/ 90/ 0/-45/ 0/ 90/ 02/ 90/-45/ 02/-452/ 04/-45/ 02]s
The final stacking sequence obtained for few zones are shown in below Table.
24
CPU Alone
(Seconds)
CPU+GPU
(Seconds)
Performance benefit
300 37 8.1 times
Performance benefit
Conclusions • A genetic algorithm based stacking sequence generation approach has been presented which can be
used to solve large scale composite skin generation problems in commercial aircraft industry.
• The approach is scalable and has been successfully demonstrated to solve the large scale stacking
sequence generation problems. Three important composite skin stacking sequence generation
problems have been solved using the current approach. All the stacking sequence rules are satisfied in
the final results.
• Results demonstrate that use of GPGPU results in “speed-up of up to 8 times”(in stacking sequence
generation domain) compared to computation using only CPU.
• Further investigation needs to be done on how the inter zonal harmonization can be brought into the
genetic algorithm based generation framework.
• The ply materials can be more than one and the orientations can be more than four, which when
formulated in to model will increase complexity.
Future work
25
THANK YOU
www.infosys.com
The contents of this document are proprietary and confidential to Infosys Limited and may not be disclosed in
whole or in part at any time, to any third party without the prior written consent of Infosys Limited.
© 2011 Infosys Limited. All rights reserved. Copyright in the whole and any part of this document belongs to
Infosys Limited. This work may not be used, sold, transferred, adapted, abridged, copied or reproduced in
whole or in part, in any manner or form, or in any media, without the prior written consent of Infosys Limited.