software and hardware implementation of cellular automata for structural analysis and design
DESCRIPTION
Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design. Zafer Gürdal * & Mark T. Jones ** Virginia Tech *Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics **The Bradley Department of Electrical and Computer Engineering - PowerPoint PPT PresentationTRANSCRIPT
Software and Hardware Implementation of Cellular Automata for Structural
Analysis and Design
Zafer Gürdal* & Mark T. Jones**
Virginia Tech
* Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics** The Bradley Department of Electrical and Computer Engineering
06/17/03 National Institute of Aerospace, Hampton VA
Support
NASA LaRC, NRA 98, Innovative Algorithms for Aerospace Engineering Analysis and Optimization, PM: Jarek Sobieski
NASA LaRC, Mechanics and Durability Branch, PM: Damodar Ambur
Virginia Tech, ASPIRES Program
CA Software Hardware Implementation
June 17, 2003
Outline
Introduction Evolutionary Design Elements of Cellular Automata
CA applied to Engineering Design Truss Domain Composite Laminate Design
Hardware Implementation Configurable Computing – FPGAs CA Implementation Results
Multigrid Acceleration
CA Software Hardware Implementation
June 17, 2003
Evolutionary DesignMimic natural evolution of biological systems for
structural design Evolutionary design often relies on local
optimality/decision making of independent parts
Examples: Reaction wood
Bone growth
Cellular Automata: Decomposition of a seemingly complex macro behavior into basic small local problems
CA Software Hardware Implementation
June 17, 2003
Evolutionary Design of Structures
Evolutionary Design
Genetic Algorithms
Species
ESO,MMD,CA
Individual Designs
Cellular Automata
Local Evolution of Analysis
and Design
ESO, MMD
Local Rules for Design, Global Analysis
CA Software Hardware Implementation
June 17, 2003
Cellular Automata
Weiner (1946), Ulam (1952), von Neumann (1966)– Automata Networks– Cell Dynamic Scheme
Idealizations of complex natural systems– Flock behavior– Diffusion of gaseous systems– Solidification and crystal growth– Hydrodynamic flow and turbulence
General characteristics– Locality – Vast Parallelism– Simplicity
CA Software Hardware Implementation
June 17, 2003
Elements of Cellular Automata
Cell DefinitionsLattice ConfigurationsNeighborhoodsBoundariesUpdate rules Iteration Schemes
CA Software Hardware Implementation
June 17, 2003
Elements of Cellular Automata
Definition for state of a cell and update rule
T
c
time step
cell ID
Two-dimensional Lattice Configurations
Rectangular Triangular Hexagonal
( 1) ( ) ( )[ , ]t t tC C N
Neighborhood cells
Center cell
CA Software Hardware Implementation
June 17, 2003
Neighborhood Definition
Rectangular Neighborhoods
von Neumann Moore MvonN
N
S
EW
N
S
EW
SE
NENW
SW
N
S
EW
SE
NENW
SW
EE
SS
WW
NN
Boundaries Periodic Location Specific
CA Software Hardware Implementation
June 17, 2003
Update Rule – 2D Truss Domain Analysis
( 1) ( ) ( )[ , ]t t tC C N u u u
}},{},,{},,{{φ )(yx
tC ffVariablesSizingMaterialvu
Ground Structure
u
C v
C
C
N
S
E
NW
NE
SW
SEuSE
vSE
W
Single Cell
vu
0)/(8
1
x
kkkCkkCkkk flsvvcuucEA
0)/(8
1
y
kkkCkkCkkk flsvvcuusEA
Displacement Update:
CA Software Hardware Implementation
June 17, 2003
Sample Truss Analysis ResultsUndeformed CA Analysis FEM Analysis
• Linear Analysis
• Nonlinear Analysis
Appliedforce ordisplacement
CA Software Hardware Implementation
June 17, 2003
Linear vs. Nonlinear Analysis
500 1000 1500 2000 2500 3000
0.05
0.1
0.15
0.2
0.25
5 10 15 20 25 30
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Linear analysis
Nonlinear analysis
# of iterations
total reaction
1641 iterations
2985 iterations
CA Software Hardware Implementation
June 17, 2003
Sizing/Design Rules
kEk
}},{},,,...,{},,{{φ 81)(
yxtC ffEAAvu
Local Optimization Formulation Sequential Move and Size
Fully Stressed Design
Geometry & Basic Ground Structure CDF = 1
75 kN
100 kN
40 m
60 m
Dense Truss Solution (CDF = 40)
all
tkt
ktk AA
)(
)()1(
CA Software Hardware Implementation
June 17, 2003
Design of Fiber Reinforced Panels
Minimum Compliance Design
nuf
.min
where (x,y): fiber angle distribution
3*
3
3
0
0 and 1
0 and 1 0
if
if
if
]2
[2
1
yx
xyArcTan
)(
2
1 ArcCosPrincipal Strain Direction
Minimum Strain Energy Density (Pedersen 1990)
x,y
X
Y
x
y
CA Software Hardware Implementation
June 17, 2003
Panel with a Circular Hole in Shear
Optimality Criteria (OC) Design
20 KN/m
0.5
Quarter Panel Model
CA Software Hardware Implementation
June 17, 2003
Panel with a Circular Hole in Shear
Pattern Matching + OC Design
Pattern Matching + Discrete Design
CA Software Hardware Implementation
June 17, 2003
Panel with a Circular Hole in Shear
Topology + Orientation Design
Topology + Discrete Fiber Orientation
CA Software Hardware Implementation
June 17, 2003
Domain Modeled === Hardware Domain
Current parallel architectures are limited Specialized CA machines mimicking CA domains
Hardware Integration
CA Software Hardware Implementation
June 17, 2003
Configurable Computing and Field Programmable Gate Arrays
(FPGAs)
CA Software Hardware Implementation
June 17, 2003
Definitions and Potential
Configurable computers are a relatively new class of computer architecture in which hardware circuits are (re-)configured for a specific algorithm
Offer “ASIC-like” speeds without the cost of designing and fabricating a chip– ASIC cost can run into many millions
– General-purpose CPUs are slow
Configurable computers are often built using FPGAs because of their widespread availability (>>$1B market)
CA Software Hardware Implementation
June 17, 2003
Field Programmable Gate Array (FPGA) Layout
An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip
Each CLB contains registers and LUTs, where each LUT can implement a 4-input logic operation
By programming the CLBs and interconnections large circuits can be represented in the FPGA
One Xilinx XC2V4000 FPGA can represent a circuit up to 1M gates
CA Software Hardware Implementation
June 17, 2003
DINI DN3000k10 Board
DINI DN3000k10 is an FPGA based PCI card
Contains five Xilinx XCV4000 FPGAs connected by a 226 bit wide bus
One of the FPGAs has a separate connection for communicating to a PC via the PCI bus
FPGAs can be configured through the PCI bus or configurations can be stored on board
CA Software Hardware Implementation
June 17, 2003
Algorithms for FPGAs
Target FPGA strengths: parallel, pipelined, customized– Goal is to have every part of the chip actively computing
at the highest possible clock speed
Do: re-think the algorithm to– Expose the natural parallelism– Pipeline time-consuming operations– Examine the precision that is really necessary
Do not: Implement algorithms as you would in software on a traditional computer
CA Software Hardware Implementation
June 17, 2003
Multiplier Options
Usa
ge (
% C
LB
s)*
*Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs
0
20
40
60
80
100
120
140
160
4 8 16 32
Precision (Bits)
Max
Fre
q. (
MH
z)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Array Multiplier (Speed) Array Multiplier (Usage)
CA Software Hardware Implementation
June 17, 2003
Application Performance HokieGene – Genome Matching Project (2003)
– Matching engine executes on one FPGA (XC2V1000)– Performs 200 billion cell updates per second– 1,200 billion operations per second (1.2 TOPS)
BYU - Network Intrusion Detection Systems (2002)– Hardware implementation uses one FPGA (XC2V1000)– Outperformed software version running on P3 – 750MHz:
• Up to 400 times more throughput than software version• Up to 1000 times less latency than software version
Xilinx – High Performance DES Encryption (2000)– Implemented on one small FPGA (XCV150)– Maximum throughput 10.75 GB/sec– Outperformed best ASIC implementation
University of Texas at Austin – Target Recognition System (2000)– System built using one FPGA (ORCA 40k) and Myrinet interfacing – Capable of processing 900 templates per second– 2,800 billion operations per second (2.8 TOPS)
CA Software Hardware Implementation
June 17, 2003
Iterative Methods for Linear Systems
Consider Jacobi’s method– D xi+1 = (D-A) xi + b– In software, we would select either single or
double precision floating pointOn a configurable computer we can select
any format in which to store/compute value– Choose the desired precision of the solution– Reconstruct the method for fast computation
CA Software Hardware Implementation
June 17, 2003
Iterative Methods Continued
Re-cast as iterative improvement scheme• ri = b - A xi Compute in n bits
xi = A–1 ri Compute in k bits
• xi+1 = xi + xi = A–1 ri Compute in n bits
Use Jacobi to solve for xi in compact, fast k-bit hardware (cost ~ bits2)
Thm: Convergence rate is independent of k
Thm: Optimal choice of k ~ n/(# iterations)1/3
CA Software Hardware Implementation
June 17, 2003
Convergence
Solution Error vs. Number of Iterations
K= 3,6,9 decimal digits
No difference in convergence rate
CA Software Hardware Implementation
June 17, 2003
Performance Advantage
Execution Cost (number of bit operations) vs. the size of the matrix
Compares cost of normal vs. modified algorithm
Convergence for each algorithm is identical
CA Software Hardware Implementation
June 17, 2003
Euler Beam Formulation
x
y F
h h
wL ,θL wC ,θC wR ,θR
FC
Control Volume
MC
FR
MR
FL
ML
C C g CK u f f
R
RR
L
LLg h
w
h
EI
h
w
h
EI
26
612
26
6123
*
3
*
f
* * * *
3 * * * *
12 61
6 4
L R L R
L R L R
EI EI EI EI
h EI EI EI EI
CK
, C CC C
C C
w Fwhere
h M
u f
d(x)
Cell Neighborhood
Cell Equilibrium
CA Software Hardware Implementation
June 17, 2003
Cellular Automata ModelMultiple Cells per Processing Element
CA Software Hardware Implementation
June 17, 2003
Beam Design
residual
g C C Cr f f K u
1 Ce K r
euu CC kk 1
error
correction
Equilibrium Update
Design Update1
2 4Md α
Eγ
Converged
Design Update
Converged
End
Equilibrium Update
NO
NO
YES
YES
CA Software Hardware Implementation
June 17, 2003
Algorithm Strategy
The limited precision algorithm illustrated for Jacobi’s method earlier is applied to CA– Much smaller, faster circuits for applying CA rule
updates in k-bit operations– Built-in 18x18 multipliers compute residual
Built-in high-speed memories provide– Storage for intermediate and permanent quantities– Many customizable word-lengths– Extremely high memory bandwidth
CA Software Hardware Implementation
June 17, 2003
Processing Element
CA Software Hardware Implementation
June 17, 2003
FPGA Performance
0
200
400
600
800
1000
1200
1400
1600
9 10 11 12 13 14 15 16 17 18 19 20
Number of Processing Elements
Cell
Up
da
tes
Per S
econ
d (
Mil
lion
)
t
8 BitModel
16 BitModelC
ell U
pd
ates
Per
Sec
ond
(M
illi
ons)
CA Software Hardware Implementation
June 17, 2003
CA Performance
0.E+00
2.E+06
4.E+06
6.E+06
8.E+06
1.E+07
1.E+07
1.E+07
2.E+07
2.E+07
2.E+07
10 15 20 25 30 35 40 45
Number of cells
Tot
al n
umbe
r of
cel
l upd
ates
CA Software Hardware Implementation
June 17, 2003
Multigrid Acceleration
x
y F
lattice 8hlattice 4hlattice 2hlattice h
E : Equilibrium update to convergence
: Equilibrium updated α timesS
S
S
S
E
S
S
S
h
2h
4h
8h
latticeV - cycle
S
S
S
E
S
E
S
S S
E
S
E
S
S
S
h
2h
4h
8h
lattice W - cycle
: Restriction (on r)
: Prolongation (on e)
CA Software Hardware Implementation
June 17, 2003
2 21
2 2 21
3 1 3 1
4 8 4 82 2
h hh i ii h h
i i
w wh
h h
2 21
2 2 21
1 1 1 1
2 8 2 82 2
h hh i ii h h
i i
w ww
h h
Prolongation
2
2 1 21 0
2
hh ii h
i
ww
h
2
2 1 2
10
2 2
hh ii h
i
wh
h
2 2 2 2,h hi iw h 2 1 2 1,h h
i iw h 2 2,h hi iw h
2 2,h hi iw h 2 2
1 1,h hi iw h
lattice 2h
lattice h2 1 2 1,h hi iw h 2 2 2 2,h h
i iw h
CA Software Hardware Implementation
June 17, 2003
2
1/ 2 1/ 8
3/ 4 1/ 8
1 0
0 1/ 2
1/ 2 1/ 8
3/ 4 1/ 8
hh
I
Prolongation/Restriction
2 2 2 2,h hi iw h 2 1 2 1,h h
i iw h 2 2,h hi iw h
2 2,h hi iw h
lattice 2h
lattice h
22
h h hh e I e
Correction Prolongation
2 2h h hh r I r
Residual Restriction 2
2h h
h hT
I I where
Prolongation Operator
CA Software Hardware Implementation
June 17, 2003
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
x /L
A/Ao
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
x /L
A/Ao
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
x /L
A/Ao
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
x /L
A/Ao
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
x /L
A/Ao
Design with 5 Cells:
Design with 17 Cells:
Design with 65 Cells:
~Design with 257 Cells:
~
~
Design with 3 Cells:
Nested Iteration for MG accelerated CA
d(x)
CA Software Hardware Implementation
June 17, 2003
CA Design Performance with Full MG
100
101
102
103
104
105
106
107
108
1 10 100 1000Number of Cells
Tot
al n
um
ber
of
cell
up
dat
es
CA Software Hardware Implementation
June 17, 2003
Concluding Remarks
Summary– CA paradigm has been demonstrated for various
structural systems– CA paradigm matches well with Configurable
Computing acceleration– Full Multigrid acceleration for CA improves design
convergence Future Work
– Expand the design capabilities in terms of structural details and the types of field problems that can be solved
– Tools that will enable engineers to effortlessly use configurable computers for CA applications
– Continue to investigate algorithms to improve CA performance