software and hardware implementation of cellular automata for structural analysis and design

Software and Hardware Implementation of Cellular Automata for Structural

Analysis and Design

Zafer Gürdal* & Mark T. Jones**

Virginia Tech

* Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics** The Bradley Department of Electrical and Computer Engineering

06/17/03 National Institute of Aerospace, Hampton VA

Support

NASA LaRC, NRA 98, Innovative Algorithms for Aerospace Engineering Analysis and Optimization, PM: Jarek Sobieski

NASA LaRC, Mechanics and Durability Branch, PM: Damodar Ambur

Virginia Tech, ASPIRES Program

CA Software Hardware Implementation

June 17, 2003

Outline

Introduction Evolutionary Design Elements of Cellular Automata

CA applied to Engineering Design Truss Domain Composite Laminate Design

Hardware Implementation Configurable Computing – FPGAs CA Implementation Results

Multigrid Acceleration


June 17, 2003

Evolutionary DesignMimic natural evolution of biological systems for

structural design Evolutionary design often relies on local

optimality/decision making of independent parts

Examples: Reaction wood

Bone growth

Cellular Automata: Decomposition of a seemingly complex macro behavior into basic small local problems


June 17, 2003

Evolutionary Design of Structures

Evolutionary Design

Genetic Algorithms

Species

ESO,MMD,CA

Individual Designs

Cellular Automata

Local Evolution of Analysis

and Design

ESO, MMD

Local Rules for Design, Global Analysis


June 17, 2003

Cellular Automata

Weiner (1946), Ulam (1952), von Neumann (1966)– Automata Networks– Cell Dynamic Scheme

Idealizations of complex natural systems– Flock behavior– Diffusion of gaseous systems– Solidification and crystal growth– Hydrodynamic flow and turbulence

General characteristics– Locality – Vast Parallelism– Simplicity


June 17, 2003

Elements of Cellular Automata

Cell DefinitionsLattice ConfigurationsNeighborhoodsBoundariesUpdate rules Iteration Schemes


June 17, 2003

Elements of Cellular Automata

Definition for state of a cell and update rule

T

c

time step

cell ID

Two-dimensional Lattice Configurations

Rectangular Triangular Hexagonal

( 1) ( ) ( )[ , ]t t tC C N

Neighborhood cells

Center cell


June 17, 2003

Neighborhood Definition

Rectangular Neighborhoods

von Neumann Moore MvonN

N

S

EW

N

S

EW

SE

NENW

SW

N

S

EW

SE

NENW

SW

EE

SS

WW

NN

Boundaries Periodic Location Specific


June 17, 2003

Update Rule – 2D Truss Domain Analysis

( 1) ( ) ( )[ , ]t t tC C N u u u

}},{},,{},,{{φ )(yx

tC ffVariablesSizingMaterialvu

Ground Structure

u

C v

C

C

N

S

E

NW

NE

SW

SEuSE

vSE

W

Single Cell

vu

0)/(8

1

x

kkkCkkCkkk flsvvcuucEA

0)/(8

1

y

kkkCkkCkkk flsvvcuusEA

Displacement Update:


June 17, 2003

Sample Truss Analysis ResultsUndeformed CA Analysis FEM Analysis

• Linear Analysis

• Nonlinear Analysis

Appliedforce ordisplacement


June 17, 2003

Linear vs. Nonlinear Analysis

500 1000 1500 2000 2500 3000

0.05

0.1

0.15

0.2

0.25

5 10 15 20 25 30

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Linear analysis

Nonlinear analysis

# of iterations

total reaction

1641 iterations

2985 iterations


June 17, 2003

Sizing/Design Rules

kEk

}},{},,,...,{},,{{φ 81)(

yxtC ffEAAvu

Local Optimization Formulation Sequential Move and Size

Fully Stressed Design

Geometry & Basic Ground Structure CDF = 1

75 kN

100 kN

40 m

60 m

Dense Truss Solution (CDF = 40)

all

tkt

ktk AA

)(

)()1(


June 17, 2003

Design of Fiber Reinforced Panels

Minimum Compliance Design

nuf

.min

where (x,y): fiber angle distribution

3*

3

3

0

0 and 1

0 and 1 0

if

if

if

]2

[2

1

yx

xyArcTan

)(

2

1 ArcCosPrincipal Strain Direction

Minimum Strain Energy Density (Pedersen 1990)

x,y

X

Y

x

y


June 17, 2003

Panel with a Circular Hole in Shear

Optimality Criteria (OC) Design

20 KN/m

0.5

Quarter Panel Model


June 17, 2003


Pattern Matching + OC Design

Pattern Matching + Discrete Design


June 17, 2003


Topology + Orientation Design

Topology + Discrete Fiber Orientation


June 17, 2003

Domain Modeled === Hardware Domain

Current parallel architectures are limited Specialized CA machines mimicking CA domains

Hardware Integration


June 17, 2003

Configurable Computing and Field Programmable Gate Arrays

(FPGAs)


June 17, 2003

Definitions and Potential

Configurable computers are a relatively new class of computer architecture in which hardware circuits are (re-)configured for a specific algorithm

Offer “ASIC-like” speeds without the cost of designing and fabricating a chip– ASIC cost can run into many millions

– General-purpose CPUs are slow

Configurable computers are often built using FPGAs because of their widespread availability (>>$1B market)


June 17, 2003

Field Programmable Gate Array (FPGA) Layout

An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip

Each CLB contains registers and LUTs, where each LUT can implement a 4-input logic operation

By programming the CLBs and interconnections large circuits can be represented in the FPGA

One Xilinx XC2V4000 FPGA can represent a circuit up to 1M gates


June 17, 2003

DINI DN3000k10 Board

DINI DN3000k10 is an FPGA based PCI card

Contains five Xilinx XCV4000 FPGAs connected by a 226 bit wide bus

One of the FPGAs has a separate connection for communicating to a PC via the PCI bus

FPGAs can be configured through the PCI bus or configurations can be stored on board


June 17, 2003

Algorithms for FPGAs

Target FPGA strengths: parallel, pipelined, customized– Goal is to have every part of the chip actively computing

at the highest possible clock speed

Do: re-think the algorithm to– Expose the natural parallelism– Pipeline time-consuming operations– Examine the precision that is really necessary

Do not: Implement algorithms as you would in software on a traditional computer


June 17, 2003

Multiplier Options

Usa

ge (

% C

LB

s)*

*Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs

0

20

40

60

80

100

120

140

160

4 8 16 32

Precision (Bits)

Max

Fre

q. (

MH

z)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Array Multiplier (Speed) Array Multiplier (Usage)


June 17, 2003

Application Performance HokieGene – Genome Matching Project (2003)

– Matching engine executes on one FPGA (XC2V1000)– Performs 200 billion cell updates per second– 1,200 billion operations per second (1.2 TOPS)

BYU - Network Intrusion Detection Systems (2002)– Hardware implementation uses one FPGA (XC2V1000)– Outperformed software version running on P3 – 750MHz:

• Up to 400 times more throughput than software version• Up to 1000 times less latency than software version

Xilinx – High Performance DES Encryption (2000)– Implemented on one small FPGA (XCV150)– Maximum throughput 10.75 GB/sec– Outperformed best ASIC implementation

University of Texas at Austin – Target Recognition System (2000)– System built using one FPGA (ORCA 40k) and Myrinet interfacing – Capable of processing 900 templates per second– 2,800 billion operations per second (2.8 TOPS)


June 17, 2003

Iterative Methods for Linear Systems

Consider Jacobi’s method– D xi+1 = (D-A) xi + b– In software, we would select either single or

double precision floating pointOn a configurable computer we can select

any format in which to store/compute value– Choose the desired precision of the solution– Reconstruct the method for fast computation


June 17, 2003

Iterative Methods Continued

Re-cast as iterative improvement scheme• ri = b - A xi Compute in n bits

xi = A–1 ri Compute in k bits

• xi+1 = xi + xi = A–1 ri Compute in n bits

Use Jacobi to solve for xi in compact, fast k-bit hardware (cost ~ bits2)

Thm: Convergence rate is independent of k

Thm: Optimal choice of k ~ n/(# iterations)1/3


June 17, 2003

Convergence

Solution Error vs. Number of Iterations

K= 3,6,9 decimal digits

No difference in convergence rate


June 17, 2003

Performance Advantage

Execution Cost (number of bit operations) vs. the size of the matrix

Compares cost of normal vs. modified algorithm

Convergence for each algorithm is identical


June 17, 2003

Euler Beam Formulation

x

y F

h h

wL ,θL wC ,θC wR ,θR

FC

Control Volume

MC

FR

MR

FL

ML

C C g CK u f f

R

RR

L

LLg h

w

h

EI

h

w

h

EI

26

612

26

6123

*

3

*

f

* * * *

3 * * * *

12 61

6 4

L R L R

L R L R

EI EI EI EI

h EI EI EI EI

CK

, C CC C

C C

w Fwhere

h M

u f

d(x)

Cell Neighborhood

Cell Equilibrium


June 17, 2003

Cellular Automata ModelMultiple Cells per Processing Element


June 17, 2003

Beam Design

residual

g C C Cr f f K u

1 Ce K r

euu CC kk 1

error

correction

Equilibrium Update

Design Update1

2 4Md α

Eγ

Converged

Design Update

Converged

End

Equilibrium Update

NO

NO

YES

YES


June 17, 2003

Algorithm Strategy

The limited precision algorithm illustrated for Jacobi’s method earlier is applied to CA– Much smaller, faster circuits for applying CA rule

updates in k-bit operations– Built-in 18x18 multipliers compute residual

Built-in high-speed memories provide– Storage for intermediate and permanent quantities– Many customizable word-lengths– Extremely high memory bandwidth


June 17, 2003

Processing Element


June 17, 2003

FPGA Performance

0

200

400

600

800

1000

1200

1400

1600

9 10 11 12 13 14 15 16 17 18 19 20

Number of Processing Elements

Cell

Up

da

tes

Per S

econ

d (

Mil

lion

)

t

8 BitModel

16 BitModelC

ell U

pd

ates

Per

Sec

ond

(M

illi

ons)


June 17, 2003

CA Performance

0.E+00

2.E+06

4.E+06

6.E+06

8.E+06

1.E+07

1.E+07

1.E+07

2.E+07

2.E+07

2.E+07

10 15 20 25 30 35 40 45

Number of cells

Tot

al n

umbe

r of

cel

l upd

ates


June 17, 2003

Multigrid Acceleration

x

y F

lattice 8hlattice 4hlattice 2hlattice h

E : Equilibrium update to convergence

: Equilibrium updated α timesS

S

S

S

E

S

S

S

h

2h

4h

8h

latticeV - cycle

S

S

S

E

S

E

S

S S

E

S

E

S

S

S

h

2h

4h

8h

lattice W - cycle

: Restriction (on r)

: Prolongation (on e)


June 17, 2003

2 21

2 2 21

3 1 3 1

4 8 4 82 2

h hh i ii h h

i i

w wh

h h

2 21

2 2 21

1 1 1 1

2 8 2 82 2

h hh i ii h h

i i

w ww

h h

Prolongation

2

2 1 21 0

2

hh ii h

i

ww

h

2

2 1 2

10

2 2

hh ii h

i

wh

h

2 2 2 2,h hi iw h 2 1 2 1,h h

i iw h 2 2,h hi iw h

2 2,h hi iw h 2 2

1 1,h hi iw h

lattice 2h

lattice h2 1 2 1,h hi iw h 2 2 2 2,h h

i iw h


June 17, 2003

2

1/ 2 1/ 8

3/ 4 1/ 8

1 0

0 1/ 2

1/ 2 1/ 8

3/ 4 1/ 8

hh

I

Prolongation/Restriction

2 2 2 2,h hi iw h 2 1 2 1,h h

i iw h 2 2,h hi iw h

2 2,h hi iw h

lattice 2h

lattice h

22

h h hh e I e

Correction Prolongation

2 2h h hh r I r

Residual Restriction 2

2h h

h hT

I I where

Prolongation Operator


June 17, 2003

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

x /L

A/Ao

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

x /L

A/Ao

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

x /L

A/Ao

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

x /L

A/Ao

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0. 8 0. 9 10

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

x /L

A/Ao

Design with 5 Cells:



~Design with 257 Cells:

~

~


Nested Iteration for MG accelerated CA

d(x)


June 17, 2003

CA Design Performance with Full MG

100

101

102

103

104

105

106

107

108

1 10 100 1000Number of Cells

Tot

al n

um

ber

of

cell

up

dat

es


June 17, 2003

Concluding Remarks

Summary– CA paradigm has been demonstrated for various

structural systems– CA paradigm matches well with Configurable

Computing acceleration– Full Multigrid acceleration for CA improves design

convergence Future Work

– Expand the design capabilities in terms of structural details and the types of field problems that can be solved

– Tools that will enable engineers to effortlessly use configurable computers for CA applications

– Continue to investigate algorithms to improve CA performance

software and hardware implementation of cellular automata for structural analysis and design

Documents

aerospace engineering

cellular automataweiner

structural analysis

dense truss solution

circular hole

engineering science

local optimalitydecision

complex macro behavior