efficient reinforced concrete design using...

Efficient Reinforced Concrete Design Using Modified Linear Elastic Finite Element

Analysis and its GPU Implementation

by

Xing TAN

BEng

This thesis is presented for the degree of

Doctor of Philosophy

of

The University of Western Australia

School of Civil and Resource Engineering

May 2012

Supervisor: Professor Andrew J. Deeks

i

Abstract

Although the strut and tie approach is a rational and reasonable approach for the design of

non-flexural members in concrete structures, the approach may lead to suboptimal design, as

much of the material present in the member is neglected. Other difficulties, such as the

amount of time consumed and the designer dependency of the solutions, have been

encountered in its implementation. To avoid these problems, design may be undertaken using

conventional linear elastic finite element analysis, which can yield more efficient designs with

less material usage. However, the conventional linear elastic finite element method is also

inefficient when the non-flexural members contain stress singularities, such as occur in a deep

beam with square or rectangular web openings. These stress singularities lead to singular

stress fields which always violate the yield criterion. This thesis proposes a modified linear

elastic finite element method which can successfully remove the stress singularities by

adjusting the elastic modulus in certain regions. This new approach involves stress

redistribution in terms of both compressive stress and tensile stress. Three different types of

beams, namely shallow beams, deep beams and deep beams with rectangular openings are

used to demonstrate its efficiency. Additionally, both the conventional strut-and-tie method

and the conventional LEFEA method are performed for comparison. Results show that the

modified linear finite element approach to design (MLEFEA) is efficient, as it can overcome

some of the inefficiencies involved in both conventional the strut-and-tie design approach and

the conventional linear elastic finite element design approach. Furthermore, to verify its safety,

the performance of the designs resulting from the new method is assessed through non-linear

finite element analysis using ABAQUS, where the results indicate that MLEFEA is safe and can

be used as a design approach.

In order to make the MLEFEA analysis more efficient in terms of computing time, this thesis

also describes the implementation of the method on Graphic Processor Units (or GPUs). GPUs

iii

are now being widely used in various scientific computational applications due to their

tremendous performance, memory bandwidth and their massively-parallel and high intensity

computational capacities. This thesis applies GPUs to the stress redistribution process arising

from the analysis of deep beams with rectangular openings. The basic process of stress

redistribution and the GPU architecture are first introduced, then several parallel techniques

for the iterative methods are reviewed. Finally the PCG method is chosen as the most suitable

approach for the current application. This is followed by an introduction to the CSR storage

format and the SpMV algorithm. The GPU-PCG method used for solving the equations systems

is then described, and the stiffness matrix assembly in CSR format is also presented.

Finally, the efficiency of the GPU implementation is demonstrated by providing speed

comparison results between the GPU-based and the CPU (sequential)-based algorithm for

stress redistribution for the example of a deep beam with web openings.

v

Acknowledgement

I owe my sincere and deepest gratitude to my supervisor Professor Andrew J. Deeks for his

generous help and guidance in this study. With great patience and enthusiasm, he always

provided invaluable discussion and high responsibility throughout my whole study. This thesis

would certainly not have been completed without him.

Additionally, I deeply appreciate his effort in giving me the unique opportunity to pursue my

PhD with him and in providing me the scholarship during my time in both the University of

Western Australia and Durham University in UK. What I learned from this remarkable

experience with him will definitely benefit me greatly along with my future life.

Finally, my unconditional love goes to my parents and my brothers, whose heartfelt

encouragement and never-ending support gave me strength all the way along. Without their

love and support, I could not do anything.

vii

Statement of Candidate Contribution

I certify that except where references are made in the text to the work of others, the contents

of this thesis are original and have not been submitted to any other university.

This thesis is the result of my own work.

Xing Tan

May 2012

ix

Table of Contents

Abstract ............................................................................................................................... i

Acknowledgement .............................................................................................................. v

Statement of Candidate Contribution ................................................................................ vii

Table of Contents ............................................................................................................... ix

List of Figures ................................................................................................................... xiii

List of Tables .................................................................................................................... xvii

List of Appendices ............................................................................................................. xix

1 Introduction ................................................................................................................ 1

1.1 Background and Motivation for This Work ................................................................ 1

1.2 Outline of This Thesis ................................................................................................. 2

PART I: EFFICIENT REINFORCED CONCRETE DESIGN METHOD ............................................... 4

2 Basic Theory and Literature Review ............................................................................. 5

2.1 Structural Design Methods ......................................................................................... 5

2.1.1 Working Stress Method ................................................................................ 6

2.1.2 Ultimate Load Method .................................................................................. 7

2.1.3 Limit State Method ....................................................................................... 8

2.2 Design of Reinforced Concrete Structures ............................................................... 10

2.2.1 Equivalent Stress Block Method ................................................................. 10

2.2.2 Strut and Tie Method .................................................................................. 13

2.2.3 Linear Elastic Finite Element Method ......................................................... 19

2.3 Stress Singularities .................................................................................................... 22

2.4 The Finite Element Method ...................................................................................... 25

2.5 Summary ................................................................................................................... 30

3 Comparison of Conventional Design Approaches with LEFEA-based Design ................. 32

3.1 Design of a Flexural Reinforced Concrete Beam ...................................................... 32

x

3.1.1 Application of Conventional Design Approach (Equivalent Stress Block Approach) .................................................................................................................. 33

3.1.2 Application of Conventional LEFEA Approach ............................................. 34

3.1.3 Cost Comparison and Remarks .................................................................... 39

3.2 Application to Design of Non-flexural Reinforced Concrete Beams without Rectangular Openings ......................................................................................................... 40

3.2.1 Application of Conventional Design Approach (STM) ................................. 40



3.3 Design of Non-flexural Reinforced Concrete Beams with Rectangular Openings .... 48

3.3.1 Application of Conventional Design Approach (STM) ................................. 49



3.4 Summary ................................................................................................................... 57

4 Modified Linear Elastic Finite Element Method ........................................................... 59

4.1 Stress Redistribution ................................................................................................. 59

4.1.1 Finite Element Implementation ................................................................... 61

4.1.2 Application to L-shaped Plate ...................................................................... 63

4.2 Summary ................................................................................................................... 70

5 Adaptive Stress Redistribution Approach ................................................................... 71

5.1 Adaptive Compressive Stress Redistribution Approach ........................................... 71

5.1.1 Application to Flexural Reinforced Concrete Beam .................................... 75

5.1.1.1 Cost Comparison and Remarks ..................................................... 79

5.1.2 Application to Non-flexural Reinforced Concrete Beams without Rectangular Openings ............................................................................................... 80

5.1.2.1 Cost Comparison and Remarks ..................................................... 83

5.1.3 Application to Non-flexural Reinforced Concrete Beams with Rectangular Openings ................................................................................................................... 84

xi

5.1.3.1 Cost Comparison and Remarks .................................................... 86

5.2 Adaptive Tensile Stress Redistribution Approach .................................................... 87

5.3 Adaptive Stress Redistribution Approach for both Compressive and Tensile Stress 91

5.3.1 Application to Flexural Reinforced Concrete Beam .................................... 92


5.3.1.2 Nonlinear verification .................................................................. 96

5.3.2 Application to Non-flexural Reinforced Concrete Beams without Rectangular Openings .............................................................................................. 97


5.3.2.2 Non-linear Verification ............................................................... 100

5.3.3 Application to Non-flexural Reinforced Concrete Beams with Rectangular Openings ................................................................................................................. 101

5.3.3.1 Cost Comparison and Remarks .................................................. 102

5.3.3.2 Nonlinear verification ................................................................ 103

5.4 Summary ................................................................................................................. 104

PART II: EFFICIENT Graphic Processing Unit (GPU) IMPLEMENTATION ............................... 106

6 Basic Theory & Literature Review ............................................................................. 107

6.1 Graphics Processing Unit (GPU) ............................................................................. 107

6.2 GPU Implementation of Finite Element Analysis ................................................... 113

6.2.1 Stiffness Matrix Assembly ......................................................................... 113

6.2.2 Stiffness Matrix Solving ............................................................................. 114

6.3 Summary ................................................................................................................. 118

7 Efficient GPU Implementation of the Modified LEFEA Approach ................................ 120

7.1 GPU Implementation of Preconditioned Conjugate Gradient Method (GPU-PCG) 120

7.2 GPU Implementation of Modified LEFEA Approach (GPU-MLEFEA) ...................... 123

7.3 Results Comparison (Speedup Results) .................................................................. 124

7.4 Summary ................................................................................................................. 125

xii

8 Conclusions ............................................................................................................. 126

References ...................................................................................................................... 129

Appendices ..................................................................................................................... 140

xiii

List of Figures

Figure 1: Conditions at Mu in a singly reinforced concrete section (Warner 2007) ................... 11

Figure 2: Equivalent Rectangular Stress Block (Warner 2007) ................................................... 11

Figure 3: Geometry of Deep Beam ............................................................................................. 15

Figure 4: Von Mises Stress Plot for Deep Beam (Linear Elastic Analysis) ................................... 15

Figure 5: Strut-and-Tie Model for Deep Beam............................................................................ 15

Figure 6: Geometry of the Deep Beam with Rectangular Openings ........................................... 23

Figure 7: Finite Element Model (Left) and Von Mises Stress (Right) .......................................... 24

Figure 8: Isoparametric Bilinear Quadrilateral Element in Local Coordinates............................ 26

Figure 9: Geometry of Shallow Beam ......................................................................................... 32

Figure 10: Model for the Shallow Beam ..................................................................................... 36

Figure 11: Plot of Principal Compressive Stress .......................................................................... 37

Figure 12: Trapezoidal Rule for the Integration .......................................................................... 38

Figure 13: Plot of Tensile Stresses across Mid-span of Beam ..................................................... 38

Figure 14: Geometry of Deep Beam ........................................................................................... 40

Figure 15: Strut and Tie Model for Deep Beam .......................................................................... 41

Figure 16: Model for the Deep Beam ......................................................................................... 43


Figure 18: Plot of Principal Tensile Stress ................................................................................... 45


Figure 20: Von Mises Stress for Deep Beam using LEFEA ........................................................... 47

Figure 21: Geometry of Deep Beam with Rectangular Openings ............................................... 49

Figure 22: Strut and Tie Model for Deep Beam with Rectangular Openings .............................. 49

Figure 23: Strut and Tie Model for Deep Beam with Rectangular Openings (Half Model) ........ 50

Figure 24: Force Equilibrium for the Applied Load ..................................................................... 50

Figure 25: STM model for Bottle Shaped Strut and Force Equilibrium (Warner 2007) .............. 51

Figure 26: Model for the Deep Beam with Rectangular Openings ............................................. 53


Figure 28: Plot of Principal Tensile Stress ................................................................................... 55


Figure 30: Linear Reduction in Elastic Modulus .......................................................................... 60

Figure 31: L-shape Plate .............................................................................................................. 63

Figure 32: Von Mises Stress of L-shaped Plate (Coarse Mesh) ................................................... 64

Figure 33: Von Mises Stress of L-shaped Plate (Finer Mesh) ...................................................... 65

Figure 34: Von Mises Stress over X Direction ............................................................................. 66

Figure 35: Von Mises Stress over Y Direction ............................................................................. 66

Figure 36: Procedure for adjusting Elastic Modulus ................................................................... 67

Figure 37: Relative Value for Elastic Modulus ............................................................................ 67

Figure 38: Von Mises Stress of L-shaped Plate after Stress Redistribution ................................ 68

Figure 39: Principal Compressive Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) . 68

Figure 40: Stress in X Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) ............... 69

Figure 41: Stress in Y Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) ............... 69

Figure 42: Principal Tensile Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) .......... 70

xiv

Figure 43: Flowchart of the Practical Implementation of Adaptive Compressive Stress

Redistribution .............................................................................................................................. 72

Figure 44: Principal Compressive Stress for Shallow Beam---MLEFEA ........................................ 76

Figure 45: Difference of Principal Compressive Stress for Shallow Beam--- (LEFEA minus

MLEFEA) ....................................................................................................................................... 77

Figure 46: Stresses across Mid-span after Compressive Stress Redistribution ........................... 77

Figure 47: Relative Value of Elastic Modulus across Mid-span after Compressive Stress

Redistribution .............................................................................................................................. 78

Figure 48: Plots of Principal Compressive Stress using LEFEA with Adaptive Compressive Stress

Redistribution .............................................................................................................................. 81

Figure 49: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress

Redistribution .............................................................................................................................. 81

Figure 50: Plot of Stresses at Mid-span ....................................................................................... 82

Figure 51: Stresses across Mid-span after Compressive Stress Redistribution (First and Last

Iteration) ...................................................................................................................................... 83

Figure 52: Plots of Principal Compressive Stress using adaptive compressive stress

redistribution approach ............................................................................................................... 85

Figure 53: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress

Redistribution .............................................................................................................................. 85

Figure 54: Plot of Stresses at Mid-span ....................................................................................... 86

Figure 55: Flowchart of the Practical Implementation for Tensile Stress Redistribution ........... 89

Figure 56: Flowchart of the Practical Implementation for both Compressive and Tensile Stress

Redistributions ............................................................................................................................ 91

Figure 57: Stresses across Mid-span after Stress Redistribution for both Compressive and

Tensile Stresses............................................................................................................................ 92

Figure 58: Relative Value of Elastic Modulus across Mid-span after Stress Redistribution for

both Compressive and Tensile Stresses ...................................................................................... 93

Figure 59: Design Results for Shallow Beam ............................................................................... 94

Figure 60: Stress Blocks for Different Approaches ...................................................................... 96

Figure 61: Load vs. Deflection Curve for Shallow Beam .............................................................. 97


Tensile Stresses............................................................................................................................ 98

Figure 63: Relative Elastic Modulus along Mid-span after Stress Redistribution for both

Compressive and Tensile Stresses ............................................................................................... 98

Figure 64: Design Results for Deep Beam ................................................................................. 100

Figure 65: Load vs. Deflection Curve for Deep Beam ................................................................ 101


Tensile Stresses.......................................................................................................................... 102

Figure 67: Design Results for Deep Beam with Openings ......................................................... 103

Figure 68: Load vs. Deflection Curve for Deep Beam with Openings ........................................ 104

Figure 69: Different Design Philosophies for CPUs and GPUs ................................................... 108

Figure 70: Execution of a CUDA Program .................................................................................. 110

Figure 71: Hierarchy of CUDA Threads ...................................................................................... 111

Figure 72: Hierarchy of GPU Memory ....................................................................................... 112

Figure 73: CSR Representation for a Sparse Matrix K ............................................................... 122

Figure 74: SpMV Kernel for the Sparse Matrix in CSR Format .................................................. 122

xv

Figure 75: GPU-PCG vs. CPU-PCG .............................................................................................. 125

xvii

List of Tables

Table 1: Cost Comparison for Shallow Beam Design .................................................................. 39

Table 2: Cost Comparison for Deep Beam Design ...................................................................... 47

Table 3: Cost Comparison for designs of Deep Beam with Openings ......................................... 57

Table 4: Stress Reduction Factors ............................................................................................... 73

Table 5: Comparison of MLEFEA with different value of ε ......................................................... 79

Table 6: Approaches Comparison for Shallow Beam .................................................................. 79

Table 7: Approaches Comparison for Deep Beam ...................................................................... 84

Table 8: Cost Comparison for designs of Deep Beam with Openings ......................................... 87

Table 9: Approaches Comparison for Shallow Beam (with Updated MLEFEA) .......................... 94

Table 10: Approaches Comparison for Deep Beam (with Updated MLEFEA) ............................ 99

Table 11: Comparison between GPU-PCG and PCG ................................................................. 124

xix

List of Appendices

Appendix A: Complete Listing of the Program .......................................................................... 140

1

1 Introduction

1.1 Background and Motivation for This Work

As a popular and a particularly effective building material, Portland cement concrete, typically

referred to as “concrete”, has been widely used in construction in Australia and around the

world. It is made by mixing the cement, water, fine aggregate (e.g. sand or finely crushed rock),

and coarse aggregate (e.g. gravel). In most cases, fly ash, limestone or blast furnace slag are

also used within the concrete mix.

Because of its components, the manufacture of cement is widely seen to be a major

contributor to the release of carbon into the atmosphere, with approximately 1 tonne of

carbon dioxide being released for each tonne of cement produced. Thus concrete can be a

great pollution source in terms of carbon emission resulting from its production and delivery.

This issue has received widespread attention and is now of significant interest. For example,

currently in Australia, to reduce the carbon emission due to construction activity, building

owners will be rewarded in terms of green stars if they include 20% fly ash or blast furnace slag

in their concrete mixes.

However, this may not lead to the desired environmental outcome, as suppliers may increase

the total quantity of cement in the mix in order to meet a performance-based specification

(e.g. with respect to strength gain) while retaining 20% of fly ash or slag. Since the production

of concrete is a major generator of carbon, reducing the amount of concrete in a design is a

much more effective way of reducing the environmental impact of concrete than such

arbitrary requirements.

This thesis develops techniques that can effectively reduce the carbon emission during

construction activities by reducing the amount of concrete being used in structures through a

more efficient design approach based on Modified Linear Elastic Finite Element Analysis

2

(MLEFEA). By ensuring that the concrete is being used optimally, the impact of concrete

building activity on the environment can be reduced to the minimum level necessary. Using

less concrete in buildings reduces the environmental impact throughout the complete lifecycle

of the building, and contributes to sustainable development.

1.2 Outline of This Thesis

This thesis starts with the preceding introduction to the overall background and motivation for

the research project. Then the main content of the thesis is divided into two parts. The first

part is about the efficient reinforced concrete design approach, and the other one is as about

the efficient GPU implementation of this approach. For the first part, the basic theory of

reinforced concrete structure design and the literature about this topic is reviewed. The

concept of stress singularities and the finite element method are also introduced here.

In Chapter 3 comparison between conventional design approaches and LEFEA based design is

performed. Designs are conducted for three different types of structures, shallow beams, deep

beams and deep beams with rectangular openings. For the conventional approaches, the

equivalent stress block approach is used to design the shallow beams, while the strut and tie

approach is used for the deep beams. The solutions provide a general idea about the efficiency

and inefficiency of LEFEA approach in various engineering applications. The cost comparison in

terms of concrete and steel is included.

Chapter 4 proposes a new approach (named MLEFEA) to perform the compressive stress

redistribution, using the example of an L-shaped plate. The results show that this approach can

successfully remove the stress singularity encountered at the re-entrant corner of the plate.

The details of the process required to perform the MLEFEA is presented.

Chapter 5 develops an adaptive MLEFEA for three different models. Additionally, both the

compressive stress redistribution and the tensile stress redistribution are introduced into the

3

MLEFEA and examined. At the same time, the non-linear analysis of the load capacity is

undertaken for each design using ABAQUS .

The rationale of using GPU for performing MLEFEA is investigated in Chapter 6, which also

includes a general overview of part two. This chapter also deals with the basic theory and

literature about the GPU and its use in finite element analysis, and approaches to stiffness

matrix assembly and reduction are investigated.

Chapter 7 presents the GPU implementation in MLEFEA. To begin with, the GPU-based PCG

approach is introduced, then the GPU-based MLEFEA is developed, with several effective

optimization approaches being presented. This chapter ends with speed comparison between

CPU code and GPU code.

The thesis finishes with a conclusion in Chapter 8 covering the key developments and findings

of this study. Appendices and references follow.

4

PART I: EFFICIENT REINFORCED CONCRETE DESIGN

METHOD

In this part, theories and literature relevant to the reinforced concrete design approaches are

reviewed first. The conventional design approaches are then investigated for three different

types of structures (beams). The basic idea of MLEFEA is proposed and is demonstrated with

an L-shaped plate application. This is followed by the introduction of an adaptive MLEFEA

approach, and those three different types of beams are analysed again to demonstrate the

efficiency of this refined approach. The designs resulting from the approach are then verified

as safe and reasonable through the non-linear finite element analysis.

5

2 Basic Theory and Literature Review

In this chapter, the basic theories involved in this work, including structural design methods,

reinforced concrete structure design methods and the stress singularities resulting from linear

elastic stress analysis of certain structural configurations are introduced. In addition, the

literature about these theories is also reviewed. For the structural design methods, three

different design methods, namely the Working Stress Method, the Ultimate Load Method, and

the Limit State Method, are introduced separately. For the design of reinforced concrete

structures, including flexural and non-flexural members, the conventional Equivalent Stress

Block Method, the Strut and Tie Method, and the Linear Elastic Finite Element Method are

described. Finally, the steps for conducting finite element analysis of elastic structures are also

explained, as this method forms the basis for the approach taken in the work described in this

thesis.

2.1 Structural Design Methods

The basic principle objective for a reinforced concrete structure design is to ensure that the

structure can achieve its intended purposes over its intended life time, with the following

properties being maintained (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007)：

Adequate Performance in terms of stability and strength: The structure should have adequate

strength to resist any overloads occurring during its life time and to perform well under service

conditions without collapse or excessive cracking.

Adequate Serviceability in terms of durability and stiffness: The structure should not exhibit

excessive deformation, and should maintain functionally over its intended life time, resisting

unexpected loads without great loss of stiffness.

Reasonable cost: Construction costs should be as economical as possible, while still meeting

the requirements of performance and serviceability.

6

With the above objectives being borne in mind, the following three methods are commonly

used to conduct the design of reinforced concrete structures.

1. Working Stress Method, also known as Modular Ratio Method;

2. Ultimate Load Method, also known as Load Factor Method;

3. Limit State Method.(Cevahir et al. 2010a)

2.1.1 Working Stress Method

The Working Stress Method is a traditional design method adopted in early reinforced

concrete structural design codes (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007). This

method allows for the conversion of a member constructed with different materials into an

“equivalent” section which has a homogeneous and single elastic modulus for design purposes,

which is why it is also called the “Modular Ratio Method”. In this approach, the structure is

analysed under specified design loads, and the stresses in each member are checked to ensure

a sufficient factor of safety against failure or yielding of that member. The magnitude of the

factor of safety required for a particular structural action depends upon the degree of safety

required. Calculation of these stresses usually involves idealising the structure as a collection

of beams and columns, and then calculating the stress resultants (internal axial forces, shear

forces, bending moments and torques) for every member. Elastic behaviour of the material is

assumed, and it is also assumed that plane sections remain plane. This method is very easy to

undertake and simple to understand, and at the same time, designs produced by this method

always results in relatively large structure member sections compared to designs produced by

the Ultimate Load Method, and therefore, this method will give designs with comparatively

better serviceability and performance under working loads.

However, because this method assumes that all stresses in the steel reinforcement remain in

the linear elastic range, it doesn’t utilize the real strength of structure and generally results in a

higher factor of safety against failure than is needed. Secondly, it generally results in

7

uneconomical designs when members use compressive reinforcement and when dealing with

compressive members, as it will require a larger volume of compressive steel compared with

the Limit State Method. Finally, replacing the concrete and steel with a homogeneous material

for the purpose of analysis is not realistic, as the creep and non-linear behaviour of the

concrete will not give concrete a definite elastic modulus.

In a sentence, the design load applied to a structure designed in this manner will usually be far

below its actual ultimate collapse load, and so will be a conservative design. Consequently, it is

hard to obtain an optimal and economical design by using this working stress method.

2.1.2 Ultimate Load Method

To overcome the shortage of Working Stress Method in terms of its incapability to give designs

with the target factor of safety against failure, the Ultimate Load Method was introduced into

reinforced concrete design (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007). This

method uses a load factor to ensure the safety of structures by taking this as the ratio of the

ultimate load of the structure to the working load carried by the structure. The structure is

then designed to collapse at the ultimate load. Unlike the Working Stress Method, which

considers only the elastic range of material behaviour, the Ultimate Load Method can give the

required margin of the safety against failure, since it considers the full non-linear stress-strain

relationship for both concrete and steel, and thus experimental tests of structures designed in

this way show the actual collapse loads are close to the design ultimate loads.

However, because this method utilises the full strength of the members, the structure sections

designed by this method may be very thin or slender, which may result in excessive cracking

and deformation under service loads and lead to the lack of serviceability. The method doesn’t

consider the effect of creep and shrinkage for the concrete, which may exacerbate

serviceability problems.

8

To summarise, the Ultimate Load Method successfully ensures the safety of a structure and

results in efficient designs, while neglecting serviceability and performance under service loads.

Again, this method has been effectively superseded by the modern Limit State Method.

2.1.3 Limit State Method

From the above introductions, the Working Stress Method gives reasonable serviceability and

performance but only partially utilizes the actual strength of the designed structure, while on

the other hand, the Ultimate Load Method provides the target structure strength without

considering adequate serviceability and performance under service loads. An ideal method

would take full advantage of both these two methods by guaranteeing the serviceability and

performance, while at the same time considering the target ultimate strength of structure.

As the Limit State Design approach can meet al.l those requirements, it is now accepted and

widely adopted in many international modern reinforced concrete structures design codes,

including the Australian concrete design code AS3600 and steel design code AS4100. There is

lots of literature about the limit state method (Kotsovos & Pavlovic 1995; Varghese 2004;

Punmia et al. 2007). Importantly, when assessing a particular limit state of a structure by using

the Limit State Method, it is necessary to take all the variables (e.g. material strength, loads

types) into consideration. This can be achieved by using characteristic values for materials and

loads.

Limit states are specified on serviceability, ensuring that the structure performs adequately at

the expected working loads without deforming or cracking excessively, and also on strength.

The strength limit states specify loading regimes sufficiently far above expected working loads

that a structure designed to collapse under these regimes has a sufficiently low probability of

failure during its lifetime. This means the probability of occurrence of the limit state load

combinations in the lifetime for the structure is small enough so that designs based on it can

be considered as safe.

9

Obviously, there is still space to improve and optimize the techniques used within this method

so that designers can fully utilize the redundancy within structures. For example, designers can

allow some members to reach stresses which will cause significant non-linear behaviour

between the serviceability and the strength limit states.

To obtain the efficiencies possible through the limit state method, the moment redistribution

approach (Williams 2009) is widely employed in various applications of structural design,

including high-strength concrete beams (Carmo et al. 2005), reinforced concrete flexural

members (Scott & Whittle 2005) and plated reinforced concrete flexural members (Oehlers et

al. 2004b; Oehlers et al. 2004a). However, moment distribution is most readily performed by

hand calculation for continuous beams. Moment redistribution for frames is more difficult.

Although there are some ‘tricks’ which can be used with structural analysis packages to

perform moment redistribution for frames, only few designers are willing to use them. As

many designers do not feel confident enough to perform this type of moment redistribution,

they just use the conventional elastic actions to perform the design. In this situation, the full

advantage possible by using the limit state design method is not obtained.

Nowadays, all modern codes of design of structures are based on limit state analysis, where

two principal theorems of limit analysis, namely the lower bound theorem and the upper

bound theorem, are used (Koopman & Lance 1965; Sloan 1989; Frier & Damkilde 2009). The

lower bound theorem states that if a load is in equilibrium with an internal stress distribution

where no stress exceeds the local value of plastic stress, this load is equal to or less than, in

other words is a lower bound of, the true plastic limit load. The upper bound theorem states

that if a load is computed on the basis of an assumed kinematically allowable collapse

mechanism, this load is equal to or greater than or in other words is an upper bound of, the

true plastic limit load. There is lots of research about the application of the lower and upper

bound theorems, e.g. the book written by Neal (Neal 1985) detailed introduced plastic analysis

for beams and plane frames, Sloan (Sloan 1989) developed a perfectly plastic soil model for

10

computing rigorous upper bounds on limit loads under conditions of plane strain, and Christian

(Frier & Damkilde 2009) demonstrated the lower bound limit state analysis by applying an

adapted interior-point method with a spatially varying barrier function.

2.2 Design of Reinforced Concrete Structures

The current Australian Standard for the design of concrete structures, AS3600, describes three

different methods for the design of the reinforced concrete members (Standards Australia

2009). These methods are Equivalent Stress Block Method, Strut and Tie Method, and Linear

Elastic Finite Element Method. The designer will choose from these methods depending on the

type of the structures that needs to be designed. For example, because of the complexity of

non-flexural members (such as deep beams), the most common approach is the Strut and Tie

Method, while for the flexural members (such as shallow beams), the Equivalent Stress Block

Method is used (Warner 2007). Application of the linear elastic finite element method to

routine design is currently limited.

2.2.1 Equivalent Stress Block Method

According to AS3600, during the conventional design of flexural beams, as a means of

simplification, an Equivalent Rectangular Compression Stress Block may be used as a

replacement of the actual shape of the concrete compressive stress block (Warner 2007). This

is an approximation to the actual stress distribution in a reinforced concrete member that

takes advantage of the fact that the stress-strain curve of low to medium strength concrete

has a wide plateau region where the maximum stress is maintained reasonably constant with

increasing strain.

11

Figure 1: Conditions at Mu in a singly reinforced concrete section (Warner 2007)

Figure 1 shows the conditions at a cracked section in a singly reinforced simple reinforced

concrete beam under the ultimate moment . The section is rectangular with thickness

and height , and is the depth of the steel below the top fibre.

Figure 2: Equivalent Rectangular Stress Block (Warner 2007)

In Australia and many other countries, the Equivalent Stress Block Method, as shown in Figure

2, is used to design the flexural members by simplifying the “true” stress block using the

following two conditions (Warner 2007):

1: The total volume of the “Equivalent” rectangular stress block and the “true” stress block

should be equal, so that the resultant force is the same in each case.

2: The location of the centroid for both the “Equivalent” and “true” stress blocks should be at

the same height in the cross-section, ensuring that the lever arm of the resultant force is the

same in each case.

Forces

T

dc

z C

dn

εcu

εst

Strains

D

As

t

d

b

Cross-section Stresses

fsy

fcp

εcu

dn

Strain “True” stress

block

fcp

γ dn/2

Centroid

γ dn

Equivalent

rectangular

stress block

α2fc’

γ dn/2

Neutral axis

12

The parameters used to describe the stress block in this method, according to AS3600, are:

′ (Eq. 2-1)

′ (Eq. 2-2)

Also, for the value of extreme fibre concrete strain at which the ultimate moment will

occur, the Australia Standard adopts the limiting strain:

(Eq. 2-3)

In the case of the rectangular stress block, the compressive force in the concrete is:

′ (Eq. 2-4)

In the absence of pre-stress or compressive reinforcement, the compressive force within the

concrete should equal the tensile force in the steel, leading to the equation:

(Eq. 2-5)

Here is the yield stress of the steel and is the area of the tensile reinforcing steel.

The lever arm between the resultant force of the equivalent stress block and the resultant

force of the stress in the reinforcing steel is:

(Eq. 2-6)

So the moment capacity is then calculated as:

( ) ( ) (Eq. 2-7)

Here, is the neutral axis parameter at the ultimate moment, which is defined as:

(Eq. 2-8)

From the point of view of this thesis, replacing the actual distribution of compressive stress in

the concrete with the equivalent stress block, in which the stress is limited to , can be

13

seen as an application of the lower bound theorem of plasticity. A distribution of stress over

the cross section has been found which is in equilibrium with the moment and which does

not exceed the nominal plastic stress anywhere. Hence is a lower bound on the actual

strength of the cross section.

2.2.2 Strut and Tie Method

Compared to flexural members, non-flexural members are more complicated to design and

analyse. Currently, the Strut and Tie method (STM) is widely used for the design of such

members. The STM is also an application of the lower bound theory of plasticity. In STM, the

complex flow of internal force in the region under consideration is idealized as a truss carrying

the imposed loading through the region to its supports. This truss is called a strut-and-tie

model and is a statically admissible stress field leading to a lower bound solution. Like a real

truss, the strut-and-tie model consists of struts and ties which are interconnected at nodes.

The struts represent regions of concrete which are assumed to carry compressive stress, while

the ties represent the reinforcement carrying tensile stress (Warner 2007). Once the truss

model is chosen, stress resultants within the struts and ties are calculated by using the

principles of statics. The stresses in the compressive struts are checked to ensure they do not

exceed a value representative of the concrete strength, while sufficient reinforcing steel is

provided in the ties to carry the required tensile force at yield. Concrete outside the

compressive struts and node regions is assumed not to carry any load. Generally speaking, the

strut and tie methodology allows designers to choose a rational load transfer path through the

structure, and then to design that load path to be strong enough to carry the strength limit

state loads. As the stress field of the STM is in internal equilibrium and equilibrates the applied

loads, it provides a rational approach to representing a complex structural member with an

appropriate simplified truss model. This approach is now widely used and has proved to be

very useful in the design and analysis of such disturbed regions of structural members.

14

In detail, to start the design process using the STM, the boundaries of the structural element

should first be defined, and the boundary forces (the ultimate design forces) on the element

should be determined from the imposed loads. Based on an understanding of how the applied

forces will be carried through the element, the designer must then sketch an appropriate

arrangement of struts and ties forming a truss, and then solve for the truss member forces.

The following step is to select the reinforcing steel to provide the necessary tie capacity and to

ensure that this reinforcement is properly anchored at the nodes. Thirdly, the dimensions of

the struts and nodes are evaluated so as to ensure that the capacity of all struts and nodes is

sufficient to carry the strut member forces. The final step is to provide distributed

reinforcement to ensure ductile behavior of the region (Foster 1998; Warner 2007; Wight et al.

2003; Zhang & Tan 2007a).

Opposite is a typical example of the application of the STM. It is a simply supported deep beam

with a force applied on it at its central span. The geometry and a plot of the Von Mises stresses

calculated by linear stress analysis are shown in Figure 3 and Figure 4 respectively. From Figure

4 it is clear that, compared to the flexural members, the force is directly transferred from the

load point to the supports, rather than being transmitted laterally as shear force, which

invalidates the conventional design method for flexural beams. However, this can be

overcome by using the STM to model this particular load path. Figure 5 shows a typical strut

and tie model which can be used for this case, imitating the load transfer path reasonably

correctly.

As mentioned above, the STM is an application of the lower bound theorem of plasticity. The

assumed stress field within the concrete element is quite different from the actual stress field.

Concrete outside of the strut and node regions is assumed to be unstressed. However, the

assumed stress field is in internal equilibrium, and does not exceed the equivalent plastic

stress anywhere, and so the load applied in the analysis (which was the required strength limit

load) becomes a lower bound to the collapse load of the element. However, in order for the

15

lower bound theorem to apply, the material must have sufficient ductility to redistribute the

stresses to the assumed stress field. As concrete has limited ductility, distributed web

reinforcement may be necessary to ensure that the design load is achieved.

Figure 3: Geometry of Deep Beam

Figure 4: Von Mises Stress Plot for Deep Beam (Linear Elastic Analysis)

Figure 5: Strut-and-Tie Model for Deep Beam

16

The concept of STM is derived from the truss analogy introduced by Ritter (Ritter 1899) and

Mörsch (Mörsch 1902) in Switzerland and Germany. The truss analogy was then extended by

Rausch (Rausch 1929), who implemented it for beams subjected to torsion. He assumed that

the loads are carried by a truss consisting of longitudinal tension and compression chords

representing the tensile steel reinforcement and the concrete compression zones respectively,

while the stirrups provide the vertical members joining the longitudinal chords. He also

pointed out that beams with insufficient web reinforcement will suffer shear-cracking, and

asserted that the crack inclination was hard to calculate (Rausch 1929). However, Kupfer and

Hilsdorf (Kupfer & Hilsdorf 1969) solved this problem and proposed an equation to calculate

the crack inclination by minimizing the strain energy of the whole truss model. Similarly,

Baumann (Baumann 1972) developed an equation for the calculation of crack direction within

the reinforced concrete structures subjected to in-plane stresses. Later, Schlaich et al. (Schlaich

& Jennewein 1987) applied the truss analogy to all types of reinforced and prestressed

concrete structures using strut and tie systems.

Ashour et al. (Ashour 1997) tested several reinforced concrete continuous deep beams with

various span-to-depth ratios, amounts and type of web reinforcement, and amounts of main

longitudinal reinforcement, and found that the vertical web reinforcement had more influence

on the ultimate load capacity than the horizontal web reinforcement, contrary to code

predictions. To explain this, Foster (Foster 1998) developed a rational formulation to obtain

the minimum web reinforcement. This formulation is based on the assumption that the beams

will not fail suddenly due to the diagonal concrete cracking, and then the required minimum

reinforcement is that needed to carry the bursting forces at the time of cracking.

Tan (Tan 2001) developed a strut and tie model taking the effect of pre-stressing into

consideration. In the same paper he also proposed a model which takes into account the

combined tensile strength contributions from longitudinal and web reinforcement, prestressed

17

strands and concrete, and uses a linear interactive failure criterion modified from the Mohr-

Coulomb theory. This model can be used for both pre- and post- tensioned deep beams.

Based on the same failure criterion, Tong (Tan et al. 2003) presented a simple and direct STM

model which takes into account the contribution of different web reinforcement

configurations (vertical, horizontal, or inclined) and prestressing tendons. In this model,

because of the adopted interactive stress-based failure criterion being applied, there is no

need to use other stress limits to calculate the ultimate strength of the beam.

Zhang and Tan (Zhang & Tan 2007b) discussed the size effect for deep beams, which is typified

by a reduction in measured shear strength due to an increase in the height of deep beams.

They pointed out that the size effect can be eased by properly configuring the dimensions of

the loading and support plates within the strut and tie model.

Using a different approach Tan’s strut and tie model for prestressed deep beams (Tan 2001),

Wang et al. (Wang & Meng 2008) developed a modified strut-and-tie model for prestressed

concrete deep beams which successfully predicts the strength of the pre-stressed concrete

deep beam. In this model prestressing is represented by equivalent external loads, and the

Kupfer-Gerstle biaxial tension compression criterion is adopted to take concrete softening into

consideration.

Some further work has also been done regarding the design and analysis of high strength

concrete deep beams. Tan (Tan et al. 1997) performed some tests on high strength concrete

deep beams subjected to combined top and bottom loading. In addition, Park and Hoque

(Hoque 2006; Park 2005) applied the STM to the analysis of fibre reinforced polymer (FRP)

strengthened deep reinforced concrete members.

However, compared to simple deep beams, the analysis and design of deep beams with web

openings has not attracted as much attention. Mansur and Tan (Mansur et al. 2001; Tan et al.

2003) proposed strut-and-tie models for the analysis of reinforced concrete beams containing

18

geometric discontinuities as a result of a circular opening. Hu et al. (Hu & Tan 2007) also

investigated the behaviour and shear strength of large reinforced concrete deep beams with

web openings. For high-strength concrete deep beams, Yang et al. (Yang et al. 2006) estimated

the influence of web openings experimentally and analytically.

The basic requirement for the STM is to choose a rational strut and tie model that, in most

cases, is similar to the true load transfer path within the structures. However, this can be

difficult to do when facing very complex structures, particularly those with penetrations (Tjhin

& Kuchma 2007). Even when choosing the strut and tie model arrangement with the aid of

linear finite element analysis to give the stress field information, the designers may still need

to try several times before identifying a suitable model. The literature indicates that web

openings often obstruct the most direct strut and tie models, and suggests many complicated

truss models to try to overcome the problem (Guan 2005; Mansur et al. 2001; Tan et al. 2003).

This makes the design procedure for deep beams with openings very tedious and laborious.

As stated previously, using the STM may lead to inefficient and conservative designs, as it only

considers the contribution of concrete within the struts to the whole strength capacity of the

structures, while neglecting the concrete outside the struts. This can lead to more concrete

being used than is actually necessary, and unnecessary carbon emissions in the production of

that concrete.

Recently, several researchers have developed an approach to choose more rational strut and

tie models through topology optimization, removing the inefficient material gradually from the

component being designed. Liang and Pillai (Liang et al. 2000; Nagarajan & Pillai 2008)

presented a topology optimization method that can automatically generate ‘optimal’ strut-

and-tie models for reinforced concrete structures. This is achieved by progressively removing

elements that have the least contribution to the stiffness from the discretized concrete

member. Guan (Guan 2005) demonstrated and evaluated a procedure for the design of strut-

and-tie model through topology optimization of continuum structures. Bruggi (Bruggi 2009)

19

implemented a minimum compliance optimization to generate the strut and tie models in both

2D and 3D situations. However, while such approaches may lead to more effective strut tie

models, the material that is discarded in the analysis process is still present in the actual

structure, while the contribution of that material to the true strength of the structure has also

been discarded in the design process. Therefore these approaches do not overcome the

fundamental inefficiency of the STM.

2.2.3 Linear Elastic Finite Element Method

While linear elastic frame analysis is extensively used in structural engineering design, the use

of linear elastic stress analysis in reinforced concrete design has so far attracted little attention

in the literature. In contrast, there is much research focussed on non-linear finite element

analysis of reinforced concrete structures (Hoque 2006; Roy & Thiagarajan 2007; Dabbagh &

Foster 2006; Au & Bai 2007). However, non-linear finite element analysis is mainly concerned

with analysing the performance of a particular structure under a particular loading regime,

rather than being a central tool in the design process (Kotsovos & Pavlovic 1995). There are

two reasons for this. One is that the detailed information of structural sections must be known

before non-linear analysis can be performed, whereas, in reality, the size and properties of the

structure and reinforcement are not known at the beginning of the design process. Another

reason is because results from non-linear analysis are highly dependent on both the load

combination and load time history. Consequently the non-linear analysis method is usually

used to verify the performance of the design under certain extreme loadings, following design

of the structure using other techniques, such as the strut-tie method.

In contrast, linear elastic frame analysis, most often performed with the use of computer, can

be conducted at an early stage of a design using quite coarse assumptions about the member

sizes. From the results of structural analysis, the worst stress resultants can be obtained and

then, with the aid of moment redistribution, an efficient design can be achieved.

20

The 2008 revisions of AS3600 explicitly permit the use of linear elastic stress analysis (and by

implication the linear elastic finite element method) to design reinforced concrete structures

or members.

The Linear Elastic Finite Element Method has been available since the 1960’s. It has been used

in the design of massive concrete structures such as dams and nuclear power plants, but has

not been used widely in general structural concrete design. Part of the reason for this

unpopularity in concrete design is that the results of finite element analysis are approximate

and are highly depended on the mesh size. For example, relatively coarse meshes may result in

inaccurate stress results, where the stress fields do not satisfy equilibrium locally. If the stress

field used in the design does not satisfy equilibrium locally, the lower bound theorem of

plasticity does not hold, and the corresponding external load may not be the lower bound of

the collapse load of a structure. A good example of this is the collapse of the Sleipner A

platform in 1991 (Jakobsen 1994; Deeks 2008). During the design process, ineffective

modelling of the tri-cell legs of the platform using linear elastic finite element analysis led to

the shear stress in the concrete between the cells being significantly underestimated and

inadequate reinforcement being specified. This led to the failure and total loss of the structure,

with the economic loss exceeding $700m.

When compared with the conventional STM, there are many advantages to be gained by using

the linear finite element method (Foster 2003). It allows the stresses within a structure of

arbitrary geometry to be calculated, without the necessity of assuming plane sections remain

plane. With modern computer software it is very easy to apply and the finite element model is

easy to set up. In addition, linear analysis can accommodate multiple load cases quickly. A

design based on the linear stress field will place the greatest quantity of reinforcement at the

high-tensile stress areas, which can efficiently control crack widths. Also, since the stress field

is computed directly by the computer, less work is required from the designer.

21

Compared to the STM method, the linear finite element method considers the contribution of

concrete outside the conventional compressive struts and nodes, and so a more efficient

design can be achieved, not only in terms of less material usage and thus cost saving, but also

less time-consuming for the designers.

The problems of the mesh dependency of finite element results and the non-equilibrium of

stress field locally when meshes are coarse can effectively be overcome by the application of

effective adaptive procedures. Through adaptive procedures, an initial coarse mesh specified

by the designer at the beginning of the process can be automatically refined to obtain a stress

field of specified accuracy.

Unfortunately, when a sufficiently fine mesh or an adaptive procedure is used, especially when

dealing with members with certain geometric discontinuities or boundary conditions (such as

deep beam with square or rectangular web openings), local stress concentrations are often

identified (Augarde & Deeks 2008; Zhu, Hinton & Zienkiewicz 1991; Zienkiewicz & Zhu 1992).

Adaptive refinement in the areas of stress concentration or singularity will result in the

calculation of progressively higher stresses.

The previous version of AS3600 allowed the design of non-flexural members using linear

elastic analysis, and using “accepted principles of mechanics”. There is no clear guidance as to

what the design process should be and no literature referred to. However, the code does

specify that the tensile stresses should be all carried by the reinforcement or tendons, but

there is no guidance as to what distance this stress can be integrated over, or what the

maximum bar spacing should be. The compressive stress is limited to where

( )

and can be averaged over a distance of 100mm to reduce the peak stresses.

Again, there is no literature quoted to support the use of such a stress averaging process.

In contrast, the 2008 version of AS3600 allows the peak tensile stresses can be averaged over

an area “appropriate to the size of the element” with no guidance about the way to choose

22

this “appropriate area”. As for the compressive stress, it is now limited to , where is

an efficiency factor which depends on the tensile stress and the concrete confinement. In

addition, the averaging process for the compressive stress has been removed. This makes it

impossible to use linear elastic finite element analysis when dealing the members with stress

singularities, as these stress singularities violate the maximum stress specified by the code.

2.3 Stress Singularities

Obviously, in order to take advantage of linear elastic finite element analysis for the design of

concrete members, a way must be found to deal with the stress concentrations and

singularities that violate the maximum stress criteria. Considering forces exerted on the

boundary of the member, the bearing stress can be approximated by “Stress = Force/Area”.

Based on this formula, the stress will be very large if the area is very small. The area of a point

(2D) or a line (3D) is theoretically zero, and this is why applying constraints and forces to points

or lines in a finite element model will result in regions where the stresses do not converge, but

keep getting higher as the mesh is refined. These points are referred to as stress singularities.

For problems of elasticity, stress singularities are also associated with particular geometric

discontinuities or boundary conditions, such as sharp corners (Barber 2002).

According to Williams (Williams 1952), the stresses near a re-entrant corner can be described

in terms of polar coordinates(r-θ), with the origin located at the corner. Williams deduced an

expression for the stress field given particular boundary conditions. According to this

expression, the stresses in the vicinity of the corner are proportional to ( ) with

the power of the singularity determined by . For homogeneous material ⁄ , this results

in the stresses being singular at the corner where r = 0.

In this thesis, the most complex example used is a deep beam with rectangular openings. The

geometric discontinuities and boundary conditions for this problem which lead to stress

concentrations or singularities are: the re-entrant corners and the force and restraint

23

boundary conditions. These geometric features and boundary conditions for the structure

mean that it is not possible to find an analytical solution for the stress field. However, by using

the finite element method with a very fine mesh, the stress field near the singularities can be

approximated. Figure 6 shows the geometry of a deep beam with openings and Figure 7 shows

the finite element model used and the resulting Von Mises stresses due to a central applied

force. Concentration of the stress at the support point and at two of the re-entrant corners is

obvious. If the mesh is further refined by reducing the size of the elements, the stress

singularities become more obvious, and tend to infinity as the element size tends to zero.

.

Figure 6: Geometry of the Deep Beam with Rectangular Openings

24

Figure 7: Finite Element Model (Left) and Von Mises Stress (Right)

However, in an actual structure, infinite stresses at such points are not realistic, as any

engineering materials, including concrete, would crack/crush/yield under the high stresses,

effectively removing these singularities. Furthermore, when constructed in practice, the

geometry of the structures is not truly ’sharp’. The corners will have a finite radius, but it’s

hard to include this in a finite element model as the radius will be much smaller than the

dimensions of the structure, and much smaller than the elements used in the example shown

in Figure 7.

Obviously, the presence of stress singularities is a significant barrier to the use of conventional

analysis methods, including adaptive finite element methods, as they involve a lot of effort to

refine the mesh locally in the areas near the stress singularities.

One approach to solving the problem of stress singularities is to adopt the scaled boundary

finite element method (SBFEM), which has been applied to linear elastic fracture of

unreinforced concrete elements (Yang & Deeks 2007). This method has proved to be an

efficient way to analyse stress singularities and discontinuities extremely accurately (Chidgzey

& Deeks 2005; Deeks & Wolf 2002a; Deeks & Wolf 2002b; Deeks & Wolf 2002c). However,

from the point of view of structural concrete design using linear stress analysis, obtaining the

25

analytical form of the stress singularities does not permit the stress field to be used for design

purposes, as the stresses still exceed those permissible.

The work reported in this thesis investigates a different approach to the problem. Utilising the

lower boundary theory of plasticity in the same way as moment redistribution does for

continuous concrete beams, the stresses in the regions of stress concentration are

redistributed, while at the same time maintaining equilibrium internally and externally. This

stress redistribution method is achieved in linear elastic analysis by reduction of the elastic

modulus of material in the area of high stress. This method can combine the merits of both the

ready automation of the linear elastic finite element analysis and the versatility of the STM.

2.4 The Finite Element Method

Repeated references have been made above to the Linear Finite Element Method. This

method will be used extensively in the work reported here, and so, for completeness, a brief

description of the method is given here. There are many good books describing the method in

great detail (Logan 2002; Huebner et al. 1995; Rao 1989), to which the reader is referred for

more information.

Within the area of mathematical physics and engineering, the finite element method is widely

used and is an efficient and versatile numerical method. For the problems analysed here,

linear stress analysis using the finite element method is implemented through the

displacement/stiffness method, in which the displacement within each element are

approximated locally between nodes using displacement shape functions. The nodal

displacements are then related to the stresses by using the strain/stress and

strain/displacement relationships (Logan 2002). Variational principles or the principle of virtual

work are then used to relate the nodal displacements to the applied forces. All the equations

resulting from this process can be easily written in matrix and vector form, and can be easily

evaluated in a computer programme or by mathematical software such as Matlab.

26

In this study the stress analysis is conducted using four-node bilinear quadrilateral plane stress

elements. Further information about this element and the basic formulation process will be

explained in this section.

Step 1 Discretizing Model and Selecting the Element Types

This step involves the discretizing of the problem model into finite elements interconnected by

nodes; there are lots of elements types available to choose from depending on the problem

and the performance the analyst wants to achieve. Normally, the smaller the size and the

higher order of elements used, the more accurate the results will be. However, at the same

time, the higher the cost of the computation and the effort required will be, and so

engineering judgment must be exercised during this process. In this study, as the concrete

structure members being considered are all of rectangular shape, the simplest quadrilateral

element is chosen, isoparametric four-node bilinear quadrilateral plane stress elements. Figure

8 illustrates this kind of element in terms of a local coordinate system s-t.

Figure 8: Isoparametric Bilinear Quadrilateral Element in Local Coordinates

27

Step 2 Selecting the Shape Function

For the isoparametric element, the same shape functions used to interpolate the

displacements between the nodes are also used to map the element coordinates into the

global coordinates of the structure x-y:

{ } [

]

{

}

(Eq. 2-9)

Where is the shape function for node :

( )( ) (Eq. 2-10)

( )( ) (Eq. 2-11)

( )( ) (Eq. 2-12)

( )( ) (Eq. 2-13)

Step 3 Defining the Strain/Displacement and Stress/Strain Relationship

In order to derive the relationship between the nodal displacements and nodal forces for each

element, which can be used to evaluate the element stiffness matrices, it is first necessary to

relate the stresses and strains within the element to the displacements of its nodes. For the

bilinear quadrilateral elements used here, there are four nodes for each element, and each

node has two degree of freedom, so the displacement field is determined by eight nodal

displacements. As was shown previously for the isoparametric mapping, the displacement field

within the element is described by the displacement functions u(x), u(y) as shown:

28

{ } [

]

{

}

[ ]{ } (Eq. 2-14)

The strains within the element can then be related to the nodal displacements as:

{

} [ ]{ } (Eq. 2-15)

The derivatives of the shape functions are contained in the matrix [B], which can be expressed

as:

[ ]

[

]

[ ] (Eq. 2-16)

Finally, the stress/displacement relationship is expressed as:

[ ] [ ][ ] [ ][ ]{ } (Eq. 2-17)

With the constitutive matrix [D] (plane stress here):

[ ]

( )[

] (Eq. 2-18)

Step 4 Forming the Element Stiffness Matrix

Applying variational principles or the principle of virtual work, the element stiffness matrix

relating the nodal forces to the nodal displacements can now be obtained as:

[ ] ∫ ∫ [ ] [ ]

[ ] | | (Eq. 2-19)

29

With | | is the determinant of the Jacobian transformation matrix, which is used to transform

derivatives from the local coordinates s-t into the global coordinates x-y. The thickness of the

element is denoted as h.

[ ] {

} [

]

[ ]

(Eq. 2-20)

Step 5 Assembling the Global Stiffness Matrix and Applying Boundary Conditions

The global stiffness matrix can be easily assembled once the individual element stiffness

matrices are formed. This assembly process can be expressed by:

[ ] ∑ [ ] (Eq. 2-21)

The final global equation written in matrix form is:

{ } [ ]{ } (Eq. 2-22)

Where [K] is the global stiffness matrix, [F] is the vector of global nodal forces; {d} is the

displacement vector of known and unknown structure nodal degrees of freedom.

Step 6 Solving the Global Equation

A number of numerical methods can be use at this step to solve the Global Equation, including

the Gaussian elimination method, the Conjugate Gradient method, the Preconditioned

Conjugate Gradient method, etc.

Step 7 Finding out the Stresses and Strains

Once the global equation has been solved, the displacements at all the nodes are known. The

stresses and strains within the element can be then be found by using the Strain/Displacement

and Stress/Strain relationships introduced in (Eq. 2-15) and (Eq. 2-17) respectively.

30

As the displacement field is only continuous between the elements, the element stresses

and strains are normally discontinuous between elements, so some types of stress recovery

algorithm are often used to smooth the stresses. For the bilinear quadrilateral plane stress

element used here, and in this study, the super-convergent patch recovery (SPR) method is

used (Zienkiewicz & Zhu 1992).

Step 8 Interpreting the Results

The final step is to interpret and analyze the results for use in the design/analysis process.

Computer graphics play an important role in this part of the procedure, as typically finite

element models contain thousands (or even millions) of nodes and elements. Displacements

can be visualised through projections of deformed meshes, while stresses are commonly

displayed using contour plots.

2.5 Summary

In this chapter, some basic theory about reinforced concrete structure design and the finite

element method is introduced and the relevant literature reviewed. In particular, the

conventional equivalent stress block approach, STM and the linear elastic finite element

method are discussed. The equivalent stress block approach is efficient for the design of

flexural members such as shallow beams as it imitates the nonlinear behaviour of the concrete

when over loaded. This can be interpreted as a lower bound approach, as it uses an assumed

stress field which is in internal and external equilibrium with the applied loads and does not

exceed the maximum compressive strength of the concrete anywhere. For the design of non-

flexural members such as deep beams, the STM is widely used. This is also a lower bound

approach, as the designer assumes a distribution of struts and ties which are in internal and

external equilibrium with the applied loads, and within which the maximum strength of the

concrete or steel is not exceeded anywhere. However, the truss model neglects the concrete

31

outside the assumed struts, and thus the STM potentially leads to concrete waste and

unnecessary carbon emission.

To attempt to avoid this material waste, the linear elastic finite element method can be

applied, allowing the contribution of all the concrete within the structure to its strength to be

taken into account. Less work is required by the designer as it can be performed by computer,

and there is less designer-dependency of the final design.

However, if the geometry or the boundary conditions of a member result in stress singularities

or concentrations, if a sufficiently fine mesh is used to ensure internal equilibrium is satisfied,

the peak compressive stresses will always exceed the maximum permissible stress in the

concrete, and the conventional linear elastic finite element approach cannot be used.

In the next chapter, different conventional approaches are used to design various reinforced

concrete members and compared with LEFEA based designs. In the following chapter the

modified linear elastic finite element method suggested in this chapter is fully developed. This

stress redistribution approach will be shown to be effective in handling problems caused by

stress singularities and concentrations.

32

3 Comparison of Conventional Design Approaches with

LEFEA-based Design

This chapter demonstrates the application of conventional design approaches (stress block and

strut-tie) and the LEFEA-based design approach to three types of simple structures. These

structures are shallow (flexural) beams, deep beams, and deep beams with web openings. The

efficiency of the resulting designs is compared. For the flexural reinforced concrete beam, both

the equivalent compressive stress block approach and LEFEA approach are presented, while

for the non-flexural beams without openings, both the STM and the LEFEA approach are

presented. Finally, for the non-flexural beams with openings, even though the stress

singularities involved in such beams invalidate the LEFEA approach for determining the

concrete thickness, the steel area indicated by the LEFEA approach is compared with steel area

required in the STM.

3.1 Design of a Flexural Reinforced Concrete Beam

Figure 9 shows a simply supported shallow beam with span 3700mm and height 450mm. A

concentrated load P is applied in the middle of the beam comprised of concrete with a

characteristic strength of . The yield stress of reinforcement to be used in this

design is .

Figure 9: Geometry of Shallow Beam

33

3.1.1 Application of Conventional Design Approach (Equivalent Stress

Block Approach)

For the design of shallow beams, the equivalent rectangular compression stress block method,

which was introduced in section 2.2.1, is widely used. In accordance with AS3600, for concrete

with , the parameters to describe the stress block are:

(Eq. 3-1)

(Eq. 3-2)

and the compressive force in the concrete is:

(Eq. 3-3)

while the tensile force in the reinforcement is:

(Eq. 3-4)

Within the structure, the compressive force in the concrete is equal to the tensile force in the

reinforcement, so:

(Eq. 3-5)

(Eq. 3-6)

In addition, from the design load, the moment that needs to be carried by the structure can be

calculated as:

(Eq. 3-7)

The moment capacity of the structure can be expressed as:

(

)

(

) (Eq. 3-8)

34

Where is the steel depth below the top fibre of the beam and here is chosen as:

(Eq. 3-9)

Equating and , the position of the neutral axis can be found as:

(Eq. 3-10)

Then the ductility ratio is:

(Eq. 3-11)

Because the ductility ratio is smaller than 0.36, which is the value required by the balanced

failure and neutral axis depth limits, the design is ok and no compressive steel is needed.

Finally the tensile force can be obtained as:

(Eq. 3-12)

followed by the area of tensile reinforcement as:

(Eq. 3-13)

3.1.2 Application of Conventional LEFEA Approach

For comparison, the conventional linear elastic finite element analysis (LEFEA) approach is

applied here to design the shallow beam whose geometry is shown in Figure 9. However, for

this simple beam a linear stress analysis can also be conducted by elastic beam theory. This will

be done first, so that the principles can be demonstrated, and then the LEFEA approach will be

used.

An initial plane thickness of 300mm is assumed. According to the AS3600, the maximum

principal compressive stress within the beam can be calculated as:

′

(Eq. 3-14)

35

By assuming the same concrete thickness as before which is 300mm, it implies a moment at

mid-span of:

(Eq. 3-15)

In the mid-span, the maximum principal tensile stress should be equal to the maximum

principal compressive stress , and the tensile force at mid-span can now be found via:

(

)

(

)

(

) (Eq. 3-16)

Finally, according to the AS3600, the tensile force should be all carried by the reinforcement

with the stresses do not exceeding . Therefore, the area of reinforcement for this design

is:

(Eq. 3-17)

The position of reinforcement should coincide with the centroid of the tensile stress block.

Approximately this can be taken as a distance from the centre of the steel to the top fibre of

the beam:

(Eq. 3-18)

However, the actual moment resulted from the applied force is:

(Eq. 3-19)

This is larger than the maximum moment calculated by elastic beam theory, and will result in

violation of the stress limit. To solve this problem, the thickness of the beam can be increased

to ensure it can carry the design moment. The new thickness of beam can be found through:

′

(Eq. 3-20)

36

Thus:

(Eq. 3-21)

So with the new thickness of beam is 406mm, the actual tensile force at mid-span is now:

( ′

)

( ′

)

(

) (Eq. 3-22)

The area of reinforcement for this design is then specified as:

(Eq. 3-23)

The same linear stress analysis will now be performed by the finite element method using 2D

planar elements, and the same design procedure followed. Again the initial beam thickness is

taken as 300mm. The value of elastic modulus is not important, as it will not affect the stresses.

The analysis here specifies the values of elastic modulus and Poisson’s ratio as 24500MPa and

0.2 respectively. Due to the symmetric of the beam, only half of the beam will be analysed.

Figure 10 shows the boundary conditions for this beam, where the supports are modelled as

relatively soft springs to avoid the occurrence of stress singularities within these regions.

Figure 10: Model for the Shallow Beam

37

The principal compressive stress using the conventional LEFEA approach is shown in Figure 11,

where the value of the maximum principal compressive stress is 17.28MPa. However, the

maximum principal compressive stress allowed by AS3600 within the beam is 13.5MPa, as

shown in (Eq. 3-14). This means that the initial value of beam thickness is too small and more

concrete is required.

Figure 11: Plot of Principal Compressive Stress

Because of the linearity of this method, the actual thickness of concrete required for the beam

can be obtained by dividing the resulted maximum compressive stress value using the

permitted largest principal compressive stress limit as:

(Eq. 3-24)

Integrating the tensile stresses along the mid-span can provide the design tensile force and

thus the required reinforcement area. To obtain the maximum resultant tensile force along the

mid-span cross-section, the trapezoidal rule is used to perform the integration in (Eq. 3-25), i.e.

by summing up all the trapezoidal areas under the curve shown in Figure 13 in the way

illustrated in Figure 12 and (Eq. 3-26).

0 200 400 600 800 1000 1200 1400 1600 1800 20000

50

100

150

200

250

300

350

400

450

-16

-14

-12

-10

-8

-6

-4

-2

0

38

Figure 12: Trapezoidal Rule for the Integration

∫

( ) ( )

( ) ( )

(Eq. 3-25)

∫ ( ) (Eq. 3-26)

Figure 13: Plot of Tensile Stresses across Mid-span of Beam

39

As the yield stress in the steel is , the quantity of steel reinforcement required

at mid-span is approximately:

(Eq. 3-27)

The centroid of the steel area should coincide with the centroid of the tensile stress to

maintain equilibrium. The location of the steel in terms of distance from the bottom fibre of

the beam to the centroid of the steel is calculated using the same trapezoidal rule for

integration as:

∫ ( )

∫ ( ) (Eq. 3-28)

3.1.3 Cost Comparison and Remarks

Table 1 shows the cost comparisons between the conventional equivalent stress block method

and the linear stress analysis method.

Table 1: Cost Comparison for Shallow Beam Design

Approaches Area of Steel ( )

Thickness of Concrete ( )

Equivalent Stress Block Theory 1329.6 300

Elastic Beam Theory 1541.0 406

LEFEA 1417.9 384

From Table 1, it can be seen that, for the flexural members, both the elastic beam theory and

the conventional LEFEA approach result in a less efficient designs, requiring more steel and

concrete. Clearly the linear stress analysis method is not an efficient approach for designing

shallow beams. One reason is that the linear stress analysis method cannot capture the non-

linear behaviour of concrete when overloaded, while the conventional equivalent stress block

approach can model this non-linear behaviour quite well by assuming a rectangular stress

block. The other reason is that the centroid of the steel must be located at the centroid of the

tensile stress diagram to preserve internal equilibrium, whereas locating the steel as close to

40

the bottom of the beam as possible (limited by serviceability requirements) will maximise the

lever arm and minimise the amount of steel required.

However, for non-flexural members, the benefits of LEFEA are more obvious. This will be

demonstrated in the next section.

3.2 Application to Design of Non-flexural Reinforced Concrete

Beams without Rectangular Openings

Figure 14 gives the geometry of a simple deep beam with a characteristic strength of

for the concrete, and a 1000KN point load is applied at the central of it over a bearing

plate.

Figure 14: Geometry of Deep Beam

3.2.1 Application of Conventional Design Approach (STM)

For the design of deep beam shown in Figure 14, the STM shown in chapter 2.2.2 is widely

used. A possible and most intuitive truss model for this deep beam is shown in Figure 15.

41

Figure 15: Strut and Tie Model for Deep Beam

For the STM, from the truss geometry and the force equilibrium, the load within the strut can

be found as:

√

√

(Eq. 3-29)

According to AS3600 DR05252, the maximum allowable compressive capacity of the strut in

this example is obtained via:

(Eq. 3-30)

Where is the cross-section area of the strut and can be calculated as:

(Eq. 3-31)

Therefore,

(Eq. 3-32)

According to the truss geometry, the strut width d is 353.55mm. So the thickness of this deep

beam can be calculated as:

42

(Eq. 3-33)

Then, from the truss geometry and the force equilibrium, the tensile force is:

(Eq. 3-34)

Assuming the yield stress of the steel is , the quantity of the reinforcement can

be calculated as:

(Eq. 3-35)


For comparison, the conventional linear elastic finite element analysis (LEFEA) approach is also

performed to design the deep beam shown in Figure 14.

Firstly, the initial thickness of the beam is taken as 300mm and linear stress analysis is

conducted. Due to the symmetric of the beam, only half of the beam is analysed. Figure 16

shows the boundary conditions for this beam, where the supports are modelled as relatively

soft springs to avoid the occurrence of stress singularities within these regions. Square 2D

planar elements with size of 25mm are used here.

43

Figure 16: Model for the Deep Beam

Figure 17 shows the plot of principal compressive stress. The maximum principal compressive

stress obtained from the LEFEA is 8.837MPa. However, the maximum principal compressive

stress allowed by AS3600 within the beam is 13.5MPa, as shown in (Eq. 3-14). This means that

the initial value of beam thickness is too large, and less concrete thickness is required.

44



can be obtained by dividing the resulted maximum compressive stress value using the

permitted largest principal compressive stress limit as:

(Eq. 3-36)

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000Principal compressive stress

-8

-7

-6

-5

-4

-3

-2

-1

0

45

Figure 18: Plot of Principal Tensile Stress

The plots of principal tensile stress and the tensile stress along the mid-span are shown in

Figure 18 and Figure 19 respectively.

To obtain the maximum resultant tensile force along the mid-span cross-section, the

trapezoidal rule is used to perform the integration in (Eq. 3-30) in the way expressed in Figure

12 and (Eq. 3-25).

∫ ( ) (Eq. 3-37)

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000Principal tensile stress

-6

-4

-2

0

2

4

46


Assuming the yield stress in the steel is , the quantity of steel reinforcement

required at mid-span is approximately:

(Eq. 3-38)

The location of the steel specified in terms of distance from the centroid of the steel to the

bottom fibre of the beam is found using the same trapezoidal rule for integration to calculate

the centroid of the horizontal tensile stresses in Figure 19 :

∫ ( )

∫ ( ) (Eq. 3-39)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

200

400

600

800

1000

1200

Stresses across Mid-span (MPa)

Dis

tan

ce

fro

m B

ott

om

Be

am

Fib

re (

mm

)

47

Figure 20: Von Mises Stress for Deep Beam using LEFEA

For interest, the Von Mises stress plot is shown in Figure 20, where the arch effect can be

observed and the load is transferred directly from load point to support. This shows that the

strut and tie model shown in Figure 15 for the deep beam is reasonable.


Table 2 shows the cost comparisons between the conventional STM and the LEFEA method.

Table 2: Cost Comparison for Deep Beam Design



Strut and Tie (STM) 1250 246

LEFEA 1397.4 196.38

The results presented in Table 2 demonstrate that for the non-flexural deep beams without

openings, designs based on the LEFEA approach can require less concrete usage thus less

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000Von Mises stress

1

2

3

4

5

6

7

48

carbon emission (20% less in this case). This is because the conventional STM only considers

the contribution of concrete within the struts to the strength of the member, while the LEFEA

approach considers the contribution of all the concrete, regardless of its position, which will

give a more accurate and reasonable stress field and a more efficient design.

Importantly, as the stress field resulting from the LEFEA is in local equilibrium and equates the

applied load, designs based on the LEFEA are safe according to the lower boundary theory of

plasticity. When using LEFEA, the stress path is determined by the computer instead of the

designers, and thus less work is required for LEFEA. The conventional STM is highly dependent

on the designers’ experience to choose a rational truss model. This may be laborious and time-

consuming for complex structures.

On the other hand, the beam designed using LEFEA requires more steel than that designed by

STM (12% in this case). This is because, as with the shallow beam, the centroid of the steel

must be located at the centroid of the tensile stress diagram to preserve internal equilibrium,

whereas locating the steel as close to the bottom of the beam as possible (limited by

serviceability requirements) will maximise the lever arm and minimise the amount of steel

required. In the STM the steel can be located closer to the bottom of the beam.

3.3 Design of Non-flexural Reinforced Concrete Beams with

Rectangular Openings

Figure 21 shows the geometry of a deep beam with two rectangular openings (with a

characteristic strength of

for the concrete), and a 1000KN point load applied at

the centre through a bearing plate.

49

Figure 21: Geometry of Deep Beam with Rectangular Openings

3.3.1 Application of Conventional Design Approach (STM)

For the design of non-flexural beams with openings, such as that shown in Figure 21, the STM

is widely used, and one possible truss model for this beam is shown in Figure 22.

Figure 22: Strut and Tie Model for Deep Beam with Rectangular Openings

Due to the symmetric of the model, only half of the truss model shown in Figure 23 will be

analysed.

50

Figure 23: Strut and Tie Model for Deep Beam with Rectangular Openings (Half Model)

Figure 24: Force Equilibrium for the Applied Load

51

Figure 25: STM model for Bottle Shaped Strut and Force Equilibrium (Warner 2007)

As for the STM, firstly, from the truss geometry and the force equilibrium shown in Figure 24,

the load comes to the bottle-shaped struts at the top is found via:

√ √ (Eq. 3-40)

Then according to AS3600 DR05252, the maximum allowable compressive capacity of the strut

in this example is specified via:

(Eq. 3-41)

According to the truss geometry, the strut width d is 353.55mm. So the thickness of this deep

beam can be calculated as:

(Eq. 3-42)

As shown in Figure 25, the tensile force carried by diagonal reinforcement within the struts is:

(Eq. 3-43)

Then from the truss geometry and the force equilibrium, the tensile force in the longitudinal

reinforcement is:

52

(Eq. 3-44)

Assuming the yield stress of the steel is , the quantity of the longitudinal

reinforcement is calculated as:

(Eq. 3-45)

The quantity of the diagonal reinforcement is:

(Eq. 3-46)

Multiplying the quantity of reinforcement with its length, the volume of steel required for this

truss arrangement is . The concrete thickness is 246mm. Other strut-tie

configurations may have led to slightly different results for concrete thickness and steel area.

However, for the purposes of comparing with the LEFEA approach, a single strut-tie design is

considered to provide sufficient comparison.


For comparison, a conventional linear elastic finite element analysis (LEFEA) approach is also

used to design the deep beam shown in Figure 21.

Firstly, the initial thickness of the beam is taken as 300mm and the linear stress analysis is

conducted. Due to the symmetric of the beam, only half of the beam is analysed. Figure 26

shows the boundary conditions for this beam, where the supports are modelled as relatively

soft springs to avoid the occurrence of stress singularities within these regions. Furthermore,

to better reflect the stress singularities in the inner corner of the openings, square 2D planar

elements with size of 10mm are used.

53

Figure 26: Model for the Deep Beam with Rectangular Openings

Figure 27 shows the plot of principal compressive stress. The maximum principal compressive

stress obtained from the LEFEA is 12.52MPa. However, the maximum principal compressive

stress allowed by AS3600 within the beam is 13.5MPa.

54



can be obtained by dividing the calculated maximum compressive stress value by the

permitted largest principal compressive stress limit, giving:

(Eq. 3-47)

500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

-12

-10

-8

-6

-4

-2

0

2

55

Figure 28: Plot of Principal Tensile Stress

The plots of principal tensile stress and the tensile stress along the mid-span are shown in

Figure 28 and Figure 29 respectively. The resultant tensile force along the mid-span cross-

section can be found by applying by integrating using the trapezoidal rule:

∫ ( ) (Eq. 3-48)

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

-6

-4

-2

0

2

4

6

8

56


If the yield stress in the steel is , the quantity of steel reinforcement required

at mid-span is approximately:

(Eq. 3-49)

The location of the steel specified as the distance from the centroid of the steel to the bottom

fibre of the beam is calculated using the same trapezoidal rule for integration as:

∫ ( )

∫ ( ) (Eq. 3-50)


Table 3 shows the cost comparison between the conventional STM and the LEFEA method for

the designs of the deep beam with openings.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

200

400

600

800

1000

1200

1400


Dis

tan

ce

fro

m B

otto

m B

ea

m F

ibre

(m

m)

57

Table 3: Cost Comparison for designs of Deep Beam with Openings

Approaches Volume of Steel

( ) Thickness of Concrete

( )

Strut and Tie (STM) 6132000 246

LEFEA 5336400 278.2

Table 3 demonstrates that the conventional LEFEA is less efficient in terms of concrete usage

than the conventional STM, and cannot be used effectively to design the non-flexural beams

with openings. The reason is that, for the conventional LEFEA approach, the concrete thickness

is controlled by the maximum principal compressive stress, while stress singularities involved

in such members mean that the smaller the elements used, the higher is the calculated stress.

So when dealing with the structure involving stress singularities, the benefits of concrete

saving for the conventional LEFEA are no longer obtainable. However, these results do show a

saving in total steel requirements.

3.4 Summary

This chapter has shown that for the design of flexural beams, the conventional LEFEA approach

is not efficient, as it cannot model the non-linear behaviour of the concrete and does not allow

the tensile steel to be located in the most effective position.

For the design of non-flexural deep beams without openings, design based on the LEFEA

approach is more efficient than conventional strut and tie designs in terms of the concrete

usage (although not steel usage, as again tensile steel cannot be located at the most effective

position).

For the design of non-flexural deep beams with openings, the stress singularities involved in

such beams invalidate the LEFEA approach for determining the concrete thickness, since the

finer the finite element mesh used in the analysis, the greater the calculated compressive

stresses, and hence the greater the required concrete thickness. However, the steel

58

requirements resulting from the LEFEA approach are less than the strut-tie approach, so if the

stress singularity issue can be overcome, LEFEA can potentially lead to a more efficient design.

In the next chapter, a modified LEFEA (MLEFEA) will be developed and discussed. The MLEFEA

approach can remove the stress singularities from the stress field, which means that beams

with square and rectangular web openings can be dimensioned so that the stress does not

exceed the maximum allowable principal compressive stress allowed by AS3600 at any point.

59

4 Modified Linear Elastic Finite Element Method

In this Chapter, an efficient way to perform the linear stress analysis involving stress

redistribution is developed. The basic process of stress redistribution is introduced. An L-

shaped plate is used as an example to demonstrate the efficiency of the proposed method.

4.1 Stress Redistribution

When using linear stress analysis to analyse a concrete beam with openings with re-entrant

corners, the occurrence of stress singularities means that the method provided in the current

Australian concrete design code cannot be applied, as the compressive stresses at the

singularities will always exceed the maximum allowable compressive stress. To avoid this

problem, the method proposed here introduces a stress redistribution process to redistribute

those stress singularities to limit the maximum stress to the allowable stress while preserving

internal and external equilibrium, enabling the design rules specified in the code to be applied

directly.

For the design of structures theoretically containing stress singularities, the actual properties

of the material should be taken into consideration. In practice stress singularities do not occur

because engineering materials, including concrete, locally yield or fail at some finite level of

stress, never reaching the infinite stress as predicted by the elastic stress field method (Barber

2002). In reality the concrete at a point of stress concentration would crack or crush, removing

the stress singularities altogether and the load would be shed to surrounding material,

increasing the surrounding stresses. Non-linear finite element analysis can model this

behaviour. However, as explained previously, the results of non-linear analysis are dependent

on the loading history and the details of the concrete constitutive model used. In linear

analysis, this process can be imitated by reduction of elastic modulus. In this work, a new

60

method to locally reduce the stress by the reduction of elastic modulus at the points of stress

singularities in the linear stress field will be presented.

Reducing the elastic modulus at particular points will effectively reduce the local stresses at

those points and redistribute them to the surrounding elements. The stresses in the

surrounding elements must change in order to preserve both internal and external equilibrium.

For stress singularity problems, this can be achieved by specifying an elastic modulus of zero at

the tip of the corners causing the stress singularities.

However, what should be borne in mind is that, since the elements being used are

quadrilateral elements, so step changes of the elastic modulus between elements will cause

more stress singularities where the boundary between areas of different elastic modulus

contains corners. To avoid this problem, a continuous change in the elastic modulus may be

used, modifying the finite element formulation to allow a variation in elastic modulus over an

element. For example, the variation of the elastic modulus function, in polar terms with the

radius measured from the point of singularity, could be defined by the plot in Figure 30.

Figure 30: Linear Reduction in Elastic Modulus

The grading interval is essential when the elastic modulus in the tip of the corner is specified as

zero. However, the size of the interval and consequent rate of elastic modulus softening is

somewhat arbitrary, and does not need to vary linearly. As long as the resulting stress field is

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Elas

tic

Mo

du

lus,

E

Distance, r

61

able to satisfy the yield criterion and equilibrium, it can be used as a basis on which to design,

according to the lower bound theory of plasticity.

For the lower bound theory of plasticity, a stress field that satisfies equilibrium and does not

exceed the material yield criteria at any point provides a lower-bound estimate of capacity of

an elastic-perfectly plastic structure. For this to be applicable reinforced concrete, complete

crushing of the concrete must not occur prior to yielding of the reinforcement.

Furthermore, the consideration of ductility is essential when designing using the lower bound

theory of plasticity. Ductility is the capacity of the material to deform in the inelastic range

without significant loss of its load-bearing capacity (Carmo, Ricardo & Lopes 2005). Sufficient

ductility must be present to allow the structure to redistribute stress to the load path assumed

by the designer. The most effective measure to increase ductility of concrete structures is the

provision of confining reinforcement.

4.1.1 Finite Element Implementation

As introduced earlier, it is well-known (Logan 2002; Timoshenko 1969) that in the finite

element method, to construct the element stiffness matrix, the following equations are usually

used:

[ ] ∫ ∫ [ ]

[ ][ ] | | (Eq. 4-1)

Here [D] is the constitutive matrix, which for plane stress is:

[ ]

( )[

] (Eq. 4-2)

In most applications the elastic modulus E is considered as constant, since the material is

homogeneous. However, the approach proposed here modifies the constitutive matrix by

using an elastic modulus E(s, t) which varies in terms of the local spatial coordinates(s-t) of the

element, while keeping Poisson’s ratio constant.

62

[ ] ( )

( )[

] (Eq. 4-3)

The approach adopted here uses the same iso-parametric shape functions iN to define the

variation of the elastic modulus function E(s, t) as are used to approximate the variation of

displacement between the nodes and to map the local coordinates to the global coordinates:

( ) ∑ (Eq. 4-4)

In this work four-node bilinear quadrilateral elements are used. The shape functions iN for

the 4-noded isoparametric bilinear quadrilateral element can be expressed (Logan 2002;

Timoshenko 1969) in terms of the local coordinates s-t, as follows:

( ) ∑ [ ] [

] (Eq. 4-5)

Here are the same functions presented in chapter 2.4, and the stress in each element can

then be determined in the conventional way, but with the constitutive matrix [D(s,t)] varying

spatially due to the spatially varying E(s, t):

[ ] [ ][ ][ ] (Eq. 4-6)

Standard finite element analysis can now be easily conducted using commercial finite element

software packages, many of which allow the designer to specify detailed models for the

behaviour of the element material. In packages such as ABAQUS and ANSYS, designers can

easily define the element material behaviour by using a custom definition, and this technique

is widely used in practice. However, as the proposed method here requires the element

material properties to vary spatially over the elements, Matlab is used to conduct the analysis.

63

4.1.2 Application to L-shaped Plate

Figure 31 gives the geometry and boundary conditions for an L-shaped plate which has a

uniformly distributed load applied at the bottom.

Figure 31: L-shape Plate

Firstly, a conventional analysis is performed using a constant elastic modulus, and the Von

Mises stress plots are shown as Figure 32.

64

Figure 32: Von Mises Stress of L-shaped Plate (Coarse Mesh)

From Figure 32, the stress singularity can be seen at the re-entrant corner of the plate. Using a

finer mesh, the stress singularity becomes more evident, as shown in Figure 33. The maximum

stress is more than doubled. If an even finer mesh was used, the stress at the re-entrant corner

would become higher still. No matter how far the mesh was refined, the stress would never

converge. In the figures which follow, the results generated from the finer of the two meshes

will be used, as they show the effect of the singularity better.

65

Figure 33: Von Mises Stress of L-shaped Plate (Finer Mesh)

In order to find out the location of the singularity, plots of Von Mises stresses along the

centrelines of plate in both the X Direction and Y Direction were generated, and are shown in

Figure 34 and Figure 35, respectively.

66

Figure 34: Von Mises Stress over X Direction

Figure 35: Von Mises Stress over Y Direction

From Figure 34 and Figure 35, the stress singularity can be seen to significantly disrupt the

stress field within a circular area of approximate radius 0.1. Therefore, to remove the stress

singularities, the stress redistribution process should be performed with the elastic modulus

graded linearly down to zero at the re-entrant corner from its full value at a circular arc with a

radius of 0.1. The change of elastic modulus is shown in Figure 36 and Figure 37.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25Von Mises Stress over X Direction

Distance Along Path

Str

ess

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1Von Mises Stress over Y Direction

Stress

Dis

tan

ce

Alo

ng

Pa

th

67

Figure 36: Procedure for adjusting Elastic Modulus

Figure 37: Relative Value for Elastic Modulus

68

After defining the spatially varying elastic modulus, this structure is analysed again using the

finite element implementation introduced above. The new Von Mises stress result with the

stress redistribution is shown in Figure 38, from which it can be seen that, in comparison to

Figure 33, the stress singularities are removed successfully.

Figure 38: Von Mises Stress of L-shaped Plate after Stress Redistribution

Figure 39: Principal Compressive Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)

69

Figure 40: Stress in X Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)

Figure 41: Stress in Y Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)

Figure 39, Figure 40 and Figure 41, respectively, present the comparison of principal

compressive stress, stress in X and Y direction for the L-shaped plate by using LEFEA and

MLEFEA involving stress redistribution. Results also show that the stress singularities in the re-

entrant corner are successfully redistributed to the stiffer part of the model, where the overall

force equilibrium is preserved.

70

Figure 42: Principal Tensile Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)

In addition, the comparison of principal tensile stress for the L-shaped plate by using LEFEA

and MLEFEA involving stress redistribution is shown in Figure 42, from which it can be seen

that after the stress redistribution, the tensile stress increased in the some areas. Therefore,

unduly softening the member will result in a less optimal stress field, where more

reinforcement is required in order to maintain ductility.

4.2 Summary

The above application of stress redistribution to the L-shaped plate shows that the proposed

method is efficient in terms of removing the stress singularities. As the stress field generated

from this method is statically admissible and in an internal equilibrium, according to the load

bound theory of plasticity, it can be used as a design approach for complex structures, such as

deep beam with openings. The redistributed stress field is smooth and does not have any

significant peaks. For the L-shaped plate it more closely resembles the stress field that would

intuitively result from connecting a horizontal beam and a vertical beam-column to a square

block of concrete.

71

5 Adaptive Stress Redistribution Approach

While the stress redistribution process introduced in the previous chapter is shown to be

effective and reasonable, softening more elements than is absolutely necessary to remove the

singularity may result in a less optimal stress field, which in turn may require more

reinforcement to provide the necessary strength and ductility. At the same time, the approach

requires the designer to choose the area to be softened, which could be very laborious,

especially if the structure has a great number of discontinuities. This chapter proposes an

adaptive stress redistribution method to overcome these shortcomings. When the

compressive stress in the beam exceeds the specified compressive strength limit, this adaptive

approach will locally adjust the elastic modulus to reduce the over-limit stresses to acceptable

values. Another limitation of LEFEA based designs noted in previous chapters is the inability to

locate the tensile steel in the most effective position. This chapter will also apply adaptive

stress redistribution to the tensile stresses in order to locate the centroid of the tensile

stresses (and hence the centroid of the reinforcing steel) in a predefined position, allowing the

lever arm to be maximised and the steel to be used effectively.

5.1 Adaptive Compressive Stress Redistribution Approach

Under the current design code AS3600, the principal compressive stress should not exceed the

maximum allowable value . However, when facing non-flexural

members such as deep beam with openings with re-entrant corners, stress singularities always

violate this criterion (provided the finite element mesh is fine enough to give accurate results)

and thus obstruct the application of the design approach. The work presented here introduces

an iterative process to redistribute the overstressed compressive stresses to the surrounding

areas by reducing the elastic modulus in the overstressed areas.

72

This iterative process continues until all the stresses are within the compressive stress limit. In

summary, the process involves 4 stages:

Stage 1: Specify the finite element model with initial Young’s modulus values, and then

conduct the analysis as usual ( for the singular points elastic modulus an elastic modulus of

zero is specified, for all other points the initial elastic modulus is homogeneous);

Stage 2: If needed, soften the areas where peak stresses exceed the yield criterion by reducing

the elastic modulus at appropriate nodes, while maintaining the Poisson’s ratio constant;

Stage 3: Re-run the analysis with the modified model and check to see whether all the yield

criteria are satisfied. If they are not, return to stage 2; otherwise continue to stage 4;

Stage 4: Calculate the reinforcement dimensions required to carry all the principal tensile

stresses present in the concrete.

A flowchart for this method is shown below in Figure 43.

Figure 43: Flowchart of the Practical Implementation of Adaptive Compressive Stress Redistribution

73

Initialising the Elastic Modulus

An initial value of elastic modulus is needed before the initial structural analysis can be

conducted. The precise value of this initial elastic modulus is not very important, since the

redistribution of stresses is determined by the relative value of elastic modulus of model. In

the work reported here, a value characteristic of concrete is chosen as the initial elastic

modulus.

However, when dealing with members containing stress singularities, such as a deep beam

with rectangular openings, stress singularities resulting from the geometric or boundary

discontinuities can be removed. This can be achieved by specifying the elastic modulus as zero

for the points of singularity at this stage. As stated previously, this is reasonable, as the

concrete at those points will crack or crush, and thus the material here will not carry any

stresses.

Once initial values of the elastic modulus are defined, finite element analysis of the structure

can be performed in the conventional way.

Adjusting the elastic modulus iteratively

Results from the previous stage are used to adjust the elastic modulus in regions where the

stresses are larger than the allowable value. According to the AS3600, the maximum allowable

principal compressive stress is:

(Eq. 5-1)

Table 4: Stress Reduction Factors

Material Stress Reduction Factor

Concrete in compression 0.6 Steel in tension 0.8

74

Here is the stress reduction factor which is specified as in Table 4; is the effective

compressive strength factor which can be evaluated as follows (Standards Australia 2009):

“(i) in regions not containing confining reinforcement: when the principal tensile

stress does not exceed and otherwise;

(ii) in regions where effective confining reinforcement is provided: shall be evaluated by

rational calculation taking account of the amount of confining steel and the details used, but

shall not exceed 2”;

The factor was originally developed by Vecchio & Collins (Vecchio & Collins 1986) and the

following relationship is obtained:

(Eq. 5-2)

Foster (Foster 2003) states that this relationship “accounts for both confinement effects, as is

the case for concrete in biaxial or triaxial compression, and disturbance effects such as caused

by the transmission of tension fields through compression fields”. At the same time, Foster

(2003) conservatively suggests that the factor can be taken as 0.6 when the principal stress

due to the applied load is √

and as 1.0 while √

, and this

factor is:

{

√

√

(Eq. 5-3)

In order to soften the elastic modulus in a continuous way, the work here reduces the elastic

modulus proportionally by using:

|

|

(Eq. 5-4)

75

Here the factor (0<<1) is chosen to reduce the elastic modulus by a greater speed ratio, the

detailed explanation for the choice of this factor will be discussed in section 5.1.1. By

substituting (Eq. 5-1) into (Eq. 5-4), the following expression is obtained:

|

|

(Eq. 5-5)

Substituting (Eq. 5-3) into (Eq. 5-5), the rule used to adjust the elastic modulus in stage 2 is

specified as:

{

|

|

√

|

|

√

(Eq. 5-6)

Here, the subscript indicates the associated node; while the superscript denotes the

iteration number.

Dimensioning Reinforcement

Once this iterative procedure has ensured all the stresses are within the allowable stress

requirements, the reinforcement can be dimensioned to carry all the tensile stresses present

within the structure. As stated in AS3600 (Standards Australia 2009), “reinforcement and/or

tendons shall be provided to carry all of the internal tensile forces, with stresses not exceeding

and , respectively”, where is shown in Table 4.

5.1.1 Application to Flexural Reinforced Concrete Beam

To illustrate the adaptive compressive stress redistribution approach, the same flexural

reinforced concrete beam shown in Figure 9 is used as an example.

As expressed in (Eq. 3-14) and (Eq. 3-19), the maximum principal compressive stress within the

beam is 13.5MPa and the ultimate moment generated from the applied load is 185KNm. If the

same thickness of the beam as the conventional stress block method where the thickness is

76

300mm is used, using the conventional LEFEA method will result in a stress field violating the

stress criteria specified by AS3600 as:

(Eq. 5-7)

Obviously, this problem can be eliminated by increasing the beam thickness. However, the

alternative is to keep using the original beam thickness (300mm) and perform the modified

LEFEA approach to redistribute the over-stressed stresses.

The adaptive compressive stress redistribution approach introduced in section 5.1 is

performed with (the selection of will be discussed later). Figure 44 presents the

principal compressive stress for the shallow beam by using the adaptive compressive stress

redistribution approach. The results show that the maximum principal compressive stress is

now 13.347MPa, which meets the criteria required by AS3600. Figure 45 indicates the

difference of principal compressive stress for shallow beam between the conventional LEFEA

approach and the adaptive compressive stress redistribution approach. It can be seen that the

peak stresses are successfully redistributed to the stiffer part of the model, while the overall

force equilibrium is preserved.

Figure 44: Principal Compressive Stress for Shallow Beam---MLEFEA

0 200 400 600 800 1000 1200 1400 1600 1800 20000

100

200

300

400

-12

-10

-8

-6

-4

-2

0

77

Figure 45: Difference of Principal Compressive Stress for Shallow Beam--- (LEFEA minus MLEFEA)

The stress plot and the elastic modulus variation across the mid-span of the beam are shown in

Figure 46 and Figure 47, respectively.

Figure 46: Stresses across Mid-span after Compressive Stress Redistribution

0 200 400 600 800 1000 1200 1400 1600 1800 20000

100

200

300

400

-8

-6

-4

-2

0

2

-15 -10 -5 0 5 10 15 200

50

100

150

200

250

300

350

400

450


Dis

tan

ce

fro

m B

ott

om

Fib

re o

f th

e B

ea

m (

mm

)

78

Figure 47: Relative Value of Elastic Modulus across Mid-span after Compressive Stress Redistribution

Figure 46 shows that all the principal compressive stresses are now below the stress limit

specified by AS3600, which in this case is 13.5MPa. From the stress plot, the tensile force can

be calculated using the trapezoidal rule for integration as:

∫ ( ) (Eq. 5-8)

The area of reinforcement required is therefore:

(Eq. 5-9)

To preserve equilibrium, the centroid of the steel area must be located a distance above the

bottom fibre of the beam of:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-10)

The rate of convergence of this approach is significantly affected by the value of (Equation

5.5). To find out the effect of in (Eq. 5-4), a parametric study was conducted under the same

applied load of 200KN and beam thickness of 300mm but using different values of , and the

results are shown in Table 5. These results show that the values of less than one significantly

0 0.2 0.4 0.6 0.8 1 1.20

50

100

150

200

250

300

350

400

450

Relative Value of Elastic Modulus across Mid-span

Dis

tan

ce

fro

m B

ott

om

Fib

re o

f th

e B

ea

m (

mm

)

79

increase the rate of convergence, but that the smaller the value is, the lower the quality of the

solution. Based on this parametric study, a value of = 0.8 has been used throughout the rest

of this study.

Table 5: Comparison of MLEFEA with different value of ε

1 0.9 0.8 0.7 0.6 0.5 0.4

Area ( )

1451 1459 1469 1480 1500 1510 1520

Xc (mm) 72.3 72.0 71.6 71.0 70.1 69.3 68.5

Iterations 184 8 6 4 4 3 3

5.1.1.1 Cost Comparison and Remarks

Table 6: Approaches Comparison for Shallow Beam


Concrete Thickness (mm)

Steel Position (mm)

Conventional (Equivalent Stress Block)

1329.6 300 50

Conventional LEFEA 1548.4 407.8 75

MLEFEA (Adaptive Compressive Stress

Redistribution) 1459.3 300 72

Table 6 compares the cost of the design resulting from the new approach with the designs

resulting from the equivalent stress block method and the conventional LEFEA approach. The

concrete requirements for the new method are now equivalent to those of the equivalent

stress block method and 25% less than those of the conventional LEFEA approach. However,

the steel requirements are still 10% more as the position of the steel is still not in the optimum

position.

80

5.1.2 Application to Non-flexural Reinforced Concrete Beams without


In this section, the Modified LEFEA approach is applied to the design of the deep beam without

rectangular openings. The geometry and loading of the beam are shown in Figure 14. Using the

conventional LEFEA approach, the analysis reported in section 3.2.2 required that the design

thickness of the beam be 196.38mm. If the beam thickness is decreased, e.g. to 180mm, based

on the linearity of the model, the stresses will then violate the maximum principal compressive

stress allowed by AS3600:

(Eq. 5-11)

However, by using the adaptive compressive stress redistribution approach, a reduced beam

thickness is possible with the stress criteria being maintained. The adaptive compressive stress

redistribution approach will lead to a more efficient design, as it can redistribute the peak

compressive stresses into the surrounding areas and ensure all the stresses are below the

stress limitation specified by AS3600.

Figure 48 presents the principal compressive stress field computed by conducting the adaptive

compressive stress redistribution with a concrete thickness is 180mm. From the analysis, the

largest principal compressive stress resulted from the load of 1000KN is 13.412MPa, which is

smaller than the stress limitation (13.5MPa) of AS3600.

81

Figure 48: Plots of Principal Compressive Stress using LEFEA with Adaptive Compressive Stress Redistribution

Like the conventional LEFEA, the actual concrete thickness can be obtained via dividing the

stress value using the largest principal compressive stress limit and is:

(Eq. 5-12)

Figure 49: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress Redistribution

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000Principal compressive stress

-12

-10

-8

-6

-4

-2

0

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

2000Principal tensile stress

-10

-8

-6

-4

-2

0

2

4

6

82

The principal tensile stresses and the stresses across mid-span are shown in Figure 49 and

Figure 50 respectively. The resultant tensile force can be found by using the trapezoidal rule to

integrate the tensile stress across mid-span of the beam and is:

∫ ( ) (Eq. 5-13)

The quantity of steel reinforcement required at mid-span is approximately:

(Eq. 5-14)

The required position of the steel centroid measured from the bottom of the beam is

calculated as:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-15)

Figure 50: Plot of Stresses at Mid-span

Figure 50 shows that all the stresses are now below the stress criteria (that is 13.5MPa)

required by AS3600.

-15 -10 -5 0 5 100

500

1000

1500

2000


Dis

tan

ce

fro

m B

ott

om

Be

am

Fib

re

83

Figure 51: Stresses across Mid-span after Compressive Stress Redistribution (First and Last Iteration)

Figure 51 illustrates the process of stress redistribution by indicating the first and last iteration

steps. The analysis in the first iteration is the same as the conventional LEFEA method, where

the maximum stresses violate the stress criteria required by AS3600. After the application of

MLEFEA, the peak stresses are redistributed to the surrounding areas and the final stress field

is within the required limit.


Table 7 compares the designs resulting from the conventional STM, the conventional LEFEA

approach and the adaptive compressive stress redistribution approach. It can be seen that the

adaptive compressive stress redistribution approach is more efficient than the conventional

LEFEA approach in terms of the concrete usage, as expected. However, there is no

improvement in the area of steel required, as the steel is still not located in the optimum

position.

-15 -10 -5 0 5 100

500

1000

1500

2000

First Iteration

Last Iteration

84

Table 7: Approaches Comparison for Deep Beam


Concrete Thickness (mm)

Steel Position (mm)

Conventional STM (Strut and Tie)

1250 246 250

Conventional LEFEA 1397.4 196.4 341.5

MLEFEA (Adaptive Compressive Stress Redistribution)

1397.4 178.8 341.1

5.1.3 Application to Non-flexural Reinforced Concrete Beams with


This section demonstrates the application of the adaptive compressive stress redistribution

approach to the deep beam with rectangular openings shown in Figure 21. As discussed earlier,

the stress singularities involved in the deep beam with square or rectangular openings

invalidate the conventional LEFEA approach. To permit direct comparison, the adaptive

compressive stress redistribution approach is performed with a predefined beam thickness is

246mm, which is the beam thickness used in the conventional STM. The value of is 0.8 and a

finer mesh of 10mm is used so as to compute the stress singularities accurately.

Figure 52 presents the principal compressive stress after conducting the adaptive compressive

stress redistribution approach. The largest principal compressive stress resulting from the load

of 1000KN is 12.3MPa.

85

Figure 52: Plots of Principal Compressive Stress using adaptive compressive stress redistribution approach

Like the conventional LEFEA approach, the actual concrete thickness can be obtained via

dividing the stress value using the largest principal compressive stress limit and is:

(Eq. 5-16)

Figure 53: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress Redistribution

0 500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

-12

-10

-8

-6

-4

-2

0

2

500 1000 1500 20000

200

400

600

800

1000

1200

1400

1600

1800

-8

-6

-4

-2

0

2

4

6

8

86

The plots of principal tensile stress and stresses across mid-span are shown in Figure 53 and

Figure 54 respectively. The resultant tensile force can be found by integrating the tensile stress

across mid-span of the beam and is:

∫ ( ) (Eq. 5-17)

The quantity of steel reinforcement required at mid-span is approximately:

(Eq. 5-18)

The distance of the centroid of the steel from the bottom of the beam is then calculated as:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-19)

Figure 54: Plot of Stresses at Mid-span


Table 8 compares the designs resulting from the conventional STM, the conventional LEFEA

approach and the adaptive compressive stress redistribution approach. The new approach

-12 -10 -8 -6 -4 -2 0 2 4 60

200

400

600

800

1000

1200

1400

1600

1800

2000


Dis

tance f

rom

Bott

om

Beam

Fib

re (

mm

)

87

results in savings in both concrete and steel in comparison to the more conventional

approaches.

Table 8: Cost Comparison for designs of Deep Beam with Openings

Approaches Volume of

Steel ( )


Steel Position (mm)


6132000 246 250

Conventional LEFEA 5336400 278.2 317

MLEFEA (Adaptive Compressive Stress Redistribution)

5330800 224 317

5.2 Adaptive Tensile Stress Redistribution Approach

The tensile stresses are all carried by the reinforcement, and the distribution of tensile stresses

decides the area and position of the reinforcement. However, following this procedure may

not result in the most efficient structure. For example, if this approach is used to design a

conventional beam, the reinforcement will be placed at the centroid of the calculated tensile

stress, rather than as close as possible to the bottom of the beam (maximizing the lever arm

for the tensile force), as would be done in conventional design. This will result in more steel

being necessary in the finite element based design than in the conventional design. In most

structural design situations, the designer will be able to identify the optimum position of the

tensile steel in a structure before performing the analysis. In doing this he or she will take into

account requirements about the concrete cover for the steel.

To increase the efficiency of the designs resulting from the proposed stress redistribution

process, in this section the steel position is assumed to be pre-defined before the analysis is

performed. The redistribution process is used to ensure the distribution of tensile stresses

matches well to the steel position, in the sense that the position of the tensile stress resultant

coincides with the proposed steel position.

88

Before considering the redistribution of tensile stress, the procedure for using linear stress

analysis to dimension the reinforcement from the tensile stress distribution is reviewed.

Firstly, linear stress analysis is conducted, and then cross-sectional cuts taken across the

principal tensile stress field perpendicular to the direction of the principal tensile stress vector,

giving plots of the principal tensile stresses.

Then the resultant tensile force can then be found from the integration of stress plots along

cross-sections using:

∫ ( ) (Eq. 5-20)

Here t is the thickness of the structure. Plane stress conditions have been assumed.

The required action line of steel can be obtained by determining the centroid of tensile stress

plots via:

∫ ( )

∫ ( ) (Eq. 5-21)

In the case that the position of the steel is pre-defined, the difference between the position

given by the above equation and the pre-defined value can be used as the criteria to drive the

stress redistribution of tensile stress. The flowchart of the tensile stress redistribution is shown

in Figure 55.

89

Figure 55: Flowchart of the Practical Implementation for Tensile Stress Redistribution

The position criterion here (in Figure 55) refers to the difference between the actual steel

position and predefined position . In this work, is introduced as:

| | (Eq. 5-22)

The procedure for redistributing the tensile stress suggested here is based on the observation

that the optimum position for steel generally coincides with the peak tensile stress regions,

while the centroid of the tensile stresses will be located at a position of lower stress. For

example, in a simple beam the optimum position for the steel is at the bottom of the beam,

where the tensile stress is maximum. However, linear stress analysis will result in a triangular

distribution of tensile stress with the resultant located 1/3 d from the bottom of the beam,

where d is the distance to the neutral axis. In practical concrete designs, the requirement that

there is sufficient concrete cover over the steel limits the extent to which the steel can be

placed in the optimal position.

Based on the observation that the designer will usually want to shift the centroid of the tensile

stress field from its linear elastic position towards the highest tensile stress position, an

90

iterative procedure is proposed where the elastic modulus in areas of high tensile stress is

artificially increased. This attracts more tensile stress to these areas, moving the centroid of

the tensile stress field towards the high tensile stress location and increasing the magnitude of

the tensile stress. The increase in elastic modulus in these areas is continued until the position

of the tensile stress centroid matches the selected steel position.

In a similar (but reverse) process to the adaptive compressive stress redistribution process, the

adaptive tensile stress redistribution approach increases the elastic modulus proportionally by

introducing a factor (larger than one) to increase the rate of increase of E. The sensitivity of

the process to the factor will be discussed later. The rule used for the increase of E is:

(Eq. 5-23)

Here is the value of tensile stress calculated from the previous iteration, and is the

hardening stress. The subscript indicates the associated node, while the superscript

denotes the iteration number.

The hardening stress is chosen on the basis of the maximum tensile stress multiplied by a

factor (less than one), and can be represented as:

(Eq. 5-24)

Numerical trials showed that in order to get the tensile stress centroid to reliably converge to

the selected steel position, the value of has to be progressively decreased. To do this

another two parameters and were introduced with:

(Eq. 5-25)

The main steps of the adaptive tensile stress redistribution approach are:

Step 1: Perform a conventional linear analysis and find the position of reinforcement ;

91

Step 2: Compare with the pre-selected steel position and get their difference from

(Eq. 5-22);

Step 3: If is smaller than the tolerance, proceed to Step 4. Otherwise, find the maximum

tensile stress and choose a value of to set the hardening stress to (Eq. 5-24), perform the

stress hardening as shown in (Eq. 5-23), and then go back to Step 2 to re-run the linear analysis;

Step 4: Find out the position and area of reinforcement to carry all the principal tensile stress

in the concrete.

5.3 Adaptive Stress Redistribution Approach for both

Compressive and Tensile Stress

The two stress redistribution processes can be performed together, as shown in Figure 56:

Figure 56: Flowchart of the Practical Implementation for both Compressive and Tensile Stress Redistributions

The application of this adaptive method will be introduced in the following sections.

92

5.3.1 Application to Flexural Reinforced Concrete Beam

To demonstrate the adaptive stress redistribution for both compressive and tensile stress, the

flexural reinforced concrete beam shown in Figure 9 is used again. The preselected steel

position from the bottom of the beam is 50mm, which is the same as the steel position used in

the conventional design approach.

The parameters used for the adaptive stress redistribution for both compressive and tensile

stress are ; ; ; ; ; . The resulting stress plot

across the mid-span of the beam is shown as Figure 57, and the elastic modulus variation

across mid-span of the beam is shown in Figure 58.

Figure 57: Stresses across Mid-span after Stress Redistribution for both Compressive and Tensile Stresses

-20 -10 0 10 20 30 40 50 600

50

100

150

200

250

300

350

400

450


Dis

tan

ce

fro

m B

ott

om

Fib

re o

f th

e B

ea

m (

mm

)

93

Figure 58: Relative Value of Elastic Modulus across Mid-span after Stress Redistribution for both Compressive and

Tensile Stresses

The tensile force is calculated through:

∫ ( ) (Eq. 5-26)

With the area of reinforcement:

(Eq. 5-27)

The position of the steel centroid above the bottom fibre of the beam is:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-28)

As hoped, this is very close to the target 50mm.


At this stage four different approaches have been used for the design of flexural beam. These

approaches are the conventional equivalent stress block approach, the conventional LEFEA

approach, the adaptive compressive stress redistribution approach and the adaptive stress

0 0.2 0.4 0.6 0.8 1 1.20

50

100

150

200

250

300

350

400

450

Relative Value of Elastic Modulus across Mid-span

Dis

tan

ce

fro

m B

ott

om

Fib

re o

f th

e B

ea

m (

mm

)

94

redistribution approach for both compressive and tensile stress. A comparison of the designs

resulting from these four different approaches is shown in Table 9.

Table 9: Approaches Comparison for Shallow Beam (with Updated MLEFEA)

Approaches Area of Steel

( ) Concrete Thickness

(mm)

Steel Position (mm)

Conventional (Equivalent Stress Block)

1329.6 300 50

Conventional LEFEA 1417.9 384 75.29


Redistribution) 1459.3 300 72

MLEFEA (Adaptive Stress Redistribution for Both

Compressive and Tensile Stress) 1213.5 300 50.58

Table 9 shows that the new MLEFEA method with redistribution of both compressive and

tensile stress provides the most efficient design in terms of both concrete and steel use.

In order to compare these four approaches more thoroughly, a parametric study was carried

out performing several parallel designs, each under an applied load of 200KN for the same

beam, but with a range of thicknesses: 200mm, 250mm, 300mm, 350mm, 400mm and 450mm.

The results are shown in Figure 59.

Figure 59: Design Results for Shallow Beam

1000.00

1100.00

1200.00

1300.00

1400.00

1500.00

1600.00

150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00

Are

a o

f St

eel (

mm

2)

Thickness of Beam (mm)

Shallow Beam

MLEFEA (comp) LEFEA Equivalent Stress Block MLEFEA (both)

95

Figure 59 shows clearly the trade off between thickness of beam and area of reinforcement,

which gives designers the flexibility to choose which material they want to save during the

design. Compared to both the conventional equivalent stress block approach and the adaptive

compressive stress redistribution for both compressive and tensile stress, design produced by

conventional LEFEA method is extremely conservative and leads to material waste in terms of

both concrete and steel.

There are only slight differences between the adaptive compressive stress redistribution for

both compressive and tensile stress and the conventional equivalent stress block approach.

The approach involving stress redistribution for both compressive and tensile stresses uses the

same steel position as the conventional method. Overall, the MLEFEA approach gives similar

(though slightly better) results than the conventional equivalent stress block approach.

The reason for the slight differences between the conventional equivalent stress block method

and the MLEFEA method is because, although the total volume of the equivalent compressive

stress block approximately equals to the MLEFEA compressive stress block there is a small

difference in the distribution. This is shown in Figure 60, which shows the volume for

compressive stress block (negative stresses are compressive stresses) for all these three

approaches are quite similar.

96

Figure 60: Stress Blocks for Different Approaches

5.3.1.2 Nonlinear verification

To verify the safety of this proposed approach, ABAQUS/CAE is used to conduct the non-linear

finite element analysis of the final design produced by the new approach. The damaged

plasticity model is used to describe the behaviour of concrete, while an elastic-plastic model is

used to model the reinforcement. The connection between concrete and steel is considered as

embedded. For the spring supports, two reference points are used to apply the boundary

conditions on the two supports.

Based on the non-linear analysis, the load vs. deflection curve of this shallow beam is as shown

in Figure 61, which indicates that the ultimate mid-span load is higher than the design load of

200KN. This result means the design is safe and the approach is reasonable.

-20 -10 0 10 20 30 40 50 60

50

100

150

200

250

300

350

400

450


Dis

tan

ce

fro

m B

ott

om

Fib

re o

f th

e B

ea

m (

mm

)

MLEFEA (Both)

MLEFEA (OnlyComp)

Equivalent Stress Block

97

Figure 61: Load vs. Deflection Curve for Shallow Beam

5.3.2 Application to Non-flexural Reinforced Concrete Beams without


The adaptive stress redistribution for both compressive and tensile stress is now used to

design the non-flexural reinforced concrete beam without rectangular openings shown in

Figure 14. The pre-selected steel position is 250mm, which is the same as the steel position for

the conventional STM, and is much lower than that for the conventional LEFEA method

(341.5mm).

The parameters used in this application are ; ; ; ; ;

. The stress plot across the mid-span of the beam is shown in Figure 62 and the elastic

modulus variation across mid-span of the beam is presented in Figure 63 .

0

50

100

150

200

250

300

350

0 50 100 150 200

Load

(K

N)

Deflection (mm)

Load vs. Deflection

98


Figure 63: Relative Elastic Modulus along Mid-span after Stress Redistribution for both Compressive and Tensile

Stresses

The tensile force is calculated as:

∫ ( ) (Eq. 5-29)

The area of reinforcement is:

-20 -10 0 10 20 30 40 50 600

500

1000

1500

2000


Dis

tan

ce

fro

m B

ott

om

Be

am

Fib

re (

mm

)

0 0.2 0.4 0.6 0.8 1 1.2 1.40

500

1000

1500

2000

Relative Value of Elastic Modulus at Mid-span

Dis

tan

ce

fro

m B

ott

om

Be

am

Fib

re (

mm

)

99

(Eq. 5-30)

The target steel centroid position was 250 mm above the bottom fibre of the beam, and the

position achieved was very close to the target at:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-31)


The designs resulting from the four different approaches are summarised in Table 10.

Table 10: Approaches Comparison for Deep Beam (with Updated MLEFEA)

Approaches Area of Steel

( ) Concrete

Thickness (mm)

Steel Position (mm)


1250 246 250

Conventional LEFEA 1397.4 196.4 341.5


Redistribution) 1397.4 181.2 341.1

MLEFEA (Adaptive Stress Redistribution for Both

Compressive and Tensile Stress) 1292.3 180 249.1

The adaptive stress redistribution for both compressive and tensile stress results in the lowest

use of concrete (27% reduction over the strut-tie design) and the smallest increase in steel

over the strut-tie approach (3%).

Figure 59 showed that, for shallow beams, there is a clear trade off between using less

concrete and using more steel to support a given load. The parametric study was repeated for

deep beams. Figure 64 shows the comparison between these approaches for the designs of

deep beam without openings under the same applied load (1000KN) with different beam

thickness. The selected steel position is 250mm from bottom of the beam. The benefit of the

MLEFEA approach is clear in terms of concrete saving when compared to the conventional

100

LEFEA and STM approaches. Compared to shallow beams, there is much less of a trade off

between concrete and steel.

For the design of deep beams without openings, the MLEFEA with compressive stress

redistribution requires the same area of steel as the conventional LEFEA, while designs based

on MLEFEA with compressive stress redistribution require less concrete. And in terms of the

steel area, designs based on the MLEFEA with both stresses redistributed are quite similar to

the strut and tie design, while the MLEFEA has significantly less concrete usage.

Figure 64: Design Results for Deep Beam

At the same time, it can be seen that for the MLEFEA approach, there is a limit to how far the

thickness can be decreased, as a minimum amount of concrete is always needed to carry the

compressive stress within the beam.

5.3.2.2 Non-linear Verification

As was done for the shallow beam, to verify the safety of the proposed approach ABAQUS/CAE

is used to conduct the non-linear finite element analysis of the final design produced by the

MLEFEA with both compressive and tensile stress redistribution.

1000.00

1100.00

1200.00

1300.00

1400.00

1500.00

130.00 150.00 170.00 190.00 210.00 230.00 250.00 270.00

Are

a o

f St

eel (

mm

2 )


Deep Beam

MLEFEA (comp) LEFEA Strut and Tie MLEFEA (both)

101

The load vs. deflection curve of the deep beam is shown in Figure 65, which indicates that the

ultimate mid-span load is higher than the applied load (1000KN), and therefore the design

based on the MLEFEA approach is safe and the approach is reasonable.

Figure 65: Load vs. Deflection Curve for Deep Beam

5.3.3 Application to Non-flexural Reinforced Concrete Beams with


In this section, the MLEFEA with redistribution of both compressive and tensile stresses is

applied to the design of the non-flexural reinforced concrete beam with rectangular openings.

The geometry of the beam is shown as Figure 21.

The parameters used in this application are ; ; ; ; ;

. The resulting stress plotted across the mid-span of the beam is presented in Figure 66.

0

500

1000

1500

2000

2500

0 20 40 60 80

Load

(K

N)

Deflection (mm)

Load vs. Deflection

102


The tensile force is calculated as:

∫ ( ) (Eq. 5-32)

The area of reinforcement is:

(Eq. 5-33)

The target position of the steel centroid was again 250mm above the bottom of the beam, and

the achieved position was:

∫ ( )

∫ ( )

∫ ( )

∫ ( ) (Eq. 5-34)


For the designs of deep beam with rectangular openings, Figure 67 illustrates the differences

between the various approaches in terms of steel and concrete usage. The data in this figure is

generated by conducting similar designs using same parameters under the same applied load

(1000KN), but with different beam thicknesses. The same steel position is selected for both the

-20 -10 0 10 20 30 40 50 600

500

1000

1500

2000


Dis

tan

ce

fro

m B

ott

om

Be

am

Fib

re (

mm

)

103

conventional STM and the MLEFEA approach. It is clear from Figure 67 that the MLEFEA

approach generates more efficient designs in terms of concrete saving than both the

conventional LEFEA and the STM. This is because the MLEFEA approach solves the problem of

stress singularities violating the stress criteria in AS3600 and fully utilizes the strength capacity

of the concrete within the structure.

As with deep beams without web openings, although the MLEFEA approach is efficient in

concrete saving than conventional approaches, there is a limit to which the stress

redistribution approach can reduce the concrete thickness.

Figure 67: Design Results for Deep Beam with Openings

5.3.3.2 Nonlinear verification

With the aid of ABAQUS, non-linear finite element analysis is performed to verify the safety of

the design for the deep beam with rectangular openings produced by the adaptive stress

redistribution of both compressive and tensile stress. The load vs. deflection curve of the beam

obtained from this analysis is as shown in Figure 68, which shows that the ultimate mid-span

load is higher than the applied load (1000KN), so the design is safe and the approach is

reasonable.

1000.00

1100.00

1200.00

1300.00

1400.00

130.00 150.00 170.00 190.00 210.00 230.00 250.00 270.00 290.00

Are

a o

f St

eel (

mm

2 )


Deep Beam with Openings

MLEFEA (comp) LEFEA Strut and Tie MLEFEA (both)

104

Figure 68: Load vs. Deflection Curve for Deep Beam with Openings

5.4 Summary

This chapter demonstrates the application of the new MLEFEA adaptive stress redistribution

approach to three different types of simple structures, namely shallow (flexural) beams, deep

beams, and deep beams with web openings. The efficiency of the resulting designs is examined.

For the adaptive stress redistribution approach, both adaptive compressive stress

redistribution and adaptive stress redistribution for compressive and tensile stress are

investigated.

For the shallow beams, the stress redistribution approach can result in similar designs as the

conventional equivalent stress block approach, indicating that it is able to obtain designs which

are close to the maximum efficiency possible.

For the deep beams, the proposed approach generates more efficient designs than

conventional LEFEA approach and strut-and-tie approach in terms of concrete savings, as it

fully considers the contribution of the concrete to the overall strength of the structure. This

saving is significant as this approach can be used in the designs of other members, not only

beams.

0

500

1000

1500

2000

2500

0 20 40 60 80 100 120

Load

(K

N)

Deflection (mm)

Load vs. Deflection

105

For the deep beams with web openings, the proposed approach overcomes the difficulties

confronted in conventional LEFEA with respect to stress singularities. The adaptive stress

redistribution can obtain a stress field suitable for design by successfully removing the stress

singularities.

Furthermore, preliminary tests of the designs resulting from the new approach were

performed using non-linear finite element analysis through ABAQUS. Analysis results show the

designs produced on the basis of the proposed approach are safe, collapsing well after

reaching the design load. As the stress fields resulting from the proposed approach are

statically admissible, in equilibrium with the applied loads, and are able to satisfy the yield

condition, designs generated using the approach are reasonable, and comply with the lower

bound theory of plasticity.

This chapter has shown that the adaptive stress redistribution approach for both compressive

and tensile stress can lead to more efficient designs than the conventional approaches,

particularly in terms of concrete savings. Therefore, this approach has the potential to reduce

carbon emission and environmental pollution.

However, the approach requires use of a fine finite element mesh and an iterative procedure,

with the global stiffness matrix being reconstructed at each step. The remaining part of this

study will look at the possibility of speeding up the computational process by using GPUs.

106

PART II: EFFICIENT Graphic Processing Unit (GPU)

IMPLEMENTATION

The first part of this thesis successfully developed an efficient MLEFEA approach which leads to

more efficient designs than either the conventional strut - tie approach or the LEFEA approach.

This new approach performs the stress redistribution adaptively to generate a suitable stress

field which is in equilibrium both internally and externally, does not exceed a specified yield

criterion anywhere, and has the tensile stress resultant at a location selected by the designer.

In order to achieve this result the process may require a lot of iterations, and within each

iteration a finite element analysis with a fine element mesh must be performed. The stiffness

matrix changes in each iteration, due to changes in the elastic modulus. Consequently

factorisation of the stiffness matrix in one iteration cannot be used in the next iteration, and

the process is potentially computationally expensive. This part of the thesis examines how the

Graphical Processor Units on a normal PC graphics card can be used to accelerate the process.

This part of the thesis will discuss the GPU, its use in finite element analysis, and its application

to the new MLEFEA approach. To begin with, the basic theory of GPU processing is introduced,

and the literature regarding the GPU and its use in finite element analysis is reviewed. This is

followed by an investigation of CSR storage format for stiffness matrix and the SpMV algorithm.

After presenting the GPU-based Pre-Conditioned Gradient method (PCG) and the process

required to assemble the stiffness matrix in CSR, this part ends with a speed comparison

between CPU and GPU algorithms implementing the new MLEFEA approach applied to an

example of a deep beam with rectangular openings.

107

6 Basic Theory & Literature Review

In this chapter, the basic theory of GPU programming is introduced and the literature about

GPU use in finite element analysis is reviewed. For the GPU basic theory, the books

“Programming massively parallel processors: a hands-on approach” (Kirk & Hwu 2010) and the

“NVIDIA CUDA C Programming Guide” (NVIDIA Corporation 2007) are heavily referenced. For

the GPU usage in finite element analysis, GPU methods for stiffness matrix assembly and

solving are presented, with an emphasis on stiffness matrix solving, where both direct and

iterative solvers are explained. Iterative solvers based on the Jacobi method, Gauss-Seidel

method, and conventional and preconditioned conjugate gradient methods are introduced.

Research reported in the literature about the most time-consuming part of iterative solvers,

Sparse Matrix Vector Multiplication (SpMV), is also reviewed.

6.1 Graphics Processing Unit (GPU)

Motivated by the strong competition in the gaming industry, the programmable Graphic

Processor Unit or GPU is originally designed to accelerate computer graphics applications.

These days the GPU is considered as a good alternative for the CPU for applications where high

computing power and speed are required, because it has higher computation speed with lower

price. For example, in 2009, the ratio between GPUs and CPUs in terms of potential peak

floating-point calculation was about 10 to 1 (1 teraflops to 100 gigaflops). Most significantly,

the performance of GPUs is still growing rapidly, while the improvement of CPUs is relatively

slow.

The main reason for this huge difference in speed is because of the different design

philosophies for CPUs and GPUs, as shown in Figure 69. The design of CPUs is mainly for

optimising sequential code performance, and they make full use of the control logic algorithm

to maintain the appearance of sequential execution. Large cache memories are used to reduce

108

the data access latencies for complex applications. Additionally, both the control logic

algorithm and the cache memories will reduce the CPUs performance in potential peak

floating-point calculations. Because of the frame buffer requirements and the relaxed memory

model, it is very difficult for CPUs to meet the bandwidth needed to meet the requirements of

different applications, operating system needs and input/output devices.

Figure 69: Different Design Philosophies for CPUs and GPUs

In contrast, the design philosophy of the GPUs is to dedicate the maximum chip area to the

floating-point calculations by minimizing the control logic execution. Unlike CPUs, there are

fewer legacy requirements and simpler memory models for GPUs, so GPUs can achieve higher

bandwidth. Most importantly, because there are a large number of threads in GPUs, the GPU

hardware can automatically find some threads to execute when some of the other threads are

waiting for long-latency memory accesses such as the global memory accesses. Furthermore,

small cache memories are used in GPUs to help control the bandwidth requirement so threads

accessing the same memory data do not need to all go to the Dynamic Random Access

Memory (DRAM). Consequently, much more chip area can be used to do the calculations.

Besides the above mentioned advantages in terms of performance, there are some other

reasons for the popularity of GPUs. Firstly, the GPUs are very cheap and are supplied in most

PCs, in contrast to the traditional parallel computing systems which can only be accessed by

fewer users. Also, modern GPUs support the Institute of Electrical and Electronics Engineers

(IEEE) floating point standard, which makes GPUs suitable for different numerical applications.

109

In addition, instead of using the conventional applications-limited Application Programming

Interface (API) functions, such as OpenGL or Direct3D, to access the programming on graphic

chips, GPUs can be easily programmed parallelly using Compute Unified Device Architecture

(CUDA) programming language. CUDA extends standard C/C++ with some special GPU function

sets, enabling developers to access GPU programming easily without knowledge about

conventional GPU pipeline.

Most recently, other ways of programming GPUs have been developed, such as CUDA Fortran,

which is developed by PGI and NVIDIA and is only available in PGI 2010 and later release. it

includes a FORTRAN 2003 compiler and tool chain for programming NVIDIA GPUs using

FORTRAN. Unlike the CUDA C compiler, the CUDA Fortran compiler is not free, and thus is less

popular. Another way of accessing the power of GPUs is through MATLAB CUDA, which

provides the base for CUDA GPU-accelerated MATLAB operations. There are three ways to

accelerate MATLAB using GPU: the MATLAB plug-in for CUDA using MEX files, the Jacket

Engine for MATLAB acceleration and the GPUmat (Zhang et al. 2011). However, MATLAB CUDA

is still maturing and has quite a number of limitations, and its usage is confined to some

specific problems.

A CUDA program runs on a host that is a traditional Central Processing Unit (CPU) and accesses

one or more GPU devices which are parallel processors. The sequential host program running

on CPU can call kernels written in CUDA to run on the GPU devices. The host and device code

is separated by the NVIDIA C compiler (nvcc). The host code is conventional ANSI C code and is

compiled as a standard CPU process, while the device code compiled by nvcc is written using

ANSI C extended and will be executed on a GPU device. The execution of a CUDA program is

shown in Figure 70. The execution starts with the host serial code (CPU code), and then when a

kernel function is invoked, it comes to the device parallel code (GPU code), where the threads

being executed in the kernel are organized as a grid. When all the threads in a kernel are

110

executed completely, the corresponding grid terminates and the whole code will continue to

execute on the host till another kernel is invoked.

Figure 70: Execution of a CUDA Program

To achieve parallel execution, a CUDA program contains an algorithm of Single Instruction

Multiple Threads or SIMTs. The CUDA executes each SIMT in parallel by a set of threads. The

threads are organized into blocks, with one or more blocks constituting a kernel. The hierarchy

of CUDA threads is presented in Figure 71, where only a small number of threads are shown

for simplicity.

Generally, grids are two-dimensional arrangements of blocks, and all blocks in a grid have the

same dimension. Each dimension of a grid can range from 1 to 65535. Blocks are usually three

dimensional arrangements of threads. The maximum number of threads in a block is 512, with

flexibility in how the threads are distributed into three dimensions, as long as the total number

of threads in that block is not over 512. Because of this hierarchical architecture, every thread

has its unique integer index within its block, and every block has its unique integer index within

its grid.

111

Figure 71: Hierarchy of CUDA Threads

In order to remove the global memory limitations of long access latencies (hundreds of clock

cycles) and finite access band-width, CUDA provides several types of memories in a unique

memory hierarchy architecture, which is illustrated in Figure 72. At the bottom of Figure 72,

there is the global memory, the constant memory and the texture memory, all of which are

available to every grid and can be written and read by the host code (running on CPU) by

calling API functions. The constant memory and the texture memory can only be accessed

read-only by the device code (running on GPU) and have faster data transfer speed and more

parallel paths for CUDA than the global memory. Both the host code and the device code can

write to the global memory.

Registers are located in every individual thread, and each thread can only access its own

corresponding registers. Shared memory is located in every thread block, and all the threads

within a block can read and write data to this block’s shared memory. This is a very efficient

way for threads in a block to share and incorporate data. In CUDA, the shared memory is

usually used to store the portion of global memory which will be heavily used in a kernel

execution.

112

Figure 72: Hierarchy of GPU Memory

For thread execution, the most important issue is thread scheduling. In CUDA, once a kernel is

invoked, the block will be assigned to a streaming multiprocessor (SM), and threads in a block

will be divided into several warps, each of which consists of 32 threads. This is the way for a

CUDA program to execute long-latency operations. When an instruction held by threads within

a warp must wait for another long-latency operation, this warp will not be executed at that

moment, and another warp which does not need to wait for another operation will be selected

to execute. If more than one warp can be selected for execution, a priority mechanism will be

used to select the warp. This process is known as latency hiding.

As there are a large number of threads and thus a great number of warps in any CUDA

execution, the hardware is able to find a warp to execute at any time. This latency hiding

makes full use of the capacity of the hardware, despite the presence of long-latency

operations. Partly as a result of this, GPUs achieve greater performance in terms of computing

time compared to CPUs.

Nowadays, because of their tremendous performance, peak computing capability and large

memory bandwidth, GPUs are now being widely used for scientific computations, including

113

physics simulations, cloth simulations (Cecka et al. 2010), fluid dynamics (Elsen et al. 2008;

Goddeke et al. 2009), finite element simulations (Galoppo et al. 2005), as well as many other

applications (Cevahir et al. 2010b).

The finite element method involves huge number of processes which could potentially be done

in parallel, such as stiffness matrix assembly and the solving of stiffness matrix. This makes it

possible to use GPUs to accelerate the FEM. The GPU work reported in this thesis attempts to

optimize stiffness matrix solving using a GPU Preconditioned Conjugate Gradient (GPUPCG)

approach, which makes full use of the massively parallel computation features of the GPU.

6.2 GPU Implementation of Finite Element Analysis

6.2.1 Stiffness Matrix Assembly

In finite element analysis, stiffness matrix assembly is the important component of the process

where the nodal data, element connectivity, and boundary conditions are all used to assemble

the linear system of equations. Because of the natural properties of partial differential

equations, the finite element method is well suited for the GPU parallel implementation. The

GPU naturally fits a structured mesh, where the structured patterns are regular and no further

information is needed for mesh connectivity. For an unstructured mesh, Cecka and Lew et al.

(Cecka et al. 2011) have investigated the use of a single GPU to accelerate the stiffness matrix

assembly. They introduced three different ways to assemble the stiffness matrix: assembly by

Non-Zero element (NZ), assembly by row, and assembly by element. They also provided a

detailed description of assembly by element via colouring (Komatitsch et al. 2009), and

assembly by NZ in local memory (Bolz et al. 2003), in global memory (Tejada & Ertl 2005) and

in shared memory. Based on the example of geometric flow and fluid simulation, Bolz and

Farmer et al. (Bolz et al. 2003) also implemented a conjugate gradient solver (Shewchuk 1994)

and a multigrid solver (Bruggi 2009) on GPU hardware for unstructured and structured meshes

respectively, and demonstrated the powerful potential of the traditional fundamental

114

computational kernels when using with GPU. J. Rodriguez-Navarro (Navarro & Susin 2006)

showed an FEM cloth simulation implemented on GPU, and successfully detected cloth

collisions and self-collision by using image-based collision methods. The visual results showed

that the GPU based approach was more efficient than two conventional methods.

In order to turn a large nonlinear optimization problem into a GPU suitable process, Hillesland

(Hillesland et al. 2005) developed a framework for building image-based models in graphics

hardware, taking advantage of the benefits of minimal storage overhead and no resampling

step. Andreas Klöckner (Klöckner et al. 2009) discussed the implementation of discontinuous

Galerkin methods on GPU and applied the implementation to Maxwell’s equations, while

Komatitsch and Michéa et al. (Komatitsch et al. 2009) discussed the way to assemble of high-

order continuous Galerkin methods by using a colouring scheme to handle the summation

operations over nodes. The performance results showed a maximum speedup with 25 for a

seismic wave propagation problem could be obtained.

6.2.2 Stiffness Matrix Solving

After the assembly of the stiffness matrix and generation of the linear equation system KU=F,

the next step is to solve the equations for the unknowns U, which normally contains the nodal

displacements in the field of structural engineering. Direct or iterative solvers are used to solve

the sparse linear equations system. Direct solvers, including Cholesky decomposition, QR

decomposition, Gauss Elimination and LU decomposition (Meyer 1988), are extensively used

for dense matrices. O'Leary (Jung & O'Leary 2006) presented an efficient GPU implementation

of Cholesky decomposition for solving of dense symmetric and positive definite linear systems.

However, due to lack of double precision support on the GPU, the interior point algorithm

presented did converge as well as the double floating precision CPU implementation. Volkov

and Demmel (Volkov & Demmel 2008) demonstrated significant speedup by using GPU in

three most widely used factorizations in dense linear algebra, namely LU factorization, QR

factorization and Cholesky factorization. Fumihiko Ino (Ino et al. 2005) introduced a GPU

115

implementation of LU decomposition, and concluded that numerical errors invalidated the

GPU implementation of LU decomposition because of the lack of double-precision support.

With the aid of GPU, Dominik Göddeke (Göddeke et al. 2005) developed a mixed precision

defect correction approach to achieve double precision accuracy for Finite Element simulation,

while still exhibiting improved performance compared to a double precision CPU solver. Since

then, problems about the lack of precision have been overcome by the emergence of GPUs

supporting double-precision. Galoppo et al. (Galoppo et al. 2005) presented a novel GPU based

LU factorization for solving the dense linear systems by reducing the problem into a series of

rasterization problems on the GPU. Appropriate data representations match the blocked

rasterization order and cache pre-fetch technology of a GPU, and the results show that this

GPU based LU factorization is more efficient in terms of overall algorithm cache and bandwidth

than conventional approaches. In 2007, based on the iterative refinement algorithm (Buttari et

al. 2007), Barrachina et al. (Barrachina et al. 2008) developed a new padding and hybrid GPU-

CPU computation iterative refinement algorithm where full accuracy in the solution of dense

linear systems is obtained.

Research reported in several papers (Tomov et al. 2010; M. Baboulin & Volkov 2008; Tomov et

al. 2009) has introduced a GPU-based dense linear algebra algorithm – the Matrix Algebra on

GPU and Multicore Architectures (MAGMA), which is similar to LAPACK library but for hybrid

architectures. Besides the MAGMA, there are also some other libraries available, such as

CUDAZTEC (Neckels) and GPUmatrix (Bonneel), which provide solvers for GMRES and

conventional decomposition (LU, QR and Cholesky), respectively.

As for iterative solvers, the Jacobi method, the Gauss-Seidel method, the conjugate gradient

method (CG), multi-grid methods and the preconditioned conjugate gradient method (PCG)

are all widely used. The Jacobi method is easily derived by examining each of the n equations

in the linear system Ax=b in isolation, which works will if the equation is dominated by the

diagonal element. Because the equations are treated independently, Jacobi method is ideally

116

suited for parallel programming, where all the equations could be resolved concurrently.

However, although it is very easy to understand and implement, its convergence rate is slow,

and so it is not a common first choice (Bathe & Wilson 1976; Demmel 1997).

While maintaining a similar process to the Jacobi method, Gauss-Seidel method examines the

equations one at a time in sequence and uses the previous iteration results as the updated

values as soon as they are available. Generally speaking, if the Jacobi method converges, the

Gauss-Seidel method will converge at a faster speed, though still relatively slowly (Saad 2003;

Demmel 1997).

However, the Conjugate Gradient (CG) approach is well-known for its efficiency for solving

symmetric positive definite systems. As its name suggests, this method generates a sequence

of conjugate (or orthogonal) vectors, which are the residuals of iterates. Actually, CG is a

simple method of Conjugate Directions where the search directions are constructed by

conjugation of the residuals (Shewchuk 1994). It is the most effective method for solving the

symmetric positive definite equations. Furthermore, because there is no data-dependency in

the algorithm, there is no need to do any significant change for compiling within the parallel

environment. Therefore, the CG method is well suited to the parallel processing, needing only

the matrix-vector product, parallel reduction, two vector updates and inner product routines

to be changed into parallel process. In the structural engineering the coefficient matrix is

sparse, symmetric, positive and definite, and so the best method for solving those problems is

CG (Shewchuk 1994; Demmel 1997).

Importantly, one thing to keep in mind is that the convergence rate of iterative methods

greatly depends on the spectrum of the coefficient matrix. Therefore, one way to increase the

convergence rate is to try to transform the system of linear equations into one with the same

solutions set, but with a more favourable spectrum. The preconditioner is a matrix to do such

transformation.

117

As explained before, CG is ideal for the solving of stiffness matrix, as the stiffness matrix is

symmetric positive definite. In this work, the preconditioned conjugate gradient method (PCG)

is chosen, because in PCG the coefficient matrix has a more favourable spectrum, leading to a

greater convergence rate (Shewchuk 1994).

Simply speaking, the PCG is the same as CG, except it contains the preconditioner to increase

the convergence speed. The simplest preconditioner is a diagonal matrix whose diagonal

elements are the same as those of the former coefficient matrix. The process of applying this

preconditioner is known as diagonal preconditioning or Jacobi preconditioning, and the

diagonal preconditioned conjugate gradient method (Shewchuk 1994) will be used in the work

presented here.

In order to optimise the solving of linear equations system, a significant amount of research

about application of GPUs has used conjugate gradient (Wiggers et al. 2007; Bolz et al. 2003;

Krüger & Westermann 2003) and multi-grid techniques (Göddeke et al. 2008; Göddeke et al.

2005). As for the preconditioned conjugate gradient method (PCG) that is used in this work,

Buatois et al. (Buatois, Caumon & Levy 2009) developed a general sparse linear solver called

Concurrent Number Cruncher (CNC) which is based on the PCG using block compressed row

storage (BCRS) format for the matrices. This solver proved to be efficient, but only for general

sparse matrices, as it results in non-optimal global memory access. In addition to the single

GPU implementation, a number of researchers (Playne & Hawick 2010; Cevahir et al. 2009;

Cevahir, Nukada & Matsuoka 2010a; Ament et al. 2010) have described multiple GPU

implementation of the Conjugate Gradient method with great accuracy and great performance.

In all of the above iterative solvers, the sparse matrix vector multiplication (SpMV) is of

particular importance, and is the most time consuming part and a bottleneck for acceleration

of these solvers.

118

In fact SpMV is one of the most important computational operations in sparse matrix

computation, and is used extensively in the iterative methods used to solve large linear

equations systems (Ax=b) and eigenvalue problems (Ax=λX), where lots of matrix-vector

products are required to reach convergence. Because of its great importance, there is lots of

literature concerning SpMV operations on GPU. Garland (Garland 2008) explored the

application of GPU in general SpMV by using the compressed sparse row (CSR) representation

for general unstructured sparse matrices and proposed a concept of scan (data parallel

primitives) which is used to convert seemingly irregular computation into a regular one which

can be implemented on massively parallel hardware such as GPUs. Sengupta et al. (Sengupta

et al. 2007) provided applied segment scan in SpMV, and concluded that the scan primitives

are an excellent match for a broad set of problems on parallel hardware, and specifically for

the GPU. Rukhsana and Shahnaz (Shahnaz & Usman 2007) developed an efficient TJDS based

sparse matrix-vector multiplication approach where the matrices are stored in a Transposed

Jagged Diagonal Storage (TJDS) format. This format is particularly suitable for parallel and

distributed processing because the references to the non-zero values of the matrix are kept by

the data partition scheme. Bell and Garland (Bell & Garland 2009) presented a GPU application

for SpMV with various matrix storage formats, and designed a hybrid (HYB) format for matrix

storage which has proved to be one of the fastest formats for unstructured matrices. They also

compared the performance of SpMV for various sparse matrix storage schemes and various

patterns of sparse matrices (Bell & Garland 2009; Bell & Garland 2008). In the work reported

here the Compressed Sparse Row (CSR) storage format is used, as it is a very popular format

which is easy to implement and widely used (Garland 2008).

6.3 Summary

This chapter introduced the basic concepts of GPUs and reviewed the literature about the

implementation of FEM on GPUs, including both stiffness matrix assembly and stiffness matrix

119

solving. A lot of literature highlights the important role of SpMV operation in determining the

computing speed of FEM problems.

In order to optimize the new MLEFEA approach for concrete design, the work presented here

used a GPU-based PCG approach to solve the FEM stiffness matrix equations. The

implementation of this approach will be described in the following chapter.

120

7 Efficient GPU Implementation of the Modified LEFEA

Approach

This chapter starts with an introduction to the CSR storage format which is used in the GPU

implementation of Preconditioned Conjugate Gradient Approach (GPU-PCG). Taking the deep

beam with rectangular openings as an example, comparison between CPU-PCG and GPU-PCG

is investigated, and results show the GPU algorithm is more effective than the CPU one in

terms of less computing time. In addition, the GPU implementation of MLEFEA (GPU-MLEFEA)

is also developed.

7.1 GPU Implementation of Preconditioned Conjugate Gradient

Method (GPU-PCG)

The stiffness matrix associated with the finite element method is not only symmetric and

positive definite, but is also banded, with many zero elements both inside the band and

outside the band. The bandwidth of the stiffness matrix depends on the numbering and

connectivity of the nodes. Since the stiffness matrix is a square matrix with dimension equal to

the number of degrees of freedom in the model, as the number of nodes and elements

increases the amount of storage required for the stiffness matrix increases significantly. To

mitigate this problem to some extent, finite element programs usually use bandwidth or

skyline storage approaches, taking advantage of the symmetry of the matrix. The effectiveness

of both of these approaches depends on the sequence of node numbering, and various

routines have been developed to renumber nodes to try to minimise the storage required.

Both the bandwidth and skyline storage approaches allow the stiffness matrix to be reduced in

place, and so are suited to direct solvers.

121

To permit effective GPU implementation of Preconditioned Conjugate Gradient Method (GPU-

PCG), the Compressed Sparse Row (CSR) storage format is used to store the stiffness matrix.

Compared to the storage formats mentioned above, there is no need to store any zero

elements or to renumber the nodes to reduce the bandwidth or skyline of the matrix.

Therefore, CSR can save memory space and avoid unnecessary calculations, resulting in a

faster algorithm. As will be seen below, the CSR storage format is ideally suited to

parallelisation of the SpMV operation used extensively in indirect solvers. It is not suitable for

direct solvers, as the stiffness matrix cannot be reduced in place.

To store a sparse stiffness matrix in CSR format, three arrays (here named elem, rowptr and col)

are needed. Zero based indexing is used. The one-dimensional double precision elem array

stores all the non-zero elements in the matrix in row-major order. The rowptr and col arrays

are index or pointer arrays that allow the elements in elem to be accessed according to their

position in the original stiffness matrix. The one-dimensional integer rowptr array stores the

position of the first non-zero element for each row in the array elem, and is of dimension

(number of rows + 1), so the first element will be zero while the last element will be the

number of non-zero elements in the stiffness matrix. The one-dimensional integer array col

stores the column index for every non-zero element in the same order as the non-zero

elements are arranged in elem, and so is of the same dimension as elem. As an example, the

CSR representation for a sparse matrix K is illustrated in Figure 73.

122

Figure 73: CSR Representation for a Sparse Matrix K

Because each element in the column vector resulting from the dot product of a matrix and a

column vector is only dependent on the corresponding row of the matrix, and so can be

calculated independently of all the other rows and elements, the SpMV for the CSR format is

very easy to parallelise. Figure 74 presents the SpMV parallel kernel for the sparse matrix in

CSR format. Note that the kernel describes the operation of each thread. Each thread works on

one particular row, and there will be one thread created for each row. The THREAD_ID

indicates the number of the thread and hence the number of the row. The GPU or GPUs can

execute as many threads in parallel as the hardware allows.

Figure 74: SpMV Kernel for the Sparse Matrix in CSR Format

123

7.2 GPU Implementation of Modified LEFEA Approach (GPU-

MLEFEA)

In this section the GPU implementation of Modified Linear Elastic Finite Element Analysis

method (MLEFEA) will be introduced.

The MLEFEA proposed in this work involves a lot of iterations in which most of the finite

element model (and hence stiffness matrix) stays the same, but there is some modification of

the elastic modulus in regions where stress redistribution is required. If a direct solver is used,

the complete stiffness matrix is factorised or reduced in place, and so there is no way to take

advantage of the similarity of one iteration to the next. However, when an indirect solver is

used, the stiffness matrix is preserved between iterations. The solution from one iteration can

also be used as a starting point for the iterative solution of the next iteration.

As a result, the program written for this work stores all the stiffness matrix data generated

from the first iteration. During the elastic modulus adjustment process in the following

iterations, areas which are affected and where modifications of elastic modulus are required

are identified. Then only the stiffness matrices for the affected areas need to be reassembled,

while for the unaffected areas use the previously stored stiffness matrices data directly.

In addition to the acceleration in matrix solving by using the parallel GPU-PCG approach and

taking advantage of the similarity of one iteration to the next one, further speedup can be

obtained by assembling the stiffness matrix directly in CSR format so that there is no need to

transfer the stiffness matrix from normal format to the CSR one. Most importantly, for the

matrix assembly in CSR, no matter how many iterations the MLEFEA approach takes, the index

or point arrays of the CSR stiffness matrix representation (rowptr and col) will remain constant

and do not have to be recomputed as there is no change in the structure of the finite element

mesh and the same elements are still zero. Only the elem array of CSR is changed during the

iteration steps, and even then only within the affected areas.

124

A complete listing of the program is provided in Appendix A.

7.3 Results Comparison (Speedup Results)

To demonstrate the efficiency of the GPU implementation of the MLEFEA and the GPU-PCG for

solving stiffness equations Ku=f, the coefficient matrix produced from the case of the deep

beam with two web openings (Figure 21) is used.

Different mesh sizes are used to generate different sets of coefficient matrices. In this work,

mesh sizes of 50mm, 25mm, 12.5mm and 10mm were used with the resulting sizes of

coefficient matrices being 3236 by 3236, 12552 by 12552, 49424 by 49424 and 76980 by 76980.

For comparison, a plain sequential PCG method corresponding to the sparse matrix equation

system solving is also used. The results in terms of computing speed for the GPU-based PCG

and the CPU (sequential)-based PCG algorithms are shown in Table 11 and Figure 75. Both the

sequential PCG and parallel GPU-PCG codes are executed a hundred times and the time

average is then taken as the final result, obtaining a reasonable elapsed time.

Table 11: Comparison between GPU-PCG and PCG

GPU vs. CPU

Mesh Size Matrix Size Non-zero Element

CPU Time (ms)

GPU Time (ms)

Error Speedup

50mm 3236*3236 45924 180 109 0.01% 1.65

25mm 12552*12552 191592 1485 350 0.01% 4.24

12.5mm 49424*49424 737684 11680 1842 0.02% 6.34

10mm 76980*76980 1042168 21862 2772 0.03% 7.89

Time is only for the equation solving part. The error here is from comparison with the Matlab results. The errors between CPU and GPU code are the same.

125

Figure 75: GPU-PCG vs. CPU-PCG

From Table 11 and Figure 75, the efficiency of the GPU-based PCG in terms of computing

speed can be observed, with the same high precision being maintained.

7.4 Summary

This chapter introduces the basic theory and reviews the relevant literature about the GPU and

its use in the area of finite element analysis. In order to optimize the MLEFEA approach, a GPU-

based PCG algorithm is used to handle the most time-consuming SpMV for solving the stiffness

matrix. Finally, the efficiency of the GPU implementation is demonstrated by providing speed

comparison results between the CPU and GPU algorithm for stress redistribution for the

example of a deep beam with web openings.

0

5000

10000

15000

20000

25000

50mm 25mm 12.5mm 10mm

Elap

sed

Tim

e (

ms)

Mesh Size

CPU-PCG GPU-PCG

126

8 Conclusions

This thesis develops a new approach to the use of linear finite element analysis in the design of

reinforced concrete members. The new approach involves the use of modified linear elastic

finite element analysis (MLEFEA), in which stress is redistributed within the member through

selective modification of the elastic modulus. This approach is similar to the use of moment

redistribution in the design of continuous reinforced concrete beams. Based on the lower

bound theory of elasticity, the approach works on the basis of identifying a stress field which is

in internal equilibrium, in equilibrium with the applied design load, and does not exceed the

specified maximum stress anywhere.

Compared with using standard LEFEA, the new approach overcomes the problems associated

with stress singularities and concentrations. Using LEFEA, stress singularities at features such

as re-entrant corners lead to stresses which are mesh dependent, increasing without bound as

the mesh is refined. Ignoring these stresses or using a coarse mesh for design leads to a risk

that the stress field used for design is not in internal equilibrium. By redistributing the stress

and ensuring design is done with a stress field which is in internal and external equilibrium, the

designer can have confidence in the ability of the design to carry the required load. In addition,

a technique was developed to redistribute tensile stress to ensure the resultant tension is at a

prescribed location. This allows the designer to position the reinforcing steel at the optimum

position (e.g. as close as possible to the bottom of a deep beam), which is not possible with

standard LEFEA.

Compared with the strut-tie method, the new approach takes into account the load carrying

capability of all the concrete, not just that in the struts. The examples included in this thesis

show that up to 27% of concrete volume can be saved in the construction of non-flexural

members by using the new approach in preference to the strut-tie method. Such a saving can

significantly reduce the environmental impact of concrete construction. The new approach is

127

also less time consuming and more designer independent, as a suitable strut tie model does

not have to be chosen.

Three different types of simple structures, shallow (flexural) beams, deep beams, and deep

beams with web openings, were examined using both conventional design approaches and the

MLEFEA approach. Cost comparison in terms of concrete and steel usage showed that the new

method is more efficient than currently available techniques, particularly for non-flexural

members.

In addition, with the help of ABAQUS, preliminary numerical tests of the proposed designs

have been performed using non-linear finite element analysis. The results showed that designs

based on this new approach are safe and reasonable. As the stress field resulting from this new

approach is statically admissible and able to satisfy the yield condition, designs generated

using this method will reach or exceed the ultimate design load, in accordance with the lower

bound theory of plasticity.

To further optimize the MLEFEA approach in terms of computing time, the use of GPUs was

also explored. The use of GPUs in finite element analysis was examined through a literature

review, and a GPU-based PCG algorithm selected for use in the MLEFEA. The GPU-based PCG

algorithm using the CSR format to store the stiffness matrix was shown to be an efficient way

to solve the stiffness matrix, compared to the CPU-based algorithm. Furthermore, combined

with the GPU-based PCG approach for solving the stiffness matrix, the overall implementation

of the MLEFEA on GPUs was developed, including stiffness matrix assembly directly in CSR

format, and only reassembling those parts of the stiffness matrix that have changed within

each iteration.

To ensure that the concrete savings predicted in this thesis can be obtained in practice, future

work in this area should include conducting full scale experimental tests of typical structural

components designed with the new method. These tests should demonstrate that the

128

components reach the desired design performance with the use of less concrete than a

conventional design. Without such work practicing engineers may be reluctant to adopt the

proposed approach.

129

References

Ament, M, Knittel, G, Weiskopf, D & Strasser, W 2010, 'A Parallel Preconditioned Conjugate

Gradient Solver for the Poisson Problem on a Multi-GPU Platform', in 18th Euromicro

International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 583-

592.

Ashour, AF 1997, Tests of reinforced concrete continuous deep beams, vol. 94, American

Concrete Institute, Farmington Hills, MI, ETATS-UNIS.

Au, FTK & Bai, ZZ 2007, 'Two-dimensional nonlinear finite element analysis of monotonically

and non-reversed cyclically loaded RC beams', Engineering Structures, vol. 29, no. 11, pp. 2921-

2934.

Augarde, CE & Deeks, AJ 2008, 'The use of Timoshenko's exact solution for a cantilever beam in

adaptive analysis', Finite elements in analysis and design., vol. 44, no. 9-10, pp. 595-601.

Barber, JR 2002, Elasticity, 2nd edn edn, Kluwer Academic, Dordecht, London.

Barrachina, S, Castillo, M, Igual, FD, Mayo, R & Quintana-Orti, ES 2008, 'Solving Dense Linear

Systems on Graphics Processors', Proceedings of the 14th international Euro-Par conference on

Parallel Processing.

Bathe, KJ & Wilson, EL 1976, Numerical methods in finite element analysis, Prentice-Hall,

Stanford.

Baumann, T 1972, 'Zur Frage der Netzbewehrung von Flächentragwerken (On the Problem of

Net Reinforcement of Surface Structures)', Bauingenieur, vol. 47, no. 10, pp. 367-377.

Bell, N & Garland, M 2008, Efficient Sparse Matrix-Vector Multiplication on CUDA, NVIDIA

Corporation.

130

Bell, N & Garland, M 2009, 'Implementing sparse matrix-vector multiplication on throughput-

oriented processors', Proceedings of the Conference on High Performance Computing

Networking, Storage and Analysis.

Bolz, J, Farmer, I, Grinspun, E & Schroder, P 2003, 'Sparse matrix solvers on the GPU: conjugate

gradients and multigrid', ACM Trans. Graph., vol. 22, no. 3, pp. 917-924.

Bonneel, N, GPUMatrix. Available from: <http://sourceforge.net/projects/gpumatrix/>.

Bruggi, M 2009, 'Generating strut-and-tie patterns for reinforced concrete structures using

topology optimization', Computers & Structures, vol. 87, no. 23-24, pp. 1483-1495.

Buatois, L, Caumon, G & Levy, B 2009, 'Concurrent number cruncher: a GPU implementation of

a general sparse linear solver', Int. J. Parallel Emerg. Distrib. Syst., vol. 24, no. 3, pp. 205-223.

Buttari, A, Dongarra, J, Langou, J, Langou, J, Luszczek, P & Kurzak, J 2007, 'Mixed Precision

Iterative Refinement Techniques for the Solution of Dense Linear Systems', Int. J. High Perform.

Comput. Appl., vol. 21, no. 4, pp. 457-466.

Carmo, D, Ricardo, NF & Lopes, SMR 2005, 'Ductility and linear analysis with moment

redistribution in reinforced high-strength concrete beams', Canadian Journal of Civil

Engineering, vol. 32, pp. 194-203.

Cecka, C, Lew, AJ & Darve, E 2011, 'Assembly of finite element methods on graphics

processors', International Journal for Numerical Methods in Engineering, vol. 85, no. 5, pp.

640-669.

Cevahir, A, Nukada, A & Matsuoka, S 2009, 'Fast Conjugate Gradients with Multiple GPUs', in

Computational Science – ICCS 2009, vol. 5544, eds G Allen, J Nabrzyski, E Seidel, G van Albada, J

Dongarra & P Sloot, Springer Berlin / Heidelberg, pp. 893-903.

http://sourceforge.net/projects/gpumatrix/%3e

131

Cevahir, A, Nukada, A & Matsuoka, S 2010b, 'High performance conjugate gradient solver on

multi-GPU clusters using hypergraph partitioning', Computer Science - Research and

Development, vol. 25, no. 1, pp. 83-91.

Chidgzey, SR & Deeks, AJ 2005, 'Determination of coefficients of crack tip asymptotic fields

using the scaled boundary finite element method', Engineering Fracture Mechanics, vol. 72, no.

13, pp. 2019-2036.

Dabbagh, H & Foster, SJ 2006, 'A Smeared – Fixed Crack Model for FE Analysis of RC

Membranes Incorporating Aggregate Interlock', Advances in Structural Engineering, vol. 9, no.

1, pp. 91-102.

Deeks, AJ 2008, 'The pursuit of accuracy in computational mechanics'.

Deeks, AJ & Wolf, JP 2002a, 'An h-hierarchical adaptive procedure for the scaled boundary

finite-element method', International Journal for Numerical Methods in Engineering, vol. 54,

no. 4, pp. 585-605.

Deeks, AJ & Wolf, JP 2002b, 'Stress recovery and error estimation for the scaled boundary

finite-element method', International Journal for Numerical Methods in Engineering, vol. 54,

no. 4, pp. 557-583.

Deeks, AJ & Wolf, JP 2002c, 'A virtual work derivation of the scaled boundary finite-element

method for elastostatics', Computational Mechanics, vol. 28, no. 6, pp. 489-504.

Demmel, JW 1997, Applied Numerical Linear Algebra, Society for Industrial and Applied

Mathematics.

Elsen, E, LeGresley, P & Darve, E 2008, 'Large calculation of the flow over a hypersonic vehicle

using a GPU', J. Comput. Phys., vol. 227, no. 24, pp. 10148-10161.

132

Foster, SJ 1998, 'Design of non-flexural members for shear', Cement and Concrete Composites,

vol. 20, no. 6, pp. 465-475.

Foster, SJ, Marti. and Mojsilovic, N 2003, 'Design of Reinforced Concrete Solids Using Stress

Analysis', ACI Structural Journal, vol. 100, no. 6, pp. 758-764.

Frier, C & Damkilde, L 2009, 'Lower Bound Limit State Analysis using the Interior-Point Method

with Spatial Varying Barrier Function', in Proceedings of the Twenty Second Nordic Seminar on

Computational Mechanics, Aalborg University, pp. 173-176.

Göddeke, D, Strzodka, R, Jamaludin, MY, McCormick, P, Wobker, H, Becker, C & Turek, S 2008,

'Using GPUs to improve multigrid solver performance on a cluster', Int. J. Comput. Sci. Eng., vol.

4, no. 1, pp. 36-55.

Göddeke, D, Strzodka, R & Turek, S 2005, 'Accelerating Double Precision FEM Simulations with

GPUs', Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique.

Galoppo, N, Govindaraju, NK, Henson, M & Manocha, D 2005, 'LU-GPU: Efficient Algorithms for

Solving Dense Linear Systems on Graphics Hardware', in Supercomputing, 2005. Proceedings of

the ACM/IEEE SC 2005 Conference, pp. 3-3.

Garland, M 2008, 'Sparse matrix computations on manycore GPU's', Proceedings of the 45th

annual Design Automation Conference.

Goddeke, D, Buijssen, SHM, Wobker, H & Turek, S 2009, 'GPU acceleration of an unmodified

parallel finite element Navier-Stokes solver', in High Performance Computing & Simulation,

2009. HPCS '09. International Conference on, pp. 12-21.

Guan, H 2005, 'Effect of Sizes and Positions of Web Openings on Strut-and-Tie Model of Deep

Beams', Advances in Structural Engineering, vol. 8, no. 1, pp. 69-84.

133

Hillesland, KE, Molinov, S & Grzeszczuk, R 2005, 'Nonlinear optimization framework for image-

based modeling on programmable graphics hardware', ACM SIGGRAPH 2005 Courses.

Hoque, MM 2006, 3D nonlinear mixed finite-element analysis of RC beams and plates with and

without FRP reinforcement, thesis, University of Manitob.

Hu, OE & Tan, KH 2007, Large reinforced-concrete deep beams with web openings : test and

strut-and-tie results, vol. 59, Telford, London, Royaume-uni, p. 12.

Huebner, KH, Thornton, EA & Byrom, TG 1995, The finite element method for engineers, Wiley,

New York

Ino, F, Matsui, M, Goda , K & Hagihara, K 2005, 'Performance Study of LU Decomposition on

the Programmable GPU', in In Proceedings of HiPC'2005, pp. 83-94.

Jakobsen, B 1994, 'The Sleipner accident and its causes', Engineering Failure Analysis, vol. 1, no.

3, pp. 193-199.

Jung, JH & O'Leary, DP 2006, 'Cholesky Decomposition and Linear Programming on a GPU', in

Proceedings of Workshop on Edge Computing Using New Commodity Architectures (EDGE),

Chapel Hill, NC.

Kirk, D, Hwu, W 2010, Programming massively parallel processors: a hands-on approach,

Morgan Kaufmann Publishers.

Klöckner, A, Warburton, T, Bridge, J & Hesthaven, JS 2009, 'Nodal discontinuous Galerkin

methods on graphics processors', J. Comput. Phys., vol. 228, no. 21, pp. 7863-7882.

Komatitsch, D, Michéa, D & Erlebacher, G 2009, 'Porting a high-order finite-element

earthquake modeling application to NVIDIA graphics cards using CUDA', Journal of Parallel and

Distributed Computing, vol. 69, no. 5, pp. 451-460.

134

Koopman, DCA & Lance, RH 1965, 'On linear programming and plastic limit analysis', Journal of

the Mechanics and Physics of Solids, vol. 13, no. 2, pp. 77-87.

Kotsovos, MD & Pavlovic, M 1995, Structural concrete: finite-element analysis for limit-state

design, Thomas Telford.

Krüger, J & Westermann, R 2003, 'Linear algebra operators for GPU implementation of

numerical algorithms', ACM Trans. Graph., vol. 22, no. 3, pp. 908-916.

Kupfer, H & Hilsdorf, HK 1969, 'Behavior of Concrete Under Biaxial Stresses', ACI Journal, vol.

66, no. 8, pp. 656-666.

Liang, QQ, Xie, YM & Grant, PS 2000, 'Topology optimization of strut-and-tie models in

reinforced concrete structures using an evolutionary procedure', ACI Structural Journal, vol. 97,

no. 2, pp. 322-330.

Logan, DL 2002, A first course in the finite element method, 3rd edn, Brooks/Cole, Pacific Grove,

CA.

Mörsch, E 1902, Der Eisenbetonbau-seine Theorie und Anwendung (Reinforced Concrete

Construction-Theory and Application) 5th edn, Verlag Konrad Wittwer, Stuttgart.

M. Baboulin, JD, J. Dongarra, S. Tomov & Volkov, V 2008, 'Enhancing the Performance of Dense

Linear Algebra Solvers on GPUs', Poster at Supercomputing 2008.

Mansur, MA, Tan, KH & Weng, W 2001, 'Analysis of Reinforced Concrete Beams with Circular

Openings Using Strut-and-Tie Model', in Structural Engineering, Mechanics and Computation,

ed. A Zingoni, Elsevier Science, Oxford, pp. 311-318.

Meyer, A 1988, 'An efficient implementation of LU decomposition in C', Adv. Eng. Softw., vol.

10, no. 3, pp. 123-130.

135

Mosley, B, Bungey, J & Hulse, R 2007, Reinforced Concrete Design to Eurocode 2, 6 edn,

Palgrave Macmillan, New York.

Nagarajan, P & Pillai, TMM 2008, 'Development of strut and tie models for simply supported

deep beams using topology optimization', Songklanakarin Journal of Science and Technology,

vol. 30, no. 5, pp. 641-647.

Navarro, JR & Susin, A 2006, 'Non structured meshes for Cloth GPU simulation using FEM',

Workshop On Virtual Reality Interaction and Physical Simulation.

Neal, BG 1985, The plastic methods of structural analysis, Chapman and Hall.

Neckels, D, CUDAZTEC. Available from: <http://www.ohloh.net/p/cudaztec>.

NVIDIA Corporation, NVIDIA CUDA Programming Guide (Version 3.0). Available from:

<http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_Prog

rammingGuide.pdf>.

Oehlers, DJ, Ju, G, Liu, IST & Seracino, R 2004a, 'Moment redistribution in continuous plated RC

flexural members. Part 1: neutral axis depth approach and tests', Engineering Structures, vol.

26, no. 14, pp. 2197-2207.

Oehlers, DJ, Liu, IST, Ju, G & Seracino, R 2004b, 'Moment redistribution in continuous plated RC

flexural members. Part 2: Flexural rigidity approach', Engineering Structures, vol. 26, no. 14, pp.

2209-2218.

Park, S 2005, Analysis of FRP strengthened deep RC members using the STM and the FEM

approaches, thesis, Syracuse University.

http://www.ohloh.net/p/cudaztec%3e

http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf%3e

http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf%3e

136

Playne, DP & Hawick, KA 2010, 'Asynchronous Communication Schemes for Finite Difference

Methods on Multiple GPUs', in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th

IEEE/ACM International Conference on, pp. 763-768.

Punmia, BC, Jain, AK & Jain, AK 2007, Limit State Design of Reinforced Concrete, First edn,

Laxmi Publications Ltd, New Delhi.

Rao, SS 1989, The finite element method in engineering, Pergamon, Oxford.

Rausch, E 1929, Berechnung des Eisenbetons gegen Verdrehung und Abscheren (Design of

reinforced concrete for torsion and shear), Julius Springer Verlag, Berlin.

Ritter, W 1899, 'Die Bauweise Hennebique (The Hennebique Method of Construction) ',

Schweizerische Bauzeitun, vol. 33, no. 7, pp. 59-61.

Roy, S & Thiagarajan, G 2007, 'Nonlinear Finite-Element Analysis of Reinforced Concrete Bridge

Approach Slab', Journal of Bridge Engineering, vol. 12, no. 6, p. 6.

Saad, Y 2003, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied

Mathematics.

Schlaich, J, Schäfer, K. & Jennewein, M 1987, 'Toward a Consistent Design of Structural

Concrete', PCI Journal, vol. 32, no. 3, pp. 74-15.

Scott, RH & Whittle, RT 2005, 'Moment redistribution effects in beams', Magazine of concrete

research., vol. 57, no. 1, pp. 9-20.

Sengupta, S, Harris, M, Zhang, Y & Owens, JD 2007, 'Scan primitives for GPU computing',

Proceedings of the 22nd ACM Siggraph/Eurographics symposium on Graphics hardware.

137

Shahnaz, R & Usman, A 2007, 'An Efficient Sparse Matrix-Vector Multiplication on Distributed

Memory Parallel Computers', International Journal of Computer Science and Network Security,

vol. 7, no. 1, pp. 77-82.

Shewchuk, JR 1994, An Introduction to the Conjugate Gradient Method Without the Agonizing

Pain, Carnegie Mellon University.

Sloan, SW 1989, 'Upper bound limit analysis using finite elements and linear programming',

International Journal for Numerical and Analytical Methods in Geomechanics, vol. 13, no. 3, pp.

263-282.

Standards Australia 2009, DR05252 Concrete Structures, Australian Standard, Australia.

Tan, KH, Kong, FK & Weng, LW 1997, 'High Strength Concrete Deep Beams Subjected to

Combined Top-and Bottom-Loading', The Structural Engineer, vol. 75, no. 11, pp. 191-197.

Tan, KH, Tang, CY & Tong, K 2003, 'A direct method for deep beams with web reinforcement',

Magazine of Concrete Research, vol. 55, no. 1, pp. 53-63.

Tan, KH, Tong, KB & Tang, CY 2003, 'Consistent strut-and-tie modelling of deep beams with

web openings', Magazine of Concrete Research, vol. 55, no. 1, pp. 65-75.

Tan, KH, Tong, K & Tang, CY 2001, 'Direct Strut-and-Tie Model for Prestressed Deep Beams',

Journal of Structural Engineering, ASCE, vol. 127, no. 9, pp. 1076-1084.

Tejada, E & Ertl, T 2005, 'Large steps in GPU-based deformable bodies simulation', Simulation

Modelling Practice and Theory, vol. 13, no. 8, pp. 703-715.

Timoshenko, S (ed.) 1969, Theory of Elasticity, 3rd edn, McGraw-Hill, New York.

Tjhin, TN & Kuchma, DA 2007, 'Integrated analysis and design tool for the strut-and-tie

method', Engineering Structures, vol. 29, no. 11, pp. 3042-3052.

138

Tomov, S, Nath, R, Du, P & Dongarra, J, MAGMA Users' Guide version 0.2. Available from:

<http://icl.cs.utk.edu/magma/>.

Tomov, S, Nath , R, Ltaief, H & Dongarra, J 2010, 'Dense linear algebra solvers for multicore

with GPU accelerators', In Proceedings of IPDPS Workshops, pp. 1-8.

Varghese, PC 2004, Limit State Design of Reinforced Concrete, PHI Learning Pvt. Ltd.

Vecchio, FJ & Collins, MP 1986, 'The modified compression-field theory for reinforced concrete

elements subjected to shear', ACI Journal Proceedings, vol. 83, no. 2, pp. 219-231.

Volkov, V & Demmel, J 2008, LU, QR and Cholesky Factorizations using Vector Capabilities of

GPUs, UCB/EECS-2008-49, EECS Department, University of California, Berkeley.

Wang, G & Meng, S 2008, 'Modified strut-and-tie model for prestressed concrete deep beams',

Engineering Structures, vol. 30, no. 12, pp. 3489-3496.

Warner, RF 2007, Reinforced concrete basics : analysis and design of reinforced concrete

structures / R.F. Warner, S.J. Foster, A.E. Kilpatrick, Pearson Prentice Hall, Frenchs Forest,

N.S.W. :.

Wiggers, WA, Bakker, V, Kokkeler, ABJ & Smit, GJM 2007, 'Implementing the conjugate

gradient algorithm on multi-core systems', International Symposium on System-on-Chip, SoC.

Wight, James, K, Parra, M & Gustavo, JG 2003, Strut-and-tie model for deep beam design, vol.

25, American Concrete Institute, Farmington Hills, MI, ETATS-UNIS, p. 8.

Williams, A 2009, 'Moment distribution methods', in Structural Analysis, Butterworth-

Heinemann, Boston, pp. 293-378.

Williams, ML 1952, 'Stress singularities resulting from various boundary conditions in angular

corners of plate in extension', Journal of Applied Mechanics, vol. 19, pp. 526–534.

http://icl.cs.utk.edu/magma/%3e

139

Yang, K-H, Eun, H-C & Chung, H-S 2006, 'The influence of web openings on the structural

behavior of reinforced high-strength concrete deep beams', Engineering Structures, vol. 28, no.

13, pp. 1825-1834.

Yang, Z & Deeks, AJ 2007, 'Modelling cohesive crack growth using a two-step finite element-

scaled boundary finite element coupled method', International Journal of Fracture, vol. 143, no.

4, pp. 333-354.

Zhang, B, Xu, S, Zhang, F, Bi, Y & Huang, LQ 2011, 'Accelerating MatLab code using GPU: A

review of tools and strategies', in Artificial Intelligence, Management Science and Electronic

Commerce (AIMSEC), 2011 2nd International Conference on, pp. 1875-1878.

Zhang, N & Tan, KH 2007a, 'Direct strut-and-tie model for single span and continuous deep

beams', Engineering Structures, vol. 29, no. 11, pp. 2987-3001.

Zhang, N & Tan, KH 2007b, 'Size effect in RC deep beams: Experimental investigation and STM

verification', Engineering Structures, vol. 29, no. 12, pp. 3241-3254.

Zhu, JZ, Hinton, E & Zienkiewicz, OC 1991, 'Adaptive finite element analysis with quadrilaterals',

Computers & Structures, vol. 40, no. 5, pp. 1097-1104.

Zienkiewicz, OC & Zhu, JZ 1992, 'The superconvergent patch recovery (SPR) and adaptive finite

element refinement', Comput. Methods Appl. Mech. Eng., vol. 101, no. 1-3, pp. 207-224.

140

Appendices

Appendix A: Complete Listing of the Program

//Important Header Files//

//*******Function to Calculate Jacobian Matrix during FEM*******************//

void Jacobian4N(double *st,int **nc, double **J)

{

double s=st[0];

double t=st[1];

double *dNds, *dNdt;

int *temp1, *temp2;

dNds= (double * )malloc(4*sizeof(double));

vector_zero (4, dNds);

dNdt= (double * )malloc(4*sizeof(double));

vector_zero(4,dNdt);

temp1=(int * )malloc(4*sizeof(int));

vector_zero_int(4,temp1);

temp2= (int * )malloc(4*sizeof(int));

vector_zero_int(4,temp2);

dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);

dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);

dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);

dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);

temp1[0]=nc[0][0]; temp1[1]=nc[1][0];

temp1[2]=nc[2][0]; temp1[3]=nc[3][0];

temp2[0]=nc[0][1]; temp2[1]=nc[1][1];

temp2[2]=nc[2][1]; temp2[3]=nc[3][1];

J[0][0]=MxMulMx(4,dNds,temp1);

141

J[0][1]=MxMulMx(4,dNds,temp2);

J[1][0]=MxMulMx(4,dNdt,temp1);

J[1][1]=MxMulMx(4,dNdt,temp2);

//free the memory //

free(dNds);

free(dNdt);

free(temp1);

free(temp2);

}

142

//****************Function to Calculate Local Stiffness Matrix*******************//

void Element4NStiffness(int **nc,double thickness, double **D, double **Ke)

{

int i,j,k;

double **B, **gp, **TransB, **J, **invJ, **dNdxy, **temp2, **temp3, **temp4;

double s, t, det;

double *dNds, *dNdt, *temp, *wt;

//initialize the array //

B= (double **) malloc(3 * sizeof(double *));

for(int i = 0; i < 3; i++)

B[i] = (double * )malloc(8* sizeof(double));

matrix_zero (3, 8, B);

matrix_zero (8, 8, Ke);

gp= (double **) malloc(4 * sizeof(double *));

for(int i = 0; i < 4; i++)

gp[i] = (double * )malloc(2* sizeof(double));

matrix_zero (4, 2, gp);

TransB= (double **) malloc(8 * sizeof(double *));

for(int i = 0; i < 8; i++)

TransB[i] = (double * )malloc(3* sizeof(double));

matrix_zero (8, 3, TransB);

J= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

J[i] = (double * )malloc(2* sizeof(double));

invJ= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

invJ[i] = (double * )malloc(2* sizeof(double));

matrix_zero (2, 2, invJ);

dNdxy= (double **) malloc(2 * sizeof(double *));

143

for(int i = 0; i < 2; i++)

dNdxy[i] = (double * )malloc(4* sizeof(double));

matrix_zero (2, 4, dNdxy);

temp2= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

temp2[i] = (double * )malloc(4* sizeof(double));

matrix_zero (2, 4, temp2);


for(int i = 0; i < 8; i++)




for(int i = 0; i < 8; i++)



/*initialize the vector */




vector_zero (4, dNdt);

temp= (double * )malloc(2*sizeof(double));

vector_zero (2, temp);

wt= (double * )malloc(4*sizeof(double));

vector_zero (4, wt);

gp[0][0]=-1.0/sqrt(3.0); gp[0][1]=-1.0/sqrt(3.0);

gp[1][0]=-1.0/sqrt(3.0); gp[1][1]=1.0/sqrt(3.0);

gp[2][0]=1.0/sqrt(3.0); gp[2][1]=-1.0/sqrt(3.0);

gp[3][0]=1.0/sqrt(3.0); gp[3][1]=1.0/sqrt(3.0);

144

wt[0]=1; wt[1]=1; wt[2]=1; wt[3]=1;

for(k=0;k<4;k++)

{

s=gp[k][0];

t=gp[k][1];

dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);

dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);

dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);

dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);

temp[0]=gp[k][0];

temp[1]=gp[k][1];

Jacobian4N(temp,nc,J);

MxInv(2, J, invJ);

for(i=0;i<4;i++)

{

temp2[0][i]=dNds[i];

temp2[1][i]=dNdt[i];

}

MxMulMx_Multi(2, 2, 4, invJ, temp2, dNdxy);

for(i=0;i<4;i++)

{

B[0][2*i]=dNdxy[0][i]; B[0][2*i+1]=0;

B[1][2*i]=0; B[1][2*i+1]=dNdxy[1][i];

B[2][2*i]=dNdxy[1][i]; B[2][2*i+1]=dNdxy[0][i];

}

det=MxDet2(J);

MxTrans(3, 8, B, TransB);

MxMulMx_Multi(8, 3, 3, TransB, D, temp3);

MxMulMx_Multi(8, 3, 8, temp3, B, temp4);

for (i=0;i<8;i++)

145

for(j=0;j<8;j++)

Ke[i][j]=Ke[i][j]+wt[k]*temp4[i][j]*thickness*det;

}


for(i=0;i<3;i++)

free(B[i]);

free(B);

for(i=0;i<4;i++)

free(gp[i]);

free(gp);

for(i=0;i<8;i++)

free(TransB[i]);

free(TransB);

for(i=0;i<2;i++)

free(invJ[i]);

free(invJ);

for(i=0;i<2;i++)

free(dNdxy[i]);

free(dNdxy);

for(i=0;i<2;i++)

free(temp2[i]);

free(temp2);

for(i=0;i<8;i++)

free(temp3[i]);

free(temp3);

for(i=0;i<8;i++)

free(temp4[i]);

free(temp4);

146

free(dNds);

free(dNdt);

free(temp);

free(wt);

}

147

//**********************SpMV Serial Code on CPU*****************************//

void spmv_csr_serial(int num_rows, int *ptr, int *indices, double *data, double *x, double *y)

{

for (int row=0;row<num_rows;row++)

{

double dot=0.0;

int row_start= ptr[row];

int row_end = ptr[row+1];

for(int jj=row_start;jj<row_end;jj++)

dot+=data[jj]*x[indices[jj]];

y[row] +=dot;

}

}

148

//*******************Function to Calculate Local Stress Matrix*********************//

void Element4NStress(double **st, double **nc,double *u, double **D, double **sx)

{

int i, k;

double **invJ, **J, **temp, **dNdxy, **B, **temp3;

double *dNds, *dNdt, *temp4;

int *temp1, *temp2;

double det, s, t;


invJ= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

invJ[i] = (double * )malloc(2* sizeof(double));

matrix_zero (2, 2, invJ);

J= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

J[i] = (double * )malloc(2* sizeof(double));

matrix_zero (2, 2, J);

B= (double **) malloc(3 * sizeof(double *));

for(int i = 0; i < 3; i++)

B[i] = (double * )malloc(8* sizeof(double));

matrix_zero (3, 8, B);

dNdxy= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

dNdxy[i] = (double * )malloc(4* sizeof(double));

matrix_zero (2, 4, dNdxy);

temp= (double **) malloc(2 * sizeof(double *));

for(int i = 0; i < 2; i++)

temp[i] = (double * )malloc(4* sizeof(double));

matrix_zero (2, 4, temp);


149

for(int i = 0; i < 3; i++)



/*initialize the vector */


vector_zero_int (4, temp1);


vector_zero_int (4, temp2);

temp4= (double * )malloc(3*sizeof(double));

vector_zero (3, temp4);




vector_zero (4, dNdt);

for(k=0; k<4; k++)

{

s=st[k][0];

t=st[k][1];

dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);

dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);

dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);

dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);

for (i=0;i<4;i++)

{

temp1[i]=nc[i][0];

temp2[i]=nc[i][1];

}

J[0][0]=MxMulMx(4,dNds,temp1);J[0][1]=MxMulMx(4,dNds,temp2);

150

J[1][0]=MxMulMx(4,dNdt,temp1);J[1][1]=MxMulMx(4,dNdt,temp2);

MxInv(2, J, invJ);

for(i=0;i<4;i++)

{

temp[0][i]=dNds[i];

temp[1][i]=dNdt[i];

}

MxMulMx_Multi(2, 2, 4, invJ, temp, dNdxy);

for(i=0;i<4;i++)

{

B[0][2*i]=dNdxy[0][i]; B[0][2*i+1]=0;

B[1][2*i]=0; B[1][2*i+1]=dNdxy[1][i];

B[2][2*i]=dNdxy[1][i]; B[2][2*i+1]=dNdxy[0][i];

}

det=MxDet2(J);

MxMulMx_Multi(3, 3, 8, D, B, temp3);

MxMulMx_Single(3, 8, 1, temp3, u, temp4);

for(int p=0;p<3;p++)

sx[k][p]=temp4[p];

}


for(i=0;i<2;i++)

free(J[i]);

free(J);

for(i=0;i<3;i++)

free(B[i]);

free(B);

for(i=0;i<2;i++)

free(invJ[i]);

151

free(invJ);

for(i=0;i<2;i++)

free(dNdxy[i]);

free(dNdxy);

for(i=0;i<2;i++)

free(temp[i]);

free(temp);

for(i=0;i<3;i++)

free(temp3[i]);

free(temp3);

free (temp1);

free (temp2);

free (temp4);

free(dNds);

free(dNdt);

}

152

//********************Function for Interpolation during FEM***********//

void Element4NInterpolate(double **gp, double **sx, double **rsx)

{

int i;

double **N2_temp, **A2, **N2;

N2_temp= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

N2_temp[i] = (double * )malloc(4*sizeof(double));

matrix_zero(4, 4, N2_temp);

A2= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

A2[i] = (double * )malloc(4*sizeof(double));

matrix_zero (4, 4, A2);

N2= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

N2[i] = (double * )malloc(4*sizeof(double));

matrix_zero(4, 4, N2);

A2[0][0]=0.25; A2[0][1]=0.25; A2[0][2]=0.25; A2[0][3]=0.25;

A2[1][0]=-0.25; A2[1][1]=0.25; A2[1][2]=0.25; A2[1][3]=-0.25;

A2[2][0]=-0.25; A2[2][1]=-0.25; A2[2][2]=0.25; A2[2][3]=0.25;

A2[3][0]=0.25; A2[3][1]=-0.25; A2[3][2]=0.25; A2[3][3]=-0.25;


{

N2_temp[p][0]=1.0;

N2_temp[p][1]=gp[p][0];

N2_temp[p][2]=gp[p][1];

N2_temp[p][3]=gp[p][0]*gp[p][1];

}

MxMulMx_Multi(4, 4, 4, N2_temp, A2, N2);

MxMulMx_Multi(4, 4, 2, N2, sx, rsx);


153

for(i=0;i<4;i++)

free(N2_temp[i]);

free(N2_temp);

for(i=0;i<4;i++)

free(N2[i]);

free(N2);

for(i=0;i<4;i++)

free(A2[i]);

free(A2);

}

154

//***** Preparison Function to Calculate Arrays of CSR Format Directly during FEM Stiffness

Matrix Assembly ******//

void MakeNodeConn(int numNodes, int **element, int numElements, int **nodeConn)

{

int i;

int en1, en2, en3, en4;

for(i = 0; i < numNodes; i++)// this loop could be parallelized numbers of nodes

connected by

nodeConn[i][4]=i+1;

for(i=0; i<numElements; i++)

{

en1=element[i][0];

en2=element[i][1];

en3=element[i][2];

en4=element[i][3];

nodeConn[en1][7] = en2+1;












}

}

155

//***Function to Calculate the Array of *rowPtr for CSR Format Directly during FEM Stiffness

Matrix Assembly****//

void MakeCSR( int *rowPtr, int *col, int numNonZero, int numNodes, int **nodeConn)

{

int i, j, j1, k;

// first determine number of non-zero elements in stiffness matrix

rowPtr[0]=0;

//now populate the rowPtr and col vectors

int nnz=0;

for(i=0;i<numNodes;i++)

for (j=0;j<2;j++)

for(k=0;k<9;k++)

{

if(nodeConn[i][k]!=0)

{

for (j1=0;j1<2;j1++)

{

col[nnz]=2*(nodeConn[i][k]-1)+j1;

nnz=nnz+1;

}

}

rowPtr[2*i+j+1] = nnz;

}

}

156

//****Function to Calculate the Arrays of *col and *elem for CSR Format Directly during FEM

Stiffness Matrix Assembly****//

void GetCSRElemIndex (int *node, int *rowPtr, int *col, int **elemIndex)

{

int i, j, n1, n2, n3, n4;

int n, k2, m, **elemIndex2;

elemIndex2= (int **) malloc(8 * sizeof(int *));

for(int i = 0; i < 8; i++)

elemIndex2[i] = (int * )malloc(8* sizeof(int));

matrix_zero_int(8, 8, elemIndex2);

//dof=(int *) malloc(8* sizeof(int));

//vector_zero_int (8, dof);

//Returns 8x8 integer matrix containing indexes into the CSR elem array

//these indexes allow the element stiffness matrix to be assembled ..

//directly into the CSR representation of the global stiffness matrix

//node is 1 x 4 containing the nodes this element connects to

n1 = node[0]; // assumes node numbering starts from 1

n2 = node[3]; // nodes re-ordered to run from smallest to largest

n3 = node[1];

n4 = node[2];

int dof[8]={2*n1, 2*n1+1, 2*n2, 2*n2+1, 2*n3, 2*n3+1, 2*n4, 2*n4+1};

for (i=0;i<8;i++)

{

n=rowPtr[dof[i]];

k2=1;

for(j=0;j<8;j++)

{

m=dof[j];

while(col[n+k2-1]!=m)

k2=k2+1;

elemIndex[i][j]=n+k2-1; // this is 1 based

}

157

}

int reindex[8]={0, 1, 4, 5, 6, 7, 2, 3}; // change node order back

for (i=0;i<8;i++)

for (j=0;j<8;j++)

elemIndex2[i][j] = elemIndex[reindex[i]][reindex[j]];

for (i=0;i<8;i++)

for (j=0;j<8;j++)

elemIndex[i][j] = elemIndex2[i][j];

for(i=0;i<8;i++)

free(elemIndex2[i]);

free(elemIndex2);

}

158

//**Function to Calculate the Nodal Stresses using Superconvergent Patch Recovery (SPR)

Approach**//

void SPRStress(double *u, double *E, double *N, double v, int nElements, int nElementsX, int

nElementsY, int nNodes,int nNodesX,int nNodesY, int **nodeCoord, int **element, int ndof,

double MeshSize, double b, double **gp, double **rStress)

{

int i,j, k1,k;

double **xegp, **nc2, **ss_temp, **A, **pcoeffs, **AA, **invAA, **BB, **transAA,

**D2;

double *pc_temp, *pc_temp2, *uu2;

double T=0;

int elnum, kk=0;

int *indexArray2,*indexArray, *indexArray_temp;

int **count3;

xegp= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

xegp[i] = (double * )malloc(2*sizeof(double));

matrix_zero (4, 2, xegp);

D2= (double **) malloc(3* sizeof(double *));

for(i = 0; i < 3; i++)

D2[i] = (double * )malloc(3*sizeof(double));

matrix_zero (3, 3, D2);

nc2= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

nc2[i] = (double * )malloc(2*sizeof(double));

matrix_zero (4, 2, nc2);

ss_temp= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

ss_temp[i] = (double * )malloc(3*sizeof(double));

matrix_zero (4, 3, ss_temp);

A= (double **) malloc(16* sizeof(double *));

159

for(i = 0; i < 16; i++)

A[i] = (double * )malloc(4*sizeof(double));

matrix_zero (16, 4, A);

pcoeffs= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

pcoeffs[i] = (double * )malloc(3*sizeof(double));

matrix_zero (4, 3, pcoeffs);

AA= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

AA[i] = (double * )malloc(4*sizeof(double));

matrix_zero (4, 4, AA);

invAA= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

invAA[i] = (double * )malloc(4*sizeof(double));

matrix_zero (4, 4, invAA);

BB= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

BB[i] = (double * )malloc(16*sizeof(double));

matrix_zero (4, 16, BB);

transAA= (double **) malloc(4* sizeof(double *));

for(i = 0; i < 4; i++)

transAA[i] = (double * )malloc(16*sizeof(double));

matrix_zero (4, 16, transAA);

count3= (int **) malloc(nNodes* sizeof(int *));

for(i = 0; i < nNodes; i++)

count3[i] = (int * )malloc(3*sizeof(int));

matrix_zero_int (nNodes, 3, count3);

pc_temp= (double *) malloc(4* sizeof(double));

vector_zero (4, pc_temp);

160

pc_temp2= (double *) malloc(3* sizeof(double));

vector_zero (3, pc_temp2);

indexArray2= (int * )malloc(200*sizeof(int));

vector_zero_int (200, indexArray2);

uu2= (double *) malloc(8* sizeof(double));

vector_zero (8, uu2);

indexArray= (int * )malloc(ndof*sizeof(int));

vector_zero_int (8, indexArray);

double **scStresses, **xgp;

xgp= (double **) malloc(16* sizeof(double *));

for(int p = 0; p < 16; p++)

xgp[p] = (double * )malloc(2*sizeof(double));

scStresses= (double **) malloc(16* sizeof(double *));

for(int p = 0; p < 16; p++)

scStresses[p] = (double * )malloc(3*sizeof(double));

for(i=1;i<nNodesX-1;i++) //patches are centered on internal nodes//

{

for(j=1;j<nNodesY-1;j++) // i.e. not the nodes on the boundary//

{

matrix_zero (16, 2, xgp);

matrix_zero (16, 3, scStresses);

for(k1=0;k1<4;k1++) //elements around the patch node,

anticlockwise starting from top left//

{

elnum=(i-1)*nElementsY+j-1;

if(k1==2 || k1==3)

elnum=elnum+nElementsY;

if(k1==0 || k1==3)

161

elnum=elnum+1;

for (int p=0;p<4;p++)

for(int q=0;q<2;q++)

nc2[p][q]=nodeCoord[element[elnum][p]][q];

Element4NInterpolate (gp, nc2, xegp); //convert gauss points

from local to global coords//

for(int j1=0;j1<4;j1++)

{

indexArray[2*j1]=2*element[elnum][j1];

indexArray[2*j1+1]=2*element[elnum][j1]+1;

}


uu2[p]=u[indexArray[p]];

T=N[0]*E[element[elnum][0]]+N[1]*E[element[elnum][1]]+N[2]*E[element[elnum][2]]

+N[3]*E[element[elnum][3]];

D2[0][0] = T/(1-v*v); D2[0][1] = v* T/(1-v*v); D2[0][2] = 0;

D2[1][0] = v* T/(1-v*v); D2[1][1] = T/(1-v*v); D2[1][2] = 0;

D2[2][0] = 0; D2[2][1] = 0; D2[2][2] = 0.5*(1-v)* T/(1-v*v);

Element4NStress(gp, nc2, uu2, D2, ss_temp);

for(int p=k1*4;p<(k1+1)*4;p++)

{

for(int q=0;q<3;q++)

scStresses[p][q]=ss_temp[p%4][q];

for(int qq=0;qq<2;qq++)

xgp[p][qq]=xegp[p%4][qq];

}

}


162

{

A[p][0]=1.0;

A[p][1]=xgp[p][0];

A[p][2]=xgp[p][1];

A[p][3]=xgp[p][0]*xgp[p][1];

}

MxTrans(16,4,A,transAA);

MxMulMx_Multi(4, 16, 4, transAA, A, AA);

MxInv(4,AA,invAA);

MxMulMx_Multi(4, 4, 16, invAA, transAA, BB);

MxMulMx_Multi(4, 16, 3, BB, scStresses, pcoeffs); //fit curve to

stresses at super-convergent points//

int cont1=1;

for(k1=0; k1<4; k1++) //elements around the patch node,


{

//handle edges

if(i==1 && (k1==0 || k1==1))

cont1=cont1+2;

if(i==nElementsX-1 && (k1==2 || k1==3))

cont1=cont1+2;

if(j==1 && (k1==1 || k1==2))

cont1=cont1+2;

if(j==nElementsY-1 && (k1==0 || k1==3))

cont1=cont1+2;

}

int *indexArray2;

indexArray2= (int * )malloc(cont1*sizeof(int));

vector_zero_int (cont1, indexArray2);

indexArray2[0]=i*nNodesY+j+1; //node in the center of the

patch//

163

cont1=1;

for(k1=0; k1<4; k1++) //elements around the patch node,


{

elnum=(i-1)*nElementsY+j-1;

if(k1==2 || k1==3)

elnum=elnum+nElementsY;

if(k1==0 || k1==3)

elnum=elnum+1;

//handle edges

if(i==1 && (k1==0 || k1==1))

{

indexArray2[cont1]=element[elnum][3]+1;

indexArray2[cont1+1]=element[elnum][0]+1;

cont1=cont1+2;

}

if(i==nElementsX-1 && (k1==2 || k1==3))

{



cont1=cont1+2;

}

if(j==1 && (k1==1 || k1==2))

{



cont1=cont1+2;

}

if(j==nElementsY-1 && (k1==0 || k1==3))

{


164


cont1=cont1+2;

}

}

kk=Unique2(cont1,indexArray2);

indexArray_temp= (int *) malloc(kk* sizeof(int));

vector_zero_int (kk, indexArray_temp);

Unique3(cont1, indexArray2, indexArray_temp);

for (int q=0;q<kk;q++)

indexArray_temp[q]=indexArray_temp[q]-1;

for(int p=0;p<kk;p++)

{

pc_temp[0]=1.0;

pc_temp[1]=nodeCoord[indexArray_temp[p]][0];

pc_temp[2]=nodeCoord[indexArray_temp[p]][1];

pc_temp[3]=nodeCoord[indexArray_temp[p]][0]*nodeCoord[indexArray_temp[p]][1];

VecMulMx(1, 4, 3, pc_temp, pcoeffs, pc_temp2);

//matrix_zero_int (nNodes, 3, count3);

for(k=0;k<3;k++)

{

rStress[indexArray_temp[p]][k]=

rStress[indexArray_temp[p]][k]+ pc_temp2[k];

count3[indexArray_temp[p]][k]=

count3[indexArray_temp[p]][k]+1;

}

}

free(indexArray2);

free(indexArray_temp);

}

}

165

for(i=0;i<16;i++)

free(xgp[i]);

free(xgp);

for(i=0;i<16;i++)

free(scStresses[i]);

free(scStresses);

for(i=0;i<nNodes;i++)

for(j=0;j<3;j++)

rStress[i][j]=rStress[i][j]*1.0/count3[i][j]; // actually in this case there is

no need to average--only one recovery for each node/


for(i=0;i<4;i++)

free(xegp[i]);

free(xegp);

for(i=0;i<3;i++)

free(D2[i]);

free(D2);

for(i=0;i<4;i++)

free(nc2[i]);

free(nc2);

for(i=0;i<4;i++)

free(ss_temp[i]);

free(ss_temp);

for(i=0;i<16;i++)

free(A[i]);

free(A);

for(i=0;i<4;i++)

free(pcoeffs[i]);

166

free(pcoeffs);

for(i=0;i<4;i++)

free(AA[i]);

free(AA);

for(i=0;i<4;i++)

free(transAA[i]);

free(transAA);


free(count3[i]);

free(count3);

for(i=0;i<4;i++)

free(invAA[i]);

free(invAA);

for(i=0;i<4;i++)

free(BB[i]);

free(BB);

free(pc_temp);

free(pc_temp2);

free(uu2);

}

167

//Main function for GPU-MLEFEA//

//**************** Main function for GPU-MLEFEA ****************************//

int main(void)

{

printf("\n\nProgramming is running!\n\n\n");

double MeshSize=10, thickness=300,Evalue=24500, v=0.2, TOL=0.2, PredefinedXc=70;

double Load=2e5, phy=0.6, fcdot=25, kesai=0.9, kesai2=1.2, kesai3=0.6, ratio=1.0,

ratio_step=0.1;

int length=2000, b=500, Max_iterations=50, Ky=5000, iterations, niter=4;

int SupPosition=250, RightPosition=length, SupOutCount, SupInCount,

RightBoundaryCount, numNonZero=0;

int i, j, k, count, xc, j1, m;

int *coord_x, *middle, *flag, *flag2, *flag3, *coord_below, *coord_above;

int ** nodeConn, *rowPtr, *col, **elemIndex, *temp_node,**nodeCoord, **element;

double T, Maxcomp_stress, Maxtens_stress, tensile_force, steel_area, moment2, Xc2,

Allowable_tens_stress;

double *elem, *u, *E, *f_CSR, *N, *middle_x, *middle_y, *middle_xy,

*below_tensile_stress, *above_compressive_stress;

double **rStress, **gp, **D, **Ke;

int ndof, nElementsX, nElementsY, nElements, nNodes, nNodesX, nNodesY, maxflag;

int *LoadPoint, *SupIn, *SupOut, *RightBoundary;

int **nc;

double Max_comp_stress;

double *cpu_out, *gpu_out, *Xc_value2, *real_tensile_force, *Area_of_Steel,

*real_moment2, *sigma1, *sigma3;

nElementsX = (int)length/MeshSize;

nElementsY = (int) b/MeshSize;

nElements = nElementsX * nElementsY;

nNodesX = nElementsX + 1;

nNodesY = nElementsY + 1;

168

nNodes = nNodesX * nNodesY;

ndof=2*nNodes; //two degrees of freedom (x and y translation) for each node


nodeCoord= (int **) malloc(nNodes * sizeof(int *));


nodeCoord[i] = (int * )malloc(2* sizeof(int));

matrix_zero_int (nNodes, 2, nodeCoord); //each row represents one node, col

1 is x coord, col 2 is y coord

element= (int **) malloc(nElements * sizeof(int *));

for(i = 0; i < nElements; i++)

element[i] = (int * )malloc(4* sizeof(int));

matrix_zero_int (nElements, 4, element); //each row represents one element, each col

contains 1 node number

gp= (double **) malloc(4 * sizeof(double *));

for(i = 0; i < 4; i++)

gp[i] = (double * )malloc(2* sizeof(double));

matrix_zero (4, 2, gp);

Ke= (double **) malloc(8 * sizeof(double *));

for(i = 0; i < 8; i++)

Ke[i] = (double * )malloc(8* sizeof(double));

matrix_zero (8, 8, Ke);

cpu_out= (double *) malloc(ndof * sizeof(double ));

vector_zero (ndof, cpu_out);

gpu_out= (double *) malloc(ndof * sizeof(double ));

vector_zero (ndof, gpu_out);

nc= (int **) malloc(4 * sizeof(int *));

for(i = 0; i < 4; i++)

nc[i] = (int * )malloc(2* sizeof(int));

matrix_zero_int (4, 2, nc);

169

D= (double **) malloc(3 * sizeof(double *));

for(i = 0; i < 3; i++)

D[i] = (double * )malloc(3* sizeof(double));

matrix_zero (3, 3, D);

rStress= (double **) malloc(nNodes * sizeof(double *));


rStress[i] = (double * )malloc(3* sizeof(double));

//initialize the vector //

E= (double * )malloc(nNodes*sizeof(double));

vector_zero (nNodes, E);

sigma1= (double * )malloc(nNodes*sizeof(double));

vector_zero (nNodes, sigma1);

sigma3= (double * )malloc(nNodes*sizeof(double));

vector_zero (nNodes, sigma3);

N= (double * )malloc(4*sizeof(double));

vector_zero (4, N);

f_CSR= (double * )malloc(ndof*sizeof(double));

vector_zero (ndof, f_CSR);

LoadPoint= (int * )malloc(ndof*sizeof(int));

vector_zero_int ((int)(250/MeshSize+1), LoadPoint);

SupIn= (int * )malloc(ndof*sizeof(int));

vector_zero_int ((int) (SupPosition/MeshSize-1), SupIn);

SupOut= (int * )malloc(ndof*sizeof(int));

vector_zero_int (2, SupOut);

RightBoundary= (int * )malloc(ndof*sizeof(int));

170

vector_zero_int ((int) (b/MeshSize+1), RightBoundary);

u= (double * )malloc(ndof*sizeof(double));

vector_zero (ndof, u);

flag= (int * )malloc(nNodes*sizeof(int));

flag2= (int * )malloc(nNodes*sizeof(int));

flag3= (int * )malloc(nNodes*sizeof(int));

middle= (int * )malloc((int)(b/MeshSize+1)*sizeof(int));

vector_zero_int ((int)(b/MeshSize+1), middle);

middle_x= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));

vector_zero ((int)(b/MeshSize+1), middle_x);

middle_y= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));

vector_zero ((int)(b/MeshSize+1), middle_y);

middle_xy= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));

vector_zero ((int)(b/MeshSize+1), middle_xy);

coord_x= (int * )malloc((int)(b/MeshSize+1)*sizeof(int));

vector_zero_int ((int)(b/MeshSize+1), coord_x);

real_tensile_force= (double * )malloc(Max_iterations*sizeof(double));

vector_zero (Max_iterations, real_tensile_force);

Area_of_Steel= (double * )malloc(Max_iterations*sizeof(double));

vector_zero (Max_iterations, Area_of_Steel);

Xc_value2= (double * )malloc(Max_iterations*sizeof(double));

vector_zero (Max_iterations, Xc_value2);

real_moment2= (double * )malloc(Max_iterations*sizeof(double));

vector_zero (Max_iterations, real_moment2);

171

//work out nodal nodeCoord//

k= 0;

for (i=0; i<nNodesX; i++)

{

xc= i*length/nElementsX;

for (j=0; j< nNodesY; j++)

{

nodeCoord[k][0]= xc;

nodeCoord[k][1]= b* j /nElementsY;

k=k+1;

}

}

for (i=0; i<nNodes; i++)

E[i]=Evalue;

Max_comp_stress=phy*0.9*fcdot;

//work out which element connect to which nodes//

k=0;

for (i=0;i<nElementsX; i++)

{

for (j=0;j<nElementsY;j++)

{

int n= i*nNodesY+j;

element[k][0]=n;element[k][1]=n+nNodesY;element[k][2]=n+nNodesY+1;element[k][3]

=n+1;

k=k+1;

}

}

gp[0][0]=-1.0/sqrt(3.0); gp[0][1]=-1.0/sqrt(3.0);

gp[1][0]=-1.0/sqrt(3.0); gp[1][1]=1.0/sqrt(3.0);

gp[2][0]=1.0/sqrt(3.0); gp[2][1]=-1.0/sqrt(3.0);

gp[3][0]=1.0/sqrt(3.0); gp[3][1]=1.0/sqrt(3.0);

172

N[0]=0.25*(1-1.0/sqrt(3.0)-1.0/sqrt(3.0)+1.0/sqrt(3.0)*(1.0/sqrt(3.0)));

N[1]=0.25*(1+1.0/sqrt(3.0)-1.0/sqrt(3.0)-1.0/sqrt(3.0)*(1.0/sqrt(3.0)));

N[2]=0.25*(1+1.0/sqrt(3.0)+1.0/sqrt(3.0)+1.0/sqrt(3.0)*(1.0/sqrt(3.0)));

N[3]=0.25*(1-1.0/sqrt(3.0)+1.0/sqrt(3.0)-1.0/sqrt(3.0)*(1.0/sqrt(3.0)));

nodeConn= (int **) malloc(nNodes * sizeof(int *));


nodeConn[i] = (int * )malloc(9* sizeof(int));

matrix_zero_int (nNodes, 9, nodeConn);

rowPtr= (int *) malloc((2*nNodes+1)* sizeof(int));

vector_zero_int (2*nNodes+1, rowPtr);

temp_node= (int *) malloc(4 * sizeof(int));

vector_zero_int (4, temp_node);

///*********starting the iterations****************/

for (iterations=0;iterations<Max_iterations;iterations++)

{

elemIndex= (int **) malloc(8 * sizeof(int *));

for(i = 0; i < 8; i++)

elemIndex[i] = (int * )malloc(8* sizeof(int));

matrix_zero_int (8, 8, elemIndex);

//CSR format assembly//

//find information for CSR storage of stiffness matrix

MakeNodeConn(nNodes, element, nElements, nodeConn);

numNonZero=0;

for (i=0;i<nNodes; i++)

for(j=0;j<9;j++)

173

if(nodeConn[i][j]!=0)

numNonZero=numNonZero+4; //two elements in two

rows

col= (int *) malloc(numNonZero* sizeof(int));

vector_zero_int (numNonZero, col);

elem= (double *) malloc(numNonZero* sizeof(double)); // storage space for

CSR

vector_zero(numNonZero, elem);

MakeCSR(rowPtr, col, numNonZero, nNodes, nodeConn);

//Assembly the stiffness matrix in CSR//

for (i=0; i<nElements; i++)

{

for (int p=0;p<4;p++)

for(int q=0; q<2;q++)

{

nc[p][q]=nodeCoord[element[i][p]][q];

}

T=N[0]*E[element[i][0]]+N[1]*E[element[i][1]]+N[2]*E[element[i][2]]+N[3]*E[element[

i][3]];

D[0][0] = T/(1-v*v); D[0][1] = v* T/(1-v*v); D[0][2] = 0;

D[1][0] = v* T/(1-v*v); D[1][1] = T/(1-v*v); D[1][2] = 0;

D[2][0] = 0; D[2][1] = 0; D[2][2] = 0.5*(1-v)* T/(1-v*v);

Element4NStiffness(nc,thickness,D,Ke);

for (k=0; k<4; k++)

temp_node[k]=element[i][k];

GetCSRElemIndex (temp_node, rowPtr, col, elemIndex);

174

for(j1=0; j1<8; j1++)

for(int kk1=0; kk1<8; kk1++)

{

m = elemIndex[j1][kk1];

elem[m]=elem[m] + Ke[j1][kk1];

}

}

//add surface forces//

count=0;


{

if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]>((int)length-250) &&

nodeCoord[i][0]<(int)length)

{

LoadPoint[count]=i;

count++;

}

}

for (i=0;i<count;i++)

f_CSR[2*LoadPoint[i]+1]=-Load*1.0/2/(250/MeshSize);


{

if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]==((int)length-250))

f_CSR[2*i+1]=-0.5*Load/2/(250/MeshSize);

if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]==(int)length)

f_CSR[2*i+1]=-0.5*Load/2/(250/MeshSize);

}

//add restrains equations for the support nodes//

SupOutCount=0;

175


{

if (nodeCoord[i][1]==0)

if ( nodeCoord[i][0]==SupPosition || nodeCoord[i][0]==0)

{

SupOut[SupOutCount]=i;

SupOutCount++;

}

}

for (j=0; j<SupOutCount; j++)

for (i=rowPtr[2*SupOut[j]+1]; i<rowPtr[2*SupOut[j]+1+1]; i++ )

if (col[i]==(2*SupOut[j]+1))

elem[i]=elem[i]+Ky/2;

SupInCount=0;


{

if (nodeCoord[i][1]==0 && nodeCoord[i][0]<SupPosition &&

nodeCoord[i][0]>0)

{

SupIn[SupInCount]=i;

SupInCount++;

}

}

for (j=0; j<SupInCount; j++)

for (i=rowPtr[2*SupIn[j]+1]; i<rowPtr[2*SupIn[j]+1+1]; i++ )

if (col[i]==(2*SupIn[j]+1))

elem[i]=elem[i]+Ky;

// Apply boundary conditions for the right-side nodes using Penalty Method //

RightBoundaryCount=0;

for(i=0; i<nNodes; i++)

{

176

if (nodeCoord[i][0]==(int)RightPosition) //RightPosition can be places

as length.

{

RightBoundary[RightBoundaryCount]=i;

RightBoundaryCount++;

}

}

for(j=0; j<RightBoundaryCount; j++)

for(i=rowPtr[2*RightBoundary[j]]; i<rowPtr[2*RightBoundary[j]+1];

i++ )

if (col[i]==(2*RightBoundary[j]))

elem[i]=elem[i]*1e10;

for(i=0; i<RightBoundaryCount; i++)

f_CSR[2*RightBoundary[i]]=0;

// find nodal displacements *******************u = K\f; = inv(K)*f //

//cpu approach

cpu_pcg_solve( ndof, rowPtr, col, elem, f_CSR, cpu_out);

//gpu approach

gpu_pcg_solve(rowPtr, (ndof+1), col, numNonZero, elem, numNonZero, f_CSR,

ndof, gpu_out);

for(i=0;i<ndof;i++)

u[i]=gpu_out[i];

matrix_zero (nNodes, 3, rStress);

//Nodal stress calculation

SPRStress(u, E, N, v, nElements, nElementsX, nElementsY, nNodes, nNodesX,

nNodesY, nodeCoord, element, ndof, MeshSize, b, gp, rStress);


{

177

sigma1[i]=(rStress[i][0]+rStress[i][1])/2+sqrt(pow((rStress[i][0]-

rStress[i][1])/2,2)+pow(rStress[i][2],2));

sigma3[i]=(rStress[i][0]+rStress[i][1])/2-sqrt(pow((rStress[i][0]-

rStress[i][1])/2,2)+pow(rStress[i][2],2));

}

Maxcomp_stress=max_abs_double(nNodes, sigma3);

Maxtens_stress=max_abs_double(nNodes,sigma1);

//find max_flag

vector_zero_int (nNodes, flag);


if(fabs(sigma3[i])>Max_comp_stress)

flag[i]=1;

maxflag=max_int(nNodes, flag);

//find the position of central line

int middleCount=0;


{

if (nodeCoord[i][0]==length && nodeCoord[i][1]<b)

{

middle[middleCount]=i;

middleCount++;

}

}


{

if (nodeCoord[i][0]==length && nodeCoord[i][1]==b)

{

middle[middleCount]=i;

middleCount++;

}

}

178

for (i=0; i<middleCount; i++)

{

middle_x[i]=rStress[middle[i]][0];

middle_y[i]=rStress[middle[i]][1];

middle_xy[i]=rStress[middle[i]][2];

coord_x[i]=nodeCoord[middle[i]][1];

}

//find the tensile and compressive stress

int below_tensile_stressCount=0;

int above_compressive_stressCount=0;

int coord_belowCount=0;

int coord_aboveCount=0;

for(i=0;i<middleCount;i++)

{

if (middle_x[i]>=0)

{

below_tensile_stressCount++;

coord_belowCount++;

}

else

{

above_compressive_stressCount++;

coord_aboveCount++;

}

}

below_tensile_stress= (double *) malloc(below_tensile_stressCount*

sizeof(double));

vector_zero (below_tensile_stressCount, below_tensile_stress);

above_compressive_stress= (double *)

malloc(above_compressive_stressCount* sizeof(double));

vector_zero (above_compressive_stressCount, above_compressive_stress);

179

coord_below= (int*) malloc(coord_belowCount* sizeof(int));

vector_zero_int (coord_belowCount, coord_below);

coord_above= (int*) malloc(coord_aboveCount* sizeof(int));

vector_zero_int (coord_aboveCount, coord_above);

below_tensile_stressCount=0;

above_compressive_stressCount=0;

coord_belowCount=0;

coord_aboveCount=0;

for(i=0;i<middleCount;i++)

{

if (middle_x[i]>=0)

{

below_tensile_stress[below_tensile_stressCount]=middle_x[i];

below_tensile_stressCount++;

coord_below[coord_belowCount]=coord_x[i];

coord_belowCount++;

}

else

{

above_compressive_stress[above_compressive_stressCount]=middle_x[i];

above_compressive_stressCount++;

coord_above[coord_aboveCount]=coord_x[i];

coord_aboveCount++;

}

}

//find the tensile force and the area of reinforcement

tensile_force=area(coord_belowCount, coord_below, below_tensile_stress,

thickness);

real_tensile_force[iterations]=tensile_force;

steel_area=tensile_force/400;

180

Area_of_Steel[iterations]=steel_area;

//find the position of steel-xc

moment2=Moment2(coord_belowCount,coord_below,below_tensile_stress,thickness);

Xc2=(moment2)/tensile_force;

Xc_value2[iterations]=Xc2;

real_moment2[iterations]=moment2;

////compressive stress redistribution

if (maxflag!=0)

{


{

if (nodeCoord[i][1]==b && nodeCoord[i][0]==length)

E[i]=0;

if (fabs(sigma3[i])>Max_comp_stress)

if(sigma1[i]<=0.33*sqrt(fcdot))

E[i]=kesai*phy*0.9*fcdot/fabs(sigma3[i])*E[i];

else

E[i]=kesai*phy*0.54*fcdot/fabs(sigma3[i])*E[i];

}

}

//tensile stress redistribution

if ((iterations+1)%niter==1)

{

Allowable_tens_stress= Maxtens_stress*(ratio-ratio_step);

}

if ((Xc2-PredefinedXc)>TOL)

{

vector_zero_int(nNodes, flag2);


if (sigma1[i]>Allowable_tens_stress)

181

{

flag2[i]=1;

E[i]=kesai2*

fabs(sigma1[i])/Allowable_tens_stress*E[i];

}

}

if ((Xc2-PredefinedXc)<-TOL)

{

vector_zero_int(nNodes, flag3);


if (sigma1[i]>Allowable_tens_stress)

{

flag3[i]=1;

E[i]=kesai3*fabs(sigma1[i])/Allowable_tens_stress*E[i];

}

}

printf("iter=%d\n\n",iterations);

if(maxflag==0 && fabs(Xc2-PredefinedXc)<TOL)

{

printf("\n\nFinal_iter is %d\n\n", iterations);

break;

}

for(i=0;i<8;i++)

free(elemIndex[i]);

free(elemIndex);

free(col);

free(elem);

free(below_tensile_stress);

free(above_compressive_stress);

free(coord_above);

free(coord_below);

182

printf("iterations= %d; maxflag=%d; steel_area=%lf; Xc2=%lf; x_diff= %lf;

Maxcomp_stress=%lf\n",iterations, maxflag, steel_area, Xc2, Xc2-PredefinedXc,

Maxcomp_stress);

}

printf("iterations= %d; maxflag=%d; steel_area=%lf; Xc2=%lf; x_diff= %lf;

Maxcomp_stress=%lf\n",iterations, maxflag, steel_area, Xc2, Xc2-PredefinedXc,

Maxcomp_stress);

//free all variables//


free(nodeCoord[i]);

free(nodeCoord);

for(i=0;i<nElements;i++)

free(element[i]);

free(element);

for(i=0;i<4;i++)

free(gp[i]);

free(gp);

for(i=0;i<8;i++)

free(Ke[i]);

free(Ke);

for(i=0;i<4;i++)

free(nc[i]);

free(nc);

for(i=0;i<3;i++)

free(D[i]);

free(D);

free(cpu_out);

free(gpu_out);

free(E);

183

free(sigma1);

free(sigma3);

free(N);

free(f_CSR);

free(LoadPoint);

free(SupIn);

free(SupOut);

free(RightBoundary);

free(nodeConn);

free(rowPtr);

free(temp_node);

free(u);

free(flag);

free(flag2);

free(flag3);

free(middle);

free(middle_x);

free(middle_y);

free(middle_xy);

free(coord_x);

free(real_tensile_force);

free(Area_of_Steel);

free(Xc_value2);

free(real_moment2);

return 0;

}

184

//Main Program of GPU-PCG//

//****************** Main Program of GPU-PCG *****************************//

// using PCG approach to solve Ax = b for x with A in CSR format.

// rowptr : matrix row pointer

// col : matrix column pointer

// elem : matrix values

// size* : size of each vector

// vec : pointer to RHS vector

// x_final : solution (x) is returned here

void gpu_pcg_solve(int* rowptr, int size_findrm, int *col, int size_colm, double* elem, int

matrix_val_size,double* vec, int rhs_val_size, double *x_final)

{

clock_t gputime, gpustartingtime, gpuendingtime;

int GPUITER;

int sumGPUtime=0;

for(GPUITER = 0; GPUITER < MAXGPUITER; GPUITER++)

{

gpustartingtime=clock();

// CSR Matrix on the GPU

int *k_findrm, *k_colm;

double *k_val;

// Vectors on the GPU

double *k_b, *k_x, *k_r, *k_d, *k_q, *k_s;

// Diagonal matrix on the GPU (stored as a vector)

double* k_jac;

// Scalars on the GPU

double *k_alpha, *k_snew, *k_beta, *k_sold, *k_s0;

// Scalars on the host

double s0, snew;

int iterations = 0;

// Allocate space on the GPU for the CSR matrix and RHS vector, and copy from

host to GPU

cudaMalloc((void**)&k_findrm, sizeof(int)*(size_findrm));

185

cudaMemcpy(k_findrm, rowptr, sizeof(int)*(size_findrm),

cudaMemcpyHostToDevice);

cudaMalloc((void**)&k_colm, sizeof(int)*(size_colm));

cudaMemcpy(k_colm, col, sizeof(int)*(size_colm), cudaMemcpyHostToDevice);

cudaBindTexture(NULL, texture_colm, k_colm, sizeof(int)*(size_colm));

cudaMalloc((void**)&k_val, sizeof(double)*(matrix_val_size));

cudaMemcpy(k_val, elem, sizeof(double)*(matrix_val_size),


cudaMalloc((void**)&k_b, sizeof(double)*(rhs_val_size));

cudaMemcpy(k_b, vec, sizeof(double)*(rhs_val_size),


// Allocate space for vectors on the GPU

cudaMalloc((void**)&k_x, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_r, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_d, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_q, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_s, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_jac, sizeof(double)*(rhs_val_size));

cudaMalloc((void**)&k_alpha, sizeof(double));

cudaMalloc((void**)&mid_temp, sizeof(double)*NUM_BLOCKS);

cudaMalloc((void**)&k_snew, sizeof(double)*NUM_BLOCKS);

cudaMalloc((void**)&k_sold, sizeof(double));

cudaMalloc((void**)&k_beta, sizeof(double));

cudaMalloc((void**)&k_s0, sizeof(double));

// Dimensions of blocks and grid on the GPU

dim3 BlockDim(NUM_THREADS);

dim3 GridDim(NUM_BLOCKS);

// Create diagonal preconditioning matrix (J = 1/diag(M))

create_diag<<<1,BlockDim>>>(rhs_val_size, k_findrm, k_colm, k_val, k_jac);

186

// Bind the matrix to the texture cache - this was not done earlier as we

modified the matrix

cudaBindTexture(NULL, texture_val, k_val, sizeof(double)*(matrix_val_size));

// Initialise result vector (x=0)

veczero<<<1,BlockDim>>>(rhs_val_size, k_x);

// r=b-Ax (r=b since x=0), and d=M^(-1)r

cudaMemcpy(k_r, k_b, sizeof(double)*(rhs_val_size),

cudaMemcpyDeviceToDevice);

diag_spmv<<<1,BlockDim>>>(rhs_val_size, k_jac, k_r, k_d);

// s0 = r.d

vecdot(rhs_val_size, k_r, k_d, k_s0);

// snew = s0

scalarassign(k_snew, k_s0);

// Copy snew and s0 back to host so that host can evaluate stopping condition

cudaMemcpy(&snew, k_snew, sizeof(double), cudaMemcpyDeviceToHost);

cudaMemcpy(&s0, k_s0, sizeof(double), cudaMemcpyDeviceToHost);

// While i < imax and snew > epsilon^2*s0

while(( iterations<IMAX) && (snew>(Epsilon*Epsilon*s0)))

{

kernel<<<GridDim,BlockDim>>>(k_findrm, k_d, k_colm, k_val, k_q,

rhs_val_size);

// alpha = snew/(d.q)

vecdot(rhs_val_size, k_d, k_q, k_alpha);

scalardiv<<<1,1>>>(k_snew, k_alpha, k_alpha);

// x = x + alpha*d

axpy<<<GridDim,BlockDim>>>(rhs_val_size, k_alpha, k_d, k_x, k_x);

// r = r - alpha*q

187

ymax<<<GridDim,BlockDim>>>(rhs_val_size, k_alpha, k_q, k_r);

// s = M^(-1)r

diag_spmv<<<GridDim,BlockDim>>>(rhs_val_size, k_jac, k_r, k_s);

// sold = snew

scalarassign(k_sold, k_snew);

// snew = r.s

vecdot(rhs_val_size, k_r, k_s, k_snew);

// beta = snew/sold

scalardiv<<<1,1>>>(k_snew, k_sold, k_beta);

// d = s + beta*d

axpy<<<GridDim,BlockDim>>>(rhs_val_size, k_beta, k_d, k_s, k_d);

// Copy back snew so the host can evaluate the stopping condition

cudaMemcpy(&snew, k_snew, sizeof(double),

cudaMemcpyDeviceToHost);

iterations++;

}

// Copy result vector back from GPU

cudaMemcpy(x_final, k_x, sizeof(double)*(rhs_val_size),

cudaMemcpyDeviceToHost);

//ThreadSynchronize

cudaThreadSynchronize();

// free memory

cudaUnbindTexture( texture_colm);

cudaUnbindTexture( texture_val);

cudaFree(k_findrm);

188

cudaFree(k_colm);

cudaFree(k_val);

cudaFree(k_b);

cudaFree(k_x);

cudaFree(k_r);

cudaFree(k_d);

cudaFree(k_q);

cudaFree(k_jac);

cudaFree(k_alpha);

cudaFree(k_snew);

cudaFree(k_sold);

cudaFree(k_beta);

cudaFree(k_s0);

cudaFree(mid_temp);

gpuendingtime=clock();

gputime=gpuendingtime-gpustartingtime;

sumGPUtime=sumGPUtime+gputime;

}

//printf(" \nSolving time on GPU for %d steps is %d

ms\n\n",GPUITER,sumGPUtime/GPUITER);

}

189

//Main Program of CPU-PCG//

//******************* Main Program of CPU-PCG

*********************************//

// Host implementation of conjugate gradient method to solve Ax = b for x with A in CSR

format //

void cpu_pcg_solve( int rhs_val_size, int *k_findrm, int *k_colm, double *k_val, double *k_b,

double *k_x)

{

int CPUITER;

int sumCPUtime=0;

clock_t cputime, startingtime, endingtime;

for(CPUITER = 0; CPUITER < MAXCPUITER; CPUITER++)

{

startingtime=clock();

int iterations=0;

double k_s0, k_snew, k_alpha, k_sold, k_beta;

double *k_jac = (double *)malloc(rhs_val_size * sizeof(double));

double *k_r = (double *)malloc(rhs_val_size * sizeof(double));

double *k_d = (double *)malloc(rhs_val_size * sizeof(double));

double *k_q = (double *)malloc(rhs_val_size * sizeof(double));

double *tmp = (double *)malloc(rhs_val_size * sizeof(double));

double *k_s = (double *)malloc(rhs_val_size * sizeof(double));

//creat the diagonal preconditioning matrix (J=1/diag(M))

cpu_creat_jac(rhs_val_size,k_findrm,k_colm,k_val,k_jac);

//initialize result vector (x=0)

vector_zero(rhs_val_size,k_x);

//r=b-Ax (r=b since x=0),and d=M^(-1)r

memcpy(k_r,k_b,rhs_val_size*sizeof(double));

cpu_diag_spmv(rhs_val_size,k_jac,k_r, k_d);

190

//s0=r.d

k_s0=vec_dot_vec(rhs_val_size,k_r,k_d);

//snew=s0

k_snew=k_s0;

vector_zero(rhs_val_size,k_q);

while(( iterations<IMAX) && (k_snew>(Epsilon*Epsilon*k_s0)))

{

vector_zero(rhs_val_size,k_q);

spmv_csr_serial(rhs_val_size, k_findrm, k_colm,k_val,k_d,k_q);

k_alpha=vec_dot_vec(rhs_val_size,k_d,k_q);

k_alpha=k_snew/k_alpha;

sca_mul_vec(rhs_val_size,k_alpha,k_d,tmp);

vec_add(rhs_val_size,k_x,tmp);

//r=r-alpha*q

sca_mul_vec(rhs_val_size,k_alpha,k_q,tmp);

vec_sub(rhs_val_size,k_r,tmp);

cpu_diag_spmv(rhs_val_size,k_jac,k_r, k_s);

k_sold=k_snew;

//snew=r.s

k_snew=vec_dot_vec(rhs_val_size,k_r,k_s);

//beta=snew/sold

k_beta=k_snew/k_sold;

//d=s+beta*d

sca_mul_vec(rhs_val_size,k_beta,k_d,tmp);

vec_add_vec(rhs_val_size,k_s,tmp,k_d);

191

iterations++;

}

free(k_jac);

free(k_r);

free(k_d);

free(k_q);

free(tmp);

free(k_s);

endingtime=clock();

cputime=endingtime-startingtime;

sumCPUtime=sumCPUtime+cputime;

}

//printf(" \nSolving time on CPU for %d steps is %d

ms\n\n",CPUITER,sumCPUtime/CPUITER);

}

efficient reinforced concrete design using...

Documents