efficient reinforced concrete design using...
TRANSCRIPT
Efficient Reinforced Concrete Design Using Modified Linear Elastic Finite Element
Analysis and its GPU Implementation
by
Xing TAN
BEng
This thesis is presented for the degree of
Doctor of Philosophy
of
The University of Western Australia
School of Civil and Resource Engineering
May 2012
Supervisor: Professor Andrew J. Deeks
i
Abstract
Although the strut and tie approach is a rational and reasonable approach for the design of
non-flexural members in concrete structures, the approach may lead to suboptimal design, as
much of the material present in the member is neglected. Other difficulties, such as the
amount of time consumed and the designer dependency of the solutions, have been
encountered in its implementation. To avoid these problems, design may be undertaken using
conventional linear elastic finite element analysis, which can yield more efficient designs with
less material usage. However, the conventional linear elastic finite element method is also
inefficient when the non-flexural members contain stress singularities, such as occur in a deep
beam with square or rectangular web openings. These stress singularities lead to singular
stress fields which always violate the yield criterion. This thesis proposes a modified linear
elastic finite element method which can successfully remove the stress singularities by
adjusting the elastic modulus in certain regions. This new approach involves stress
redistribution in terms of both compressive stress and tensile stress. Three different types of
beams, namely shallow beams, deep beams and deep beams with rectangular openings are
used to demonstrate its efficiency. Additionally, both the conventional strut-and-tie method
and the conventional LEFEA method are performed for comparison. Results show that the
modified linear finite element approach to design (MLEFEA) is efficient, as it can overcome
some of the inefficiencies involved in both conventional the strut-and-tie design approach and
the conventional linear elastic finite element design approach. Furthermore, to verify its safety,
the performance of the designs resulting from the new method is assessed through non-linear
finite element analysis using ABAQUS, where the results indicate that MLEFEA is safe and can
be used as a design approach.
In order to make the MLEFEA analysis more efficient in terms of computing time, this thesis
also describes the implementation of the method on Graphic Processor Units (or GPUs). GPUs
ii
iii
are now being widely used in various scientific computational applications due to their
tremendous performance, memory bandwidth and their massively-parallel and high intensity
computational capacities. This thesis applies GPUs to the stress redistribution process arising
from the analysis of deep beams with rectangular openings. The basic process of stress
redistribution and the GPU architecture are first introduced, then several parallel techniques
for the iterative methods are reviewed. Finally the PCG method is chosen as the most suitable
approach for the current application. This is followed by an introduction to the CSR storage
format and the SpMV algorithm. The GPU-PCG method used for solving the equations systems
is then described, and the stiffness matrix assembly in CSR format is also presented.
Finally, the efficiency of the GPU implementation is demonstrated by providing speed
comparison results between the GPU-based and the CPU (sequential)-based algorithm for
stress redistribution for the example of a deep beam with web openings.
iv
v
Acknowledgement
I owe my sincere and deepest gratitude to my supervisor Professor Andrew J. Deeks for his
generous help and guidance in this study. With great patience and enthusiasm, he always
provided invaluable discussion and high responsibility throughout my whole study. This thesis
would certainly not have been completed without him.
Additionally, I deeply appreciate his effort in giving me the unique opportunity to pursue my
PhD with him and in providing me the scholarship during my time in both the University of
Western Australia and Durham University in UK. What I learned from this remarkable
experience with him will definitely benefit me greatly along with my future life.
Finally, my unconditional love goes to my parents and my brothers, whose heartfelt
encouragement and never-ending support gave me strength all the way along. Without their
love and support, I could not do anything.
vi
vii
Statement of Candidate Contribution
I certify that except where references are made in the text to the work of others, the contents
of this thesis are original and have not been submitted to any other university.
This thesis is the result of my own work.
Xing Tan
May 2012
viii
ix
Table of Contents
Abstract ............................................................................................................................... i
Acknowledgement .............................................................................................................. v
Statement of Candidate Contribution ................................................................................ vii
Table of Contents ............................................................................................................... ix
List of Figures ................................................................................................................... xiii
List of Tables .................................................................................................................... xvii
List of Appendices ............................................................................................................. xix
1 Introduction ................................................................................................................ 1
1.1 Background and Motivation for This Work ................................................................ 1
1.2 Outline of This Thesis ................................................................................................. 2
PART I: EFFICIENT REINFORCED CONCRETE DESIGN METHOD ............................................... 4
2 Basic Theory and Literature Review ............................................................................. 5
2.1 Structural Design Methods ......................................................................................... 5
2.1.1 Working Stress Method ................................................................................ 6
2.1.2 Ultimate Load Method .................................................................................. 7
2.1.3 Limit State Method ....................................................................................... 8
2.2 Design of Reinforced Concrete Structures ............................................................... 10
2.2.1 Equivalent Stress Block Method ................................................................. 10
2.2.2 Strut and Tie Method .................................................................................. 13
2.2.3 Linear Elastic Finite Element Method ......................................................... 19
2.3 Stress Singularities .................................................................................................... 22
2.4 The Finite Element Method ...................................................................................... 25
2.5 Summary ................................................................................................................... 30
3 Comparison of Conventional Design Approaches with LEFEA-based Design ................. 32
3.1 Design of a Flexural Reinforced Concrete Beam ...................................................... 32
x
3.1.1 Application of Conventional Design Approach (Equivalent Stress Block Approach) .................................................................................................................. 33
3.1.2 Application of Conventional LEFEA Approach ............................................. 34
3.1.3 Cost Comparison and Remarks .................................................................... 39
3.2 Application to Design of Non-flexural Reinforced Concrete Beams without Rectangular Openings ......................................................................................................... 40
3.2.1 Application of Conventional Design Approach (STM) ................................. 40
3.2.2 Application of Conventional LEFEA Approach ............................................. 42
3.2.3 Cost Comparison and Remarks .................................................................... 47
3.3 Design of Non-flexural Reinforced Concrete Beams with Rectangular Openings .... 48
3.3.1 Application of Conventional Design Approach (STM) ................................. 49
3.3.2 Application of Conventional LEFEA Approach ............................................. 52
3.3.3 Cost Comparison and Remarks .................................................................... 56
3.4 Summary ................................................................................................................... 57
4 Modified Linear Elastic Finite Element Method ........................................................... 59
4.1 Stress Redistribution ................................................................................................. 59
4.1.1 Finite Element Implementation ................................................................... 61
4.1.2 Application to L-shaped Plate ...................................................................... 63
4.2 Summary ................................................................................................................... 70
5 Adaptive Stress Redistribution Approach ................................................................... 71
5.1 Adaptive Compressive Stress Redistribution Approach ........................................... 71
5.1.1 Application to Flexural Reinforced Concrete Beam .................................... 75
5.1.1.1 Cost Comparison and Remarks ..................................................... 79
5.1.2 Application to Non-flexural Reinforced Concrete Beams without Rectangular Openings ............................................................................................... 80
5.1.2.1 Cost Comparison and Remarks ..................................................... 83
5.1.3 Application to Non-flexural Reinforced Concrete Beams with Rectangular Openings ................................................................................................................... 84
xi
5.1.3.1 Cost Comparison and Remarks .................................................... 86
5.2 Adaptive Tensile Stress Redistribution Approach .................................................... 87
5.3 Adaptive Stress Redistribution Approach for both Compressive and Tensile Stress 91
5.3.1 Application to Flexural Reinforced Concrete Beam .................................... 92
5.3.1.1 Cost Comparison and Remarks .................................................... 93
5.3.1.2 Nonlinear verification .................................................................. 96
5.3.2 Application to Non-flexural Reinforced Concrete Beams without Rectangular Openings .............................................................................................. 97
5.3.2.1 Cost Comparison and Remarks .................................................... 99
5.3.2.2 Non-linear Verification ............................................................... 100
5.3.3 Application to Non-flexural Reinforced Concrete Beams with Rectangular Openings ................................................................................................................. 101
5.3.3.1 Cost Comparison and Remarks .................................................. 102
5.3.3.2 Nonlinear verification ................................................................ 103
5.4 Summary ................................................................................................................. 104
PART II: EFFICIENT Graphic Processing Unit (GPU) IMPLEMENTATION ............................... 106
6 Basic Theory & Literature Review ............................................................................. 107
6.1 Graphics Processing Unit (GPU) ............................................................................. 107
6.2 GPU Implementation of Finite Element Analysis ................................................... 113
6.2.1 Stiffness Matrix Assembly ......................................................................... 113
6.2.2 Stiffness Matrix Solving ............................................................................. 114
6.3 Summary ................................................................................................................. 118
7 Efficient GPU Implementation of the Modified LEFEA Approach ................................ 120
7.1 GPU Implementation of Preconditioned Conjugate Gradient Method (GPU-PCG) 120
7.2 GPU Implementation of Modified LEFEA Approach (GPU-MLEFEA) ...................... 123
7.3 Results Comparison (Speedup Results) .................................................................. 124
7.4 Summary ................................................................................................................. 125
xii
8 Conclusions ............................................................................................................. 126
References ...................................................................................................................... 129
Appendices ..................................................................................................................... 140
xiii
List of Figures
Figure 1: Conditions at Mu in a singly reinforced concrete section (Warner 2007) ................... 11
Figure 2: Equivalent Rectangular Stress Block (Warner 2007) ................................................... 11
Figure 3: Geometry of Deep Beam ............................................................................................. 15
Figure 4: Von Mises Stress Plot for Deep Beam (Linear Elastic Analysis) ................................... 15
Figure 5: Strut-and-Tie Model for Deep Beam............................................................................ 15
Figure 6: Geometry of the Deep Beam with Rectangular Openings ........................................... 23
Figure 7: Finite Element Model (Left) and Von Mises Stress (Right) .......................................... 24
Figure 8: Isoparametric Bilinear Quadrilateral Element in Local Coordinates............................ 26
Figure 9: Geometry of Shallow Beam ......................................................................................... 32
Figure 10: Model for the Shallow Beam ..................................................................................... 36
Figure 11: Plot of Principal Compressive Stress .......................................................................... 37
Figure 12: Trapezoidal Rule for the Integration .......................................................................... 38
Figure 13: Plot of Tensile Stresses across Mid-span of Beam ..................................................... 38
Figure 14: Geometry of Deep Beam ........................................................................................... 40
Figure 15: Strut and Tie Model for Deep Beam .......................................................................... 41
Figure 16: Model for the Deep Beam ......................................................................................... 43
Figure 17: Plot of Principal Compressive Stress .......................................................................... 44
Figure 18: Plot of Principal Tensile Stress ................................................................................... 45
Figure 19: Plot of Tensile Stresses across Mid-span of Beam ..................................................... 46
Figure 20: Von Mises Stress for Deep Beam using LEFEA ........................................................... 47
Figure 21: Geometry of Deep Beam with Rectangular Openings ............................................... 49
Figure 22: Strut and Tie Model for Deep Beam with Rectangular Openings .............................. 49
Figure 23: Strut and Tie Model for Deep Beam with Rectangular Openings (Half Model) ........ 50
Figure 24: Force Equilibrium for the Applied Load ..................................................................... 50
Figure 25: STM model for Bottle Shaped Strut and Force Equilibrium (Warner 2007) .............. 51
Figure 26: Model for the Deep Beam with Rectangular Openings ............................................. 53
Figure 27: Plot of Principal Compressive Stress .......................................................................... 54
Figure 28: Plot of Principal Tensile Stress ................................................................................... 55
Figure 29: Plot of Tensile Stresses across Mid-span of Beam ..................................................... 56
Figure 30: Linear Reduction in Elastic Modulus .......................................................................... 60
Figure 31: L-shape Plate .............................................................................................................. 63
Figure 32: Von Mises Stress of L-shaped Plate (Coarse Mesh) ................................................... 64
Figure 33: Von Mises Stress of L-shaped Plate (Finer Mesh) ...................................................... 65
Figure 34: Von Mises Stress over X Direction ............................................................................. 66
Figure 35: Von Mises Stress over Y Direction ............................................................................. 66
Figure 36: Procedure for adjusting Elastic Modulus ................................................................... 67
Figure 37: Relative Value for Elastic Modulus ............................................................................ 67
Figure 38: Von Mises Stress of L-shaped Plate after Stress Redistribution ................................ 68
Figure 39: Principal Compressive Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) . 68
Figure 40: Stress in X Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) ............... 69
Figure 41: Stress in Y Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) ............... 69
Figure 42: Principal Tensile Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right) .......... 70
xiv
Figure 43: Flowchart of the Practical Implementation of Adaptive Compressive Stress
Redistribution .............................................................................................................................. 72
Figure 44: Principal Compressive Stress for Shallow Beam---MLEFEA ........................................ 76
Figure 45: Difference of Principal Compressive Stress for Shallow Beam--- (LEFEA minus
MLEFEA) ....................................................................................................................................... 77
Figure 46: Stresses across Mid-span after Compressive Stress Redistribution ........................... 77
Figure 47: Relative Value of Elastic Modulus across Mid-span after Compressive Stress
Redistribution .............................................................................................................................. 78
Figure 48: Plots of Principal Compressive Stress using LEFEA with Adaptive Compressive Stress
Redistribution .............................................................................................................................. 81
Figure 49: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress
Redistribution .............................................................................................................................. 81
Figure 50: Plot of Stresses at Mid-span ....................................................................................... 82
Figure 51: Stresses across Mid-span after Compressive Stress Redistribution (First and Last
Iteration) ...................................................................................................................................... 83
Figure 52: Plots of Principal Compressive Stress using adaptive compressive stress
redistribution approach ............................................................................................................... 85
Figure 53: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress
Redistribution .............................................................................................................................. 85
Figure 54: Plot of Stresses at Mid-span ....................................................................................... 86
Figure 55: Flowchart of the Practical Implementation for Tensile Stress Redistribution ........... 89
Figure 56: Flowchart of the Practical Implementation for both Compressive and Tensile Stress
Redistributions ............................................................................................................................ 91
Figure 57: Stresses across Mid-span after Stress Redistribution for both Compressive and
Tensile Stresses............................................................................................................................ 92
Figure 58: Relative Value of Elastic Modulus across Mid-span after Stress Redistribution for
both Compressive and Tensile Stresses ...................................................................................... 93
Figure 59: Design Results for Shallow Beam ............................................................................... 94
Figure 60: Stress Blocks for Different Approaches ...................................................................... 96
Figure 61: Load vs. Deflection Curve for Shallow Beam .............................................................. 97
Figure 62: Stresses across Mid-span after Stress Redistribution for both Compressive and
Tensile Stresses............................................................................................................................ 98
Figure 63: Relative Elastic Modulus along Mid-span after Stress Redistribution for both
Compressive and Tensile Stresses ............................................................................................... 98
Figure 64: Design Results for Deep Beam ................................................................................. 100
Figure 65: Load vs. Deflection Curve for Deep Beam ................................................................ 101
Figure 66: Stresses across Mid-span after Stress Redistribution for both Compressive and
Tensile Stresses.......................................................................................................................... 102
Figure 67: Design Results for Deep Beam with Openings ......................................................... 103
Figure 68: Load vs. Deflection Curve for Deep Beam with Openings ........................................ 104
Figure 69: Different Design Philosophies for CPUs and GPUs ................................................... 108
Figure 70: Execution of a CUDA Program .................................................................................. 110
Figure 71: Hierarchy of CUDA Threads ...................................................................................... 111
Figure 72: Hierarchy of GPU Memory ....................................................................................... 112
Figure 73: CSR Representation for a Sparse Matrix K ............................................................... 122
Figure 74: SpMV Kernel for the Sparse Matrix in CSR Format .................................................. 122
xv
Figure 75: GPU-PCG vs. CPU-PCG .............................................................................................. 125
xvi
xvii
List of Tables
Table 1: Cost Comparison for Shallow Beam Design .................................................................. 39
Table 2: Cost Comparison for Deep Beam Design ...................................................................... 47
Table 3: Cost Comparison for designs of Deep Beam with Openings ......................................... 57
Table 4: Stress Reduction Factors ............................................................................................... 73
Table 5: Comparison of MLEFEA with different value of ε ......................................................... 79
Table 6: Approaches Comparison for Shallow Beam .................................................................. 79
Table 7: Approaches Comparison for Deep Beam ...................................................................... 84
Table 8: Cost Comparison for designs of Deep Beam with Openings ......................................... 87
Table 9: Approaches Comparison for Shallow Beam (with Updated MLEFEA) .......................... 94
Table 10: Approaches Comparison for Deep Beam (with Updated MLEFEA) ............................ 99
Table 11: Comparison between GPU-PCG and PCG ................................................................. 124
xviii
xix
List of Appendices
Appendix A: Complete Listing of the Program .......................................................................... 140
xx
1
1 Introduction
1.1 Background and Motivation for This Work
As a popular and a particularly effective building material, Portland cement concrete, typically
referred to as “concrete”, has been widely used in construction in Australia and around the
world. It is made by mixing the cement, water, fine aggregate (e.g. sand or finely crushed rock),
and coarse aggregate (e.g. gravel). In most cases, fly ash, limestone or blast furnace slag are
also used within the concrete mix.
Because of its components, the manufacture of cement is widely seen to be a major
contributor to the release of carbon into the atmosphere, with approximately 1 tonne of
carbon dioxide being released for each tonne of cement produced. Thus concrete can be a
great pollution source in terms of carbon emission resulting from its production and delivery.
This issue has received widespread attention and is now of significant interest. For example,
currently in Australia, to reduce the carbon emission due to construction activity, building
owners will be rewarded in terms of green stars if they include 20% fly ash or blast furnace slag
in their concrete mixes.
However, this may not lead to the desired environmental outcome, as suppliers may increase
the total quantity of cement in the mix in order to meet a performance-based specification
(e.g. with respect to strength gain) while retaining 20% of fly ash or slag. Since the production
of concrete is a major generator of carbon, reducing the amount of concrete in a design is a
much more effective way of reducing the environmental impact of concrete than such
arbitrary requirements.
This thesis develops techniques that can effectively reduce the carbon emission during
construction activities by reducing the amount of concrete being used in structures through a
more efficient design approach based on Modified Linear Elastic Finite Element Analysis
2
(MLEFEA). By ensuring that the concrete is being used optimally, the impact of concrete
building activity on the environment can be reduced to the minimum level necessary. Using
less concrete in buildings reduces the environmental impact throughout the complete lifecycle
of the building, and contributes to sustainable development.
1.2 Outline of This Thesis
This thesis starts with the preceding introduction to the overall background and motivation for
the research project. Then the main content of the thesis is divided into two parts. The first
part is about the efficient reinforced concrete design approach, and the other one is as about
the efficient GPU implementation of this approach. For the first part, the basic theory of
reinforced concrete structure design and the literature about this topic is reviewed. The
concept of stress singularities and the finite element method are also introduced here.
In Chapter 3 comparison between conventional design approaches and LEFEA based design is
performed. Designs are conducted for three different types of structures, shallow beams, deep
beams and deep beams with rectangular openings. For the conventional approaches, the
equivalent stress block approach is used to design the shallow beams, while the strut and tie
approach is used for the deep beams. The solutions provide a general idea about the efficiency
and inefficiency of LEFEA approach in various engineering applications. The cost comparison in
terms of concrete and steel is included.
Chapter 4 proposes a new approach (named MLEFEA) to perform the compressive stress
redistribution, using the example of an L-shaped plate. The results show that this approach can
successfully remove the stress singularity encountered at the re-entrant corner of the plate.
The details of the process required to perform the MLEFEA is presented.
Chapter 5 develops an adaptive MLEFEA for three different models. Additionally, both the
compressive stress redistribution and the tensile stress redistribution are introduced into the
3
MLEFEA and examined. At the same time, the non-linear analysis of the load capacity is
undertaken for each design using ABAQUS .
The rationale of using GPU for performing MLEFEA is investigated in Chapter 6, which also
includes a general overview of part two. This chapter also deals with the basic theory and
literature about the GPU and its use in finite element analysis, and approaches to stiffness
matrix assembly and reduction are investigated.
Chapter 7 presents the GPU implementation in MLEFEA. To begin with, the GPU-based PCG
approach is introduced, then the GPU-based MLEFEA is developed, with several effective
optimization approaches being presented. This chapter ends with speed comparison between
CPU code and GPU code.
The thesis finishes with a conclusion in Chapter 8 covering the key developments and findings
of this study. Appendices and references follow.
4
PART I: EFFICIENT REINFORCED CONCRETE DESIGN
METHOD
In this part, theories and literature relevant to the reinforced concrete design approaches are
reviewed first. The conventional design approaches are then investigated for three different
types of structures (beams). The basic idea of MLEFEA is proposed and is demonstrated with
an L-shaped plate application. This is followed by the introduction of an adaptive MLEFEA
approach, and those three different types of beams are analysed again to demonstrate the
efficiency of this refined approach. The designs resulting from the approach are then verified
as safe and reasonable through the non-linear finite element analysis.
5
2 Basic Theory and Literature Review
In this chapter, the basic theories involved in this work, including structural design methods,
reinforced concrete structure design methods and the stress singularities resulting from linear
elastic stress analysis of certain structural configurations are introduced. In addition, the
literature about these theories is also reviewed. For the structural design methods, three
different design methods, namely the Working Stress Method, the Ultimate Load Method, and
the Limit State Method, are introduced separately. For the design of reinforced concrete
structures, including flexural and non-flexural members, the conventional Equivalent Stress
Block Method, the Strut and Tie Method, and the Linear Elastic Finite Element Method are
described. Finally, the steps for conducting finite element analysis of elastic structures are also
explained, as this method forms the basis for the approach taken in the work described in this
thesis.
2.1 Structural Design Methods
The basic principle objective for a reinforced concrete structure design is to ensure that the
structure can achieve its intended purposes over its intended life time, with the following
properties being maintained (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007):
Adequate Performance in terms of stability and strength: The structure should have adequate
strength to resist any overloads occurring during its life time and to perform well under service
conditions without collapse or excessive cracking.
Adequate Serviceability in terms of durability and stiffness: The structure should not exhibit
excessive deformation, and should maintain functionally over its intended life time, resisting
unexpected loads without great loss of stiffness.
Reasonable cost: Construction costs should be as economical as possible, while still meeting
the requirements of performance and serviceability.
6
With the above objectives being borne in mind, the following three methods are commonly
used to conduct the design of reinforced concrete structures.
1. Working Stress Method, also known as Modular Ratio Method;
2. Ultimate Load Method, also known as Load Factor Method;
3. Limit State Method.(Cevahir et al. 2010a)
2.1.1 Working Stress Method
The Working Stress Method is a traditional design method adopted in early reinforced
concrete structural design codes (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007). This
method allows for the conversion of a member constructed with different materials into an
“equivalent” section which has a homogeneous and single elastic modulus for design purposes,
which is why it is also called the “Modular Ratio Method”. In this approach, the structure is
analysed under specified design loads, and the stresses in each member are checked to ensure
a sufficient factor of safety against failure or yielding of that member. The magnitude of the
factor of safety required for a particular structural action depends upon the degree of safety
required. Calculation of these stresses usually involves idealising the structure as a collection
of beams and columns, and then calculating the stress resultants (internal axial forces, shear
forces, bending moments and torques) for every member. Elastic behaviour of the material is
assumed, and it is also assumed that plane sections remain plane. This method is very easy to
undertake and simple to understand, and at the same time, designs produced by this method
always results in relatively large structure member sections compared to designs produced by
the Ultimate Load Method, and therefore, this method will give designs with comparatively
better serviceability and performance under working loads.
However, because this method assumes that all stresses in the steel reinforcement remain in
the linear elastic range, it doesn’t utilize the real strength of structure and generally results in a
higher factor of safety against failure than is needed. Secondly, it generally results in
7
uneconomical designs when members use compressive reinforcement and when dealing with
compressive members, as it will require a larger volume of compressive steel compared with
the Limit State Method. Finally, replacing the concrete and steel with a homogeneous material
for the purpose of analysis is not realistic, as the creep and non-linear behaviour of the
concrete will not give concrete a definite elastic modulus.
In a sentence, the design load applied to a structure designed in this manner will usually be far
below its actual ultimate collapse load, and so will be a conservative design. Consequently, it is
hard to obtain an optimal and economical design by using this working stress method.
2.1.2 Ultimate Load Method
To overcome the shortage of Working Stress Method in terms of its incapability to give designs
with the target factor of safety against failure, the Ultimate Load Method was introduced into
reinforced concrete design (Varghese 2004; Punmia et al. 2007; Mosley et al. 2007). This
method uses a load factor to ensure the safety of structures by taking this as the ratio of the
ultimate load of the structure to the working load carried by the structure. The structure is
then designed to collapse at the ultimate load. Unlike the Working Stress Method, which
considers only the elastic range of material behaviour, the Ultimate Load Method can give the
required margin of the safety against failure, since it considers the full non-linear stress-strain
relationship for both concrete and steel, and thus experimental tests of structures designed in
this way show the actual collapse loads are close to the design ultimate loads.
However, because this method utilises the full strength of the members, the structure sections
designed by this method may be very thin or slender, which may result in excessive cracking
and deformation under service loads and lead to the lack of serviceability. The method doesn’t
consider the effect of creep and shrinkage for the concrete, which may exacerbate
serviceability problems.
8
To summarise, the Ultimate Load Method successfully ensures the safety of a structure and
results in efficient designs, while neglecting serviceability and performance under service loads.
Again, this method has been effectively superseded by the modern Limit State Method.
2.1.3 Limit State Method
From the above introductions, the Working Stress Method gives reasonable serviceability and
performance but only partially utilizes the actual strength of the designed structure, while on
the other hand, the Ultimate Load Method provides the target structure strength without
considering adequate serviceability and performance under service loads. An ideal method
would take full advantage of both these two methods by guaranteeing the serviceability and
performance, while at the same time considering the target ultimate strength of structure.
As the Limit State Design approach can meet al.l those requirements, it is now accepted and
widely adopted in many international modern reinforced concrete structures design codes,
including the Australian concrete design code AS3600 and steel design code AS4100. There is
lots of literature about the limit state method (Kotsovos & Pavlovic 1995; Varghese 2004;
Punmia et al. 2007). Importantly, when assessing a particular limit state of a structure by using
the Limit State Method, it is necessary to take all the variables (e.g. material strength, loads
types) into consideration. This can be achieved by using characteristic values for materials and
loads.
Limit states are specified on serviceability, ensuring that the structure performs adequately at
the expected working loads without deforming or cracking excessively, and also on strength.
The strength limit states specify loading regimes sufficiently far above expected working loads
that a structure designed to collapse under these regimes has a sufficiently low probability of
failure during its lifetime. This means the probability of occurrence of the limit state load
combinations in the lifetime for the structure is small enough so that designs based on it can
be considered as safe.
9
Obviously, there is still space to improve and optimize the techniques used within this method
so that designers can fully utilize the redundancy within structures. For example, designers can
allow some members to reach stresses which will cause significant non-linear behaviour
between the serviceability and the strength limit states.
To obtain the efficiencies possible through the limit state method, the moment redistribution
approach (Williams 2009) is widely employed in various applications of structural design,
including high-strength concrete beams (Carmo et al. 2005), reinforced concrete flexural
members (Scott & Whittle 2005) and plated reinforced concrete flexural members (Oehlers et
al. 2004b; Oehlers et al. 2004a). However, moment distribution is most readily performed by
hand calculation for continuous beams. Moment redistribution for frames is more difficult.
Although there are some ‘tricks’ which can be used with structural analysis packages to
perform moment redistribution for frames, only few designers are willing to use them. As
many designers do not feel confident enough to perform this type of moment redistribution,
they just use the conventional elastic actions to perform the design. In this situation, the full
advantage possible by using the limit state design method is not obtained.
Nowadays, all modern codes of design of structures are based on limit state analysis, where
two principal theorems of limit analysis, namely the lower bound theorem and the upper
bound theorem, are used (Koopman & Lance 1965; Sloan 1989; Frier & Damkilde 2009). The
lower bound theorem states that if a load is in equilibrium with an internal stress distribution
where no stress exceeds the local value of plastic stress, this load is equal to or less than, in
other words is a lower bound of, the true plastic limit load. The upper bound theorem states
that if a load is computed on the basis of an assumed kinematically allowable collapse
mechanism, this load is equal to or greater than or in other words is an upper bound of, the
true plastic limit load. There is lots of research about the application of the lower and upper
bound theorems, e.g. the book written by Neal (Neal 1985) detailed introduced plastic analysis
for beams and plane frames, Sloan (Sloan 1989) developed a perfectly plastic soil model for
10
computing rigorous upper bounds on limit loads under conditions of plane strain, and Christian
(Frier & Damkilde 2009) demonstrated the lower bound limit state analysis by applying an
adapted interior-point method with a spatially varying barrier function.
2.2 Design of Reinforced Concrete Structures
The current Australian Standard for the design of concrete structures, AS3600, describes three
different methods for the design of the reinforced concrete members (Standards Australia
2009). These methods are Equivalent Stress Block Method, Strut and Tie Method, and Linear
Elastic Finite Element Method. The designer will choose from these methods depending on the
type of the structures that needs to be designed. For example, because of the complexity of
non-flexural members (such as deep beams), the most common approach is the Strut and Tie
Method, while for the flexural members (such as shallow beams), the Equivalent Stress Block
Method is used (Warner 2007). Application of the linear elastic finite element method to
routine design is currently limited.
2.2.1 Equivalent Stress Block Method
According to AS3600, during the conventional design of flexural beams, as a means of
simplification, an Equivalent Rectangular Compression Stress Block may be used as a
replacement of the actual shape of the concrete compressive stress block (Warner 2007). This
is an approximation to the actual stress distribution in a reinforced concrete member that
takes advantage of the fact that the stress-strain curve of low to medium strength concrete
has a wide plateau region where the maximum stress is maintained reasonably constant with
increasing strain.
11
Figure 1: Conditions at Mu in a singly reinforced concrete section (Warner 2007)
Figure 1 shows the conditions at a cracked section in a singly reinforced simple reinforced
concrete beam under the ultimate moment . The section is rectangular with thickness
and height , and is the depth of the steel below the top fibre.
Figure 2: Equivalent Rectangular Stress Block (Warner 2007)
In Australia and many other countries, the Equivalent Stress Block Method, as shown in Figure
2, is used to design the flexural members by simplifying the “true” stress block using the
following two conditions (Warner 2007):
1: The total volume of the “Equivalent” rectangular stress block and the “true” stress block
should be equal, so that the resultant force is the same in each case.
2: The location of the centroid for both the “Equivalent” and “true” stress blocks should be at
the same height in the cross-section, ensuring that the lever arm of the resultant force is the
same in each case.
Forces
T
dc
z C
dn
εcu
εst
Strains
D
As
t
d
b
Cross-section Stresses
fsy
fcp
εcu
dn
Strain “True” stress
block
fcp
γ dn/2
Centroid
γ dn
Equivalent
rectangular
stress block
α2fc’
γ dn/2
Neutral axis
12
The parameters used to describe the stress block in this method, according to AS3600, are:
′ (Eq. 2-1)
′ (Eq. 2-2)
Also, for the value of extreme fibre concrete strain at which the ultimate moment will
occur, the Australia Standard adopts the limiting strain:
(Eq. 2-3)
In the case of the rectangular stress block, the compressive force in the concrete is:
′ (Eq. 2-4)
In the absence of pre-stress or compressive reinforcement, the compressive force within the
concrete should equal the tensile force in the steel, leading to the equation:
(Eq. 2-5)
Here is the yield stress of the steel and is the area of the tensile reinforcing steel.
The lever arm between the resultant force of the equivalent stress block and the resultant
force of the stress in the reinforcing steel is:
(Eq. 2-6)
So the moment capacity is then calculated as:
( ) ( ) (Eq. 2-7)
Here, is the neutral axis parameter at the ultimate moment, which is defined as:
(Eq. 2-8)
From the point of view of this thesis, replacing the actual distribution of compressive stress in
the concrete with the equivalent stress block, in which the stress is limited to , can be
13
seen as an application of the lower bound theorem of plasticity. A distribution of stress over
the cross section has been found which is in equilibrium with the moment and which does
not exceed the nominal plastic stress anywhere. Hence is a lower bound on the actual
strength of the cross section.
2.2.2 Strut and Tie Method
Compared to flexural members, non-flexural members are more complicated to design and
analyse. Currently, the Strut and Tie method (STM) is widely used for the design of such
members. The STM is also an application of the lower bound theory of plasticity. In STM, the
complex flow of internal force in the region under consideration is idealized as a truss carrying
the imposed loading through the region to its supports. This truss is called a strut-and-tie
model and is a statically admissible stress field leading to a lower bound solution. Like a real
truss, the strut-and-tie model consists of struts and ties which are interconnected at nodes.
The struts represent regions of concrete which are assumed to carry compressive stress, while
the ties represent the reinforcement carrying tensile stress (Warner 2007). Once the truss
model is chosen, stress resultants within the struts and ties are calculated by using the
principles of statics. The stresses in the compressive struts are checked to ensure they do not
exceed a value representative of the concrete strength, while sufficient reinforcing steel is
provided in the ties to carry the required tensile force at yield. Concrete outside the
compressive struts and node regions is assumed not to carry any load. Generally speaking, the
strut and tie methodology allows designers to choose a rational load transfer path through the
structure, and then to design that load path to be strong enough to carry the strength limit
state loads. As the stress field of the STM is in internal equilibrium and equilibrates the applied
loads, it provides a rational approach to representing a complex structural member with an
appropriate simplified truss model. This approach is now widely used and has proved to be
very useful in the design and analysis of such disturbed regions of structural members.
14
In detail, to start the design process using the STM, the boundaries of the structural element
should first be defined, and the boundary forces (the ultimate design forces) on the element
should be determined from the imposed loads. Based on an understanding of how the applied
forces will be carried through the element, the designer must then sketch an appropriate
arrangement of struts and ties forming a truss, and then solve for the truss member forces.
The following step is to select the reinforcing steel to provide the necessary tie capacity and to
ensure that this reinforcement is properly anchored at the nodes. Thirdly, the dimensions of
the struts and nodes are evaluated so as to ensure that the capacity of all struts and nodes is
sufficient to carry the strut member forces. The final step is to provide distributed
reinforcement to ensure ductile behavior of the region (Foster 1998; Warner 2007; Wight et al.
2003; Zhang & Tan 2007a).
Opposite is a typical example of the application of the STM. It is a simply supported deep beam
with a force applied on it at its central span. The geometry and a plot of the Von Mises stresses
calculated by linear stress analysis are shown in Figure 3 and Figure 4 respectively. From Figure
4 it is clear that, compared to the flexural members, the force is directly transferred from the
load point to the supports, rather than being transmitted laterally as shear force, which
invalidates the conventional design method for flexural beams. However, this can be
overcome by using the STM to model this particular load path. Figure 5 shows a typical strut
and tie model which can be used for this case, imitating the load transfer path reasonably
correctly.
As mentioned above, the STM is an application of the lower bound theorem of plasticity. The
assumed stress field within the concrete element is quite different from the actual stress field.
Concrete outside of the strut and node regions is assumed to be unstressed. However, the
assumed stress field is in internal equilibrium, and does not exceed the equivalent plastic
stress anywhere, and so the load applied in the analysis (which was the required strength limit
load) becomes a lower bound to the collapse load of the element. However, in order for the
15
lower bound theorem to apply, the material must have sufficient ductility to redistribute the
stresses to the assumed stress field. As concrete has limited ductility, distributed web
reinforcement may be necessary to ensure that the design load is achieved.
Figure 3: Geometry of Deep Beam
Figure 4: Von Mises Stress Plot for Deep Beam (Linear Elastic Analysis)
Figure 5: Strut-and-Tie Model for Deep Beam
16
The concept of STM is derived from the truss analogy introduced by Ritter (Ritter 1899) and
Mörsch (Mörsch 1902) in Switzerland and Germany. The truss analogy was then extended by
Rausch (Rausch 1929), who implemented it for beams subjected to torsion. He assumed that
the loads are carried by a truss consisting of longitudinal tension and compression chords
representing the tensile steel reinforcement and the concrete compression zones respectively,
while the stirrups provide the vertical members joining the longitudinal chords. He also
pointed out that beams with insufficient web reinforcement will suffer shear-cracking, and
asserted that the crack inclination was hard to calculate (Rausch 1929). However, Kupfer and
Hilsdorf (Kupfer & Hilsdorf 1969) solved this problem and proposed an equation to calculate
the crack inclination by minimizing the strain energy of the whole truss model. Similarly,
Baumann (Baumann 1972) developed an equation for the calculation of crack direction within
the reinforced concrete structures subjected to in-plane stresses. Later, Schlaich et al. (Schlaich
& Jennewein 1987) applied the truss analogy to all types of reinforced and prestressed
concrete structures using strut and tie systems.
Ashour et al. (Ashour 1997) tested several reinforced concrete continuous deep beams with
various span-to-depth ratios, amounts and type of web reinforcement, and amounts of main
longitudinal reinforcement, and found that the vertical web reinforcement had more influence
on the ultimate load capacity than the horizontal web reinforcement, contrary to code
predictions. To explain this, Foster (Foster 1998) developed a rational formulation to obtain
the minimum web reinforcement. This formulation is based on the assumption that the beams
will not fail suddenly due to the diagonal concrete cracking, and then the required minimum
reinforcement is that needed to carry the bursting forces at the time of cracking.
Tan (Tan 2001) developed a strut and tie model taking the effect of pre-stressing into
consideration. In the same paper he also proposed a model which takes into account the
combined tensile strength contributions from longitudinal and web reinforcement, prestressed
17
strands and concrete, and uses a linear interactive failure criterion modified from the Mohr-
Coulomb theory. This model can be used for both pre- and post- tensioned deep beams.
Based on the same failure criterion, Tong (Tan et al. 2003) presented a simple and direct STM
model which takes into account the contribution of different web reinforcement
configurations (vertical, horizontal, or inclined) and prestressing tendons. In this model,
because of the adopted interactive stress-based failure criterion being applied, there is no
need to use other stress limits to calculate the ultimate strength of the beam.
Zhang and Tan (Zhang & Tan 2007b) discussed the size effect for deep beams, which is typified
by a reduction in measured shear strength due to an increase in the height of deep beams.
They pointed out that the size effect can be eased by properly configuring the dimensions of
the loading and support plates within the strut and tie model.
Using a different approach Tan’s strut and tie model for prestressed deep beams (Tan 2001),
Wang et al. (Wang & Meng 2008) developed a modified strut-and-tie model for prestressed
concrete deep beams which successfully predicts the strength of the pre-stressed concrete
deep beam. In this model prestressing is represented by equivalent external loads, and the
Kupfer-Gerstle biaxial tension compression criterion is adopted to take concrete softening into
consideration.
Some further work has also been done regarding the design and analysis of high strength
concrete deep beams. Tan (Tan et al. 1997) performed some tests on high strength concrete
deep beams subjected to combined top and bottom loading. In addition, Park and Hoque
(Hoque 2006; Park 2005) applied the STM to the analysis of fibre reinforced polymer (FRP)
strengthened deep reinforced concrete members.
However, compared to simple deep beams, the analysis and design of deep beams with web
openings has not attracted as much attention. Mansur and Tan (Mansur et al. 2001; Tan et al.
2003) proposed strut-and-tie models for the analysis of reinforced concrete beams containing
18
geometric discontinuities as a result of a circular opening. Hu et al. (Hu & Tan 2007) also
investigated the behaviour and shear strength of large reinforced concrete deep beams with
web openings. For high-strength concrete deep beams, Yang et al. (Yang et al. 2006) estimated
the influence of web openings experimentally and analytically.
The basic requirement for the STM is to choose a rational strut and tie model that, in most
cases, is similar to the true load transfer path within the structures. However, this can be
difficult to do when facing very complex structures, particularly those with penetrations (Tjhin
& Kuchma 2007). Even when choosing the strut and tie model arrangement with the aid of
linear finite element analysis to give the stress field information, the designers may still need
to try several times before identifying a suitable model. The literature indicates that web
openings often obstruct the most direct strut and tie models, and suggests many complicated
truss models to try to overcome the problem (Guan 2005; Mansur et al. 2001; Tan et al. 2003).
This makes the design procedure for deep beams with openings very tedious and laborious.
As stated previously, using the STM may lead to inefficient and conservative designs, as it only
considers the contribution of concrete within the struts to the whole strength capacity of the
structures, while neglecting the concrete outside the struts. This can lead to more concrete
being used than is actually necessary, and unnecessary carbon emissions in the production of
that concrete.
Recently, several researchers have developed an approach to choose more rational strut and
tie models through topology optimization, removing the inefficient material gradually from the
component being designed. Liang and Pillai (Liang et al. 2000; Nagarajan & Pillai 2008)
presented a topology optimization method that can automatically generate ‘optimal’ strut-
and-tie models for reinforced concrete structures. This is achieved by progressively removing
elements that have the least contribution to the stiffness from the discretized concrete
member. Guan (Guan 2005) demonstrated and evaluated a procedure for the design of strut-
and-tie model through topology optimization of continuum structures. Bruggi (Bruggi 2009)
19
implemented a minimum compliance optimization to generate the strut and tie models in both
2D and 3D situations. However, while such approaches may lead to more effective strut tie
models, the material that is discarded in the analysis process is still present in the actual
structure, while the contribution of that material to the true strength of the structure has also
been discarded in the design process. Therefore these approaches do not overcome the
fundamental inefficiency of the STM.
2.2.3 Linear Elastic Finite Element Method
While linear elastic frame analysis is extensively used in structural engineering design, the use
of linear elastic stress analysis in reinforced concrete design has so far attracted little attention
in the literature. In contrast, there is much research focussed on non-linear finite element
analysis of reinforced concrete structures (Hoque 2006; Roy & Thiagarajan 2007; Dabbagh &
Foster 2006; Au & Bai 2007). However, non-linear finite element analysis is mainly concerned
with analysing the performance of a particular structure under a particular loading regime,
rather than being a central tool in the design process (Kotsovos & Pavlovic 1995). There are
two reasons for this. One is that the detailed information of structural sections must be known
before non-linear analysis can be performed, whereas, in reality, the size and properties of the
structure and reinforcement are not known at the beginning of the design process. Another
reason is because results from non-linear analysis are highly dependent on both the load
combination and load time history. Consequently the non-linear analysis method is usually
used to verify the performance of the design under certain extreme loadings, following design
of the structure using other techniques, such as the strut-tie method.
In contrast, linear elastic frame analysis, most often performed with the use of computer, can
be conducted at an early stage of a design using quite coarse assumptions about the member
sizes. From the results of structural analysis, the worst stress resultants can be obtained and
then, with the aid of moment redistribution, an efficient design can be achieved.
20
The 2008 revisions of AS3600 explicitly permit the use of linear elastic stress analysis (and by
implication the linear elastic finite element method) to design reinforced concrete structures
or members.
The Linear Elastic Finite Element Method has been available since the 1960’s. It has been used
in the design of massive concrete structures such as dams and nuclear power plants, but has
not been used widely in general structural concrete design. Part of the reason for this
unpopularity in concrete design is that the results of finite element analysis are approximate
and are highly depended on the mesh size. For example, relatively coarse meshes may result in
inaccurate stress results, where the stress fields do not satisfy equilibrium locally. If the stress
field used in the design does not satisfy equilibrium locally, the lower bound theorem of
plasticity does not hold, and the corresponding external load may not be the lower bound of
the collapse load of a structure. A good example of this is the collapse of the Sleipner A
platform in 1991 (Jakobsen 1994; Deeks 2008). During the design process, ineffective
modelling of the tri-cell legs of the platform using linear elastic finite element analysis led to
the shear stress in the concrete between the cells being significantly underestimated and
inadequate reinforcement being specified. This led to the failure and total loss of the structure,
with the economic loss exceeding $700m.
When compared with the conventional STM, there are many advantages to be gained by using
the linear finite element method (Foster 2003). It allows the stresses within a structure of
arbitrary geometry to be calculated, without the necessity of assuming plane sections remain
plane. With modern computer software it is very easy to apply and the finite element model is
easy to set up. In addition, linear analysis can accommodate multiple load cases quickly. A
design based on the linear stress field will place the greatest quantity of reinforcement at the
high-tensile stress areas, which can efficiently control crack widths. Also, since the stress field
is computed directly by the computer, less work is required from the designer.
21
Compared to the STM method, the linear finite element method considers the contribution of
concrete outside the conventional compressive struts and nodes, and so a more efficient
design can be achieved, not only in terms of less material usage and thus cost saving, but also
less time-consuming for the designers.
The problems of the mesh dependency of finite element results and the non-equilibrium of
stress field locally when meshes are coarse can effectively be overcome by the application of
effective adaptive procedures. Through adaptive procedures, an initial coarse mesh specified
by the designer at the beginning of the process can be automatically refined to obtain a stress
field of specified accuracy.
Unfortunately, when a sufficiently fine mesh or an adaptive procedure is used, especially when
dealing with members with certain geometric discontinuities or boundary conditions (such as
deep beam with square or rectangular web openings), local stress concentrations are often
identified (Augarde & Deeks 2008; Zhu, Hinton & Zienkiewicz 1991; Zienkiewicz & Zhu 1992).
Adaptive refinement in the areas of stress concentration or singularity will result in the
calculation of progressively higher stresses.
The previous version of AS3600 allowed the design of non-flexural members using linear
elastic analysis, and using “accepted principles of mechanics”. There is no clear guidance as to
what the design process should be and no literature referred to. However, the code does
specify that the tensile stresses should be all carried by the reinforcement or tendons, but
there is no guidance as to what distance this stress can be integrated over, or what the
maximum bar spacing should be. The compressive stress is limited to where
( )
and can be averaged over a distance of 100mm to reduce the peak stresses.
Again, there is no literature quoted to support the use of such a stress averaging process.
In contrast, the 2008 version of AS3600 allows the peak tensile stresses can be averaged over
an area “appropriate to the size of the element” with no guidance about the way to choose
22
this “appropriate area”. As for the compressive stress, it is now limited to , where is
an efficiency factor which depends on the tensile stress and the concrete confinement. In
addition, the averaging process for the compressive stress has been removed. This makes it
impossible to use linear elastic finite element analysis when dealing the members with stress
singularities, as these stress singularities violate the maximum stress specified by the code.
2.3 Stress Singularities
Obviously, in order to take advantage of linear elastic finite element analysis for the design of
concrete members, a way must be found to deal with the stress concentrations and
singularities that violate the maximum stress criteria. Considering forces exerted on the
boundary of the member, the bearing stress can be approximated by “Stress = Force/Area”.
Based on this formula, the stress will be very large if the area is very small. The area of a point
(2D) or a line (3D) is theoretically zero, and this is why applying constraints and forces to points
or lines in a finite element model will result in regions where the stresses do not converge, but
keep getting higher as the mesh is refined. These points are referred to as stress singularities.
For problems of elasticity, stress singularities are also associated with particular geometric
discontinuities or boundary conditions, such as sharp corners (Barber 2002).
According to Williams (Williams 1952), the stresses near a re-entrant corner can be described
in terms of polar coordinates(r-θ), with the origin located at the corner. Williams deduced an
expression for the stress field given particular boundary conditions. According to this
expression, the stresses in the vicinity of the corner are proportional to ( ) with
the power of the singularity determined by . For homogeneous material ⁄ , this results
in the stresses being singular at the corner where r = 0.
In this thesis, the most complex example used is a deep beam with rectangular openings. The
geometric discontinuities and boundary conditions for this problem which lead to stress
concentrations or singularities are: the re-entrant corners and the force and restraint
23
boundary conditions. These geometric features and boundary conditions for the structure
mean that it is not possible to find an analytical solution for the stress field. However, by using
the finite element method with a very fine mesh, the stress field near the singularities can be
approximated. Figure 6 shows the geometry of a deep beam with openings and Figure 7 shows
the finite element model used and the resulting Von Mises stresses due to a central applied
force. Concentration of the stress at the support point and at two of the re-entrant corners is
obvious. If the mesh is further refined by reducing the size of the elements, the stress
singularities become more obvious, and tend to infinity as the element size tends to zero.
.
Figure 6: Geometry of the Deep Beam with Rectangular Openings
24
Figure 7: Finite Element Model (Left) and Von Mises Stress (Right)
However, in an actual structure, infinite stresses at such points are not realistic, as any
engineering materials, including concrete, would crack/crush/yield under the high stresses,
effectively removing these singularities. Furthermore, when constructed in practice, the
geometry of the structures is not truly ’sharp’. The corners will have a finite radius, but it’s
hard to include this in a finite element model as the radius will be much smaller than the
dimensions of the structure, and much smaller than the elements used in the example shown
in Figure 7.
Obviously, the presence of stress singularities is a significant barrier to the use of conventional
analysis methods, including adaptive finite element methods, as they involve a lot of effort to
refine the mesh locally in the areas near the stress singularities.
One approach to solving the problem of stress singularities is to adopt the scaled boundary
finite element method (SBFEM), which has been applied to linear elastic fracture of
unreinforced concrete elements (Yang & Deeks 2007). This method has proved to be an
efficient way to analyse stress singularities and discontinuities extremely accurately (Chidgzey
& Deeks 2005; Deeks & Wolf 2002a; Deeks & Wolf 2002b; Deeks & Wolf 2002c). However,
from the point of view of structural concrete design using linear stress analysis, obtaining the
25
analytical form of the stress singularities does not permit the stress field to be used for design
purposes, as the stresses still exceed those permissible.
The work reported in this thesis investigates a different approach to the problem. Utilising the
lower boundary theory of plasticity in the same way as moment redistribution does for
continuous concrete beams, the stresses in the regions of stress concentration are
redistributed, while at the same time maintaining equilibrium internally and externally. This
stress redistribution method is achieved in linear elastic analysis by reduction of the elastic
modulus of material in the area of high stress. This method can combine the merits of both the
ready automation of the linear elastic finite element analysis and the versatility of the STM.
2.4 The Finite Element Method
Repeated references have been made above to the Linear Finite Element Method. This
method will be used extensively in the work reported here, and so, for completeness, a brief
description of the method is given here. There are many good books describing the method in
great detail (Logan 2002; Huebner et al. 1995; Rao 1989), to which the reader is referred for
more information.
Within the area of mathematical physics and engineering, the finite element method is widely
used and is an efficient and versatile numerical method. For the problems analysed here,
linear stress analysis using the finite element method is implemented through the
displacement/stiffness method, in which the displacement within each element are
approximated locally between nodes using displacement shape functions. The nodal
displacements are then related to the stresses by using the strain/stress and
strain/displacement relationships (Logan 2002). Variational principles or the principle of virtual
work are then used to relate the nodal displacements to the applied forces. All the equations
resulting from this process can be easily written in matrix and vector form, and can be easily
evaluated in a computer programme or by mathematical software such as Matlab.
26
In this study the stress analysis is conducted using four-node bilinear quadrilateral plane stress
elements. Further information about this element and the basic formulation process will be
explained in this section.
Step 1 Discretizing Model and Selecting the Element Types
This step involves the discretizing of the problem model into finite elements interconnected by
nodes; there are lots of elements types available to choose from depending on the problem
and the performance the analyst wants to achieve. Normally, the smaller the size and the
higher order of elements used, the more accurate the results will be. However, at the same
time, the higher the cost of the computation and the effort required will be, and so
engineering judgment must be exercised during this process. In this study, as the concrete
structure members being considered are all of rectangular shape, the simplest quadrilateral
element is chosen, isoparametric four-node bilinear quadrilateral plane stress elements. Figure
8 illustrates this kind of element in terms of a local coordinate system s-t.
Figure 8: Isoparametric Bilinear Quadrilateral Element in Local Coordinates
27
Step 2 Selecting the Shape Function
For the isoparametric element, the same shape functions used to interpolate the
displacements between the nodes are also used to map the element coordinates into the
global coordinates of the structure x-y:
{ } [
]
{
}
(Eq. 2-9)
Where is the shape function for node :
( )( ) (Eq. 2-10)
( )( ) (Eq. 2-11)
( )( ) (Eq. 2-12)
( )( ) (Eq. 2-13)
Step 3 Defining the Strain/Displacement and Stress/Strain Relationship
In order to derive the relationship between the nodal displacements and nodal forces for each
element, which can be used to evaluate the element stiffness matrices, it is first necessary to
relate the stresses and strains within the element to the displacements of its nodes. For the
bilinear quadrilateral elements used here, there are four nodes for each element, and each
node has two degree of freedom, so the displacement field is determined by eight nodal
displacements. As was shown previously for the isoparametric mapping, the displacement field
within the element is described by the displacement functions u(x), u(y) as shown:
28
{ } [
]
{
}
[ ]{ } (Eq. 2-14)
The strains within the element can then be related to the nodal displacements as:
{
} [ ]{ } (Eq. 2-15)
The derivatives of the shape functions are contained in the matrix [B], which can be expressed
as:
[ ]
[
]
[ ] (Eq. 2-16)
Finally, the stress/displacement relationship is expressed as:
[ ] [ ][ ] [ ][ ]{ } (Eq. 2-17)
With the constitutive matrix [D] (plane stress here):
[ ]
( )[
] (Eq. 2-18)
Step 4 Forming the Element Stiffness Matrix
Applying variational principles or the principle of virtual work, the element stiffness matrix
relating the nodal forces to the nodal displacements can now be obtained as:
[ ] ∫ ∫ [ ] [ ]
[ ] | | (Eq. 2-19)
29
With | | is the determinant of the Jacobian transformation matrix, which is used to transform
derivatives from the local coordinates s-t into the global coordinates x-y. The thickness of the
element is denoted as h.
[ ] {
} [
]
[ ]
(Eq. 2-20)
Step 5 Assembling the Global Stiffness Matrix and Applying Boundary Conditions
The global stiffness matrix can be easily assembled once the individual element stiffness
matrices are formed. This assembly process can be expressed by:
[ ] ∑ [ ] (Eq. 2-21)
The final global equation written in matrix form is:
{ } [ ]{ } (Eq. 2-22)
Where [K] is the global stiffness matrix, [F] is the vector of global nodal forces; {d} is the
displacement vector of known and unknown structure nodal degrees of freedom.
Step 6 Solving the Global Equation
A number of numerical methods can be use at this step to solve the Global Equation, including
the Gaussian elimination method, the Conjugate Gradient method, the Preconditioned
Conjugate Gradient method, etc.
Step 7 Finding out the Stresses and Strains
Once the global equation has been solved, the displacements at all the nodes are known. The
stresses and strains within the element can be then be found by using the Strain/Displacement
and Stress/Strain relationships introduced in (Eq. 2-15) and (Eq. 2-17) respectively.
30
As the displacement field is only continuous between the elements, the element stresses
and strains are normally discontinuous between elements, so some types of stress recovery
algorithm are often used to smooth the stresses. For the bilinear quadrilateral plane stress
element used here, and in this study, the super-convergent patch recovery (SPR) method is
used (Zienkiewicz & Zhu 1992).
Step 8 Interpreting the Results
The final step is to interpret and analyze the results for use in the design/analysis process.
Computer graphics play an important role in this part of the procedure, as typically finite
element models contain thousands (or even millions) of nodes and elements. Displacements
can be visualised through projections of deformed meshes, while stresses are commonly
displayed using contour plots.
2.5 Summary
In this chapter, some basic theory about reinforced concrete structure design and the finite
element method is introduced and the relevant literature reviewed. In particular, the
conventional equivalent stress block approach, STM and the linear elastic finite element
method are discussed. The equivalent stress block approach is efficient for the design of
flexural members such as shallow beams as it imitates the nonlinear behaviour of the concrete
when over loaded. This can be interpreted as a lower bound approach, as it uses an assumed
stress field which is in internal and external equilibrium with the applied loads and does not
exceed the maximum compressive strength of the concrete anywhere. For the design of non-
flexural members such as deep beams, the STM is widely used. This is also a lower bound
approach, as the designer assumes a distribution of struts and ties which are in internal and
external equilibrium with the applied loads, and within which the maximum strength of the
concrete or steel is not exceeded anywhere. However, the truss model neglects the concrete
31
outside the assumed struts, and thus the STM potentially leads to concrete waste and
unnecessary carbon emission.
To attempt to avoid this material waste, the linear elastic finite element method can be
applied, allowing the contribution of all the concrete within the structure to its strength to be
taken into account. Less work is required by the designer as it can be performed by computer,
and there is less designer-dependency of the final design.
However, if the geometry or the boundary conditions of a member result in stress singularities
or concentrations, if a sufficiently fine mesh is used to ensure internal equilibrium is satisfied,
the peak compressive stresses will always exceed the maximum permissible stress in the
concrete, and the conventional linear elastic finite element approach cannot be used.
In the next chapter, different conventional approaches are used to design various reinforced
concrete members and compared with LEFEA based designs. In the following chapter the
modified linear elastic finite element method suggested in this chapter is fully developed. This
stress redistribution approach will be shown to be effective in handling problems caused by
stress singularities and concentrations.
32
3 Comparison of Conventional Design Approaches with
LEFEA-based Design
This chapter demonstrates the application of conventional design approaches (stress block and
strut-tie) and the LEFEA-based design approach to three types of simple structures. These
structures are shallow (flexural) beams, deep beams, and deep beams with web openings. The
efficiency of the resulting designs is compared. For the flexural reinforced concrete beam, both
the equivalent compressive stress block approach and LEFEA approach are presented, while
for the non-flexural beams without openings, both the STM and the LEFEA approach are
presented. Finally, for the non-flexural beams with openings, even though the stress
singularities involved in such beams invalidate the LEFEA approach for determining the
concrete thickness, the steel area indicated by the LEFEA approach is compared with steel area
required in the STM.
3.1 Design of a Flexural Reinforced Concrete Beam
Figure 9 shows a simply supported shallow beam with span 3700mm and height 450mm. A
concentrated load P is applied in the middle of the beam comprised of concrete with a
characteristic strength of . The yield stress of reinforcement to be used in this
design is .
Figure 9: Geometry of Shallow Beam
33
3.1.1 Application of Conventional Design Approach (Equivalent Stress
Block Approach)
For the design of shallow beams, the equivalent rectangular compression stress block method,
which was introduced in section 2.2.1, is widely used. In accordance with AS3600, for concrete
with , the parameters to describe the stress block are:
(Eq. 3-1)
(Eq. 3-2)
and the compressive force in the concrete is:
(Eq. 3-3)
while the tensile force in the reinforcement is:
(Eq. 3-4)
Within the structure, the compressive force in the concrete is equal to the tensile force in the
reinforcement, so:
(Eq. 3-5)
(Eq. 3-6)
In addition, from the design load, the moment that needs to be carried by the structure can be
calculated as:
(Eq. 3-7)
The moment capacity of the structure can be expressed as:
(
)
(
) (Eq. 3-8)
34
Where is the steel depth below the top fibre of the beam and here is chosen as:
(Eq. 3-9)
Equating and , the position of the neutral axis can be found as:
(Eq. 3-10)
Then the ductility ratio is:
(Eq. 3-11)
Because the ductility ratio is smaller than 0.36, which is the value required by the balanced
failure and neutral axis depth limits, the design is ok and no compressive steel is needed.
Finally the tensile force can be obtained as:
(Eq. 3-12)
followed by the area of tensile reinforcement as:
(Eq. 3-13)
3.1.2 Application of Conventional LEFEA Approach
For comparison, the conventional linear elastic finite element analysis (LEFEA) approach is
applied here to design the shallow beam whose geometry is shown in Figure 9. However, for
this simple beam a linear stress analysis can also be conducted by elastic beam theory. This will
be done first, so that the principles can be demonstrated, and then the LEFEA approach will be
used.
An initial plane thickness of 300mm is assumed. According to the AS3600, the maximum
principal compressive stress within the beam can be calculated as:
′
(Eq. 3-14)
35
By assuming the same concrete thickness as before which is 300mm, it implies a moment at
mid-span of:
(Eq. 3-15)
In the mid-span, the maximum principal tensile stress should be equal to the maximum
principal compressive stress , and the tensile force at mid-span can now be found via:
(
)
(
)
(
) (Eq. 3-16)
Finally, according to the AS3600, the tensile force should be all carried by the reinforcement
with the stresses do not exceeding . Therefore, the area of reinforcement for this design
is:
(Eq. 3-17)
The position of reinforcement should coincide with the centroid of the tensile stress block.
Approximately this can be taken as a distance from the centre of the steel to the top fibre of
the beam:
(Eq. 3-18)
However, the actual moment resulted from the applied force is:
(Eq. 3-19)
This is larger than the maximum moment calculated by elastic beam theory, and will result in
violation of the stress limit. To solve this problem, the thickness of the beam can be increased
to ensure it can carry the design moment. The new thickness of beam can be found through:
′
(Eq. 3-20)
36
Thus:
(Eq. 3-21)
So with the new thickness of beam is 406mm, the actual tensile force at mid-span is now:
( ′
)
( ′
)
(
) (Eq. 3-22)
The area of reinforcement for this design is then specified as:
(Eq. 3-23)
The same linear stress analysis will now be performed by the finite element method using 2D
planar elements, and the same design procedure followed. Again the initial beam thickness is
taken as 300mm. The value of elastic modulus is not important, as it will not affect the stresses.
The analysis here specifies the values of elastic modulus and Poisson’s ratio as 24500MPa and
0.2 respectively. Due to the symmetric of the beam, only half of the beam will be analysed.
Figure 10 shows the boundary conditions for this beam, where the supports are modelled as
relatively soft springs to avoid the occurrence of stress singularities within these regions.
Figure 10: Model for the Shallow Beam
37
The principal compressive stress using the conventional LEFEA approach is shown in Figure 11,
where the value of the maximum principal compressive stress is 17.28MPa. However, the
maximum principal compressive stress allowed by AS3600 within the beam is 13.5MPa, as
shown in (Eq. 3-14). This means that the initial value of beam thickness is too small and more
concrete is required.
Figure 11: Plot of Principal Compressive Stress
Because of the linearity of this method, the actual thickness of concrete required for the beam
can be obtained by dividing the resulted maximum compressive stress value using the
permitted largest principal compressive stress limit as:
(Eq. 3-24)
Integrating the tensile stresses along the mid-span can provide the design tensile force and
thus the required reinforcement area. To obtain the maximum resultant tensile force along the
mid-span cross-section, the trapezoidal rule is used to perform the integration in (Eq. 3-25), i.e.
by summing up all the trapezoidal areas under the curve shown in Figure 13 in the way
illustrated in Figure 12 and (Eq. 3-26).
0 200 400 600 800 1000 1200 1400 1600 1800 20000
50
100
150
200
250
300
350
400
450
-16
-14
-12
-10
-8
-6
-4
-2
0
38
Figure 12: Trapezoidal Rule for the Integration
∫
( ) ( )
( ) ( )
(Eq. 3-25)
∫ ( ) (Eq. 3-26)
Figure 13: Plot of Tensile Stresses across Mid-span of Beam
39
As the yield stress in the steel is , the quantity of steel reinforcement required
at mid-span is approximately:
(Eq. 3-27)
The centroid of the steel area should coincide with the centroid of the tensile stress to
maintain equilibrium. The location of the steel in terms of distance from the bottom fibre of
the beam to the centroid of the steel is calculated using the same trapezoidal rule for
integration as:
∫ ( )
∫ ( ) (Eq. 3-28)
3.1.3 Cost Comparison and Remarks
Table 1 shows the cost comparisons between the conventional equivalent stress block method
and the linear stress analysis method.
Table 1: Cost Comparison for Shallow Beam Design
Approaches Area of Steel ( )
Thickness of Concrete ( )
Equivalent Stress Block Theory 1329.6 300
Elastic Beam Theory 1541.0 406
LEFEA 1417.9 384
From Table 1, it can be seen that, for the flexural members, both the elastic beam theory and
the conventional LEFEA approach result in a less efficient designs, requiring more steel and
concrete. Clearly the linear stress analysis method is not an efficient approach for designing
shallow beams. One reason is that the linear stress analysis method cannot capture the non-
linear behaviour of concrete when overloaded, while the conventional equivalent stress block
approach can model this non-linear behaviour quite well by assuming a rectangular stress
block. The other reason is that the centroid of the steel must be located at the centroid of the
tensile stress diagram to preserve internal equilibrium, whereas locating the steel as close to
40
the bottom of the beam as possible (limited by serviceability requirements) will maximise the
lever arm and minimise the amount of steel required.
However, for non-flexural members, the benefits of LEFEA are more obvious. This will be
demonstrated in the next section.
3.2 Application to Design of Non-flexural Reinforced Concrete
Beams without Rectangular Openings
Figure 14 gives the geometry of a simple deep beam with a characteristic strength of
for the concrete, and a 1000KN point load is applied at the central of it over a bearing
plate.
Figure 14: Geometry of Deep Beam
3.2.1 Application of Conventional Design Approach (STM)
For the design of deep beam shown in Figure 14, the STM shown in chapter 2.2.2 is widely
used. A possible and most intuitive truss model for this deep beam is shown in Figure 15.
41
Figure 15: Strut and Tie Model for Deep Beam
For the STM, from the truss geometry and the force equilibrium, the load within the strut can
be found as:
√
√
(Eq. 3-29)
According to AS3600 DR05252, the maximum allowable compressive capacity of the strut in
this example is obtained via:
(Eq. 3-30)
Where is the cross-section area of the strut and can be calculated as:
(Eq. 3-31)
Therefore,
(Eq. 3-32)
According to the truss geometry, the strut width d is 353.55mm. So the thickness of this deep
beam can be calculated as:
42
(Eq. 3-33)
Then, from the truss geometry and the force equilibrium, the tensile force is:
(Eq. 3-34)
Assuming the yield stress of the steel is , the quantity of the reinforcement can
be calculated as:
(Eq. 3-35)
3.2.2 Application of Conventional LEFEA Approach
For comparison, the conventional linear elastic finite element analysis (LEFEA) approach is also
performed to design the deep beam shown in Figure 14.
Firstly, the initial thickness of the beam is taken as 300mm and linear stress analysis is
conducted. Due to the symmetric of the beam, only half of the beam is analysed. Figure 16
shows the boundary conditions for this beam, where the supports are modelled as relatively
soft springs to avoid the occurrence of stress singularities within these regions. Square 2D
planar elements with size of 25mm are used here.
43
Figure 16: Model for the Deep Beam
Figure 17 shows the plot of principal compressive stress. The maximum principal compressive
stress obtained from the LEFEA is 8.837MPa. However, the maximum principal compressive
stress allowed by AS3600 within the beam is 13.5MPa, as shown in (Eq. 3-14). This means that
the initial value of beam thickness is too large, and less concrete thickness is required.
44
Figure 17: Plot of Principal Compressive Stress
Because of the linearity of this method, the actual thickness of concrete required for the beam
can be obtained by dividing the resulted maximum compressive stress value using the
permitted largest principal compressive stress limit as:
(Eq. 3-36)
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
2000Principal compressive stress
-8
-7
-6
-5
-4
-3
-2
-1
0
45
Figure 18: Plot of Principal Tensile Stress
The plots of principal tensile stress and the tensile stress along the mid-span are shown in
Figure 18 and Figure 19 respectively.
To obtain the maximum resultant tensile force along the mid-span cross-section, the
trapezoidal rule is used to perform the integration in (Eq. 3-30) in the way expressed in Figure
12 and (Eq. 3-25).
∫ ( ) (Eq. 3-37)
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
2000Principal tensile stress
-6
-4
-2
0
2
4
46
Figure 19: Plot of Tensile Stresses across Mid-span of Beam
Assuming the yield stress in the steel is , the quantity of steel reinforcement
required at mid-span is approximately:
(Eq. 3-38)
The location of the steel specified in terms of distance from the centroid of the steel to the
bottom fibre of the beam is found using the same trapezoidal rule for integration to calculate
the centroid of the horizontal tensile stresses in Figure 19 :
∫ ( )
∫ ( ) (Eq. 3-39)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
200
400
600
800
1000
1200
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Be
am
Fib
re (
mm
)
47
Figure 20: Von Mises Stress for Deep Beam using LEFEA
For interest, the Von Mises stress plot is shown in Figure 20, where the arch effect can be
observed and the load is transferred directly from load point to support. This shows that the
strut and tie model shown in Figure 15 for the deep beam is reasonable.
3.2.3 Cost Comparison and Remarks
Table 2 shows the cost comparisons between the conventional STM and the LEFEA method.
Table 2: Cost Comparison for Deep Beam Design
Approaches Area of Steel ( )
Thickness of Concrete ( )
Strut and Tie (STM) 1250 246
LEFEA 1397.4 196.38
The results presented in Table 2 demonstrate that for the non-flexural deep beams without
openings, designs based on the LEFEA approach can require less concrete usage thus less
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
2000Von Mises stress
1
2
3
4
5
6
7
48
carbon emission (20% less in this case). This is because the conventional STM only considers
the contribution of concrete within the struts to the strength of the member, while the LEFEA
approach considers the contribution of all the concrete, regardless of its position, which will
give a more accurate and reasonable stress field and a more efficient design.
Importantly, as the stress field resulting from the LEFEA is in local equilibrium and equates the
applied load, designs based on the LEFEA are safe according to the lower boundary theory of
plasticity. When using LEFEA, the stress path is determined by the computer instead of the
designers, and thus less work is required for LEFEA. The conventional STM is highly dependent
on the designers’ experience to choose a rational truss model. This may be laborious and time-
consuming for complex structures.
On the other hand, the beam designed using LEFEA requires more steel than that designed by
STM (12% in this case). This is because, as with the shallow beam, the centroid of the steel
must be located at the centroid of the tensile stress diagram to preserve internal equilibrium,
whereas locating the steel as close to the bottom of the beam as possible (limited by
serviceability requirements) will maximise the lever arm and minimise the amount of steel
required. In the STM the steel can be located closer to the bottom of the beam.
3.3 Design of Non-flexural Reinforced Concrete Beams with
Rectangular Openings
Figure 21 shows the geometry of a deep beam with two rectangular openings (with a
characteristic strength of
for the concrete), and a 1000KN point load applied at
the centre through a bearing plate.
49
Figure 21: Geometry of Deep Beam with Rectangular Openings
3.3.1 Application of Conventional Design Approach (STM)
For the design of non-flexural beams with openings, such as that shown in Figure 21, the STM
is widely used, and one possible truss model for this beam is shown in Figure 22.
Figure 22: Strut and Tie Model for Deep Beam with Rectangular Openings
Due to the symmetric of the model, only half of the truss model shown in Figure 23 will be
analysed.
50
Figure 23: Strut and Tie Model for Deep Beam with Rectangular Openings (Half Model)
Figure 24: Force Equilibrium for the Applied Load
51
Figure 25: STM model for Bottle Shaped Strut and Force Equilibrium (Warner 2007)
As for the STM, firstly, from the truss geometry and the force equilibrium shown in Figure 24,
the load comes to the bottle-shaped struts at the top is found via:
√ √ (Eq. 3-40)
Then according to AS3600 DR05252, the maximum allowable compressive capacity of the strut
in this example is specified via:
(Eq. 3-41)
According to the truss geometry, the strut width d is 353.55mm. So the thickness of this deep
beam can be calculated as:
(Eq. 3-42)
As shown in Figure 25, the tensile force carried by diagonal reinforcement within the struts is:
(Eq. 3-43)
Then from the truss geometry and the force equilibrium, the tensile force in the longitudinal
reinforcement is:
52
(Eq. 3-44)
Assuming the yield stress of the steel is , the quantity of the longitudinal
reinforcement is calculated as:
(Eq. 3-45)
The quantity of the diagonal reinforcement is:
(Eq. 3-46)
Multiplying the quantity of reinforcement with its length, the volume of steel required for this
truss arrangement is . The concrete thickness is 246mm. Other strut-tie
configurations may have led to slightly different results for concrete thickness and steel area.
However, for the purposes of comparing with the LEFEA approach, a single strut-tie design is
considered to provide sufficient comparison.
3.3.2 Application of Conventional LEFEA Approach
For comparison, a conventional linear elastic finite element analysis (LEFEA) approach is also
used to design the deep beam shown in Figure 21.
Firstly, the initial thickness of the beam is taken as 300mm and the linear stress analysis is
conducted. Due to the symmetric of the beam, only half of the beam is analysed. Figure 26
shows the boundary conditions for this beam, where the supports are modelled as relatively
soft springs to avoid the occurrence of stress singularities within these regions. Furthermore,
to better reflect the stress singularities in the inner corner of the openings, square 2D planar
elements with size of 10mm are used.
53
Figure 26: Model for the Deep Beam with Rectangular Openings
Figure 27 shows the plot of principal compressive stress. The maximum principal compressive
stress obtained from the LEFEA is 12.52MPa. However, the maximum principal compressive
stress allowed by AS3600 within the beam is 13.5MPa.
54
Figure 27: Plot of Principal Compressive Stress
Because of the linearity of this method, the actual thickness of concrete required for the beam
can be obtained by dividing the calculated maximum compressive stress value by the
permitted largest principal compressive stress limit, giving:
(Eq. 3-47)
500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
-12
-10
-8
-6
-4
-2
0
2
55
Figure 28: Plot of Principal Tensile Stress
The plots of principal tensile stress and the tensile stress along the mid-span are shown in
Figure 28 and Figure 29 respectively. The resultant tensile force along the mid-span cross-
section can be found by applying by integrating using the trapezoidal rule:
∫ ( ) (Eq. 3-48)
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
-6
-4
-2
0
2
4
6
8
56
Figure 29: Plot of Tensile Stresses across Mid-span of Beam
If the yield stress in the steel is , the quantity of steel reinforcement required
at mid-span is approximately:
(Eq. 3-49)
The location of the steel specified as the distance from the centroid of the steel to the bottom
fibre of the beam is calculated using the same trapezoidal rule for integration as:
∫ ( )
∫ ( ) (Eq. 3-50)
3.3.3 Cost Comparison and Remarks
Table 3 shows the cost comparison between the conventional STM and the LEFEA method for
the designs of the deep beam with openings.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
200
400
600
800
1000
1200
1400
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
otto
m B
ea
m F
ibre
(m
m)
57
Table 3: Cost Comparison for designs of Deep Beam with Openings
Approaches Volume of Steel
( ) Thickness of Concrete
( )
Strut and Tie (STM) 6132000 246
LEFEA 5336400 278.2
Table 3 demonstrates that the conventional LEFEA is less efficient in terms of concrete usage
than the conventional STM, and cannot be used effectively to design the non-flexural beams
with openings. The reason is that, for the conventional LEFEA approach, the concrete thickness
is controlled by the maximum principal compressive stress, while stress singularities involved
in such members mean that the smaller the elements used, the higher is the calculated stress.
So when dealing with the structure involving stress singularities, the benefits of concrete
saving for the conventional LEFEA are no longer obtainable. However, these results do show a
saving in total steel requirements.
3.4 Summary
This chapter has shown that for the design of flexural beams, the conventional LEFEA approach
is not efficient, as it cannot model the non-linear behaviour of the concrete and does not allow
the tensile steel to be located in the most effective position.
For the design of non-flexural deep beams without openings, design based on the LEFEA
approach is more efficient than conventional strut and tie designs in terms of the concrete
usage (although not steel usage, as again tensile steel cannot be located at the most effective
position).
For the design of non-flexural deep beams with openings, the stress singularities involved in
such beams invalidate the LEFEA approach for determining the concrete thickness, since the
finer the finite element mesh used in the analysis, the greater the calculated compressive
stresses, and hence the greater the required concrete thickness. However, the steel
58
requirements resulting from the LEFEA approach are less than the strut-tie approach, so if the
stress singularity issue can be overcome, LEFEA can potentially lead to a more efficient design.
In the next chapter, a modified LEFEA (MLEFEA) will be developed and discussed. The MLEFEA
approach can remove the stress singularities from the stress field, which means that beams
with square and rectangular web openings can be dimensioned so that the stress does not
exceed the maximum allowable principal compressive stress allowed by AS3600 at any point.
59
4 Modified Linear Elastic Finite Element Method
In this Chapter, an efficient way to perform the linear stress analysis involving stress
redistribution is developed. The basic process of stress redistribution is introduced. An L-
shaped plate is used as an example to demonstrate the efficiency of the proposed method.
4.1 Stress Redistribution
When using linear stress analysis to analyse a concrete beam with openings with re-entrant
corners, the occurrence of stress singularities means that the method provided in the current
Australian concrete design code cannot be applied, as the compressive stresses at the
singularities will always exceed the maximum allowable compressive stress. To avoid this
problem, the method proposed here introduces a stress redistribution process to redistribute
those stress singularities to limit the maximum stress to the allowable stress while preserving
internal and external equilibrium, enabling the design rules specified in the code to be applied
directly.
For the design of structures theoretically containing stress singularities, the actual properties
of the material should be taken into consideration. In practice stress singularities do not occur
because engineering materials, including concrete, locally yield or fail at some finite level of
stress, never reaching the infinite stress as predicted by the elastic stress field method (Barber
2002). In reality the concrete at a point of stress concentration would crack or crush, removing
the stress singularities altogether and the load would be shed to surrounding material,
increasing the surrounding stresses. Non-linear finite element analysis can model this
behaviour. However, as explained previously, the results of non-linear analysis are dependent
on the loading history and the details of the concrete constitutive model used. In linear
analysis, this process can be imitated by reduction of elastic modulus. In this work, a new
60
method to locally reduce the stress by the reduction of elastic modulus at the points of stress
singularities in the linear stress field will be presented.
Reducing the elastic modulus at particular points will effectively reduce the local stresses at
those points and redistribute them to the surrounding elements. The stresses in the
surrounding elements must change in order to preserve both internal and external equilibrium.
For stress singularity problems, this can be achieved by specifying an elastic modulus of zero at
the tip of the corners causing the stress singularities.
However, what should be borne in mind is that, since the elements being used are
quadrilateral elements, so step changes of the elastic modulus between elements will cause
more stress singularities where the boundary between areas of different elastic modulus
contains corners. To avoid this problem, a continuous change in the elastic modulus may be
used, modifying the finite element formulation to allow a variation in elastic modulus over an
element. For example, the variation of the elastic modulus function, in polar terms with the
radius measured from the point of singularity, could be defined by the plot in Figure 30.
Figure 30: Linear Reduction in Elastic Modulus
The grading interval is essential when the elastic modulus in the tip of the corner is specified as
zero. However, the size of the interval and consequent rate of elastic modulus softening is
somewhat arbitrary, and does not need to vary linearly. As long as the resulting stress field is
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Elas
tic
Mo
du
lus,
E
Distance, r
61
able to satisfy the yield criterion and equilibrium, it can be used as a basis on which to design,
according to the lower bound theory of plasticity.
For the lower bound theory of plasticity, a stress field that satisfies equilibrium and does not
exceed the material yield criteria at any point provides a lower-bound estimate of capacity of
an elastic-perfectly plastic structure. For this to be applicable reinforced concrete, complete
crushing of the concrete must not occur prior to yielding of the reinforcement.
Furthermore, the consideration of ductility is essential when designing using the lower bound
theory of plasticity. Ductility is the capacity of the material to deform in the inelastic range
without significant loss of its load-bearing capacity (Carmo, Ricardo & Lopes 2005). Sufficient
ductility must be present to allow the structure to redistribute stress to the load path assumed
by the designer. The most effective measure to increase ductility of concrete structures is the
provision of confining reinforcement.
4.1.1 Finite Element Implementation
As introduced earlier, it is well-known (Logan 2002; Timoshenko 1969) that in the finite
element method, to construct the element stiffness matrix, the following equations are usually
used:
[ ] ∫ ∫ [ ]
[ ][ ] | | (Eq. 4-1)
Here [D] is the constitutive matrix, which for plane stress is:
[ ]
( )[
] (Eq. 4-2)
In most applications the elastic modulus E is considered as constant, since the material is
homogeneous. However, the approach proposed here modifies the constitutive matrix by
using an elastic modulus E(s, t) which varies in terms of the local spatial coordinates(s-t) of the
element, while keeping Poisson’s ratio constant.
62
[ ] ( )
( )[
] (Eq. 4-3)
The approach adopted here uses the same iso-parametric shape functions iN to define the
variation of the elastic modulus function E(s, t) as are used to approximate the variation of
displacement between the nodes and to map the local coordinates to the global coordinates:
( ) ∑ (Eq. 4-4)
In this work four-node bilinear quadrilateral elements are used. The shape functions iN for
the 4-noded isoparametric bilinear quadrilateral element can be expressed (Logan 2002;
Timoshenko 1969) in terms of the local coordinates s-t, as follows:
( ) ∑ [ ] [
] (Eq. 4-5)
Here are the same functions presented in chapter 2.4, and the stress in each element can
then be determined in the conventional way, but with the constitutive matrix [D(s,t)] varying
spatially due to the spatially varying E(s, t):
[ ] [ ][ ][ ] (Eq. 4-6)
Standard finite element analysis can now be easily conducted using commercial finite element
software packages, many of which allow the designer to specify detailed models for the
behaviour of the element material. In packages such as ABAQUS and ANSYS, designers can
easily define the element material behaviour by using a custom definition, and this technique
is widely used in practice. However, as the proposed method here requires the element
material properties to vary spatially over the elements, Matlab is used to conduct the analysis.
63
4.1.2 Application to L-shaped Plate
Figure 31 gives the geometry and boundary conditions for an L-shaped plate which has a
uniformly distributed load applied at the bottom.
Figure 31: L-shape Plate
Firstly, a conventional analysis is performed using a constant elastic modulus, and the Von
Mises stress plots are shown as Figure 32.
64
Figure 32: Von Mises Stress of L-shaped Plate (Coarse Mesh)
From Figure 32, the stress singularity can be seen at the re-entrant corner of the plate. Using a
finer mesh, the stress singularity becomes more evident, as shown in Figure 33. The maximum
stress is more than doubled. If an even finer mesh was used, the stress at the re-entrant corner
would become higher still. No matter how far the mesh was refined, the stress would never
converge. In the figures which follow, the results generated from the finer of the two meshes
will be used, as they show the effect of the singularity better.
65
Figure 33: Von Mises Stress of L-shaped Plate (Finer Mesh)
In order to find out the location of the singularity, plots of Von Mises stresses along the
centrelines of plate in both the X Direction and Y Direction were generated, and are shown in
Figure 34 and Figure 35, respectively.
66
Figure 34: Von Mises Stress over X Direction
Figure 35: Von Mises Stress over Y Direction
From Figure 34 and Figure 35, the stress singularity can be seen to significantly disrupt the
stress field within a circular area of approximate radius 0.1. Therefore, to remove the stress
singularities, the stress redistribution process should be performed with the elastic modulus
graded linearly down to zero at the re-entrant corner from its full value at a circular arc with a
radius of 0.1. The change of elastic modulus is shown in Figure 36 and Figure 37.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15
20
25Von Mises Stress over X Direction
Distance Along Path
Str
ess
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1Von Mises Stress over Y Direction
Stress
Dis
tan
ce
Alo
ng
Pa
th
67
Figure 36: Procedure for adjusting Elastic Modulus
Figure 37: Relative Value for Elastic Modulus
68
After defining the spatially varying elastic modulus, this structure is analysed again using the
finite element implementation introduced above. The new Von Mises stress result with the
stress redistribution is shown in Figure 38, from which it can be seen that, in comparison to
Figure 33, the stress singularities are removed successfully.
Figure 38: Von Mises Stress of L-shaped Plate after Stress Redistribution
Figure 39: Principal Compressive Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)
69
Figure 40: Stress in X Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)
Figure 41: Stress in Y Direction for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)
Figure 39, Figure 40 and Figure 41, respectively, present the comparison of principal
compressive stress, stress in X and Y direction for the L-shaped plate by using LEFEA and
MLEFEA involving stress redistribution. Results also show that the stress singularities in the re-
entrant corner are successfully redistributed to the stiffer part of the model, where the overall
force equilibrium is preserved.
70
Figure 42: Principal Tensile Stress for L-shaped Plate---LEFEA (Left) vs. MLEFEA (Right)
In addition, the comparison of principal tensile stress for the L-shaped plate by using LEFEA
and MLEFEA involving stress redistribution is shown in Figure 42, from which it can be seen
that after the stress redistribution, the tensile stress increased in the some areas. Therefore,
unduly softening the member will result in a less optimal stress field, where more
reinforcement is required in order to maintain ductility.
4.2 Summary
The above application of stress redistribution to the L-shaped plate shows that the proposed
method is efficient in terms of removing the stress singularities. As the stress field generated
from this method is statically admissible and in an internal equilibrium, according to the load
bound theory of plasticity, it can be used as a design approach for complex structures, such as
deep beam with openings. The redistributed stress field is smooth and does not have any
significant peaks. For the L-shaped plate it more closely resembles the stress field that would
intuitively result from connecting a horizontal beam and a vertical beam-column to a square
block of concrete.
71
5 Adaptive Stress Redistribution Approach
While the stress redistribution process introduced in the previous chapter is shown to be
effective and reasonable, softening more elements than is absolutely necessary to remove the
singularity may result in a less optimal stress field, which in turn may require more
reinforcement to provide the necessary strength and ductility. At the same time, the approach
requires the designer to choose the area to be softened, which could be very laborious,
especially if the structure has a great number of discontinuities. This chapter proposes an
adaptive stress redistribution method to overcome these shortcomings. When the
compressive stress in the beam exceeds the specified compressive strength limit, this adaptive
approach will locally adjust the elastic modulus to reduce the over-limit stresses to acceptable
values. Another limitation of LEFEA based designs noted in previous chapters is the inability to
locate the tensile steel in the most effective position. This chapter will also apply adaptive
stress redistribution to the tensile stresses in order to locate the centroid of the tensile
stresses (and hence the centroid of the reinforcing steel) in a predefined position, allowing the
lever arm to be maximised and the steel to be used effectively.
5.1 Adaptive Compressive Stress Redistribution Approach
Under the current design code AS3600, the principal compressive stress should not exceed the
maximum allowable value . However, when facing non-flexural
members such as deep beam with openings with re-entrant corners, stress singularities always
violate this criterion (provided the finite element mesh is fine enough to give accurate results)
and thus obstruct the application of the design approach. The work presented here introduces
an iterative process to redistribute the overstressed compressive stresses to the surrounding
areas by reducing the elastic modulus in the overstressed areas.
72
This iterative process continues until all the stresses are within the compressive stress limit. In
summary, the process involves 4 stages:
Stage 1: Specify the finite element model with initial Young’s modulus values, and then
conduct the analysis as usual ( for the singular points elastic modulus an elastic modulus of
zero is specified, for all other points the initial elastic modulus is homogeneous);
Stage 2: If needed, soften the areas where peak stresses exceed the yield criterion by reducing
the elastic modulus at appropriate nodes, while maintaining the Poisson’s ratio constant;
Stage 3: Re-run the analysis with the modified model and check to see whether all the yield
criteria are satisfied. If they are not, return to stage 2; otherwise continue to stage 4;
Stage 4: Calculate the reinforcement dimensions required to carry all the principal tensile
stresses present in the concrete.
A flowchart for this method is shown below in Figure 43.
Figure 43: Flowchart of the Practical Implementation of Adaptive Compressive Stress Redistribution
73
Initialising the Elastic Modulus
An initial value of elastic modulus is needed before the initial structural analysis can be
conducted. The precise value of this initial elastic modulus is not very important, since the
redistribution of stresses is determined by the relative value of elastic modulus of model. In
the work reported here, a value characteristic of concrete is chosen as the initial elastic
modulus.
However, when dealing with members containing stress singularities, such as a deep beam
with rectangular openings, stress singularities resulting from the geometric or boundary
discontinuities can be removed. This can be achieved by specifying the elastic modulus as zero
for the points of singularity at this stage. As stated previously, this is reasonable, as the
concrete at those points will crack or crush, and thus the material here will not carry any
stresses.
Once initial values of the elastic modulus are defined, finite element analysis of the structure
can be performed in the conventional way.
Adjusting the elastic modulus iteratively
Results from the previous stage are used to adjust the elastic modulus in regions where the
stresses are larger than the allowable value. According to the AS3600, the maximum allowable
principal compressive stress is:
(Eq. 5-1)
Table 4: Stress Reduction Factors
Material Stress Reduction Factor
Concrete in compression 0.6 Steel in tension 0.8
74
Here is the stress reduction factor which is specified as in Table 4; is the effective
compressive strength factor which can be evaluated as follows (Standards Australia 2009):
“(i) in regions not containing confining reinforcement: when the principal tensile
stress does not exceed and otherwise;
(ii) in regions where effective confining reinforcement is provided: shall be evaluated by
rational calculation taking account of the amount of confining steel and the details used, but
shall not exceed 2”;
The factor was originally developed by Vecchio & Collins (Vecchio & Collins 1986) and the
following relationship is obtained:
(Eq. 5-2)
Foster (Foster 2003) states that this relationship “accounts for both confinement effects, as is
the case for concrete in biaxial or triaxial compression, and disturbance effects such as caused
by the transmission of tension fields through compression fields”. At the same time, Foster
(2003) conservatively suggests that the factor can be taken as 0.6 when the principal stress
due to the applied load is √
and as 1.0 while √
, and this
factor is:
{
√
√
(Eq. 5-3)
In order to soften the elastic modulus in a continuous way, the work here reduces the elastic
modulus proportionally by using:
|
|
(Eq. 5-4)
75
Here the factor (0<<1) is chosen to reduce the elastic modulus by a greater speed ratio, the
detailed explanation for the choice of this factor will be discussed in section 5.1.1. By
substituting (Eq. 5-1) into (Eq. 5-4), the following expression is obtained:
|
|
(Eq. 5-5)
Substituting (Eq. 5-3) into (Eq. 5-5), the rule used to adjust the elastic modulus in stage 2 is
specified as:
{
|
|
√
|
|
√
(Eq. 5-6)
Here, the subscript indicates the associated node; while the superscript denotes the
iteration number.
Dimensioning Reinforcement
Once this iterative procedure has ensured all the stresses are within the allowable stress
requirements, the reinforcement can be dimensioned to carry all the tensile stresses present
within the structure. As stated in AS3600 (Standards Australia 2009), “reinforcement and/or
tendons shall be provided to carry all of the internal tensile forces, with stresses not exceeding
and , respectively”, where is shown in Table 4.
5.1.1 Application to Flexural Reinforced Concrete Beam
To illustrate the adaptive compressive stress redistribution approach, the same flexural
reinforced concrete beam shown in Figure 9 is used as an example.
As expressed in (Eq. 3-14) and (Eq. 3-19), the maximum principal compressive stress within the
beam is 13.5MPa and the ultimate moment generated from the applied load is 185KNm. If the
same thickness of the beam as the conventional stress block method where the thickness is
76
300mm is used, using the conventional LEFEA method will result in a stress field violating the
stress criteria specified by AS3600 as:
(Eq. 5-7)
Obviously, this problem can be eliminated by increasing the beam thickness. However, the
alternative is to keep using the original beam thickness (300mm) and perform the modified
LEFEA approach to redistribute the over-stressed stresses.
The adaptive compressive stress redistribution approach introduced in section 5.1 is
performed with (the selection of will be discussed later). Figure 44 presents the
principal compressive stress for the shallow beam by using the adaptive compressive stress
redistribution approach. The results show that the maximum principal compressive stress is
now 13.347MPa, which meets the criteria required by AS3600. Figure 45 indicates the
difference of principal compressive stress for shallow beam between the conventional LEFEA
approach and the adaptive compressive stress redistribution approach. It can be seen that the
peak stresses are successfully redistributed to the stiffer part of the model, while the overall
force equilibrium is preserved.
Figure 44: Principal Compressive Stress for Shallow Beam---MLEFEA
0 200 400 600 800 1000 1200 1400 1600 1800 20000
100
200
300
400
-12
-10
-8
-6
-4
-2
0
77
Figure 45: Difference of Principal Compressive Stress for Shallow Beam--- (LEFEA minus MLEFEA)
The stress plot and the elastic modulus variation across the mid-span of the beam are shown in
Figure 46 and Figure 47, respectively.
Figure 46: Stresses across Mid-span after Compressive Stress Redistribution
0 200 400 600 800 1000 1200 1400 1600 1800 20000
100
200
300
400
-8
-6
-4
-2
0
2
-15 -10 -5 0 5 10 15 200
50
100
150
200
250
300
350
400
450
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Fib
re o
f th
e B
ea
m (
mm
)
78
Figure 47: Relative Value of Elastic Modulus across Mid-span after Compressive Stress Redistribution
Figure 46 shows that all the principal compressive stresses are now below the stress limit
specified by AS3600, which in this case is 13.5MPa. From the stress plot, the tensile force can
be calculated using the trapezoidal rule for integration as:
∫ ( ) (Eq. 5-8)
The area of reinforcement required is therefore:
(Eq. 5-9)
To preserve equilibrium, the centroid of the steel area must be located a distance above the
bottom fibre of the beam of:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-10)
The rate of convergence of this approach is significantly affected by the value of (Equation
5.5). To find out the effect of in (Eq. 5-4), a parametric study was conducted under the same
applied load of 200KN and beam thickness of 300mm but using different values of , and the
results are shown in Table 5. These results show that the values of less than one significantly
0 0.2 0.4 0.6 0.8 1 1.20
50
100
150
200
250
300
350
400
450
Relative Value of Elastic Modulus across Mid-span
Dis
tan
ce
fro
m B
ott
om
Fib
re o
f th
e B
ea
m (
mm
)
79
increase the rate of convergence, but that the smaller the value is, the lower the quality of the
solution. Based on this parametric study, a value of = 0.8 has been used throughout the rest
of this study.
Table 5: Comparison of MLEFEA with different value of ε
1 0.9 0.8 0.7 0.6 0.5 0.4
Area ( )
1451 1459 1469 1480 1500 1510 1520
Xc (mm) 72.3 72.0 71.6 71.0 70.1 69.3 68.5
Iterations 184 8 6 4 4 3 3
5.1.1.1 Cost Comparison and Remarks
Table 6: Approaches Comparison for Shallow Beam
Approaches Area of Steel ( )
Concrete Thickness (mm)
Steel Position (mm)
Conventional (Equivalent Stress Block)
1329.6 300 50
Conventional LEFEA 1548.4 407.8 75
MLEFEA (Adaptive Compressive Stress
Redistribution) 1459.3 300 72
Table 6 compares the cost of the design resulting from the new approach with the designs
resulting from the equivalent stress block method and the conventional LEFEA approach. The
concrete requirements for the new method are now equivalent to those of the equivalent
stress block method and 25% less than those of the conventional LEFEA approach. However,
the steel requirements are still 10% more as the position of the steel is still not in the optimum
position.
80
5.1.2 Application to Non-flexural Reinforced Concrete Beams without
Rectangular Openings
In this section, the Modified LEFEA approach is applied to the design of the deep beam without
rectangular openings. The geometry and loading of the beam are shown in Figure 14. Using the
conventional LEFEA approach, the analysis reported in section 3.2.2 required that the design
thickness of the beam be 196.38mm. If the beam thickness is decreased, e.g. to 180mm, based
on the linearity of the model, the stresses will then violate the maximum principal compressive
stress allowed by AS3600:
(Eq. 5-11)
However, by using the adaptive compressive stress redistribution approach, a reduced beam
thickness is possible with the stress criteria being maintained. The adaptive compressive stress
redistribution approach will lead to a more efficient design, as it can redistribute the peak
compressive stresses into the surrounding areas and ensure all the stresses are below the
stress limitation specified by AS3600.
Figure 48 presents the principal compressive stress field computed by conducting the adaptive
compressive stress redistribution with a concrete thickness is 180mm. From the analysis, the
largest principal compressive stress resulted from the load of 1000KN is 13.412MPa, which is
smaller than the stress limitation (13.5MPa) of AS3600.
81
Figure 48: Plots of Principal Compressive Stress using LEFEA with Adaptive Compressive Stress Redistribution
Like the conventional LEFEA, the actual concrete thickness can be obtained via dividing the
stress value using the largest principal compressive stress limit and is:
(Eq. 5-12)
Figure 49: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress Redistribution
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
2000Principal compressive stress
-12
-10
-8
-6
-4
-2
0
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
2000Principal tensile stress
-10
-8
-6
-4
-2
0
2
4
6
82
The principal tensile stresses and the stresses across mid-span are shown in Figure 49 and
Figure 50 respectively. The resultant tensile force can be found by using the trapezoidal rule to
integrate the tensile stress across mid-span of the beam and is:
∫ ( ) (Eq. 5-13)
The quantity of steel reinforcement required at mid-span is approximately:
(Eq. 5-14)
The required position of the steel centroid measured from the bottom of the beam is
calculated as:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-15)
Figure 50: Plot of Stresses at Mid-span
Figure 50 shows that all the stresses are now below the stress criteria (that is 13.5MPa)
required by AS3600.
-15 -10 -5 0 5 100
500
1000
1500
2000
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Be
am
Fib
re
83
Figure 51: Stresses across Mid-span after Compressive Stress Redistribution (First and Last Iteration)
Figure 51 illustrates the process of stress redistribution by indicating the first and last iteration
steps. The analysis in the first iteration is the same as the conventional LEFEA method, where
the maximum stresses violate the stress criteria required by AS3600. After the application of
MLEFEA, the peak stresses are redistributed to the surrounding areas and the final stress field
is within the required limit.
5.1.2.1 Cost Comparison and Remarks
Table 7 compares the designs resulting from the conventional STM, the conventional LEFEA
approach and the adaptive compressive stress redistribution approach. It can be seen that the
adaptive compressive stress redistribution approach is more efficient than the conventional
LEFEA approach in terms of the concrete usage, as expected. However, there is no
improvement in the area of steel required, as the steel is still not located in the optimum
position.
-15 -10 -5 0 5 100
500
1000
1500
2000
First Iteration
Last Iteration
84
Table 7: Approaches Comparison for Deep Beam
Approaches Area of Steel ( )
Concrete Thickness (mm)
Steel Position (mm)
Conventional STM (Strut and Tie)
1250 246 250
Conventional LEFEA 1397.4 196.4 341.5
MLEFEA (Adaptive Compressive Stress Redistribution)
1397.4 178.8 341.1
5.1.3 Application to Non-flexural Reinforced Concrete Beams with
Rectangular Openings
This section demonstrates the application of the adaptive compressive stress redistribution
approach to the deep beam with rectangular openings shown in Figure 21. As discussed earlier,
the stress singularities involved in the deep beam with square or rectangular openings
invalidate the conventional LEFEA approach. To permit direct comparison, the adaptive
compressive stress redistribution approach is performed with a predefined beam thickness is
246mm, which is the beam thickness used in the conventional STM. The value of is 0.8 and a
finer mesh of 10mm is used so as to compute the stress singularities accurately.
Figure 52 presents the principal compressive stress after conducting the adaptive compressive
stress redistribution approach. The largest principal compressive stress resulting from the load
of 1000KN is 12.3MPa.
85
Figure 52: Plots of Principal Compressive Stress using adaptive compressive stress redistribution approach
Like the conventional LEFEA approach, the actual concrete thickness can be obtained via
dividing the stress value using the largest principal compressive stress limit and is:
(Eq. 5-16)
Figure 53: Plots of Principal Tensile Stress using LEFEA with Adaptive Compressive Stress Redistribution
0 500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
-12
-10
-8
-6
-4
-2
0
2
500 1000 1500 20000
200
400
600
800
1000
1200
1400
1600
1800
-8
-6
-4
-2
0
2
4
6
8
86
The plots of principal tensile stress and stresses across mid-span are shown in Figure 53 and
Figure 54 respectively. The resultant tensile force can be found by integrating the tensile stress
across mid-span of the beam and is:
∫ ( ) (Eq. 5-17)
The quantity of steel reinforcement required at mid-span is approximately:
(Eq. 5-18)
The distance of the centroid of the steel from the bottom of the beam is then calculated as:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-19)
Figure 54: Plot of Stresses at Mid-span
5.1.3.1 Cost Comparison and Remarks
Table 8 compares the designs resulting from the conventional STM, the conventional LEFEA
approach and the adaptive compressive stress redistribution approach. The new approach
-12 -10 -8 -6 -4 -2 0 2 4 60
200
400
600
800
1000
1200
1400
1600
1800
2000
Stresses across Mid-span (MPa)
Dis
tance f
rom
Bott
om
Beam
Fib
re (
mm
)
87
results in savings in both concrete and steel in comparison to the more conventional
approaches.
Table 8: Cost Comparison for designs of Deep Beam with Openings
Approaches Volume of
Steel ( )
Thickness of Concrete ( )
Steel Position (mm)
Conventional STM (Strut and Tie)
6132000 246 250
Conventional LEFEA 5336400 278.2 317
MLEFEA (Adaptive Compressive Stress Redistribution)
5330800 224 317
5.2 Adaptive Tensile Stress Redistribution Approach
The tensile stresses are all carried by the reinforcement, and the distribution of tensile stresses
decides the area and position of the reinforcement. However, following this procedure may
not result in the most efficient structure. For example, if this approach is used to design a
conventional beam, the reinforcement will be placed at the centroid of the calculated tensile
stress, rather than as close as possible to the bottom of the beam (maximizing the lever arm
for the tensile force), as would be done in conventional design. This will result in more steel
being necessary in the finite element based design than in the conventional design. In most
structural design situations, the designer will be able to identify the optimum position of the
tensile steel in a structure before performing the analysis. In doing this he or she will take into
account requirements about the concrete cover for the steel.
To increase the efficiency of the designs resulting from the proposed stress redistribution
process, in this section the steel position is assumed to be pre-defined before the analysis is
performed. The redistribution process is used to ensure the distribution of tensile stresses
matches well to the steel position, in the sense that the position of the tensile stress resultant
coincides with the proposed steel position.
88
Before considering the redistribution of tensile stress, the procedure for using linear stress
analysis to dimension the reinforcement from the tensile stress distribution is reviewed.
Firstly, linear stress analysis is conducted, and then cross-sectional cuts taken across the
principal tensile stress field perpendicular to the direction of the principal tensile stress vector,
giving plots of the principal tensile stresses.
Then the resultant tensile force can then be found from the integration of stress plots along
cross-sections using:
∫ ( ) (Eq. 5-20)
Here t is the thickness of the structure. Plane stress conditions have been assumed.
The required action line of steel can be obtained by determining the centroid of tensile stress
plots via:
∫ ( )
∫ ( ) (Eq. 5-21)
In the case that the position of the steel is pre-defined, the difference between the position
given by the above equation and the pre-defined value can be used as the criteria to drive the
stress redistribution of tensile stress. The flowchart of the tensile stress redistribution is shown
in Figure 55.
89
Figure 55: Flowchart of the Practical Implementation for Tensile Stress Redistribution
The position criterion here (in Figure 55) refers to the difference between the actual steel
position and predefined position . In this work, is introduced as:
| | (Eq. 5-22)
The procedure for redistributing the tensile stress suggested here is based on the observation
that the optimum position for steel generally coincides with the peak tensile stress regions,
while the centroid of the tensile stresses will be located at a position of lower stress. For
example, in a simple beam the optimum position for the steel is at the bottom of the beam,
where the tensile stress is maximum. However, linear stress analysis will result in a triangular
distribution of tensile stress with the resultant located 1/3 d from the bottom of the beam,
where d is the distance to the neutral axis. In practical concrete designs, the requirement that
there is sufficient concrete cover over the steel limits the extent to which the steel can be
placed in the optimal position.
Based on the observation that the designer will usually want to shift the centroid of the tensile
stress field from its linear elastic position towards the highest tensile stress position, an
90
iterative procedure is proposed where the elastic modulus in areas of high tensile stress is
artificially increased. This attracts more tensile stress to these areas, moving the centroid of
the tensile stress field towards the high tensile stress location and increasing the magnitude of
the tensile stress. The increase in elastic modulus in these areas is continued until the position
of the tensile stress centroid matches the selected steel position.
In a similar (but reverse) process to the adaptive compressive stress redistribution process, the
adaptive tensile stress redistribution approach increases the elastic modulus proportionally by
introducing a factor (larger than one) to increase the rate of increase of E. The sensitivity of
the process to the factor will be discussed later. The rule used for the increase of E is:
(Eq. 5-23)
Here is the value of tensile stress calculated from the previous iteration, and is the
hardening stress. The subscript indicates the associated node, while the superscript
denotes the iteration number.
The hardening stress is chosen on the basis of the maximum tensile stress multiplied by a
factor (less than one), and can be represented as:
(Eq. 5-24)
Numerical trials showed that in order to get the tensile stress centroid to reliably converge to
the selected steel position, the value of has to be progressively decreased. To do this
another two parameters and were introduced with:
(Eq. 5-25)
The main steps of the adaptive tensile stress redistribution approach are:
Step 1: Perform a conventional linear analysis and find the position of reinforcement ;
91
Step 2: Compare with the pre-selected steel position and get their difference from
(Eq. 5-22);
Step 3: If is smaller than the tolerance, proceed to Step 4. Otherwise, find the maximum
tensile stress and choose a value of to set the hardening stress to (Eq. 5-24), perform the
stress hardening as shown in (Eq. 5-23), and then go back to Step 2 to re-run the linear analysis;
Step 4: Find out the position and area of reinforcement to carry all the principal tensile stress
in the concrete.
5.3 Adaptive Stress Redistribution Approach for both
Compressive and Tensile Stress
The two stress redistribution processes can be performed together, as shown in Figure 56:
Figure 56: Flowchart of the Practical Implementation for both Compressive and Tensile Stress Redistributions
The application of this adaptive method will be introduced in the following sections.
92
5.3.1 Application to Flexural Reinforced Concrete Beam
To demonstrate the adaptive stress redistribution for both compressive and tensile stress, the
flexural reinforced concrete beam shown in Figure 9 is used again. The preselected steel
position from the bottom of the beam is 50mm, which is the same as the steel position used in
the conventional design approach.
The parameters used for the adaptive stress redistribution for both compressive and tensile
stress are ; ; ; ; ; . The resulting stress plot
across the mid-span of the beam is shown as Figure 57, and the elastic modulus variation
across mid-span of the beam is shown in Figure 58.
Figure 57: Stresses across Mid-span after Stress Redistribution for both Compressive and Tensile Stresses
-20 -10 0 10 20 30 40 50 600
50
100
150
200
250
300
350
400
450
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Fib
re o
f th
e B
ea
m (
mm
)
93
Figure 58: Relative Value of Elastic Modulus across Mid-span after Stress Redistribution for both Compressive and
Tensile Stresses
The tensile force is calculated through:
∫ ( ) (Eq. 5-26)
With the area of reinforcement:
(Eq. 5-27)
The position of the steel centroid above the bottom fibre of the beam is:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-28)
As hoped, this is very close to the target 50mm.
5.3.1.1 Cost Comparison and Remarks
At this stage four different approaches have been used for the design of flexural beam. These
approaches are the conventional equivalent stress block approach, the conventional LEFEA
approach, the adaptive compressive stress redistribution approach and the adaptive stress
0 0.2 0.4 0.6 0.8 1 1.20
50
100
150
200
250
300
350
400
450
Relative Value of Elastic Modulus across Mid-span
Dis
tan
ce
fro
m B
ott
om
Fib
re o
f th
e B
ea
m (
mm
)
94
redistribution approach for both compressive and tensile stress. A comparison of the designs
resulting from these four different approaches is shown in Table 9.
Table 9: Approaches Comparison for Shallow Beam (with Updated MLEFEA)
Approaches Area of Steel
( ) Concrete Thickness
(mm)
Steel Position (mm)
Conventional (Equivalent Stress Block)
1329.6 300 50
Conventional LEFEA 1417.9 384 75.29
MLEFEA (Adaptive Compressive Stress
Redistribution) 1459.3 300 72
MLEFEA (Adaptive Stress Redistribution for Both
Compressive and Tensile Stress) 1213.5 300 50.58
Table 9 shows that the new MLEFEA method with redistribution of both compressive and
tensile stress provides the most efficient design in terms of both concrete and steel use.
In order to compare these four approaches more thoroughly, a parametric study was carried
out performing several parallel designs, each under an applied load of 200KN for the same
beam, but with a range of thicknesses: 200mm, 250mm, 300mm, 350mm, 400mm and 450mm.
The results are shown in Figure 59.
Figure 59: Design Results for Shallow Beam
1000.00
1100.00
1200.00
1300.00
1400.00
1500.00
1600.00
150.00 200.00 250.00 300.00 350.00 400.00 450.00 500.00
Are
a o
f St
eel (
mm
2)
Thickness of Beam (mm)
Shallow Beam
MLEFEA (comp) LEFEA Equivalent Stress Block MLEFEA (both)
95
Figure 59 shows clearly the trade off between thickness of beam and area of reinforcement,
which gives designers the flexibility to choose which material they want to save during the
design. Compared to both the conventional equivalent stress block approach and the adaptive
compressive stress redistribution for both compressive and tensile stress, design produced by
conventional LEFEA method is extremely conservative and leads to material waste in terms of
both concrete and steel.
There are only slight differences between the adaptive compressive stress redistribution for
both compressive and tensile stress and the conventional equivalent stress block approach.
The approach involving stress redistribution for both compressive and tensile stresses uses the
same steel position as the conventional method. Overall, the MLEFEA approach gives similar
(though slightly better) results than the conventional equivalent stress block approach.
The reason for the slight differences between the conventional equivalent stress block method
and the MLEFEA method is because, although the total volume of the equivalent compressive
stress block approximately equals to the MLEFEA compressive stress block there is a small
difference in the distribution. This is shown in Figure 60, which shows the volume for
compressive stress block (negative stresses are compressive stresses) for all these three
approaches are quite similar.
96
Figure 60: Stress Blocks for Different Approaches
5.3.1.2 Nonlinear verification
To verify the safety of this proposed approach, ABAQUS/CAE is used to conduct the non-linear
finite element analysis of the final design produced by the new approach. The damaged
plasticity model is used to describe the behaviour of concrete, while an elastic-plastic model is
used to model the reinforcement. The connection between concrete and steel is considered as
embedded. For the spring supports, two reference points are used to apply the boundary
conditions on the two supports.
Based on the non-linear analysis, the load vs. deflection curve of this shallow beam is as shown
in Figure 61, which indicates that the ultimate mid-span load is higher than the design load of
200KN. This result means the design is safe and the approach is reasonable.
-20 -10 0 10 20 30 40 50 60
50
100
150
200
250
300
350
400
450
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Fib
re o
f th
e B
ea
m (
mm
)
MLEFEA (Both)
MLEFEA (OnlyComp)
Equivalent Stress Block
97
Figure 61: Load vs. Deflection Curve for Shallow Beam
5.3.2 Application to Non-flexural Reinforced Concrete Beams without
Rectangular Openings
The adaptive stress redistribution for both compressive and tensile stress is now used to
design the non-flexural reinforced concrete beam without rectangular openings shown in
Figure 14. The pre-selected steel position is 250mm, which is the same as the steel position for
the conventional STM, and is much lower than that for the conventional LEFEA method
(341.5mm).
The parameters used in this application are ; ; ; ; ;
. The stress plot across the mid-span of the beam is shown in Figure 62 and the elastic
modulus variation across mid-span of the beam is presented in Figure 63 .
0
50
100
150
200
250
300
350
0 50 100 150 200
Load
(K
N)
Deflection (mm)
Load vs. Deflection
98
Figure 62: Stresses across Mid-span after Stress Redistribution for both Compressive and Tensile Stresses
Figure 63: Relative Elastic Modulus along Mid-span after Stress Redistribution for both Compressive and Tensile
Stresses
The tensile force is calculated as:
∫ ( ) (Eq. 5-29)
The area of reinforcement is:
-20 -10 0 10 20 30 40 50 600
500
1000
1500
2000
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Be
am
Fib
re (
mm
)
0 0.2 0.4 0.6 0.8 1 1.2 1.40
500
1000
1500
2000
Relative Value of Elastic Modulus at Mid-span
Dis
tan
ce
fro
m B
ott
om
Be
am
Fib
re (
mm
)
99
(Eq. 5-30)
The target steel centroid position was 250 mm above the bottom fibre of the beam, and the
position achieved was very close to the target at:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-31)
5.3.2.1 Cost Comparison and Remarks
The designs resulting from the four different approaches are summarised in Table 10.
Table 10: Approaches Comparison for Deep Beam (with Updated MLEFEA)
Approaches Area of Steel
( ) Concrete
Thickness (mm)
Steel Position (mm)
Conventional STM (Strut and Tie)
1250 246 250
Conventional LEFEA 1397.4 196.4 341.5
MLEFEA (Adaptive Compressive Stress
Redistribution) 1397.4 181.2 341.1
MLEFEA (Adaptive Stress Redistribution for Both
Compressive and Tensile Stress) 1292.3 180 249.1
The adaptive stress redistribution for both compressive and tensile stress results in the lowest
use of concrete (27% reduction over the strut-tie design) and the smallest increase in steel
over the strut-tie approach (3%).
Figure 59 showed that, for shallow beams, there is a clear trade off between using less
concrete and using more steel to support a given load. The parametric study was repeated for
deep beams. Figure 64 shows the comparison between these approaches for the designs of
deep beam without openings under the same applied load (1000KN) with different beam
thickness. The selected steel position is 250mm from bottom of the beam. The benefit of the
MLEFEA approach is clear in terms of concrete saving when compared to the conventional
100
LEFEA and STM approaches. Compared to shallow beams, there is much less of a trade off
between concrete and steel.
For the design of deep beams without openings, the MLEFEA with compressive stress
redistribution requires the same area of steel as the conventional LEFEA, while designs based
on MLEFEA with compressive stress redistribution require less concrete. And in terms of the
steel area, designs based on the MLEFEA with both stresses redistributed are quite similar to
the strut and tie design, while the MLEFEA has significantly less concrete usage.
Figure 64: Design Results for Deep Beam
At the same time, it can be seen that for the MLEFEA approach, there is a limit to how far the
thickness can be decreased, as a minimum amount of concrete is always needed to carry the
compressive stress within the beam.
5.3.2.2 Non-linear Verification
As was done for the shallow beam, to verify the safety of the proposed approach ABAQUS/CAE
is used to conduct the non-linear finite element analysis of the final design produced by the
MLEFEA with both compressive and tensile stress redistribution.
1000.00
1100.00
1200.00
1300.00
1400.00
1500.00
130.00 150.00 170.00 190.00 210.00 230.00 250.00 270.00
Are
a o
f St
eel (
mm
2 )
Thickness of Beam (mm)
Deep Beam
MLEFEA (comp) LEFEA Strut and Tie MLEFEA (both)
101
The load vs. deflection curve of the deep beam is shown in Figure 65, which indicates that the
ultimate mid-span load is higher than the applied load (1000KN), and therefore the design
based on the MLEFEA approach is safe and the approach is reasonable.
Figure 65: Load vs. Deflection Curve for Deep Beam
5.3.3 Application to Non-flexural Reinforced Concrete Beams with
Rectangular Openings
In this section, the MLEFEA with redistribution of both compressive and tensile stresses is
applied to the design of the non-flexural reinforced concrete beam with rectangular openings.
The geometry of the beam is shown as Figure 21.
The parameters used in this application are ; ; ; ; ;
. The resulting stress plotted across the mid-span of the beam is presented in Figure 66.
0
500
1000
1500
2000
2500
0 20 40 60 80
Load
(K
N)
Deflection (mm)
Load vs. Deflection
102
Figure 66: Stresses across Mid-span after Stress Redistribution for both Compressive and Tensile Stresses
The tensile force is calculated as:
∫ ( ) (Eq. 5-32)
The area of reinforcement is:
(Eq. 5-33)
The target position of the steel centroid was again 250mm above the bottom of the beam, and
the achieved position was:
∫ ( )
∫ ( )
∫ ( )
∫ ( ) (Eq. 5-34)
5.3.3.1 Cost Comparison and Remarks
For the designs of deep beam with rectangular openings, Figure 67 illustrates the differences
between the various approaches in terms of steel and concrete usage. The data in this figure is
generated by conducting similar designs using same parameters under the same applied load
(1000KN), but with different beam thicknesses. The same steel position is selected for both the
-20 -10 0 10 20 30 40 50 600
500
1000
1500
2000
Stresses across Mid-span (MPa)
Dis
tan
ce
fro
m B
ott
om
Be
am
Fib
re (
mm
)
103
conventional STM and the MLEFEA approach. It is clear from Figure 67 that the MLEFEA
approach generates more efficient designs in terms of concrete saving than both the
conventional LEFEA and the STM. This is because the MLEFEA approach solves the problem of
stress singularities violating the stress criteria in AS3600 and fully utilizes the strength capacity
of the concrete within the structure.
As with deep beams without web openings, although the MLEFEA approach is efficient in
concrete saving than conventional approaches, there is a limit to which the stress
redistribution approach can reduce the concrete thickness.
Figure 67: Design Results for Deep Beam with Openings
5.3.3.2 Nonlinear verification
With the aid of ABAQUS, non-linear finite element analysis is performed to verify the safety of
the design for the deep beam with rectangular openings produced by the adaptive stress
redistribution of both compressive and tensile stress. The load vs. deflection curve of the beam
obtained from this analysis is as shown in Figure 68, which shows that the ultimate mid-span
load is higher than the applied load (1000KN), so the design is safe and the approach is
reasonable.
1000.00
1100.00
1200.00
1300.00
1400.00
130.00 150.00 170.00 190.00 210.00 230.00 250.00 270.00 290.00
Are
a o
f St
eel (
mm
2 )
Thickness of Beam (mm)
Deep Beam with Openings
MLEFEA (comp) LEFEA Strut and Tie MLEFEA (both)
104
Figure 68: Load vs. Deflection Curve for Deep Beam with Openings
5.4 Summary
This chapter demonstrates the application of the new MLEFEA adaptive stress redistribution
approach to three different types of simple structures, namely shallow (flexural) beams, deep
beams, and deep beams with web openings. The efficiency of the resulting designs is examined.
For the adaptive stress redistribution approach, both adaptive compressive stress
redistribution and adaptive stress redistribution for compressive and tensile stress are
investigated.
For the shallow beams, the stress redistribution approach can result in similar designs as the
conventional equivalent stress block approach, indicating that it is able to obtain designs which
are close to the maximum efficiency possible.
For the deep beams, the proposed approach generates more efficient designs than
conventional LEFEA approach and strut-and-tie approach in terms of concrete savings, as it
fully considers the contribution of the concrete to the overall strength of the structure. This
saving is significant as this approach can be used in the designs of other members, not only
beams.
0
500
1000
1500
2000
2500
0 20 40 60 80 100 120
Load
(K
N)
Deflection (mm)
Load vs. Deflection
105
For the deep beams with web openings, the proposed approach overcomes the difficulties
confronted in conventional LEFEA with respect to stress singularities. The adaptive stress
redistribution can obtain a stress field suitable for design by successfully removing the stress
singularities.
Furthermore, preliminary tests of the designs resulting from the new approach were
performed using non-linear finite element analysis through ABAQUS. Analysis results show the
designs produced on the basis of the proposed approach are safe, collapsing well after
reaching the design load. As the stress fields resulting from the proposed approach are
statically admissible, in equilibrium with the applied loads, and are able to satisfy the yield
condition, designs generated using the approach are reasonable, and comply with the lower
bound theory of plasticity.
This chapter has shown that the adaptive stress redistribution approach for both compressive
and tensile stress can lead to more efficient designs than the conventional approaches,
particularly in terms of concrete savings. Therefore, this approach has the potential to reduce
carbon emission and environmental pollution.
However, the approach requires use of a fine finite element mesh and an iterative procedure,
with the global stiffness matrix being reconstructed at each step. The remaining part of this
study will look at the possibility of speeding up the computational process by using GPUs.
106
PART II: EFFICIENT Graphic Processing Unit (GPU)
IMPLEMENTATION
The first part of this thesis successfully developed an efficient MLEFEA approach which leads to
more efficient designs than either the conventional strut - tie approach or the LEFEA approach.
This new approach performs the stress redistribution adaptively to generate a suitable stress
field which is in equilibrium both internally and externally, does not exceed a specified yield
criterion anywhere, and has the tensile stress resultant at a location selected by the designer.
In order to achieve this result the process may require a lot of iterations, and within each
iteration a finite element analysis with a fine element mesh must be performed. The stiffness
matrix changes in each iteration, due to changes in the elastic modulus. Consequently
factorisation of the stiffness matrix in one iteration cannot be used in the next iteration, and
the process is potentially computationally expensive. This part of the thesis examines how the
Graphical Processor Units on a normal PC graphics card can be used to accelerate the process.
This part of the thesis will discuss the GPU, its use in finite element analysis, and its application
to the new MLEFEA approach. To begin with, the basic theory of GPU processing is introduced,
and the literature regarding the GPU and its use in finite element analysis is reviewed. This is
followed by an investigation of CSR storage format for stiffness matrix and the SpMV algorithm.
After presenting the GPU-based Pre-Conditioned Gradient method (PCG) and the process
required to assemble the stiffness matrix in CSR, this part ends with a speed comparison
between CPU and GPU algorithms implementing the new MLEFEA approach applied to an
example of a deep beam with rectangular openings.
107
6 Basic Theory & Literature Review
In this chapter, the basic theory of GPU programming is introduced and the literature about
GPU use in finite element analysis is reviewed. For the GPU basic theory, the books
“Programming massively parallel processors: a hands-on approach” (Kirk & Hwu 2010) and the
“NVIDIA CUDA C Programming Guide” (NVIDIA Corporation 2007) are heavily referenced. For
the GPU usage in finite element analysis, GPU methods for stiffness matrix assembly and
solving are presented, with an emphasis on stiffness matrix solving, where both direct and
iterative solvers are explained. Iterative solvers based on the Jacobi method, Gauss-Seidel
method, and conventional and preconditioned conjugate gradient methods are introduced.
Research reported in the literature about the most time-consuming part of iterative solvers,
Sparse Matrix Vector Multiplication (SpMV), is also reviewed.
6.1 Graphics Processing Unit (GPU)
Motivated by the strong competition in the gaming industry, the programmable Graphic
Processor Unit or GPU is originally designed to accelerate computer graphics applications.
These days the GPU is considered as a good alternative for the CPU for applications where high
computing power and speed are required, because it has higher computation speed with lower
price. For example, in 2009, the ratio between GPUs and CPUs in terms of potential peak
floating-point calculation was about 10 to 1 (1 teraflops to 100 gigaflops). Most significantly,
the performance of GPUs is still growing rapidly, while the improvement of CPUs is relatively
slow.
The main reason for this huge difference in speed is because of the different design
philosophies for CPUs and GPUs, as shown in Figure 69. The design of CPUs is mainly for
optimising sequential code performance, and they make full use of the control logic algorithm
to maintain the appearance of sequential execution. Large cache memories are used to reduce
108
the data access latencies for complex applications. Additionally, both the control logic
algorithm and the cache memories will reduce the CPUs performance in potential peak
floating-point calculations. Because of the frame buffer requirements and the relaxed memory
model, it is very difficult for CPUs to meet the bandwidth needed to meet the requirements of
different applications, operating system needs and input/output devices.
Figure 69: Different Design Philosophies for CPUs and GPUs
In contrast, the design philosophy of the GPUs is to dedicate the maximum chip area to the
floating-point calculations by minimizing the control logic execution. Unlike CPUs, there are
fewer legacy requirements and simpler memory models for GPUs, so GPUs can achieve higher
bandwidth. Most importantly, because there are a large number of threads in GPUs, the GPU
hardware can automatically find some threads to execute when some of the other threads are
waiting for long-latency memory accesses such as the global memory accesses. Furthermore,
small cache memories are used in GPUs to help control the bandwidth requirement so threads
accessing the same memory data do not need to all go to the Dynamic Random Access
Memory (DRAM). Consequently, much more chip area can be used to do the calculations.
Besides the above mentioned advantages in terms of performance, there are some other
reasons for the popularity of GPUs. Firstly, the GPUs are very cheap and are supplied in most
PCs, in contrast to the traditional parallel computing systems which can only be accessed by
fewer users. Also, modern GPUs support the Institute of Electrical and Electronics Engineers
(IEEE) floating point standard, which makes GPUs suitable for different numerical applications.
109
In addition, instead of using the conventional applications-limited Application Programming
Interface (API) functions, such as OpenGL or Direct3D, to access the programming on graphic
chips, GPUs can be easily programmed parallelly using Compute Unified Device Architecture
(CUDA) programming language. CUDA extends standard C/C++ with some special GPU function
sets, enabling developers to access GPU programming easily without knowledge about
conventional GPU pipeline.
Most recently, other ways of programming GPUs have been developed, such as CUDA Fortran,
which is developed by PGI and NVIDIA and is only available in PGI 2010 and later release. it
includes a FORTRAN 2003 compiler and tool chain for programming NVIDIA GPUs using
FORTRAN. Unlike the CUDA C compiler, the CUDA Fortran compiler is not free, and thus is less
popular. Another way of accessing the power of GPUs is through MATLAB CUDA, which
provides the base for CUDA GPU-accelerated MATLAB operations. There are three ways to
accelerate MATLAB using GPU: the MATLAB plug-in for CUDA using MEX files, the Jacket
Engine for MATLAB acceleration and the GPUmat (Zhang et al. 2011). However, MATLAB CUDA
is still maturing and has quite a number of limitations, and its usage is confined to some
specific problems.
A CUDA program runs on a host that is a traditional Central Processing Unit (CPU) and accesses
one or more GPU devices which are parallel processors. The sequential host program running
on CPU can call kernels written in CUDA to run on the GPU devices. The host and device code
is separated by the NVIDIA C compiler (nvcc). The host code is conventional ANSI C code and is
compiled as a standard CPU process, while the device code compiled by nvcc is written using
ANSI C extended and will be executed on a GPU device. The execution of a CUDA program is
shown in Figure 70. The execution starts with the host serial code (CPU code), and then when a
kernel function is invoked, it comes to the device parallel code (GPU code), where the threads
being executed in the kernel are organized as a grid. When all the threads in a kernel are
110
executed completely, the corresponding grid terminates and the whole code will continue to
execute on the host till another kernel is invoked.
Figure 70: Execution of a CUDA Program
To achieve parallel execution, a CUDA program contains an algorithm of Single Instruction
Multiple Threads or SIMTs. The CUDA executes each SIMT in parallel by a set of threads. The
threads are organized into blocks, with one or more blocks constituting a kernel. The hierarchy
of CUDA threads is presented in Figure 71, where only a small number of threads are shown
for simplicity.
Generally, grids are two-dimensional arrangements of blocks, and all blocks in a grid have the
same dimension. Each dimension of a grid can range from 1 to 65535. Blocks are usually three
dimensional arrangements of threads. The maximum number of threads in a block is 512, with
flexibility in how the threads are distributed into three dimensions, as long as the total number
of threads in that block is not over 512. Because of this hierarchical architecture, every thread
has its unique integer index within its block, and every block has its unique integer index within
its grid.
111
Figure 71: Hierarchy of CUDA Threads
In order to remove the global memory limitations of long access latencies (hundreds of clock
cycles) and finite access band-width, CUDA provides several types of memories in a unique
memory hierarchy architecture, which is illustrated in Figure 72. At the bottom of Figure 72,
there is the global memory, the constant memory and the texture memory, all of which are
available to every grid and can be written and read by the host code (running on CPU) by
calling API functions. The constant memory and the texture memory can only be accessed
read-only by the device code (running on GPU) and have faster data transfer speed and more
parallel paths for CUDA than the global memory. Both the host code and the device code can
write to the global memory.
Registers are located in every individual thread, and each thread can only access its own
corresponding registers. Shared memory is located in every thread block, and all the threads
within a block can read and write data to this block’s shared memory. This is a very efficient
way for threads in a block to share and incorporate data. In CUDA, the shared memory is
usually used to store the portion of global memory which will be heavily used in a kernel
execution.
112
Figure 72: Hierarchy of GPU Memory
For thread execution, the most important issue is thread scheduling. In CUDA, once a kernel is
invoked, the block will be assigned to a streaming multiprocessor (SM), and threads in a block
will be divided into several warps, each of which consists of 32 threads. This is the way for a
CUDA program to execute long-latency operations. When an instruction held by threads within
a warp must wait for another long-latency operation, this warp will not be executed at that
moment, and another warp which does not need to wait for another operation will be selected
to execute. If more than one warp can be selected for execution, a priority mechanism will be
used to select the warp. This process is known as latency hiding.
As there are a large number of threads and thus a great number of warps in any CUDA
execution, the hardware is able to find a warp to execute at any time. This latency hiding
makes full use of the capacity of the hardware, despite the presence of long-latency
operations. Partly as a result of this, GPUs achieve greater performance in terms of computing
time compared to CPUs.
Nowadays, because of their tremendous performance, peak computing capability and large
memory bandwidth, GPUs are now being widely used for scientific computations, including
113
physics simulations, cloth simulations (Cecka et al. 2010), fluid dynamics (Elsen et al. 2008;
Goddeke et al. 2009), finite element simulations (Galoppo et al. 2005), as well as many other
applications (Cevahir et al. 2010b).
The finite element method involves huge number of processes which could potentially be done
in parallel, such as stiffness matrix assembly and the solving of stiffness matrix. This makes it
possible to use GPUs to accelerate the FEM. The GPU work reported in this thesis attempts to
optimize stiffness matrix solving using a GPU Preconditioned Conjugate Gradient (GPUPCG)
approach, which makes full use of the massively parallel computation features of the GPU.
6.2 GPU Implementation of Finite Element Analysis
6.2.1 Stiffness Matrix Assembly
In finite element analysis, stiffness matrix assembly is the important component of the process
where the nodal data, element connectivity, and boundary conditions are all used to assemble
the linear system of equations. Because of the natural properties of partial differential
equations, the finite element method is well suited for the GPU parallel implementation. The
GPU naturally fits a structured mesh, where the structured patterns are regular and no further
information is needed for mesh connectivity. For an unstructured mesh, Cecka and Lew et al.
(Cecka et al. 2011) have investigated the use of a single GPU to accelerate the stiffness matrix
assembly. They introduced three different ways to assemble the stiffness matrix: assembly by
Non-Zero element (NZ), assembly by row, and assembly by element. They also provided a
detailed description of assembly by element via colouring (Komatitsch et al. 2009), and
assembly by NZ in local memory (Bolz et al. 2003), in global memory (Tejada & Ertl 2005) and
in shared memory. Based on the example of geometric flow and fluid simulation, Bolz and
Farmer et al. (Bolz et al. 2003) also implemented a conjugate gradient solver (Shewchuk 1994)
and a multigrid solver (Bruggi 2009) on GPU hardware for unstructured and structured meshes
respectively, and demonstrated the powerful potential of the traditional fundamental
114
computational kernels when using with GPU. J. Rodriguez-Navarro (Navarro & Susin 2006)
showed an FEM cloth simulation implemented on GPU, and successfully detected cloth
collisions and self-collision by using image-based collision methods. The visual results showed
that the GPU based approach was more efficient than two conventional methods.
In order to turn a large nonlinear optimization problem into a GPU suitable process, Hillesland
(Hillesland et al. 2005) developed a framework for building image-based models in graphics
hardware, taking advantage of the benefits of minimal storage overhead and no resampling
step. Andreas Klöckner (Klöckner et al. 2009) discussed the implementation of discontinuous
Galerkin methods on GPU and applied the implementation to Maxwell’s equations, while
Komatitsch and Michéa et al. (Komatitsch et al. 2009) discussed the way to assemble of high-
order continuous Galerkin methods by using a colouring scheme to handle the summation
operations over nodes. The performance results showed a maximum speedup with 25 for a
seismic wave propagation problem could be obtained.
6.2.2 Stiffness Matrix Solving
After the assembly of the stiffness matrix and generation of the linear equation system KU=F,
the next step is to solve the equations for the unknowns U, which normally contains the nodal
displacements in the field of structural engineering. Direct or iterative solvers are used to solve
the sparse linear equations system. Direct solvers, including Cholesky decomposition, QR
decomposition, Gauss Elimination and LU decomposition (Meyer 1988), are extensively used
for dense matrices. O'Leary (Jung & O'Leary 2006) presented an efficient GPU implementation
of Cholesky decomposition for solving of dense symmetric and positive definite linear systems.
However, due to lack of double precision support on the GPU, the interior point algorithm
presented did converge as well as the double floating precision CPU implementation. Volkov
and Demmel (Volkov & Demmel 2008) demonstrated significant speedup by using GPU in
three most widely used factorizations in dense linear algebra, namely LU factorization, QR
factorization and Cholesky factorization. Fumihiko Ino (Ino et al. 2005) introduced a GPU
115
implementation of LU decomposition, and concluded that numerical errors invalidated the
GPU implementation of LU decomposition because of the lack of double-precision support.
With the aid of GPU, Dominik Göddeke (Göddeke et al. 2005) developed a mixed precision
defect correction approach to achieve double precision accuracy for Finite Element simulation,
while still exhibiting improved performance compared to a double precision CPU solver. Since
then, problems about the lack of precision have been overcome by the emergence of GPUs
supporting double-precision. Galoppo et al. (Galoppo et al. 2005) presented a novel GPU based
LU factorization for solving the dense linear systems by reducing the problem into a series of
rasterization problems on the GPU. Appropriate data representations match the blocked
rasterization order and cache pre-fetch technology of a GPU, and the results show that this
GPU based LU factorization is more efficient in terms of overall algorithm cache and bandwidth
than conventional approaches. In 2007, based on the iterative refinement algorithm (Buttari et
al. 2007), Barrachina et al. (Barrachina et al. 2008) developed a new padding and hybrid GPU-
CPU computation iterative refinement algorithm where full accuracy in the solution of dense
linear systems is obtained.
Research reported in several papers (Tomov et al. 2010; M. Baboulin & Volkov 2008; Tomov et
al. 2009) has introduced a GPU-based dense linear algebra algorithm – the Matrix Algebra on
GPU and Multicore Architectures (MAGMA), which is similar to LAPACK library but for hybrid
architectures. Besides the MAGMA, there are also some other libraries available, such as
CUDAZTEC (Neckels) and GPUmatrix (Bonneel), which provide solvers for GMRES and
conventional decomposition (LU, QR and Cholesky), respectively.
As for iterative solvers, the Jacobi method, the Gauss-Seidel method, the conjugate gradient
method (CG), multi-grid methods and the preconditioned conjugate gradient method (PCG)
are all widely used. The Jacobi method is easily derived by examining each of the n equations
in the linear system Ax=b in isolation, which works will if the equation is dominated by the
diagonal element. Because the equations are treated independently, Jacobi method is ideally
116
suited for parallel programming, where all the equations could be resolved concurrently.
However, although it is very easy to understand and implement, its convergence rate is slow,
and so it is not a common first choice (Bathe & Wilson 1976; Demmel 1997).
While maintaining a similar process to the Jacobi method, Gauss-Seidel method examines the
equations one at a time in sequence and uses the previous iteration results as the updated
values as soon as they are available. Generally speaking, if the Jacobi method converges, the
Gauss-Seidel method will converge at a faster speed, though still relatively slowly (Saad 2003;
Demmel 1997).
However, the Conjugate Gradient (CG) approach is well-known for its efficiency for solving
symmetric positive definite systems. As its name suggests, this method generates a sequence
of conjugate (or orthogonal) vectors, which are the residuals of iterates. Actually, CG is a
simple method of Conjugate Directions where the search directions are constructed by
conjugation of the residuals (Shewchuk 1994). It is the most effective method for solving the
symmetric positive definite equations. Furthermore, because there is no data-dependency in
the algorithm, there is no need to do any significant change for compiling within the parallel
environment. Therefore, the CG method is well suited to the parallel processing, needing only
the matrix-vector product, parallel reduction, two vector updates and inner product routines
to be changed into parallel process. In the structural engineering the coefficient matrix is
sparse, symmetric, positive and definite, and so the best method for solving those problems is
CG (Shewchuk 1994; Demmel 1997).
Importantly, one thing to keep in mind is that the convergence rate of iterative methods
greatly depends on the spectrum of the coefficient matrix. Therefore, one way to increase the
convergence rate is to try to transform the system of linear equations into one with the same
solutions set, but with a more favourable spectrum. The preconditioner is a matrix to do such
transformation.
117
As explained before, CG is ideal for the solving of stiffness matrix, as the stiffness matrix is
symmetric positive definite. In this work, the preconditioned conjugate gradient method (PCG)
is chosen, because in PCG the coefficient matrix has a more favourable spectrum, leading to a
greater convergence rate (Shewchuk 1994).
Simply speaking, the PCG is the same as CG, except it contains the preconditioner to increase
the convergence speed. The simplest preconditioner is a diagonal matrix whose diagonal
elements are the same as those of the former coefficient matrix. The process of applying this
preconditioner is known as diagonal preconditioning or Jacobi preconditioning, and the
diagonal preconditioned conjugate gradient method (Shewchuk 1994) will be used in the work
presented here.
In order to optimise the solving of linear equations system, a significant amount of research
about application of GPUs has used conjugate gradient (Wiggers et al. 2007; Bolz et al. 2003;
Krüger & Westermann 2003) and multi-grid techniques (Göddeke et al. 2008; Göddeke et al.
2005). As for the preconditioned conjugate gradient method (PCG) that is used in this work,
Buatois et al. (Buatois, Caumon & Levy 2009) developed a general sparse linear solver called
Concurrent Number Cruncher (CNC) which is based on the PCG using block compressed row
storage (BCRS) format for the matrices. This solver proved to be efficient, but only for general
sparse matrices, as it results in non-optimal global memory access. In addition to the single
GPU implementation, a number of researchers (Playne & Hawick 2010; Cevahir et al. 2009;
Cevahir, Nukada & Matsuoka 2010a; Ament et al. 2010) have described multiple GPU
implementation of the Conjugate Gradient method with great accuracy and great performance.
In all of the above iterative solvers, the sparse matrix vector multiplication (SpMV) is of
particular importance, and is the most time consuming part and a bottleneck for acceleration
of these solvers.
118
In fact SpMV is one of the most important computational operations in sparse matrix
computation, and is used extensively in the iterative methods used to solve large linear
equations systems (Ax=b) and eigenvalue problems (Ax=λX), where lots of matrix-vector
products are required to reach convergence. Because of its great importance, there is lots of
literature concerning SpMV operations on GPU. Garland (Garland 2008) explored the
application of GPU in general SpMV by using the compressed sparse row (CSR) representation
for general unstructured sparse matrices and proposed a concept of scan (data parallel
primitives) which is used to convert seemingly irregular computation into a regular one which
can be implemented on massively parallel hardware such as GPUs. Sengupta et al. (Sengupta
et al. 2007) provided applied segment scan in SpMV, and concluded that the scan primitives
are an excellent match for a broad set of problems on parallel hardware, and specifically for
the GPU. Rukhsana and Shahnaz (Shahnaz & Usman 2007) developed an efficient TJDS based
sparse matrix-vector multiplication approach where the matrices are stored in a Transposed
Jagged Diagonal Storage (TJDS) format. This format is particularly suitable for parallel and
distributed processing because the references to the non-zero values of the matrix are kept by
the data partition scheme. Bell and Garland (Bell & Garland 2009) presented a GPU application
for SpMV with various matrix storage formats, and designed a hybrid (HYB) format for matrix
storage which has proved to be one of the fastest formats for unstructured matrices. They also
compared the performance of SpMV for various sparse matrix storage schemes and various
patterns of sparse matrices (Bell & Garland 2009; Bell & Garland 2008). In the work reported
here the Compressed Sparse Row (CSR) storage format is used, as it is a very popular format
which is easy to implement and widely used (Garland 2008).
6.3 Summary
This chapter introduced the basic concepts of GPUs and reviewed the literature about the
implementation of FEM on GPUs, including both stiffness matrix assembly and stiffness matrix
119
solving. A lot of literature highlights the important role of SpMV operation in determining the
computing speed of FEM problems.
In order to optimize the new MLEFEA approach for concrete design, the work presented here
used a GPU-based PCG approach to solve the FEM stiffness matrix equations. The
implementation of this approach will be described in the following chapter.
120
7 Efficient GPU Implementation of the Modified LEFEA
Approach
This chapter starts with an introduction to the CSR storage format which is used in the GPU
implementation of Preconditioned Conjugate Gradient Approach (GPU-PCG). Taking the deep
beam with rectangular openings as an example, comparison between CPU-PCG and GPU-PCG
is investigated, and results show the GPU algorithm is more effective than the CPU one in
terms of less computing time. In addition, the GPU implementation of MLEFEA (GPU-MLEFEA)
is also developed.
7.1 GPU Implementation of Preconditioned Conjugate Gradient
Method (GPU-PCG)
The stiffness matrix associated with the finite element method is not only symmetric and
positive definite, but is also banded, with many zero elements both inside the band and
outside the band. The bandwidth of the stiffness matrix depends on the numbering and
connectivity of the nodes. Since the stiffness matrix is a square matrix with dimension equal to
the number of degrees of freedom in the model, as the number of nodes and elements
increases the amount of storage required for the stiffness matrix increases significantly. To
mitigate this problem to some extent, finite element programs usually use bandwidth or
skyline storage approaches, taking advantage of the symmetry of the matrix. The effectiveness
of both of these approaches depends on the sequence of node numbering, and various
routines have been developed to renumber nodes to try to minimise the storage required.
Both the bandwidth and skyline storage approaches allow the stiffness matrix to be reduced in
place, and so are suited to direct solvers.
121
To permit effective GPU implementation of Preconditioned Conjugate Gradient Method (GPU-
PCG), the Compressed Sparse Row (CSR) storage format is used to store the stiffness matrix.
Compared to the storage formats mentioned above, there is no need to store any zero
elements or to renumber the nodes to reduce the bandwidth or skyline of the matrix.
Therefore, CSR can save memory space and avoid unnecessary calculations, resulting in a
faster algorithm. As will be seen below, the CSR storage format is ideally suited to
parallelisation of the SpMV operation used extensively in indirect solvers. It is not suitable for
direct solvers, as the stiffness matrix cannot be reduced in place.
To store a sparse stiffness matrix in CSR format, three arrays (here named elem, rowptr and col)
are needed. Zero based indexing is used. The one-dimensional double precision elem array
stores all the non-zero elements in the matrix in row-major order. The rowptr and col arrays
are index or pointer arrays that allow the elements in elem to be accessed according to their
position in the original stiffness matrix. The one-dimensional integer rowptr array stores the
position of the first non-zero element for each row in the array elem, and is of dimension
(number of rows + 1), so the first element will be zero while the last element will be the
number of non-zero elements in the stiffness matrix. The one-dimensional integer array col
stores the column index for every non-zero element in the same order as the non-zero
elements are arranged in elem, and so is of the same dimension as elem. As an example, the
CSR representation for a sparse matrix K is illustrated in Figure 73.
122
Figure 73: CSR Representation for a Sparse Matrix K
Because each element in the column vector resulting from the dot product of a matrix and a
column vector is only dependent on the corresponding row of the matrix, and so can be
calculated independently of all the other rows and elements, the SpMV for the CSR format is
very easy to parallelise. Figure 74 presents the SpMV parallel kernel for the sparse matrix in
CSR format. Note that the kernel describes the operation of each thread. Each thread works on
one particular row, and there will be one thread created for each row. The THREAD_ID
indicates the number of the thread and hence the number of the row. The GPU or GPUs can
execute as many threads in parallel as the hardware allows.
Figure 74: SpMV Kernel for the Sparse Matrix in CSR Format
123
7.2 GPU Implementation of Modified LEFEA Approach (GPU-
MLEFEA)
In this section the GPU implementation of Modified Linear Elastic Finite Element Analysis
method (MLEFEA) will be introduced.
The MLEFEA proposed in this work involves a lot of iterations in which most of the finite
element model (and hence stiffness matrix) stays the same, but there is some modification of
the elastic modulus in regions where stress redistribution is required. If a direct solver is used,
the complete stiffness matrix is factorised or reduced in place, and so there is no way to take
advantage of the similarity of one iteration to the next. However, when an indirect solver is
used, the stiffness matrix is preserved between iterations. The solution from one iteration can
also be used as a starting point for the iterative solution of the next iteration.
As a result, the program written for this work stores all the stiffness matrix data generated
from the first iteration. During the elastic modulus adjustment process in the following
iterations, areas which are affected and where modifications of elastic modulus are required
are identified. Then only the stiffness matrices for the affected areas need to be reassembled,
while for the unaffected areas use the previously stored stiffness matrices data directly.
In addition to the acceleration in matrix solving by using the parallel GPU-PCG approach and
taking advantage of the similarity of one iteration to the next one, further speedup can be
obtained by assembling the stiffness matrix directly in CSR format so that there is no need to
transfer the stiffness matrix from normal format to the CSR one. Most importantly, for the
matrix assembly in CSR, no matter how many iterations the MLEFEA approach takes, the index
or point arrays of the CSR stiffness matrix representation (rowptr and col) will remain constant
and do not have to be recomputed as there is no change in the structure of the finite element
mesh and the same elements are still zero. Only the elem array of CSR is changed during the
iteration steps, and even then only within the affected areas.
124
A complete listing of the program is provided in Appendix A.
7.3 Results Comparison (Speedup Results)
To demonstrate the efficiency of the GPU implementation of the MLEFEA and the GPU-PCG for
solving stiffness equations Ku=f, the coefficient matrix produced from the case of the deep
beam with two web openings (Figure 21) is used.
Different mesh sizes are used to generate different sets of coefficient matrices. In this work,
mesh sizes of 50mm, 25mm, 12.5mm and 10mm were used with the resulting sizes of
coefficient matrices being 3236 by 3236, 12552 by 12552, 49424 by 49424 and 76980 by 76980.
For comparison, a plain sequential PCG method corresponding to the sparse matrix equation
system solving is also used. The results in terms of computing speed for the GPU-based PCG
and the CPU (sequential)-based PCG algorithms are shown in Table 11 and Figure 75. Both the
sequential PCG and parallel GPU-PCG codes are executed a hundred times and the time
average is then taken as the final result, obtaining a reasonable elapsed time.
Table 11: Comparison between GPU-PCG and PCG
GPU vs. CPU
Mesh Size Matrix Size Non-zero Element
CPU Time (ms)
GPU Time (ms)
Error Speedup
50mm 3236*3236 45924 180 109 0.01% 1.65
25mm 12552*12552 191592 1485 350 0.01% 4.24
12.5mm 49424*49424 737684 11680 1842 0.02% 6.34
10mm 76980*76980 1042168 21862 2772 0.03% 7.89
Time is only for the equation solving part. The error here is from comparison with the Matlab results. The errors between CPU and GPU code are the same.
125
Figure 75: GPU-PCG vs. CPU-PCG
From Table 11 and Figure 75, the efficiency of the GPU-based PCG in terms of computing
speed can be observed, with the same high precision being maintained.
7.4 Summary
This chapter introduces the basic theory and reviews the relevant literature about the GPU and
its use in the area of finite element analysis. In order to optimize the MLEFEA approach, a GPU-
based PCG algorithm is used to handle the most time-consuming SpMV for solving the stiffness
matrix. Finally, the efficiency of the GPU implementation is demonstrated by providing speed
comparison results between the CPU and GPU algorithm for stress redistribution for the
example of a deep beam with web openings.
0
5000
10000
15000
20000
25000
50mm 25mm 12.5mm 10mm
Elap
sed
Tim
e (
ms)
Mesh Size
CPU-PCG GPU-PCG
126
8 Conclusions
This thesis develops a new approach to the use of linear finite element analysis in the design of
reinforced concrete members. The new approach involves the use of modified linear elastic
finite element analysis (MLEFEA), in which stress is redistributed within the member through
selective modification of the elastic modulus. This approach is similar to the use of moment
redistribution in the design of continuous reinforced concrete beams. Based on the lower
bound theory of elasticity, the approach works on the basis of identifying a stress field which is
in internal equilibrium, in equilibrium with the applied design load, and does not exceed the
specified maximum stress anywhere.
Compared with using standard LEFEA, the new approach overcomes the problems associated
with stress singularities and concentrations. Using LEFEA, stress singularities at features such
as re-entrant corners lead to stresses which are mesh dependent, increasing without bound as
the mesh is refined. Ignoring these stresses or using a coarse mesh for design leads to a risk
that the stress field used for design is not in internal equilibrium. By redistributing the stress
and ensuring design is done with a stress field which is in internal and external equilibrium, the
designer can have confidence in the ability of the design to carry the required load. In addition,
a technique was developed to redistribute tensile stress to ensure the resultant tension is at a
prescribed location. This allows the designer to position the reinforcing steel at the optimum
position (e.g. as close as possible to the bottom of a deep beam), which is not possible with
standard LEFEA.
Compared with the strut-tie method, the new approach takes into account the load carrying
capability of all the concrete, not just that in the struts. The examples included in this thesis
show that up to 27% of concrete volume can be saved in the construction of non-flexural
members by using the new approach in preference to the strut-tie method. Such a saving can
significantly reduce the environmental impact of concrete construction. The new approach is
127
also less time consuming and more designer independent, as a suitable strut tie model does
not have to be chosen.
Three different types of simple structures, shallow (flexural) beams, deep beams, and deep
beams with web openings, were examined using both conventional design approaches and the
MLEFEA approach. Cost comparison in terms of concrete and steel usage showed that the new
method is more efficient than currently available techniques, particularly for non-flexural
members.
In addition, with the help of ABAQUS, preliminary numerical tests of the proposed designs
have been performed using non-linear finite element analysis. The results showed that designs
based on this new approach are safe and reasonable. As the stress field resulting from this new
approach is statically admissible and able to satisfy the yield condition, designs generated
using this method will reach or exceed the ultimate design load, in accordance with the lower
bound theory of plasticity.
To further optimize the MLEFEA approach in terms of computing time, the use of GPUs was
also explored. The use of GPUs in finite element analysis was examined through a literature
review, and a GPU-based PCG algorithm selected for use in the MLEFEA. The GPU-based PCG
algorithm using the CSR format to store the stiffness matrix was shown to be an efficient way
to solve the stiffness matrix, compared to the CPU-based algorithm. Furthermore, combined
with the GPU-based PCG approach for solving the stiffness matrix, the overall implementation
of the MLEFEA on GPUs was developed, including stiffness matrix assembly directly in CSR
format, and only reassembling those parts of the stiffness matrix that have changed within
each iteration.
To ensure that the concrete savings predicted in this thesis can be obtained in practice, future
work in this area should include conducting full scale experimental tests of typical structural
components designed with the new method. These tests should demonstrate that the
128
components reach the desired design performance with the use of less concrete than a
conventional design. Without such work practicing engineers may be reluctant to adopt the
proposed approach.
129
References
Ament, M, Knittel, G, Weiskopf, D & Strasser, W 2010, 'A Parallel Preconditioned Conjugate
Gradient Solver for the Poisson Problem on a Multi-GPU Platform', in 18th Euromicro
International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 583-
592.
Ashour, AF 1997, Tests of reinforced concrete continuous deep beams, vol. 94, American
Concrete Institute, Farmington Hills, MI, ETATS-UNIS.
Au, FTK & Bai, ZZ 2007, 'Two-dimensional nonlinear finite element analysis of monotonically
and non-reversed cyclically loaded RC beams', Engineering Structures, vol. 29, no. 11, pp. 2921-
2934.
Augarde, CE & Deeks, AJ 2008, 'The use of Timoshenko's exact solution for a cantilever beam in
adaptive analysis', Finite elements in analysis and design., vol. 44, no. 9-10, pp. 595-601.
Barber, JR 2002, Elasticity, 2nd edn edn, Kluwer Academic, Dordecht, London.
Barrachina, S, Castillo, M, Igual, FD, Mayo, R & Quintana-Orti, ES 2008, 'Solving Dense Linear
Systems on Graphics Processors', Proceedings of the 14th international Euro-Par conference on
Parallel Processing.
Bathe, KJ & Wilson, EL 1976, Numerical methods in finite element analysis, Prentice-Hall,
Stanford.
Baumann, T 1972, 'Zur Frage der Netzbewehrung von Flächentragwerken (On the Problem of
Net Reinforcement of Surface Structures)', Bauingenieur, vol. 47, no. 10, pp. 367-377.
Bell, N & Garland, M 2008, Efficient Sparse Matrix-Vector Multiplication on CUDA, NVIDIA
Corporation.
130
Bell, N & Garland, M 2009, 'Implementing sparse matrix-vector multiplication on throughput-
oriented processors', Proceedings of the Conference on High Performance Computing
Networking, Storage and Analysis.
Bolz, J, Farmer, I, Grinspun, E & Schroder, P 2003, 'Sparse matrix solvers on the GPU: conjugate
gradients and multigrid', ACM Trans. Graph., vol. 22, no. 3, pp. 917-924.
Bonneel, N, GPUMatrix. Available from: <http://sourceforge.net/projects/gpumatrix/>.
Bruggi, M 2009, 'Generating strut-and-tie patterns for reinforced concrete structures using
topology optimization', Computers & Structures, vol. 87, no. 23-24, pp. 1483-1495.
Buatois, L, Caumon, G & Levy, B 2009, 'Concurrent number cruncher: a GPU implementation of
a general sparse linear solver', Int. J. Parallel Emerg. Distrib. Syst., vol. 24, no. 3, pp. 205-223.
Buttari, A, Dongarra, J, Langou, J, Langou, J, Luszczek, P & Kurzak, J 2007, 'Mixed Precision
Iterative Refinement Techniques for the Solution of Dense Linear Systems', Int. J. High Perform.
Comput. Appl., vol. 21, no. 4, pp. 457-466.
Carmo, D, Ricardo, NF & Lopes, SMR 2005, 'Ductility and linear analysis with moment
redistribution in reinforced high-strength concrete beams', Canadian Journal of Civil
Engineering, vol. 32, pp. 194-203.
Cecka, C, Lew, AJ & Darve, E 2011, 'Assembly of finite element methods on graphics
processors', International Journal for Numerical Methods in Engineering, vol. 85, no. 5, pp.
640-669.
Cevahir, A, Nukada, A & Matsuoka, S 2009, 'Fast Conjugate Gradients with Multiple GPUs', in
Computational Science – ICCS 2009, vol. 5544, eds G Allen, J Nabrzyski, E Seidel, G van Albada, J
Dongarra & P Sloot, Springer Berlin / Heidelberg, pp. 893-903.
131
Cevahir, A, Nukada, A & Matsuoka, S 2010b, 'High performance conjugate gradient solver on
multi-GPU clusters using hypergraph partitioning', Computer Science - Research and
Development, vol. 25, no. 1, pp. 83-91.
Chidgzey, SR & Deeks, AJ 2005, 'Determination of coefficients of crack tip asymptotic fields
using the scaled boundary finite element method', Engineering Fracture Mechanics, vol. 72, no.
13, pp. 2019-2036.
Dabbagh, H & Foster, SJ 2006, 'A Smeared – Fixed Crack Model for FE Analysis of RC
Membranes Incorporating Aggregate Interlock', Advances in Structural Engineering, vol. 9, no.
1, pp. 91-102.
Deeks, AJ 2008, 'The pursuit of accuracy in computational mechanics'.
Deeks, AJ & Wolf, JP 2002a, 'An h-hierarchical adaptive procedure for the scaled boundary
finite-element method', International Journal for Numerical Methods in Engineering, vol. 54,
no. 4, pp. 585-605.
Deeks, AJ & Wolf, JP 2002b, 'Stress recovery and error estimation for the scaled boundary
finite-element method', International Journal for Numerical Methods in Engineering, vol. 54,
no. 4, pp. 557-583.
Deeks, AJ & Wolf, JP 2002c, 'A virtual work derivation of the scaled boundary finite-element
method for elastostatics', Computational Mechanics, vol. 28, no. 6, pp. 489-504.
Demmel, JW 1997, Applied Numerical Linear Algebra, Society for Industrial and Applied
Mathematics.
Elsen, E, LeGresley, P & Darve, E 2008, 'Large calculation of the flow over a hypersonic vehicle
using a GPU', J. Comput. Phys., vol. 227, no. 24, pp. 10148-10161.
132
Foster, SJ 1998, 'Design of non-flexural members for shear', Cement and Concrete Composites,
vol. 20, no. 6, pp. 465-475.
Foster, SJ, Marti. and Mojsilovic, N 2003, 'Design of Reinforced Concrete Solids Using Stress
Analysis', ACI Structural Journal, vol. 100, no. 6, pp. 758-764.
Frier, C & Damkilde, L 2009, 'Lower Bound Limit State Analysis using the Interior-Point Method
with Spatial Varying Barrier Function', in Proceedings of the Twenty Second Nordic Seminar on
Computational Mechanics, Aalborg University, pp. 173-176.
Göddeke, D, Strzodka, R, Jamaludin, MY, McCormick, P, Wobker, H, Becker, C & Turek, S 2008,
'Using GPUs to improve multigrid solver performance on a cluster', Int. J. Comput. Sci. Eng., vol.
4, no. 1, pp. 36-55.
Göddeke, D, Strzodka, R & Turek, S 2005, 'Accelerating Double Precision FEM Simulations with
GPUs', Proceedings of ASIM 2005 - 18th Symposium on Simulation Technique.
Galoppo, N, Govindaraju, NK, Henson, M & Manocha, D 2005, 'LU-GPU: Efficient Algorithms for
Solving Dense Linear Systems on Graphics Hardware', in Supercomputing, 2005. Proceedings of
the ACM/IEEE SC 2005 Conference, pp. 3-3.
Garland, M 2008, 'Sparse matrix computations on manycore GPU's', Proceedings of the 45th
annual Design Automation Conference.
Goddeke, D, Buijssen, SHM, Wobker, H & Turek, S 2009, 'GPU acceleration of an unmodified
parallel finite element Navier-Stokes solver', in High Performance Computing & Simulation,
2009. HPCS '09. International Conference on, pp. 12-21.
Guan, H 2005, 'Effect of Sizes and Positions of Web Openings on Strut-and-Tie Model of Deep
Beams', Advances in Structural Engineering, vol. 8, no. 1, pp. 69-84.
133
Hillesland, KE, Molinov, S & Grzeszczuk, R 2005, 'Nonlinear optimization framework for image-
based modeling on programmable graphics hardware', ACM SIGGRAPH 2005 Courses.
Hoque, MM 2006, 3D nonlinear mixed finite-element analysis of RC beams and plates with and
without FRP reinforcement, thesis, University of Manitob.
Hu, OE & Tan, KH 2007, Large reinforced-concrete deep beams with web openings : test and
strut-and-tie results, vol. 59, Telford, London, Royaume-uni, p. 12.
Huebner, KH, Thornton, EA & Byrom, TG 1995, The finite element method for engineers, Wiley,
New York
Ino, F, Matsui, M, Goda , K & Hagihara, K 2005, 'Performance Study of LU Decomposition on
the Programmable GPU', in In Proceedings of HiPC'2005, pp. 83-94.
Jakobsen, B 1994, 'The Sleipner accident and its causes', Engineering Failure Analysis, vol. 1, no.
3, pp. 193-199.
Jung, JH & O'Leary, DP 2006, 'Cholesky Decomposition and Linear Programming on a GPU', in
Proceedings of Workshop on Edge Computing Using New Commodity Architectures (EDGE),
Chapel Hill, NC.
Kirk, D, Hwu, W 2010, Programming massively parallel processors: a hands-on approach,
Morgan Kaufmann Publishers.
Klöckner, A, Warburton, T, Bridge, J & Hesthaven, JS 2009, 'Nodal discontinuous Galerkin
methods on graphics processors', J. Comput. Phys., vol. 228, no. 21, pp. 7863-7882.
Komatitsch, D, Michéa, D & Erlebacher, G 2009, 'Porting a high-order finite-element
earthquake modeling application to NVIDIA graphics cards using CUDA', Journal of Parallel and
Distributed Computing, vol. 69, no. 5, pp. 451-460.
134
Koopman, DCA & Lance, RH 1965, 'On linear programming and plastic limit analysis', Journal of
the Mechanics and Physics of Solids, vol. 13, no. 2, pp. 77-87.
Kotsovos, MD & Pavlovic, M 1995, Structural concrete: finite-element analysis for limit-state
design, Thomas Telford.
Krüger, J & Westermann, R 2003, 'Linear algebra operators for GPU implementation of
numerical algorithms', ACM Trans. Graph., vol. 22, no. 3, pp. 908-916.
Kupfer, H & Hilsdorf, HK 1969, 'Behavior of Concrete Under Biaxial Stresses', ACI Journal, vol.
66, no. 8, pp. 656-666.
Liang, QQ, Xie, YM & Grant, PS 2000, 'Topology optimization of strut-and-tie models in
reinforced concrete structures using an evolutionary procedure', ACI Structural Journal, vol. 97,
no. 2, pp. 322-330.
Logan, DL 2002, A first course in the finite element method, 3rd edn, Brooks/Cole, Pacific Grove,
CA.
Mörsch, E 1902, Der Eisenbetonbau-seine Theorie und Anwendung (Reinforced Concrete
Construction-Theory and Application) 5th edn, Verlag Konrad Wittwer, Stuttgart.
M. Baboulin, JD, J. Dongarra, S. Tomov & Volkov, V 2008, 'Enhancing the Performance of Dense
Linear Algebra Solvers on GPUs', Poster at Supercomputing 2008.
Mansur, MA, Tan, KH & Weng, W 2001, 'Analysis of Reinforced Concrete Beams with Circular
Openings Using Strut-and-Tie Model', in Structural Engineering, Mechanics and Computation,
ed. A Zingoni, Elsevier Science, Oxford, pp. 311-318.
Meyer, A 1988, 'An efficient implementation of LU decomposition in C', Adv. Eng. Softw., vol.
10, no. 3, pp. 123-130.
135
Mosley, B, Bungey, J & Hulse, R 2007, Reinforced Concrete Design to Eurocode 2, 6 edn,
Palgrave Macmillan, New York.
Nagarajan, P & Pillai, TMM 2008, 'Development of strut and tie models for simply supported
deep beams using topology optimization', Songklanakarin Journal of Science and Technology,
vol. 30, no. 5, pp. 641-647.
Navarro, JR & Susin, A 2006, 'Non structured meshes for Cloth GPU simulation using FEM',
Workshop On Virtual Reality Interaction and Physical Simulation.
Neal, BG 1985, The plastic methods of structural analysis, Chapman and Hall.
Neckels, D, CUDAZTEC. Available from: <http://www.ohloh.net/p/cudaztec>.
NVIDIA Corporation, NVIDIA CUDA Programming Guide (Version 3.0). Available from:
<http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_Prog
rammingGuide.pdf>.
Oehlers, DJ, Ju, G, Liu, IST & Seracino, R 2004a, 'Moment redistribution in continuous plated RC
flexural members. Part 1: neutral axis depth approach and tests', Engineering Structures, vol.
26, no. 14, pp. 2197-2207.
Oehlers, DJ, Liu, IST, Ju, G & Seracino, R 2004b, 'Moment redistribution in continuous plated RC
flexural members. Part 2: Flexural rigidity approach', Engineering Structures, vol. 26, no. 14, pp.
2209-2218.
Park, S 2005, Analysis of FRP strengthened deep RC members using the STM and the FEM
approaches, thesis, Syracuse University.
136
Playne, DP & Hawick, KA 2010, 'Asynchronous Communication Schemes for Finite Difference
Methods on Multiple GPUs', in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th
IEEE/ACM International Conference on, pp. 763-768.
Punmia, BC, Jain, AK & Jain, AK 2007, Limit State Design of Reinforced Concrete, First edn,
Laxmi Publications Ltd, New Delhi.
Rao, SS 1989, The finite element method in engineering, Pergamon, Oxford.
Rausch, E 1929, Berechnung des Eisenbetons gegen Verdrehung und Abscheren (Design of
reinforced concrete for torsion and shear), Julius Springer Verlag, Berlin.
Ritter, W 1899, 'Die Bauweise Hennebique (The Hennebique Method of Construction) ',
Schweizerische Bauzeitun, vol. 33, no. 7, pp. 59-61.
Roy, S & Thiagarajan, G 2007, 'Nonlinear Finite-Element Analysis of Reinforced Concrete Bridge
Approach Slab', Journal of Bridge Engineering, vol. 12, no. 6, p. 6.
Saad, Y 2003, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied
Mathematics.
Schlaich, J, Schäfer, K. & Jennewein, M 1987, 'Toward a Consistent Design of Structural
Concrete', PCI Journal, vol. 32, no. 3, pp. 74-15.
Scott, RH & Whittle, RT 2005, 'Moment redistribution effects in beams', Magazine of concrete
research., vol. 57, no. 1, pp. 9-20.
Sengupta, S, Harris, M, Zhang, Y & Owens, JD 2007, 'Scan primitives for GPU computing',
Proceedings of the 22nd ACM Siggraph/Eurographics symposium on Graphics hardware.
137
Shahnaz, R & Usman, A 2007, 'An Efficient Sparse Matrix-Vector Multiplication on Distributed
Memory Parallel Computers', International Journal of Computer Science and Network Security,
vol. 7, no. 1, pp. 77-82.
Shewchuk, JR 1994, An Introduction to the Conjugate Gradient Method Without the Agonizing
Pain, Carnegie Mellon University.
Sloan, SW 1989, 'Upper bound limit analysis using finite elements and linear programming',
International Journal for Numerical and Analytical Methods in Geomechanics, vol. 13, no. 3, pp.
263-282.
Standards Australia 2009, DR05252 Concrete Structures, Australian Standard, Australia.
Tan, KH, Kong, FK & Weng, LW 1997, 'High Strength Concrete Deep Beams Subjected to
Combined Top-and Bottom-Loading', The Structural Engineer, vol. 75, no. 11, pp. 191-197.
Tan, KH, Tang, CY & Tong, K 2003, 'A direct method for deep beams with web reinforcement',
Magazine of Concrete Research, vol. 55, no. 1, pp. 53-63.
Tan, KH, Tong, KB & Tang, CY 2003, 'Consistent strut-and-tie modelling of deep beams with
web openings', Magazine of Concrete Research, vol. 55, no. 1, pp. 65-75.
Tan, KH, Tong, K & Tang, CY 2001, 'Direct Strut-and-Tie Model for Prestressed Deep Beams',
Journal of Structural Engineering, ASCE, vol. 127, no. 9, pp. 1076-1084.
Tejada, E & Ertl, T 2005, 'Large steps in GPU-based deformable bodies simulation', Simulation
Modelling Practice and Theory, vol. 13, no. 8, pp. 703-715.
Timoshenko, S (ed.) 1969, Theory of Elasticity, 3rd edn, McGraw-Hill, New York.
Tjhin, TN & Kuchma, DA 2007, 'Integrated analysis and design tool for the strut-and-tie
method', Engineering Structures, vol. 29, no. 11, pp. 3042-3052.
138
Tomov, S, Nath, R, Du, P & Dongarra, J, MAGMA Users' Guide version 0.2. Available from:
<http://icl.cs.utk.edu/magma/>.
Tomov, S, Nath , R, Ltaief, H & Dongarra, J 2010, 'Dense linear algebra solvers for multicore
with GPU accelerators', In Proceedings of IPDPS Workshops, pp. 1-8.
Varghese, PC 2004, Limit State Design of Reinforced Concrete, PHI Learning Pvt. Ltd.
Vecchio, FJ & Collins, MP 1986, 'The modified compression-field theory for reinforced concrete
elements subjected to shear', ACI Journal Proceedings, vol. 83, no. 2, pp. 219-231.
Volkov, V & Demmel, J 2008, LU, QR and Cholesky Factorizations using Vector Capabilities of
GPUs, UCB/EECS-2008-49, EECS Department, University of California, Berkeley.
Wang, G & Meng, S 2008, 'Modified strut-and-tie model for prestressed concrete deep beams',
Engineering Structures, vol. 30, no. 12, pp. 3489-3496.
Warner, RF 2007, Reinforced concrete basics : analysis and design of reinforced concrete
structures / R.F. Warner, S.J. Foster, A.E. Kilpatrick, Pearson Prentice Hall, Frenchs Forest,
N.S.W. :.
Wiggers, WA, Bakker, V, Kokkeler, ABJ & Smit, GJM 2007, 'Implementing the conjugate
gradient algorithm on multi-core systems', International Symposium on System-on-Chip, SoC.
Wight, James, K, Parra, M & Gustavo, JG 2003, Strut-and-tie model for deep beam design, vol.
25, American Concrete Institute, Farmington Hills, MI, ETATS-UNIS, p. 8.
Williams, A 2009, 'Moment distribution methods', in Structural Analysis, Butterworth-
Heinemann, Boston, pp. 293-378.
Williams, ML 1952, 'Stress singularities resulting from various boundary conditions in angular
corners of plate in extension', Journal of Applied Mechanics, vol. 19, pp. 526–534.
139
Yang, K-H, Eun, H-C & Chung, H-S 2006, 'The influence of web openings on the structural
behavior of reinforced high-strength concrete deep beams', Engineering Structures, vol. 28, no.
13, pp. 1825-1834.
Yang, Z & Deeks, AJ 2007, 'Modelling cohesive crack growth using a two-step finite element-
scaled boundary finite element coupled method', International Journal of Fracture, vol. 143, no.
4, pp. 333-354.
Zhang, B, Xu, S, Zhang, F, Bi, Y & Huang, LQ 2011, 'Accelerating MatLab code using GPU: A
review of tools and strategies', in Artificial Intelligence, Management Science and Electronic
Commerce (AIMSEC), 2011 2nd International Conference on, pp. 1875-1878.
Zhang, N & Tan, KH 2007a, 'Direct strut-and-tie model for single span and continuous deep
beams', Engineering Structures, vol. 29, no. 11, pp. 2987-3001.
Zhang, N & Tan, KH 2007b, 'Size effect in RC deep beams: Experimental investigation and STM
verification', Engineering Structures, vol. 29, no. 12, pp. 3241-3254.
Zhu, JZ, Hinton, E & Zienkiewicz, OC 1991, 'Adaptive finite element analysis with quadrilaterals',
Computers & Structures, vol. 40, no. 5, pp. 1097-1104.
Zienkiewicz, OC & Zhu, JZ 1992, 'The superconvergent patch recovery (SPR) and adaptive finite
element refinement', Comput. Methods Appl. Mech. Eng., vol. 101, no. 1-3, pp. 207-224.
140
Appendices
Appendix A: Complete Listing of the Program
//Important Header Files//
//*******Function to Calculate Jacobian Matrix during FEM*******************//
void Jacobian4N(double *st,int **nc, double **J)
{
double s=st[0];
double t=st[1];
double *dNds, *dNdt;
int *temp1, *temp2;
dNds= (double * )malloc(4*sizeof(double));
vector_zero (4, dNds);
dNdt= (double * )malloc(4*sizeof(double));
vector_zero(4,dNdt);
temp1=(int * )malloc(4*sizeof(int));
vector_zero_int(4,temp1);
temp2= (int * )malloc(4*sizeof(int));
vector_zero_int(4,temp2);
dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);
dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);
dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);
dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);
temp1[0]=nc[0][0]; temp1[1]=nc[1][0];
temp1[2]=nc[2][0]; temp1[3]=nc[3][0];
temp2[0]=nc[0][1]; temp2[1]=nc[1][1];
temp2[2]=nc[2][1]; temp2[3]=nc[3][1];
J[0][0]=MxMulMx(4,dNds,temp1);
141
J[0][1]=MxMulMx(4,dNds,temp2);
J[1][0]=MxMulMx(4,dNdt,temp1);
J[1][1]=MxMulMx(4,dNdt,temp2);
//free the memory //
free(dNds);
free(dNdt);
free(temp1);
free(temp2);
}
142
//****************Function to Calculate Local Stiffness Matrix*******************//
void Element4NStiffness(int **nc,double thickness, double **D, double **Ke)
{
int i,j,k;
double **B, **gp, **TransB, **J, **invJ, **dNdxy, **temp2, **temp3, **temp4;
double s, t, det;
double *dNds, *dNdt, *temp, *wt;
//initialize the array //
B= (double **) malloc(3 * sizeof(double *));
for(int i = 0; i < 3; i++)
B[i] = (double * )malloc(8* sizeof(double));
matrix_zero (3, 8, B);
matrix_zero (8, 8, Ke);
gp= (double **) malloc(4 * sizeof(double *));
for(int i = 0; i < 4; i++)
gp[i] = (double * )malloc(2* sizeof(double));
matrix_zero (4, 2, gp);
TransB= (double **) malloc(8 * sizeof(double *));
for(int i = 0; i < 8; i++)
TransB[i] = (double * )malloc(3* sizeof(double));
matrix_zero (8, 3, TransB);
J= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
J[i] = (double * )malloc(2* sizeof(double));
invJ= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
invJ[i] = (double * )malloc(2* sizeof(double));
matrix_zero (2, 2, invJ);
dNdxy= (double **) malloc(2 * sizeof(double *));
143
for(int i = 0; i < 2; i++)
dNdxy[i] = (double * )malloc(4* sizeof(double));
matrix_zero (2, 4, dNdxy);
temp2= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
temp2[i] = (double * )malloc(4* sizeof(double));
matrix_zero (2, 4, temp2);
temp3= (double **) malloc(8 * sizeof(double *));
for(int i = 0; i < 8; i++)
temp3[i] = (double * )malloc(3* sizeof(double));
matrix_zero (8, 3, temp3);
temp4= (double **) malloc(8 * sizeof(double *));
for(int i = 0; i < 8; i++)
temp4[i] = (double * )malloc(8* sizeof(double));
matrix_zero (8, 8, temp4);
/*initialize the vector */
dNds= (double * )malloc(4*sizeof(double));
vector_zero (4, dNds);
dNdt= (double * )malloc(4*sizeof(double));
vector_zero (4, dNdt);
temp= (double * )malloc(2*sizeof(double));
vector_zero (2, temp);
wt= (double * )malloc(4*sizeof(double));
vector_zero (4, wt);
gp[0][0]=-1.0/sqrt(3.0); gp[0][1]=-1.0/sqrt(3.0);
gp[1][0]=-1.0/sqrt(3.0); gp[1][1]=1.0/sqrt(3.0);
gp[2][0]=1.0/sqrt(3.0); gp[2][1]=-1.0/sqrt(3.0);
gp[3][0]=1.0/sqrt(3.0); gp[3][1]=1.0/sqrt(3.0);
144
wt[0]=1; wt[1]=1; wt[2]=1; wt[3]=1;
for(k=0;k<4;k++)
{
s=gp[k][0];
t=gp[k][1];
dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);
dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);
dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);
dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);
temp[0]=gp[k][0];
temp[1]=gp[k][1];
Jacobian4N(temp,nc,J);
MxInv(2, J, invJ);
for(i=0;i<4;i++)
{
temp2[0][i]=dNds[i];
temp2[1][i]=dNdt[i];
}
MxMulMx_Multi(2, 2, 4, invJ, temp2, dNdxy);
for(i=0;i<4;i++)
{
B[0][2*i]=dNdxy[0][i]; B[0][2*i+1]=0;
B[1][2*i]=0; B[1][2*i+1]=dNdxy[1][i];
B[2][2*i]=dNdxy[1][i]; B[2][2*i+1]=dNdxy[0][i];
}
det=MxDet2(J);
MxTrans(3, 8, B, TransB);
MxMulMx_Multi(8, 3, 3, TransB, D, temp3);
MxMulMx_Multi(8, 3, 8, temp3, B, temp4);
for (i=0;i<8;i++)
145
for(j=0;j<8;j++)
Ke[i][j]=Ke[i][j]+wt[k]*temp4[i][j]*thickness*det;
}
//free the memory //
for(i=0;i<3;i++)
free(B[i]);
free(B);
for(i=0;i<4;i++)
free(gp[i]);
free(gp);
for(i=0;i<8;i++)
free(TransB[i]);
free(TransB);
for(i=0;i<2;i++)
free(invJ[i]);
free(invJ);
for(i=0;i<2;i++)
free(dNdxy[i]);
free(dNdxy);
for(i=0;i<2;i++)
free(temp2[i]);
free(temp2);
for(i=0;i<8;i++)
free(temp3[i]);
free(temp3);
for(i=0;i<8;i++)
free(temp4[i]);
free(temp4);
146
free(dNds);
free(dNdt);
free(temp);
free(wt);
}
147
//**********************SpMV Serial Code on CPU*****************************//
void spmv_csr_serial(int num_rows, int *ptr, int *indices, double *data, double *x, double *y)
{
for (int row=0;row<num_rows;row++)
{
double dot=0.0;
int row_start= ptr[row];
int row_end = ptr[row+1];
for(int jj=row_start;jj<row_end;jj++)
dot+=data[jj]*x[indices[jj]];
y[row] +=dot;
}
}
148
//*******************Function to Calculate Local Stress Matrix*********************//
void Element4NStress(double **st, double **nc,double *u, double **D, double **sx)
{
int i, k;
double **invJ, **J, **temp, **dNdxy, **B, **temp3;
double *dNds, *dNdt, *temp4;
int *temp1, *temp2;
double det, s, t;
//initialize the array //
invJ= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
invJ[i] = (double * )malloc(2* sizeof(double));
matrix_zero (2, 2, invJ);
J= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
J[i] = (double * )malloc(2* sizeof(double));
matrix_zero (2, 2, J);
B= (double **) malloc(3 * sizeof(double *));
for(int i = 0; i < 3; i++)
B[i] = (double * )malloc(8* sizeof(double));
matrix_zero (3, 8, B);
dNdxy= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
dNdxy[i] = (double * )malloc(4* sizeof(double));
matrix_zero (2, 4, dNdxy);
temp= (double **) malloc(2 * sizeof(double *));
for(int i = 0; i < 2; i++)
temp[i] = (double * )malloc(4* sizeof(double));
matrix_zero (2, 4, temp);
temp3= (double **) malloc(3 * sizeof(double *));
149
for(int i = 0; i < 3; i++)
temp3[i] = (double * )malloc(8* sizeof(double));
matrix_zero (3, 8, temp3);
/*initialize the vector */
temp1= (int * )malloc(4*sizeof(int));
vector_zero_int (4, temp1);
temp2= (int * )malloc(4*sizeof(int));
vector_zero_int (4, temp2);
temp4= (double * )malloc(3*sizeof(double));
vector_zero (3, temp4);
dNds= (double * )malloc(4*sizeof(double));
vector_zero (4, dNds);
dNdt= (double * )malloc(4*sizeof(double));
vector_zero (4, dNdt);
for(k=0; k<4; k++)
{
s=st[k][0];
t=st[k][1];
dNds[0]=0.25*(-1+t); dNds[1]=0.25*(1-t);
dNds[2]=0.25*(1+t); dNds[3]=0.25*(-1-t);
dNdt[0]=0.25*(-1+s); dNdt[1]=0.25*(-1-s);
dNdt[2]=0.25*(1+s); dNdt[3]=0.25*(1-s);
for (i=0;i<4;i++)
{
temp1[i]=nc[i][0];
temp2[i]=nc[i][1];
}
J[0][0]=MxMulMx(4,dNds,temp1);J[0][1]=MxMulMx(4,dNds,temp2);
150
J[1][0]=MxMulMx(4,dNdt,temp1);J[1][1]=MxMulMx(4,dNdt,temp2);
MxInv(2, J, invJ);
for(i=0;i<4;i++)
{
temp[0][i]=dNds[i];
temp[1][i]=dNdt[i];
}
MxMulMx_Multi(2, 2, 4, invJ, temp, dNdxy);
for(i=0;i<4;i++)
{
B[0][2*i]=dNdxy[0][i]; B[0][2*i+1]=0;
B[1][2*i]=0; B[1][2*i+1]=dNdxy[1][i];
B[2][2*i]=dNdxy[1][i]; B[2][2*i+1]=dNdxy[0][i];
}
det=MxDet2(J);
MxMulMx_Multi(3, 3, 8, D, B, temp3);
MxMulMx_Single(3, 8, 1, temp3, u, temp4);
for(int p=0;p<3;p++)
sx[k][p]=temp4[p];
}
//free the memory //
for(i=0;i<2;i++)
free(J[i]);
free(J);
for(i=0;i<3;i++)
free(B[i]);
free(B);
for(i=0;i<2;i++)
free(invJ[i]);
151
free(invJ);
for(i=0;i<2;i++)
free(dNdxy[i]);
free(dNdxy);
for(i=0;i<2;i++)
free(temp[i]);
free(temp);
for(i=0;i<3;i++)
free(temp3[i]);
free(temp3);
free (temp1);
free (temp2);
free (temp4);
free(dNds);
free(dNdt);
}
152
//********************Function for Interpolation during FEM***********//
void Element4NInterpolate(double **gp, double **sx, double **rsx)
{
int i;
double **N2_temp, **A2, **N2;
N2_temp= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
N2_temp[i] = (double * )malloc(4*sizeof(double));
matrix_zero(4, 4, N2_temp);
A2= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
A2[i] = (double * )malloc(4*sizeof(double));
matrix_zero (4, 4, A2);
N2= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
N2[i] = (double * )malloc(4*sizeof(double));
matrix_zero(4, 4, N2);
A2[0][0]=0.25; A2[0][1]=0.25; A2[0][2]=0.25; A2[0][3]=0.25;
A2[1][0]=-0.25; A2[1][1]=0.25; A2[1][2]=0.25; A2[1][3]=-0.25;
A2[2][0]=-0.25; A2[2][1]=-0.25; A2[2][2]=0.25; A2[2][3]=0.25;
A2[3][0]=0.25; A2[3][1]=-0.25; A2[3][2]=0.25; A2[3][3]=-0.25;
for(int p=0;p<4;p++)
{
N2_temp[p][0]=1.0;
N2_temp[p][1]=gp[p][0];
N2_temp[p][2]=gp[p][1];
N2_temp[p][3]=gp[p][0]*gp[p][1];
}
MxMulMx_Multi(4, 4, 4, N2_temp, A2, N2);
MxMulMx_Multi(4, 4, 2, N2, sx, rsx);
//free the memory //
153
for(i=0;i<4;i++)
free(N2_temp[i]);
free(N2_temp);
for(i=0;i<4;i++)
free(N2[i]);
free(N2);
for(i=0;i<4;i++)
free(A2[i]);
free(A2);
}
154
//***** Preparison Function to Calculate Arrays of CSR Format Directly during FEM Stiffness
Matrix Assembly ******//
void MakeNodeConn(int numNodes, int **element, int numElements, int **nodeConn)
{
int i;
int en1, en2, en3, en4;
for(i = 0; i < numNodes; i++)// this loop could be parallelized numbers of nodes
connected by
nodeConn[i][4]=i+1;
for(i=0; i<numElements; i++)
{
en1=element[i][0];
en2=element[i][1];
en3=element[i][2];
en4=element[i][3];
nodeConn[en1][7] = en2+1;
nodeConn[en1][8] = en3+1;
nodeConn[en1][5] = en4+1;
nodeConn[en2][5] = en3+1;
nodeConn[en2][2] = en4+1;
nodeConn[en2][1] = en1+1;
nodeConn[en3][1] = en4+1;
nodeConn[en3][0] = en1+1;
nodeConn[en3][3] = en2+1;
nodeConn[en4][3] = en1+1;
nodeConn[en4][6] = en2+1;
nodeConn[en4][7] = en3+1;
}
}
155
//***Function to Calculate the Array of *rowPtr for CSR Format Directly during FEM Stiffness
Matrix Assembly****//
void MakeCSR( int *rowPtr, int *col, int numNonZero, int numNodes, int **nodeConn)
{
int i, j, j1, k;
// first determine number of non-zero elements in stiffness matrix
rowPtr[0]=0;
//now populate the rowPtr and col vectors
int nnz=0;
for(i=0;i<numNodes;i++)
for (j=0;j<2;j++)
for(k=0;k<9;k++)
{
if(nodeConn[i][k]!=0)
{
for (j1=0;j1<2;j1++)
{
col[nnz]=2*(nodeConn[i][k]-1)+j1;
nnz=nnz+1;
}
}
rowPtr[2*i+j+1] = nnz;
}
}
156
//****Function to Calculate the Arrays of *col and *elem for CSR Format Directly during FEM
Stiffness Matrix Assembly****//
void GetCSRElemIndex (int *node, int *rowPtr, int *col, int **elemIndex)
{
int i, j, n1, n2, n3, n4;
int n, k2, m, **elemIndex2;
elemIndex2= (int **) malloc(8 * sizeof(int *));
for(int i = 0; i < 8; i++)
elemIndex2[i] = (int * )malloc(8* sizeof(int));
matrix_zero_int(8, 8, elemIndex2);
//dof=(int *) malloc(8* sizeof(int));
//vector_zero_int (8, dof);
//Returns 8x8 integer matrix containing indexes into the CSR elem array
//these indexes allow the element stiffness matrix to be assembled ..
//directly into the CSR representation of the global stiffness matrix
//node is 1 x 4 containing the nodes this element connects to
n1 = node[0]; // assumes node numbering starts from 1
n2 = node[3]; // nodes re-ordered to run from smallest to largest
n3 = node[1];
n4 = node[2];
int dof[8]={2*n1, 2*n1+1, 2*n2, 2*n2+1, 2*n3, 2*n3+1, 2*n4, 2*n4+1};
for (i=0;i<8;i++)
{
n=rowPtr[dof[i]];
k2=1;
for(j=0;j<8;j++)
{
m=dof[j];
while(col[n+k2-1]!=m)
k2=k2+1;
elemIndex[i][j]=n+k2-1; // this is 1 based
}
157
}
int reindex[8]={0, 1, 4, 5, 6, 7, 2, 3}; // change node order back
for (i=0;i<8;i++)
for (j=0;j<8;j++)
elemIndex2[i][j] = elemIndex[reindex[i]][reindex[j]];
for (i=0;i<8;i++)
for (j=0;j<8;j++)
elemIndex[i][j] = elemIndex2[i][j];
for(i=0;i<8;i++)
free(elemIndex2[i]);
free(elemIndex2);
}
158
//**Function to Calculate the Nodal Stresses using Superconvergent Patch Recovery (SPR)
Approach**//
void SPRStress(double *u, double *E, double *N, double v, int nElements, int nElementsX, int
nElementsY, int nNodes,int nNodesX,int nNodesY, int **nodeCoord, int **element, int ndof,
double MeshSize, double b, double **gp, double **rStress)
{
int i,j, k1,k;
double **xegp, **nc2, **ss_temp, **A, **pcoeffs, **AA, **invAA, **BB, **transAA,
**D2;
double *pc_temp, *pc_temp2, *uu2;
double T=0;
int elnum, kk=0;
int *indexArray2,*indexArray, *indexArray_temp;
int **count3;
xegp= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
xegp[i] = (double * )malloc(2*sizeof(double));
matrix_zero (4, 2, xegp);
D2= (double **) malloc(3* sizeof(double *));
for(i = 0; i < 3; i++)
D2[i] = (double * )malloc(3*sizeof(double));
matrix_zero (3, 3, D2);
nc2= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
nc2[i] = (double * )malloc(2*sizeof(double));
matrix_zero (4, 2, nc2);
ss_temp= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
ss_temp[i] = (double * )malloc(3*sizeof(double));
matrix_zero (4, 3, ss_temp);
A= (double **) malloc(16* sizeof(double *));
159
for(i = 0; i < 16; i++)
A[i] = (double * )malloc(4*sizeof(double));
matrix_zero (16, 4, A);
pcoeffs= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
pcoeffs[i] = (double * )malloc(3*sizeof(double));
matrix_zero (4, 3, pcoeffs);
AA= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
AA[i] = (double * )malloc(4*sizeof(double));
matrix_zero (4, 4, AA);
invAA= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
invAA[i] = (double * )malloc(4*sizeof(double));
matrix_zero (4, 4, invAA);
BB= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
BB[i] = (double * )malloc(16*sizeof(double));
matrix_zero (4, 16, BB);
transAA= (double **) malloc(4* sizeof(double *));
for(i = 0; i < 4; i++)
transAA[i] = (double * )malloc(16*sizeof(double));
matrix_zero (4, 16, transAA);
count3= (int **) malloc(nNodes* sizeof(int *));
for(i = 0; i < nNodes; i++)
count3[i] = (int * )malloc(3*sizeof(int));
matrix_zero_int (nNodes, 3, count3);
pc_temp= (double *) malloc(4* sizeof(double));
vector_zero (4, pc_temp);
160
pc_temp2= (double *) malloc(3* sizeof(double));
vector_zero (3, pc_temp2);
indexArray2= (int * )malloc(200*sizeof(int));
vector_zero_int (200, indexArray2);
uu2= (double *) malloc(8* sizeof(double));
vector_zero (8, uu2);
indexArray= (int * )malloc(ndof*sizeof(int));
vector_zero_int (8, indexArray);
double **scStresses, **xgp;
xgp= (double **) malloc(16* sizeof(double *));
for(int p = 0; p < 16; p++)
xgp[p] = (double * )malloc(2*sizeof(double));
scStresses= (double **) malloc(16* sizeof(double *));
for(int p = 0; p < 16; p++)
scStresses[p] = (double * )malloc(3*sizeof(double));
for(i=1;i<nNodesX-1;i++) //patches are centered on internal nodes//
{
for(j=1;j<nNodesY-1;j++) // i.e. not the nodes on the boundary//
{
matrix_zero (16, 2, xgp);
matrix_zero (16, 3, scStresses);
for(k1=0;k1<4;k1++) //elements around the patch node,
anticlockwise starting from top left//
{
elnum=(i-1)*nElementsY+j-1;
if(k1==2 || k1==3)
elnum=elnum+nElementsY;
if(k1==0 || k1==3)
161
elnum=elnum+1;
for (int p=0;p<4;p++)
for(int q=0;q<2;q++)
nc2[p][q]=nodeCoord[element[elnum][p]][q];
Element4NInterpolate (gp, nc2, xegp); //convert gauss points
from local to global coords//
for(int j1=0;j1<4;j1++)
{
indexArray[2*j1]=2*element[elnum][j1];
indexArray[2*j1+1]=2*element[elnum][j1]+1;
}
for(int p=0;p<8;p++)
uu2[p]=u[indexArray[p]];
T=N[0]*E[element[elnum][0]]+N[1]*E[element[elnum][1]]+N[2]*E[element[elnum][2]]
+N[3]*E[element[elnum][3]];
D2[0][0] = T/(1-v*v); D2[0][1] = v* T/(1-v*v); D2[0][2] = 0;
D2[1][0] = v* T/(1-v*v); D2[1][1] = T/(1-v*v); D2[1][2] = 0;
D2[2][0] = 0; D2[2][1] = 0; D2[2][2] = 0.5*(1-v)* T/(1-v*v);
Element4NStress(gp, nc2, uu2, D2, ss_temp);
for(int p=k1*4;p<(k1+1)*4;p++)
{
for(int q=0;q<3;q++)
scStresses[p][q]=ss_temp[p%4][q];
for(int qq=0;qq<2;qq++)
xgp[p][qq]=xegp[p%4][qq];
}
}
for(int p=0;p<16;p++)
162
{
A[p][0]=1.0;
A[p][1]=xgp[p][0];
A[p][2]=xgp[p][1];
A[p][3]=xgp[p][0]*xgp[p][1];
}
MxTrans(16,4,A,transAA);
MxMulMx_Multi(4, 16, 4, transAA, A, AA);
MxInv(4,AA,invAA);
MxMulMx_Multi(4, 4, 16, invAA, transAA, BB);
MxMulMx_Multi(4, 16, 3, BB, scStresses, pcoeffs); //fit curve to
stresses at super-convergent points//
int cont1=1;
for(k1=0; k1<4; k1++) //elements around the patch node,
anticlockwise starting from top left//
{
//handle edges
if(i==1 && (k1==0 || k1==1))
cont1=cont1+2;
if(i==nElementsX-1 && (k1==2 || k1==3))
cont1=cont1+2;
if(j==1 && (k1==1 || k1==2))
cont1=cont1+2;
if(j==nElementsY-1 && (k1==0 || k1==3))
cont1=cont1+2;
}
int *indexArray2;
indexArray2= (int * )malloc(cont1*sizeof(int));
vector_zero_int (cont1, indexArray2);
indexArray2[0]=i*nNodesY+j+1; //node in the center of the
patch//
163
cont1=1;
for(k1=0; k1<4; k1++) //elements around the patch node,
anticlockwise starting from top left//
{
elnum=(i-1)*nElementsY+j-1;
if(k1==2 || k1==3)
elnum=elnum+nElementsY;
if(k1==0 || k1==3)
elnum=elnum+1;
//handle edges
if(i==1 && (k1==0 || k1==1))
{
indexArray2[cont1]=element[elnum][3]+1;
indexArray2[cont1+1]=element[elnum][0]+1;
cont1=cont1+2;
}
if(i==nElementsX-1 && (k1==2 || k1==3))
{
indexArray2[cont1]=element[elnum][1]+1;
indexArray2[cont1+1]=element[elnum][2]+1;
cont1=cont1+2;
}
if(j==1 && (k1==1 || k1==2))
{
indexArray2[cont1]=element[elnum][0]+1;
indexArray2[cont1+1]=element[elnum][1]+1;
cont1=cont1+2;
}
if(j==nElementsY-1 && (k1==0 || k1==3))
{
indexArray2[cont1]=element[elnum][2]+1;
164
indexArray2[cont1+1]=element[elnum][3]+1;
cont1=cont1+2;
}
}
kk=Unique2(cont1,indexArray2);
indexArray_temp= (int *) malloc(kk* sizeof(int));
vector_zero_int (kk, indexArray_temp);
Unique3(cont1, indexArray2, indexArray_temp);
for (int q=0;q<kk;q++)
indexArray_temp[q]=indexArray_temp[q]-1;
for(int p=0;p<kk;p++)
{
pc_temp[0]=1.0;
pc_temp[1]=nodeCoord[indexArray_temp[p]][0];
pc_temp[2]=nodeCoord[indexArray_temp[p]][1];
pc_temp[3]=nodeCoord[indexArray_temp[p]][0]*nodeCoord[indexArray_temp[p]][1];
VecMulMx(1, 4, 3, pc_temp, pcoeffs, pc_temp2);
//matrix_zero_int (nNodes, 3, count3);
for(k=0;k<3;k++)
{
rStress[indexArray_temp[p]][k]=
rStress[indexArray_temp[p]][k]+ pc_temp2[k];
count3[indexArray_temp[p]][k]=
count3[indexArray_temp[p]][k]+1;
}
}
free(indexArray2);
free(indexArray_temp);
}
}
165
for(i=0;i<16;i++)
free(xgp[i]);
free(xgp);
for(i=0;i<16;i++)
free(scStresses[i]);
free(scStresses);
for(i=0;i<nNodes;i++)
for(j=0;j<3;j++)
rStress[i][j]=rStress[i][j]*1.0/count3[i][j]; // actually in this case there is
no need to average--only one recovery for each node/
//free the memory //
for(i=0;i<4;i++)
free(xegp[i]);
free(xegp);
for(i=0;i<3;i++)
free(D2[i]);
free(D2);
for(i=0;i<4;i++)
free(nc2[i]);
free(nc2);
for(i=0;i<4;i++)
free(ss_temp[i]);
free(ss_temp);
for(i=0;i<16;i++)
free(A[i]);
free(A);
for(i=0;i<4;i++)
free(pcoeffs[i]);
166
free(pcoeffs);
for(i=0;i<4;i++)
free(AA[i]);
free(AA);
for(i=0;i<4;i++)
free(transAA[i]);
free(transAA);
for(i=0;i<nNodes;i++)
free(count3[i]);
free(count3);
for(i=0;i<4;i++)
free(invAA[i]);
free(invAA);
for(i=0;i<4;i++)
free(BB[i]);
free(BB);
free(pc_temp);
free(pc_temp2);
free(uu2);
}
167
//Main function for GPU-MLEFEA//
//**************** Main function for GPU-MLEFEA ****************************//
int main(void)
{
printf("\n\nProgramming is running!\n\n\n");
double MeshSize=10, thickness=300,Evalue=24500, v=0.2, TOL=0.2, PredefinedXc=70;
double Load=2e5, phy=0.6, fcdot=25, kesai=0.9, kesai2=1.2, kesai3=0.6, ratio=1.0,
ratio_step=0.1;
int length=2000, b=500, Max_iterations=50, Ky=5000, iterations, niter=4;
int SupPosition=250, RightPosition=length, SupOutCount, SupInCount,
RightBoundaryCount, numNonZero=0;
int i, j, k, count, xc, j1, m;
int *coord_x, *middle, *flag, *flag2, *flag3, *coord_below, *coord_above;
int ** nodeConn, *rowPtr, *col, **elemIndex, *temp_node,**nodeCoord, **element;
double T, Maxcomp_stress, Maxtens_stress, tensile_force, steel_area, moment2, Xc2,
Allowable_tens_stress;
double *elem, *u, *E, *f_CSR, *N, *middle_x, *middle_y, *middle_xy,
*below_tensile_stress, *above_compressive_stress;
double **rStress, **gp, **D, **Ke;
int ndof, nElementsX, nElementsY, nElements, nNodes, nNodesX, nNodesY, maxflag;
int *LoadPoint, *SupIn, *SupOut, *RightBoundary;
int **nc;
double Max_comp_stress;
double *cpu_out, *gpu_out, *Xc_value2, *real_tensile_force, *Area_of_Steel,
*real_moment2, *sigma1, *sigma3;
nElementsX = (int)length/MeshSize;
nElementsY = (int) b/MeshSize;
nElements = nElementsX * nElementsY;
nNodesX = nElementsX + 1;
nNodesY = nElementsY + 1;
168
nNodes = nNodesX * nNodesY;
ndof=2*nNodes; //two degrees of freedom (x and y translation) for each node
//initialize the array //
nodeCoord= (int **) malloc(nNodes * sizeof(int *));
for(i = 0; i < nNodes; i++)
nodeCoord[i] = (int * )malloc(2* sizeof(int));
matrix_zero_int (nNodes, 2, nodeCoord); //each row represents one node, col
1 is x coord, col 2 is y coord
element= (int **) malloc(nElements * sizeof(int *));
for(i = 0; i < nElements; i++)
element[i] = (int * )malloc(4* sizeof(int));
matrix_zero_int (nElements, 4, element); //each row represents one element, each col
contains 1 node number
gp= (double **) malloc(4 * sizeof(double *));
for(i = 0; i < 4; i++)
gp[i] = (double * )malloc(2* sizeof(double));
matrix_zero (4, 2, gp);
Ke= (double **) malloc(8 * sizeof(double *));
for(i = 0; i < 8; i++)
Ke[i] = (double * )malloc(8* sizeof(double));
matrix_zero (8, 8, Ke);
cpu_out= (double *) malloc(ndof * sizeof(double ));
vector_zero (ndof, cpu_out);
gpu_out= (double *) malloc(ndof * sizeof(double ));
vector_zero (ndof, gpu_out);
nc= (int **) malloc(4 * sizeof(int *));
for(i = 0; i < 4; i++)
nc[i] = (int * )malloc(2* sizeof(int));
matrix_zero_int (4, 2, nc);
169
D= (double **) malloc(3 * sizeof(double *));
for(i = 0; i < 3; i++)
D[i] = (double * )malloc(3* sizeof(double));
matrix_zero (3, 3, D);
rStress= (double **) malloc(nNodes * sizeof(double *));
for(i = 0; i < nNodes; i++)
rStress[i] = (double * )malloc(3* sizeof(double));
//initialize the vector //
E= (double * )malloc(nNodes*sizeof(double));
vector_zero (nNodes, E);
sigma1= (double * )malloc(nNodes*sizeof(double));
vector_zero (nNodes, sigma1);
sigma3= (double * )malloc(nNodes*sizeof(double));
vector_zero (nNodes, sigma3);
N= (double * )malloc(4*sizeof(double));
vector_zero (4, N);
f_CSR= (double * )malloc(ndof*sizeof(double));
vector_zero (ndof, f_CSR);
LoadPoint= (int * )malloc(ndof*sizeof(int));
vector_zero_int ((int)(250/MeshSize+1), LoadPoint);
SupIn= (int * )malloc(ndof*sizeof(int));
vector_zero_int ((int) (SupPosition/MeshSize-1), SupIn);
SupOut= (int * )malloc(ndof*sizeof(int));
vector_zero_int (2, SupOut);
RightBoundary= (int * )malloc(ndof*sizeof(int));
170
vector_zero_int ((int) (b/MeshSize+1), RightBoundary);
u= (double * )malloc(ndof*sizeof(double));
vector_zero (ndof, u);
flag= (int * )malloc(nNodes*sizeof(int));
flag2= (int * )malloc(nNodes*sizeof(int));
flag3= (int * )malloc(nNodes*sizeof(int));
middle= (int * )malloc((int)(b/MeshSize+1)*sizeof(int));
vector_zero_int ((int)(b/MeshSize+1), middle);
middle_x= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));
vector_zero ((int)(b/MeshSize+1), middle_x);
middle_y= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));
vector_zero ((int)(b/MeshSize+1), middle_y);
middle_xy= (double * )malloc((int)(b/MeshSize+1)*sizeof(double));
vector_zero ((int)(b/MeshSize+1), middle_xy);
coord_x= (int * )malloc((int)(b/MeshSize+1)*sizeof(int));
vector_zero_int ((int)(b/MeshSize+1), coord_x);
real_tensile_force= (double * )malloc(Max_iterations*sizeof(double));
vector_zero (Max_iterations, real_tensile_force);
Area_of_Steel= (double * )malloc(Max_iterations*sizeof(double));
vector_zero (Max_iterations, Area_of_Steel);
Xc_value2= (double * )malloc(Max_iterations*sizeof(double));
vector_zero (Max_iterations, Xc_value2);
real_moment2= (double * )malloc(Max_iterations*sizeof(double));
vector_zero (Max_iterations, real_moment2);
171
//work out nodal nodeCoord//
k= 0;
for (i=0; i<nNodesX; i++)
{
xc= i*length/nElementsX;
for (j=0; j< nNodesY; j++)
{
nodeCoord[k][0]= xc;
nodeCoord[k][1]= b* j /nElementsY;
k=k+1;
}
}
for (i=0; i<nNodes; i++)
E[i]=Evalue;
Max_comp_stress=phy*0.9*fcdot;
//work out which element connect to which nodes//
k=0;
for (i=0;i<nElementsX; i++)
{
for (j=0;j<nElementsY;j++)
{
int n= i*nNodesY+j;
element[k][0]=n;element[k][1]=n+nNodesY;element[k][2]=n+nNodesY+1;element[k][3]
=n+1;
k=k+1;
}
}
gp[0][0]=-1.0/sqrt(3.0); gp[0][1]=-1.0/sqrt(3.0);
gp[1][0]=-1.0/sqrt(3.0); gp[1][1]=1.0/sqrt(3.0);
gp[2][0]=1.0/sqrt(3.0); gp[2][1]=-1.0/sqrt(3.0);
gp[3][0]=1.0/sqrt(3.0); gp[3][1]=1.0/sqrt(3.0);
172
N[0]=0.25*(1-1.0/sqrt(3.0)-1.0/sqrt(3.0)+1.0/sqrt(3.0)*(1.0/sqrt(3.0)));
N[1]=0.25*(1+1.0/sqrt(3.0)-1.0/sqrt(3.0)-1.0/sqrt(3.0)*(1.0/sqrt(3.0)));
N[2]=0.25*(1+1.0/sqrt(3.0)+1.0/sqrt(3.0)+1.0/sqrt(3.0)*(1.0/sqrt(3.0)));
N[3]=0.25*(1-1.0/sqrt(3.0)+1.0/sqrt(3.0)-1.0/sqrt(3.0)*(1.0/sqrt(3.0)));
nodeConn= (int **) malloc(nNodes * sizeof(int *));
for(i = 0; i < nNodes; i++)
nodeConn[i] = (int * )malloc(9* sizeof(int));
matrix_zero_int (nNodes, 9, nodeConn);
rowPtr= (int *) malloc((2*nNodes+1)* sizeof(int));
vector_zero_int (2*nNodes+1, rowPtr);
temp_node= (int *) malloc(4 * sizeof(int));
vector_zero_int (4, temp_node);
///*********starting the iterations****************/
for (iterations=0;iterations<Max_iterations;iterations++)
{
elemIndex= (int **) malloc(8 * sizeof(int *));
for(i = 0; i < 8; i++)
elemIndex[i] = (int * )malloc(8* sizeof(int));
matrix_zero_int (8, 8, elemIndex);
//CSR format assembly//
//find information for CSR storage of stiffness matrix
MakeNodeConn(nNodes, element, nElements, nodeConn);
numNonZero=0;
for (i=0;i<nNodes; i++)
for(j=0;j<9;j++)
173
if(nodeConn[i][j]!=0)
numNonZero=numNonZero+4; //two elements in two
rows
col= (int *) malloc(numNonZero* sizeof(int));
vector_zero_int (numNonZero, col);
elem= (double *) malloc(numNonZero* sizeof(double)); // storage space for
CSR
vector_zero(numNonZero, elem);
MakeCSR(rowPtr, col, numNonZero, nNodes, nodeConn);
//Assembly the stiffness matrix in CSR//
for (i=0; i<nElements; i++)
{
for (int p=0;p<4;p++)
for(int q=0; q<2;q++)
{
nc[p][q]=nodeCoord[element[i][p]][q];
}
T=N[0]*E[element[i][0]]+N[1]*E[element[i][1]]+N[2]*E[element[i][2]]+N[3]*E[element[
i][3]];
D[0][0] = T/(1-v*v); D[0][1] = v* T/(1-v*v); D[0][2] = 0;
D[1][0] = v* T/(1-v*v); D[1][1] = T/(1-v*v); D[1][2] = 0;
D[2][0] = 0; D[2][1] = 0; D[2][2] = 0.5*(1-v)* T/(1-v*v);
Element4NStiffness(nc,thickness,D,Ke);
for (k=0; k<4; k++)
temp_node[k]=element[i][k];
GetCSRElemIndex (temp_node, rowPtr, col, elemIndex);
174
for(j1=0; j1<8; j1++)
for(int kk1=0; kk1<8; kk1++)
{
m = elemIndex[j1][kk1];
elem[m]=elem[m] + Ke[j1][kk1];
}
}
//add surface forces//
count=0;
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]>((int)length-250) &&
nodeCoord[i][0]<(int)length)
{
LoadPoint[count]=i;
count++;
}
}
for (i=0;i<count;i++)
f_CSR[2*LoadPoint[i]+1]=-Load*1.0/2/(250/MeshSize);
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]==((int)length-250))
f_CSR[2*i+1]=-0.5*Load/2/(250/MeshSize);
if (nodeCoord[i][1]==(int)b && nodeCoord[i][0]==(int)length)
f_CSR[2*i+1]=-0.5*Load/2/(250/MeshSize);
}
//add restrains equations for the support nodes//
SupOutCount=0;
175
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][1]==0)
if ( nodeCoord[i][0]==SupPosition || nodeCoord[i][0]==0)
{
SupOut[SupOutCount]=i;
SupOutCount++;
}
}
for (j=0; j<SupOutCount; j++)
for (i=rowPtr[2*SupOut[j]+1]; i<rowPtr[2*SupOut[j]+1+1]; i++ )
if (col[i]==(2*SupOut[j]+1))
elem[i]=elem[i]+Ky/2;
SupInCount=0;
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][1]==0 && nodeCoord[i][0]<SupPosition &&
nodeCoord[i][0]>0)
{
SupIn[SupInCount]=i;
SupInCount++;
}
}
for (j=0; j<SupInCount; j++)
for (i=rowPtr[2*SupIn[j]+1]; i<rowPtr[2*SupIn[j]+1+1]; i++ )
if (col[i]==(2*SupIn[j]+1))
elem[i]=elem[i]+Ky;
// Apply boundary conditions for the right-side nodes using Penalty Method //
RightBoundaryCount=0;
for(i=0; i<nNodes; i++)
{
176
if (nodeCoord[i][0]==(int)RightPosition) //RightPosition can be places
as length.
{
RightBoundary[RightBoundaryCount]=i;
RightBoundaryCount++;
}
}
for(j=0; j<RightBoundaryCount; j++)
for(i=rowPtr[2*RightBoundary[j]]; i<rowPtr[2*RightBoundary[j]+1];
i++ )
if (col[i]==(2*RightBoundary[j]))
elem[i]=elem[i]*1e10;
for(i=0; i<RightBoundaryCount; i++)
f_CSR[2*RightBoundary[i]]=0;
// find nodal displacements *******************u = K\f; = inv(K)*f //
//cpu approach
cpu_pcg_solve( ndof, rowPtr, col, elem, f_CSR, cpu_out);
//gpu approach
gpu_pcg_solve(rowPtr, (ndof+1), col, numNonZero, elem, numNonZero, f_CSR,
ndof, gpu_out);
for(i=0;i<ndof;i++)
u[i]=gpu_out[i];
matrix_zero (nNodes, 3, rStress);
//Nodal stress calculation
SPRStress(u, E, N, v, nElements, nElementsX, nElementsY, nNodes, nNodesX,
nNodesY, nodeCoord, element, ndof, MeshSize, b, gp, rStress);
for(i=0; i<nNodes; i++)
{
177
sigma1[i]=(rStress[i][0]+rStress[i][1])/2+sqrt(pow((rStress[i][0]-
rStress[i][1])/2,2)+pow(rStress[i][2],2));
sigma3[i]=(rStress[i][0]+rStress[i][1])/2-sqrt(pow((rStress[i][0]-
rStress[i][1])/2,2)+pow(rStress[i][2],2));
}
Maxcomp_stress=max_abs_double(nNodes, sigma3);
Maxtens_stress=max_abs_double(nNodes,sigma1);
//find max_flag
vector_zero_int (nNodes, flag);
for(i=0; i<nNodes; i++)
if(fabs(sigma3[i])>Max_comp_stress)
flag[i]=1;
maxflag=max_int(nNodes, flag);
//find the position of central line
int middleCount=0;
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][0]==length && nodeCoord[i][1]<b)
{
middle[middleCount]=i;
middleCount++;
}
}
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][0]==length && nodeCoord[i][1]==b)
{
middle[middleCount]=i;
middleCount++;
}
}
178
for (i=0; i<middleCount; i++)
{
middle_x[i]=rStress[middle[i]][0];
middle_y[i]=rStress[middle[i]][1];
middle_xy[i]=rStress[middle[i]][2];
coord_x[i]=nodeCoord[middle[i]][1];
}
//find the tensile and compressive stress
int below_tensile_stressCount=0;
int above_compressive_stressCount=0;
int coord_belowCount=0;
int coord_aboveCount=0;
for(i=0;i<middleCount;i++)
{
if (middle_x[i]>=0)
{
below_tensile_stressCount++;
coord_belowCount++;
}
else
{
above_compressive_stressCount++;
coord_aboveCount++;
}
}
below_tensile_stress= (double *) malloc(below_tensile_stressCount*
sizeof(double));
vector_zero (below_tensile_stressCount, below_tensile_stress);
above_compressive_stress= (double *)
malloc(above_compressive_stressCount* sizeof(double));
vector_zero (above_compressive_stressCount, above_compressive_stress);
179
coord_below= (int*) malloc(coord_belowCount* sizeof(int));
vector_zero_int (coord_belowCount, coord_below);
coord_above= (int*) malloc(coord_aboveCount* sizeof(int));
vector_zero_int (coord_aboveCount, coord_above);
below_tensile_stressCount=0;
above_compressive_stressCount=0;
coord_belowCount=0;
coord_aboveCount=0;
for(i=0;i<middleCount;i++)
{
if (middle_x[i]>=0)
{
below_tensile_stress[below_tensile_stressCount]=middle_x[i];
below_tensile_stressCount++;
coord_below[coord_belowCount]=coord_x[i];
coord_belowCount++;
}
else
{
above_compressive_stress[above_compressive_stressCount]=middle_x[i];
above_compressive_stressCount++;
coord_above[coord_aboveCount]=coord_x[i];
coord_aboveCount++;
}
}
//find the tensile force and the area of reinforcement
tensile_force=area(coord_belowCount, coord_below, below_tensile_stress,
thickness);
real_tensile_force[iterations]=tensile_force;
steel_area=tensile_force/400;
180
Area_of_Steel[iterations]=steel_area;
//find the position of steel-xc
moment2=Moment2(coord_belowCount,coord_below,below_tensile_stress,thickness);
Xc2=(moment2)/tensile_force;
Xc_value2[iterations]=Xc2;
real_moment2[iterations]=moment2;
////compressive stress redistribution
if (maxflag!=0)
{
for(i=0;i<nNodes;i++)
{
if (nodeCoord[i][1]==b && nodeCoord[i][0]==length)
E[i]=0;
if (fabs(sigma3[i])>Max_comp_stress)
if(sigma1[i]<=0.33*sqrt(fcdot))
E[i]=kesai*phy*0.9*fcdot/fabs(sigma3[i])*E[i];
else
E[i]=kesai*phy*0.54*fcdot/fabs(sigma3[i])*E[i];
}
}
//tensile stress redistribution
if ((iterations+1)%niter==1)
{
Allowable_tens_stress= Maxtens_stress*(ratio-ratio_step);
}
if ((Xc2-PredefinedXc)>TOL)
{
vector_zero_int(nNodes, flag2);
for(i=0;i<nNodes;i++)
if (sigma1[i]>Allowable_tens_stress)
181
{
flag2[i]=1;
E[i]=kesai2*
fabs(sigma1[i])/Allowable_tens_stress*E[i];
}
}
if ((Xc2-PredefinedXc)<-TOL)
{
vector_zero_int(nNodes, flag3);
for(i=0;i<nNodes;i++)
if (sigma1[i]>Allowable_tens_stress)
{
flag3[i]=1;
E[i]=kesai3*fabs(sigma1[i])/Allowable_tens_stress*E[i];
}
}
printf("iter=%d\n\n",iterations);
if(maxflag==0 && fabs(Xc2-PredefinedXc)<TOL)
{
printf("\n\nFinal_iter is %d\n\n", iterations);
break;
}
for(i=0;i<8;i++)
free(elemIndex[i]);
free(elemIndex);
free(col);
free(elem);
free(below_tensile_stress);
free(above_compressive_stress);
free(coord_above);
free(coord_below);
182
printf("iterations= %d; maxflag=%d; steel_area=%lf; Xc2=%lf; x_diff= %lf;
Maxcomp_stress=%lf\n",iterations, maxflag, steel_area, Xc2, Xc2-PredefinedXc,
Maxcomp_stress);
}
printf("iterations= %d; maxflag=%d; steel_area=%lf; Xc2=%lf; x_diff= %lf;
Maxcomp_stress=%lf\n",iterations, maxflag, steel_area, Xc2, Xc2-PredefinedXc,
Maxcomp_stress);
//free all variables//
for(i=0;i<nNodes;i++)
free(nodeCoord[i]);
free(nodeCoord);
for(i=0;i<nElements;i++)
free(element[i]);
free(element);
for(i=0;i<4;i++)
free(gp[i]);
free(gp);
for(i=0;i<8;i++)
free(Ke[i]);
free(Ke);
for(i=0;i<4;i++)
free(nc[i]);
free(nc);
for(i=0;i<3;i++)
free(D[i]);
free(D);
free(cpu_out);
free(gpu_out);
free(E);
183
free(sigma1);
free(sigma3);
free(N);
free(f_CSR);
free(LoadPoint);
free(SupIn);
free(SupOut);
free(RightBoundary);
free(nodeConn);
free(rowPtr);
free(temp_node);
free(u);
free(flag);
free(flag2);
free(flag3);
free(middle);
free(middle_x);
free(middle_y);
free(middle_xy);
free(coord_x);
free(real_tensile_force);
free(Area_of_Steel);
free(Xc_value2);
free(real_moment2);
return 0;
}
184
//Main Program of GPU-PCG//
//****************** Main Program of GPU-PCG *****************************//
// using PCG approach to solve Ax = b for x with A in CSR format.
// rowptr : matrix row pointer
// col : matrix column pointer
// elem : matrix values
// size* : size of each vector
// vec : pointer to RHS vector
// x_final : solution (x) is returned here
void gpu_pcg_solve(int* rowptr, int size_findrm, int *col, int size_colm, double* elem, int
matrix_val_size,double* vec, int rhs_val_size, double *x_final)
{
clock_t gputime, gpustartingtime, gpuendingtime;
int GPUITER;
int sumGPUtime=0;
for(GPUITER = 0; GPUITER < MAXGPUITER; GPUITER++)
{
gpustartingtime=clock();
// CSR Matrix on the GPU
int *k_findrm, *k_colm;
double *k_val;
// Vectors on the GPU
double *k_b, *k_x, *k_r, *k_d, *k_q, *k_s;
// Diagonal matrix on the GPU (stored as a vector)
double* k_jac;
// Scalars on the GPU
double *k_alpha, *k_snew, *k_beta, *k_sold, *k_s0;
// Scalars on the host
double s0, snew;
int iterations = 0;
// Allocate space on the GPU for the CSR matrix and RHS vector, and copy from
host to GPU
cudaMalloc((void**)&k_findrm, sizeof(int)*(size_findrm));
185
cudaMemcpy(k_findrm, rowptr, sizeof(int)*(size_findrm),
cudaMemcpyHostToDevice);
cudaMalloc((void**)&k_colm, sizeof(int)*(size_colm));
cudaMemcpy(k_colm, col, sizeof(int)*(size_colm), cudaMemcpyHostToDevice);
cudaBindTexture(NULL, texture_colm, k_colm, sizeof(int)*(size_colm));
cudaMalloc((void**)&k_val, sizeof(double)*(matrix_val_size));
cudaMemcpy(k_val, elem, sizeof(double)*(matrix_val_size),
cudaMemcpyHostToDevice);
cudaMalloc((void**)&k_b, sizeof(double)*(rhs_val_size));
cudaMemcpy(k_b, vec, sizeof(double)*(rhs_val_size),
cudaMemcpyHostToDevice);
// Allocate space for vectors on the GPU
cudaMalloc((void**)&k_x, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_r, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_d, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_q, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_s, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_jac, sizeof(double)*(rhs_val_size));
cudaMalloc((void**)&k_alpha, sizeof(double));
cudaMalloc((void**)&mid_temp, sizeof(double)*NUM_BLOCKS);
cudaMalloc((void**)&k_snew, sizeof(double)*NUM_BLOCKS);
cudaMalloc((void**)&k_sold, sizeof(double));
cudaMalloc((void**)&k_beta, sizeof(double));
cudaMalloc((void**)&k_s0, sizeof(double));
// Dimensions of blocks and grid on the GPU
dim3 BlockDim(NUM_THREADS);
dim3 GridDim(NUM_BLOCKS);
// Create diagonal preconditioning matrix (J = 1/diag(M))
create_diag<<<1,BlockDim>>>(rhs_val_size, k_findrm, k_colm, k_val, k_jac);
186
// Bind the matrix to the texture cache - this was not done earlier as we
modified the matrix
cudaBindTexture(NULL, texture_val, k_val, sizeof(double)*(matrix_val_size));
// Initialise result vector (x=0)
veczero<<<1,BlockDim>>>(rhs_val_size, k_x);
// r=b-Ax (r=b since x=0), and d=M^(-1)r
cudaMemcpy(k_r, k_b, sizeof(double)*(rhs_val_size),
cudaMemcpyDeviceToDevice);
diag_spmv<<<1,BlockDim>>>(rhs_val_size, k_jac, k_r, k_d);
// s0 = r.d
vecdot(rhs_val_size, k_r, k_d, k_s0);
// snew = s0
scalarassign(k_snew, k_s0);
// Copy snew and s0 back to host so that host can evaluate stopping condition
cudaMemcpy(&snew, k_snew, sizeof(double), cudaMemcpyDeviceToHost);
cudaMemcpy(&s0, k_s0, sizeof(double), cudaMemcpyDeviceToHost);
// While i < imax and snew > epsilon^2*s0
while(( iterations<IMAX) && (snew>(Epsilon*Epsilon*s0)))
{
kernel<<<GridDim,BlockDim>>>(k_findrm, k_d, k_colm, k_val, k_q,
rhs_val_size);
// alpha = snew/(d.q)
vecdot(rhs_val_size, k_d, k_q, k_alpha);
scalardiv<<<1,1>>>(k_snew, k_alpha, k_alpha);
// x = x + alpha*d
axpy<<<GridDim,BlockDim>>>(rhs_val_size, k_alpha, k_d, k_x, k_x);
// r = r - alpha*q
187
ymax<<<GridDim,BlockDim>>>(rhs_val_size, k_alpha, k_q, k_r);
// s = M^(-1)r
diag_spmv<<<GridDim,BlockDim>>>(rhs_val_size, k_jac, k_r, k_s);
// sold = snew
scalarassign(k_sold, k_snew);
// snew = r.s
vecdot(rhs_val_size, k_r, k_s, k_snew);
// beta = snew/sold
scalardiv<<<1,1>>>(k_snew, k_sold, k_beta);
// d = s + beta*d
axpy<<<GridDim,BlockDim>>>(rhs_val_size, k_beta, k_d, k_s, k_d);
// Copy back snew so the host can evaluate the stopping condition
cudaMemcpy(&snew, k_snew, sizeof(double),
cudaMemcpyDeviceToHost);
iterations++;
}
// Copy result vector back from GPU
cudaMemcpy(x_final, k_x, sizeof(double)*(rhs_val_size),
cudaMemcpyDeviceToHost);
//ThreadSynchronize
cudaThreadSynchronize();
// free memory
cudaUnbindTexture( texture_colm);
cudaUnbindTexture( texture_val);
cudaFree(k_findrm);
188
cudaFree(k_colm);
cudaFree(k_val);
cudaFree(k_b);
cudaFree(k_x);
cudaFree(k_r);
cudaFree(k_d);
cudaFree(k_q);
cudaFree(k_jac);
cudaFree(k_alpha);
cudaFree(k_snew);
cudaFree(k_sold);
cudaFree(k_beta);
cudaFree(k_s0);
cudaFree(mid_temp);
gpuendingtime=clock();
gputime=gpuendingtime-gpustartingtime;
sumGPUtime=sumGPUtime+gputime;
}
//printf(" \nSolving time on GPU for %d steps is %d
ms\n\n",GPUITER,sumGPUtime/GPUITER);
}
189
//Main Program of CPU-PCG//
//******************* Main Program of CPU-PCG
*********************************//
// Host implementation of conjugate gradient method to solve Ax = b for x with A in CSR
format //
void cpu_pcg_solve( int rhs_val_size, int *k_findrm, int *k_colm, double *k_val, double *k_b,
double *k_x)
{
int CPUITER;
int sumCPUtime=0;
clock_t cputime, startingtime, endingtime;
for(CPUITER = 0; CPUITER < MAXCPUITER; CPUITER++)
{
startingtime=clock();
int iterations=0;
double k_s0, k_snew, k_alpha, k_sold, k_beta;
double *k_jac = (double *)malloc(rhs_val_size * sizeof(double));
double *k_r = (double *)malloc(rhs_val_size * sizeof(double));
double *k_d = (double *)malloc(rhs_val_size * sizeof(double));
double *k_q = (double *)malloc(rhs_val_size * sizeof(double));
double *tmp = (double *)malloc(rhs_val_size * sizeof(double));
double *k_s = (double *)malloc(rhs_val_size * sizeof(double));
//creat the diagonal preconditioning matrix (J=1/diag(M))
cpu_creat_jac(rhs_val_size,k_findrm,k_colm,k_val,k_jac);
//initialize result vector (x=0)
vector_zero(rhs_val_size,k_x);
//r=b-Ax (r=b since x=0),and d=M^(-1)r
memcpy(k_r,k_b,rhs_val_size*sizeof(double));
cpu_diag_spmv(rhs_val_size,k_jac,k_r, k_d);
190
//s0=r.d
k_s0=vec_dot_vec(rhs_val_size,k_r,k_d);
//snew=s0
k_snew=k_s0;
vector_zero(rhs_val_size,k_q);
while(( iterations<IMAX) && (k_snew>(Epsilon*Epsilon*k_s0)))
{
vector_zero(rhs_val_size,k_q);
spmv_csr_serial(rhs_val_size, k_findrm, k_colm,k_val,k_d,k_q);
k_alpha=vec_dot_vec(rhs_val_size,k_d,k_q);
k_alpha=k_snew/k_alpha;
sca_mul_vec(rhs_val_size,k_alpha,k_d,tmp);
vec_add(rhs_val_size,k_x,tmp);
//r=r-alpha*q
sca_mul_vec(rhs_val_size,k_alpha,k_q,tmp);
vec_sub(rhs_val_size,k_r,tmp);
cpu_diag_spmv(rhs_val_size,k_jac,k_r, k_s);
k_sold=k_snew;
//snew=r.s
k_snew=vec_dot_vec(rhs_val_size,k_r,k_s);
//beta=snew/sold
k_beta=k_snew/k_sold;
//d=s+beta*d
sca_mul_vec(rhs_val_size,k_beta,k_d,tmp);
vec_add_vec(rhs_val_size,k_s,tmp,k_d);
191
iterations++;
}
free(k_jac);
free(k_r);
free(k_d);
free(k_q);
free(tmp);
free(k_s);
endingtime=clock();
cputime=endingtime-startingtime;
sumCPUtime=sumCPUtime+cputime;
}
//printf(" \nSolving time on CPU for %d steps is %d
ms\n\n",CPUITER,sumCPUtime/CPUITER);
}